{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "BK9dmG7MSOya" }, "source": [ "#
Taller de Aprendizaje Automático
\n", "##
Taller 5: Estimación de la demanda de bicicletas compartidas utilizando *Neural Networks*.
" ] }, { "cell_type": "markdown", "metadata": { "id": "8PjU4ItOTaGr" }, "source": [ "# Introducción\n", "\n", "En esta actividad se retomará el problema de la competencia [*Bike Sharing Demand*](https://www.kaggle.com/c/bike-sharing-demand) visto en el Taller 3.\n", "Esta vez las estimaciónes deben obtenerse utilizando la herramienta: *Multilayer Perceptron* (MLP). Es importante mantener la función *Root Mean Squared Logarithmic Error* (RMSLE) como medida de desempeño de manera de poder comparar los resultados con los obtenidos en el Taller 3.\n", "\n", "Tanto las preguntas teóricas como la parte práctica de esta actividad están ligadas al contenido del capítulo 10 (*Introduction to\n", "Artificial Neural Networks with\n", "Keras*) del libro del curso.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "hnXnqw5KS3zd" }, "source": [ "## Objetivos\n", "\n", "\n", "* Trabajar con modelos MLP utilizando la librería [*Keras*](https://keras.io/api/).\n", "* Probar algunas de las herramientas disponibles para la busqueda de hiperparámetros.\n", "* Interpretar modelos con *Shapley Values*." ] }, { "cell_type": "markdown", "metadata": { "id": "graphic-longitude" }, "source": [ "## Formas de trabajo" ] }, { "cell_type": "markdown", "metadata": { "id": "similar-surgery" }, "source": [ "#### Opción 1: Trabajar localmente" ] }, { "cell_type": "markdown", "metadata": { "id": "616868da" }, "source": [ "##### Descarga de datos disponibles en Kaggle" ] }, { "cell_type": "markdown", "metadata": { "id": "9a76c306" }, "source": [ "Luego, para descargar el dataset de Demanda de Bicicletas:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-04-06T08:45:00.703408Z", "start_time": "2022-04-06T08:44:59.289873Z" }, "id": "a25f6679", "outputId": "d70c8c1e-530a-45ef-e259-f3addee20c02", "scrolled": true }, "outputs": [], "source": [ "!kaggle competitions download -c bike-sharing-demand" ] }, { "cell_type": "markdown", "metadata": { "id": "e60a6af5" }, "source": [ "Descomprima el archivo descargado:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-04-06T08:45:00.781065Z", "start_time": "2022-04-06T08:45:00.769278Z" } }, "outputs": [], "source": [ "import shutil\n", "shutil.unpack_archive('./bike-sharing-demand.zip', './')" ] }, { "cell_type": "markdown", "metadata": { "id": "efficient-thailand" }, "source": [ "#### Opción 2: Trabajar en *Colab*. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", "
\n", " Ejecutar en Google Colab\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "compound-criminal" }, "source": [ "Se puede trabajar en Google Colab. Para ello es necesario contar con una cuenta de **google drive** y ejecutar un notebook almacenado en dicha cuenta. De lo contrario, no se conservarán los cambios realizados en la sesión. En caso de ya contar con una cuenta, se puede abrir el notebook y luego ir a `Archivo-->Guardar una copia en drive`. " ] }, { "cell_type": "markdown", "metadata": { "id": "e_rith_Skga5" }, "source": [ "La siguiente celda monta el disco personal del drive:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 21875, "status": "ok", "timestamp": 1645451536176, "user": { "displayName": "Emiliano Acevedo", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s64", "userId": "09255842080725499836" }, "user_tz": 180 }, "id": "timely-power", "outputId": "9b878f94-05b8-4598-baa4-08a3e132868d" }, "outputs": [], "source": [ "from google.colab import drive\n", "drive.mount('/content/drive')" ] }, { "cell_type": "markdown", "metadata": { "id": "9WC47WBdkeqj" }, "source": [ "A continuación, vaya a su cuenta de [Kaggle](https://www.kaggle.com/) (o cree una si aún no lo ha hecho), haga clic en el icono de perfil en la esquina superior derecha de la pantalla y seleccione \"Your Account\" en la lista desplegable. Luego, seleccione la viñeta \"Account\" y haga clic en \"Create new API token\". Entonces un archivo llamado kaggle.json se descargará automáticamente a su carpeta de descargas. Este archivo contiene sus credenciales de inicio de sesión para permitirle acceder a la API." ] }, { "cell_type": "markdown", "metadata": { "id": "changing-enhancement" }, "source": [ "La siguiente celda realiza la configuración necesaria para obtener datos desde la plataforma Kaggle. Le solicitará que suba el archivo kaggle.json descargado anteriormente." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 90, "resources": { "http://localhost:8080/nbextensions/google.colab/files.js": { "data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgZG8gewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwoKICAgICAgbGV0IHBlcmNlbnREb25lID0gZmlsZURhdGEuYnl0ZUxlbmd0aCA9PT0gMCA/CiAgICAgICAgICAxMDAgOgogICAgICAgICAgTWF0aC5yb3VuZCgocG9zaXRpb24gLyBmaWxlRGF0YS5ieXRlTGVuZ3RoKSAqIDEwMCk7CiAgICAgIHBlcmNlbnQudGV4dENvbnRlbnQgPSBgJHtwZXJjZW50RG9uZX0lIGRvbmVgOwoKICAgIH0gd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCk7CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK", "headers": [ [ "content-type", "application/javascript" ] ], "ok": true, "status": 200, "status_text": "" } } }, "executionInfo": { "elapsed": 31400, "status": "ok", "timestamp": 1645451632279, "user": { "displayName": "Emiliano Acevedo", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s64", "userId": "09255842080725499836" }, "user_tz": 180 }, "id": "convinced-person", "outputId": "59951021-dcb6-4622-e24e-062806a6ce7f" }, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')\n", "from google.colab import files\n", "\n", "# El siguiente archivo solicitado es para habilitar la API de Kaggle en el entorno que está trabajando.\n", "# Este archivo se descarga entrando a su perfíl de Kaggle, en la sección API, presionando donde dice: Create New API Token\n", "\n", "uploaded = files.upload()\n", "\n", "for fn in uploaded.keys():\n", " print('User uploaded file \"{name}\" with length {length} bytes'.format(\n", " name=fn, length=len(uploaded[fn])))\n", "\n", "#Then move kaggle.json into the folder where the API expects to find it.\n", "!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json" ] }, { "cell_type": "markdown", "metadata": { "id": "fossil-australian" }, "source": [ "Una vez guardado el *token* se pueden descargar los datos, en este caso se bajarán los datos del dataset de Demanda de Bicicletas:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-04-06T08:18:39.849067Z", "start_time": "2022-04-06T08:18:38.482792Z" }, "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 2104, "status": "ok", "timestamp": 1645451663400, "user": { "displayName": "Emiliano Acevedo", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s64", "userId": "09255842080725499836" }, "user_tz": 180 }, "id": "independent-eagle", "outputId": "bfe00bd3-81a5-487f-e60f-8b6e2ce1a189" }, "outputs": [], "source": [ "!kaggle competitions download -c bike-sharing-demand" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-04-06T08:19:07.842906Z", "start_time": "2022-04-06T08:18:41.405686Z" } }, "outputs": [], "source": [ "!unzip bike-sharing-demand.zip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Paquetes a utilizar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En esta actividad se utilizarán algunas bibliotecas auxiliares que deberán ser instaladas. Ejecutar la siguiente celda hasta que se ejecute sin errores. En caso de error, se puede instalar el paquete faltante desde el notebook con el comando:\n", "\n", "`!pip install paquete_faltante`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-04-06T08:45:12.980879Z", "start_time": "2022-04-06T08:45:10.992955Z" } }, "outputs": [], "source": [ "#import comet_ml in the top of your file\n", "from comet_ml import Experiment\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "#import seaborn as sns\n", "#sns.set_theme(style=\"whitegrid\")\n", "\n", "df_train = pd.read_csv('train.csv')\n", "df_test = pd.read_csv('test.csv')\n", "df_submission = pd.read_csv('sampleSubmission.csv')" ] }, { "cell_type": "markdown", "metadata": { "id": "HQi0j-avej0Q" }, "source": [ "## Parte 1 - Procesamiento de los datos\n", "\n", "Dado que ya se ha familiarizado con los datos, se implementa el mismo preprocesamiento que utilizó en el Taller 3." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-04-06T08:45:17.367597Z", "start_time": "2022-04-06T08:45:17.333795Z" }, "id": "WAJk7qqqphJa" }, "outputs": [], "source": [ "df_train['datetime'] = pd.to_datetime(df_train['datetime'])\n", "\n", "df_train['hour'] = df_train['datetime'].dt.hour\n", "df_train['weekday'] = df_train['datetime'].dt.weekday\n", "df_train['month'] = df_train['datetime'].dt.month \n", "df_train['year'] = df_train['datetime'].dt.year\n", "\n", "y_train_full = df_train['count']\n", "df_train = df_train.drop(['datetime', 'casual', 'registered', 'count'], axis=1) # hay que eliminarlas ya que tiene relación directa con la columna objetivo y no aparecen en el conjunto de *test*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Dado que se trabajará con redes neuronales, ¿Cree conveniente realizar alguna modificación en el preprocesamiento?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-04-06T08:45:19.115054Z", "start_time": "2022-04-06T08:45:19.108573Z" }, "id": "3RotPZI9O67F" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "nyhdy7JagA0k" }, "source": [ "## Parte 2 - Multilayer Perceptron (MLP)\n", "\n", "Siguiendo el ejemplo de la sección *Building a Regression MLP Using the Sequential API*:\n", "\n", "\n", "* Implementar un estimador manteniendo los hiperparámetros del ejemplo.\n", "* ¿Cuál es la cantidad total de parámetros entrenables de la red?\n", "* Seleccionar aleatoriamente un *10%* de los datos para validación, y graficar la función de *loss* (*Mean Squared Logarithmic Error*) de entrenamiento y validación.\n", "\n", "***Nota:** Observe que en el ejemplo se agrega una capa de normalización*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "jzchTK1jrIxA" }, "source": [ "## Parte 3 - Ajuste fino\n", "\n", "Siguiendo el ejemplo de la sección *Fine-Tuning Neural Network Hyperparameters*:\n", "\n", "\n", "* Utilizar la herramienta *RandomSearch* de *KerasTuner* para la busqueda de hiperparámetros del modelo implementado en *keras*. \n", "* Probar el *tip* que se sugiere en la sección *Number of Neurons per Hidden Layer* y comentar los resultados." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "HmUt2QrywG60" }, "source": [ "## Parte 4 - Ajuste fino (Optuna)\n", "\n", "* Utilizar *Optuna* para la busqueda de hiperparámetros del modelo en *Keras*. Se le sugiere seguir uno de los siguientes ejemplos: [*keras_simple*](https://github.com/optuna/optuna-examples/blob/main/keras/keras_simple.py), [OptunaSearchCV](https://github.com/optuna/optuna-examples/blob/main/sklearn/sklearn_optuna_search_cv_simple.py).\n", "\n", "***Nota:** Optuna puede utilizarse para optimizar otras técnicas por fuera de las redes neuronales.*\n", "\n", "***Nota2:** Keras Tuner permite realizar Optimización bayesiana.*" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0DM4D3m-1X64", "outputId": "b2ed6754-9b75-4a83-8e61-c3e8394b613a" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "cQoU9uS1lkM0" }, "source": [ "## Parte 5 - Pipeline\n", "\n", "\n", "\n", "* Incorporar el estimador con mejor desempeño a un *pipeline* similar al implementado en el taller 3. Puede ser útil la librería [scikeras](https://adriangb.com/scikeras/stable/migration.html).\n", "* Subir los resultados de los datos *test* a la página de la competencia.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "hXPwnG-rhKN2", "outputId": "1f40a4c8-e48a-4cdb-bf22-03e86d33ae5e" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parte 6 - Shapley Values\n", "\n", "- Siguiendo el [ejemplo](https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html) presentado en la sección *Explaining a non-additive boosted tree model* de la librería *shap*. Encuentre los shapley values para el mejor modelo que haya encontrado.\n", "- Elija una muestra y mediante la función *shap.plots.waterfall* observe la explicación de la predicción del modelo.\n", "\n", "Más explicación de los Shapley Values y su uso en la interpretabilidad de modelos de aprendizaje automático se puede encontrar [aquí](https://christophm.github.io/interpretable-ml-book/shapley.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "Taller5_demanda_de_bicicletas_con_NNs_solucion.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false }, "vscode": { "interpreter": { "hash": "cf9ed6eb9f80a715f753bb5491e7f879990bf814a1e5372a2cadce9b619c9f4f" } } }, "nbformat": 4, "nbformat_minor": 1 }