{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Parte A [Opcional]\n",
        "\n",
        "En esta parte van a implementar un transformer de scikit-learn desde cero.\n",
        "\n",
        "El objetivo es entender cómo funcionan y ver en la práctica cómo son los estándares de scikit learn.\n",
        "\n",
        "Un transformador es un objeto cuyo principal método es el `transform` que permite aplicar transformaciones sobre las features de entrada."
      ],
      "metadata": {
        "id": "g0NDHbHleKBv"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tYJeFWHVK8a8"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "from sklearn.base import BaseEstimator, TransformerMixin\n",
        "\n",
        "# Notar que hereda de BaseEstimator y TransformerMixin;\n",
        "# esto permite tener métodos extra \"gratis\", y garantiza\n",
        "# que nuestro transformer sea compatible con todo scikit\n",
        "# con solo implementar fit/transform y seguir los estándares\n",
        "class CustomStandardScaler(BaseEstimator, TransformerMixin):\n",
        "\n",
        "    \"\"\"Standardize features by removing the mean and scaling to unit variance.\n",
        "\n",
        "    This is a custom implementation of sklearn.preprocessing.StandardScaler for\n",
        "    learning purposes only.\n",
        "\n",
        "    The standard score of a sample `x` is calculated as:\n",
        "\n",
        "        z = (x - u) / s\n",
        "\n",
        "    where `u` is the mean of the training samples or zero if `with_mean=False`,\n",
        "    and `s` is the standard deviation of the training samples or one if\n",
        "    `with_std=False`.\n",
        "\n",
        "    Centering and scaling happen independently on each feature by computing\n",
        "    the relevant statistics on the samples in the training set. Mean and\n",
        "    standard deviation are then stored to be used on later data using\n",
        "    :meth:`transform`.\n",
        "\n",
        "    Parameters\n",
        "    ----------\n",
        "    with_mean : bool, default=True\n",
        "        If True, center the data before scaling.\n",
        "\n",
        "    with_std : bool, default=True\n",
        "        If True, scale the data to unit variance (or equivalently,\n",
        "        unit standard deviation).\n",
        "\n",
        "    Attributes\n",
        "    ----------\n",
        "    mean_ : ndarray of shape (n_features,) or None\n",
        "        The mean value for each feature in the training set.\n",
        "        Equal to ``None`` when ``with_mean=False``.\n",
        "\n",
        "    std_ : ndarray of shape (n_features,) or None\n",
        "        The variance for each feature in the training set.\n",
        "        Equal to ``None`` when ``with_std=False``.\n",
        "\n",
        "\n",
        "    See Also\n",
        "    --------\n",
        "    sklearn.preprocessing.StandardScaler : Original transformer from scikit-learn.\n",
        "\n",
        "    Examples\n",
        "    --------\n",
        "    >>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]\n",
        "    >>> scaler = CustomStandardScaler()\n",
        "    >>> print(scaler.fit(data))\n",
        "    CustomStandardScaler()\n",
        "    >>> print(scaler.mean_)\n",
        "    [0.5 0.5]\n",
        "    >>> print(scaler.transform(data))\n",
        "    [[-1. -1.]\n",
        "     [-1. -1.]\n",
        "     [ 1.  1.]\n",
        "     [ 1.  1.]]\n",
        "    >>> print(scaler.transform([[2, 2]]))\n",
        "    [[3. 3.]]\n",
        "    \"\"\"\n",
        "    # todos los parametros de entrada deben tener un valor por defecto\n",
        "    # y, todos los parámetros necesarios se pasan en el __init__\n",
        "    def __init__(self, with_mean=True, with_std=True):\n",
        "        super().__init__()\n",
        "        # se debe almacenar todos los paramétros de entrada en un atributo\n",
        "        # de igual nombre al de entrada, y con el mismo valor que dio el usuario\n",
        "        # ejemplo: si tengo un parametro pepe de entrada, lo guardo como self.pepe\n",
        "\n",
        "        # == su codigo empieza aqui ====\n",
        "        self.with_mean =\n",
        "        self.with_std =\n",
        "        # == su codigo termina aqui ====\n",
        "\n",
        "        # los atributos calculados los nombre con _ al final\n",
        "        self.mean_ = None\n",
        "        self.std_ = None\n",
        "\n",
        "    def fit(self, X, y=None):\n",
        "\n",
        "        # implementar el metodo fit que calcula la media y desviacion\n",
        "        # de los datos en X, y los guarda en self.mean_ y self.std_ respectivamente\n",
        "        # dependiendo de los parametros dados por el usuario\n",
        "\n",
        "        # == su codigo empieza aqui ====\n",
        "            self.mean_ =\n",
        "            self.std_ =\n",
        "        # == su codigo termina aqui ====\n",
        "\n",
        "        # IMPORTANTE: el .fit siempre retorna el self\n",
        "        return self\n",
        "\n",
        "    def transform(self, X, y=None):\n",
        "        # implementar el metodo transform que resta la media de self.mean_\n",
        "        # y divide entre la desviacion estandar self.std_\n",
        "        # según los parametros dados por el usuario\n",
        "        # == su codigo empieza aqui ====\n",
        "        # == su codigo termina aqui ====\n",
        "        return X"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "En la siguiente celda comparamos el `CustomStandardScaler` implementado anteriormente, con el provisto por scikit, para verificar que nos da lo mismo y nueestra implementación es correcta"
      ],
      "metadata": {
        "id": "KyV8ytO6gOxN"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.preprocessing import StandardScaler\n",
        "\n",
        "np.random.seed(0)\n",
        "X = np.random.normal(5, 2, (200, 8))\n",
        "\n",
        "for with_mean in [True, False]:\n",
        "  for with_std in [True, False]:\n",
        "    X_sklearn = StandardScaler(with_mean=with_mean, with_std=with_std).fit(X).transform(X)\n",
        "    X_nuestro = CustomStandardScaler(with_mean=with_mean, with_std=with_std).fit(X).transform(X)\n",
        "\n",
        "    assert np.max(np.abs(X_sklearn - X_nuestro))==0, (with_mean, with_std)"
      ],
      "metadata": {
        "id": "-8qPmPyjShId"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Lo siguiente que vamos a hacer es tomar un dataset, iris, y entrenar un clasificador. El objetivo es usar nuestro `CustomStandardScaler` en una grid search más adelante.\n",
        "\n",
        "##Preguntas:\n",
        "En el llamado a `train_test_split`:\n",
        "- Qué hace el parametro `stratify=y`? Por qué es importante?\n",
        "- Qué hace el parámetro `shuffle=True`? Por qué es importante?"
      ],
      "metadata": {
        "id": "h2t7zcJGge6M"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.datasets import load_iris\n",
        "from sklearn.model_selection import train_test_split\n",
        "\n",
        "iris = load_iris()\n",
        "\n",
        "y = iris.target\n",
        "X = iris.data\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, y,\n",
        "                                                    test_size=0.1,\n",
        "                                                    random_state=0,\n",
        "                                                    stratify=y,\n",
        "                                                    shuffle=True)\n",
        "\n",
        "\n",
        "X_train.shape, y_train.shape"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "kqkdIHsiOE95",
        "outputId": "8ef04f01-eff0-49b9-d4d6-b796641d8c22"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "((135, 4), (135,))"
            ]
          },
          "metadata": {},
          "execution_count": 77
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Implemente un pipe de clasificación que utilice el `CustomStandardScaler` desarrollado. Ejecute una grid search sobre el pipe que pruebe todas las combinaciones de los parámetros `CustomStandardScaler.with_mean` y `CustomStandardScaler.with_mean` al menos."
      ],
      "metadata": {
        "id": "-6oDbgQyhG4i"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.pipeline import Pipeline\n",
        "from sklearn.neighbors import KNeighborsClassifier\n",
        "from sklearn.model_selection import GridSearchCV\n",
        "\n",
        "pipe = Pipeline([\n",
        "    (\"scaler\", CustomStandardScaler()),\n",
        "\n",
        "])\n",
        "\n",
        "gs = GridSearchCV(\n",
        "    pipe,\n",
        "    {\n",
        "        \"scaler__with_mean\": [True, False],\n",
        "        \"scaler__with_std\": [True, False],\n",
        "\n",
        "    },\n",
        "    cv=6,\n",
        "    n_jobs=-1,\n",
        "    scoring = (\"accuracy\", \"f1_macro\"),  # defino todas las que quiero trackear\n",
        "    refit=\"accuracy\"  # indico cual es la mas importante para reentrenar el ganador\n",
        ")\n",
        "\n",
        "gs.fit(X_train, y_train)\n",
        "print(gs.best_params_)\n",
        "print(gs.best_score_)"
      ],
      "metadata": {
        "id": "v68dx5hOORlm"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Para terminar vamos a levantar todas las ejecuciones en un unico DataFrame de pandas, como una forma rápida de visualización de estos datos"
      ],
      "metadata": {
        "id": "s3th-szjhk0T"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "# habilitamos todas las columnas\n",
        "pd.set_option('display.max_columns', None)\n",
        "# levantamos los resultados de la grid search en un dataframe\n",
        "pd.DataFrame(gs.cv_results_)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "TKQ48IgHPybR",
        "outputId": "84ded1e6-b3f5-4dfb-fa22-bc33aa76d137"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "    mean_fit_time  std_fit_time  mean_score_time  std_score_time  \\\n",
              "0        0.002353      0.000816         0.009110        0.003944   \n",
              "1        0.001512      0.000131         0.007573        0.004027   \n",
              "2        0.001884      0.000986         0.005648        0.002699   \n",
              "3        0.001446      0.000099         0.008071        0.004958   \n",
              "4        0.002101      0.001313         0.005209        0.000959   \n",
              "5        0.001772      0.000372         0.006268        0.001582   \n",
              "6        0.004500      0.003316         0.009153        0.003559   \n",
              "7        0.001475      0.000125         0.005685        0.001782   \n",
              "8        0.001441      0.000052         0.004896        0.000112   \n",
              "9        0.001500      0.000069         0.005179        0.000726   \n",
              "10       0.001516      0.000066         0.004809        0.000103   \n",
              "11       0.001328      0.000155         0.004345        0.000738   \n",
              "\n",
              "   param_clf__n_neighbors param_scaler__with_mean param_scaler__with_std  \\\n",
              "0                       5                    True                   True   \n",
              "1                       5                    True                  False   \n",
              "2                       5                   False                   True   \n",
              "3                       5                   False                  False   \n",
              "4                      10                    True                   True   \n",
              "5                      10                    True                  False   \n",
              "6                      10                   False                   True   \n",
              "7                      10                   False                  False   \n",
              "8                      15                    True                   True   \n",
              "9                      15                    True                  False   \n",
              "10                     15                   False                   True   \n",
              "11                     15                   False                  False   \n",
              "\n",
              "                                               params  split0_test_accuracy  \\\n",
              "0   {'clf__n_neighbors': 5, 'scaler__with_mean': T...              1.000000   \n",
              "1   {'clf__n_neighbors': 5, 'scaler__with_mean': T...              0.956522   \n",
              "2   {'clf__n_neighbors': 5, 'scaler__with_mean': F...              1.000000   \n",
              "3   {'clf__n_neighbors': 5, 'scaler__with_mean': F...              0.956522   \n",
              "4   {'clf__n_neighbors': 10, 'scaler__with_mean': ...              0.956522   \n",
              "5   {'clf__n_neighbors': 10, 'scaler__with_mean': ...              1.000000   \n",
              "6   {'clf__n_neighbors': 10, 'scaler__with_mean': ...              0.956522   \n",
              "7   {'clf__n_neighbors': 10, 'scaler__with_mean': ...              1.000000   \n",
              "8   {'clf__n_neighbors': 15, 'scaler__with_mean': ...              1.000000   \n",
              "9   {'clf__n_neighbors': 15, 'scaler__with_mean': ...              1.000000   \n",
              "10  {'clf__n_neighbors': 15, 'scaler__with_mean': ...              1.000000   \n",
              "11  {'clf__n_neighbors': 15, 'scaler__with_mean': ...              1.000000   \n",
              "\n",
              "    split1_test_accuracy  split2_test_accuracy  split3_test_accuracy  \\\n",
              "0               0.913043              1.000000              0.909091   \n",
              "1               0.913043              1.000000              0.954545   \n",
              "2               0.913043              1.000000              0.909091   \n",
              "3               0.913043              1.000000              0.954545   \n",
              "4               0.913043              0.956522              0.954545   \n",
              "5               0.956522              0.956522              0.909091   \n",
              "6               0.913043              0.956522              0.954545   \n",
              "7               0.956522              0.956522              0.909091   \n",
              "8               0.913043              1.000000              0.909091   \n",
              "9               0.956522              1.000000              0.909091   \n",
              "10              0.913043              1.000000              0.909091   \n",
              "11              0.956522              1.000000              0.909091   \n",
              "\n",
              "    split4_test_accuracy  split5_test_accuracy  mean_test_accuracy  \\\n",
              "0                    1.0                   1.0            0.970356   \n",
              "1                    1.0                   1.0            0.970685   \n",
              "2                    1.0                   1.0            0.970356   \n",
              "3                    1.0                   1.0            0.970685   \n",
              "4                    1.0                   1.0            0.963439   \n",
              "5                    1.0                   1.0            0.970356   \n",
              "6                    1.0                   1.0            0.963439   \n",
              "7                    1.0                   1.0            0.970356   \n",
              "8                    1.0                   1.0            0.970356   \n",
              "9                    1.0                   1.0            0.977602   \n",
              "10                   1.0                   1.0            0.970356   \n",
              "11                   1.0                   1.0            0.977602   \n",
              "\n",
              "    std_test_accuracy  rank_test_accuracy  split0_test_f1_macro  \\\n",
              "0            0.041939                   5              1.000000   \n",
              "1            0.032562                   3              0.955556   \n",
              "2            0.041939                   5              1.000000   \n",
              "3            0.032562                   3              0.955556   \n",
              "4            0.029966                  11              0.955556   \n",
              "5            0.033597                   5              1.000000   \n",
              "6            0.029966                  11              0.955556   \n",
              "7            0.033597                   5              1.000000   \n",
              "8            0.041939                   5              1.000000   \n",
              "9            0.034508                   1              1.000000   \n",
              "10           0.041939                   5              1.000000   \n",
              "11           0.034508                   1              1.000000   \n",
              "\n",
              "    split1_test_f1_macro  split2_test_f1_macro  split3_test_f1_macro  \\\n",
              "0               0.907407              1.000000              0.910714   \n",
              "1               0.907407              1.000000              0.954751   \n",
              "2               0.907407              1.000000              0.910714   \n",
              "3               0.907407              1.000000              0.954751   \n",
              "4               0.907407              0.954751              0.955556   \n",
              "5               0.954751              0.954751              0.910714   \n",
              "6               0.907407              0.954751              0.955556   \n",
              "7               0.954751              0.954751              0.910714   \n",
              "8               0.907407              1.000000              0.910714   \n",
              "9               0.954751              1.000000              0.907407   \n",
              "10              0.907407              1.000000              0.910714   \n",
              "11              0.954751              1.000000              0.907407   \n",
              "\n",
              "    split4_test_f1_macro  split5_test_f1_macro  mean_test_f1_macro  \\\n",
              "0                    1.0                   1.0            0.969687   \n",
              "1                    1.0                   1.0            0.969619   \n",
              "2                    1.0                   1.0            0.969687   \n",
              "3                    1.0                   1.0            0.969619   \n",
              "4                    1.0                   1.0            0.962212   \n",
              "5                    1.0                   1.0            0.970036   \n",
              "6                    1.0                   1.0            0.962212   \n",
              "7                    1.0                   1.0            0.970036   \n",
              "8                    1.0                   1.0            0.969687   \n",
              "9                    1.0                   1.0            0.977026   \n",
              "10                   1.0                   1.0            0.969687   \n",
              "11                   1.0                   1.0            0.977026   \n",
              "\n",
              "    std_test_f1_macro  rank_test_f1_macro  \n",
              "0            0.042880                   5  \n",
              "1            0.034298                   9  \n",
              "2            0.042880                   5  \n",
              "3            0.034298                   9  \n",
              "4            0.031632                  11  \n",
              "5            0.033366                   3  \n",
              "6            0.031632                  11  \n",
              "7            0.033366                   3  \n",
              "8            0.042880                   5  \n",
              "9            0.035247                   1  \n",
              "10           0.042880                   5  \n",
              "11           0.035247                   1  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-f09bea61-c8ad-4808-bab0-20064caea745\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_fit_time</th>\n",
              "      <th>std_fit_time</th>\n",
              "      <th>mean_score_time</th>\n",
              "      <th>std_score_time</th>\n",
              "      <th>param_clf__n_neighbors</th>\n",
              "      <th>param_scaler__with_mean</th>\n",
              "      <th>param_scaler__with_std</th>\n",
              "      <th>params</th>\n",
              "      <th>split0_test_accuracy</th>\n",
              "      <th>split1_test_accuracy</th>\n",
              "      <th>split2_test_accuracy</th>\n",
              "      <th>split3_test_accuracy</th>\n",
              "      <th>split4_test_accuracy</th>\n",
              "      <th>split5_test_accuracy</th>\n",
              "      <th>mean_test_accuracy</th>\n",
              "      <th>std_test_accuracy</th>\n",
              "      <th>rank_test_accuracy</th>\n",
              "      <th>split0_test_f1_macro</th>\n",
              "      <th>split1_test_f1_macro</th>\n",
              "      <th>split2_test_f1_macro</th>\n",
              "      <th>split3_test_f1_macro</th>\n",
              "      <th>split4_test_f1_macro</th>\n",
              "      <th>split5_test_f1_macro</th>\n",
              "      <th>mean_test_f1_macro</th>\n",
              "      <th>std_test_f1_macro</th>\n",
              "      <th>rank_test_f1_macro</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0.002353</td>\n",
              "      <td>0.000816</td>\n",
              "      <td>0.009110</td>\n",
              "      <td>0.003944</td>\n",
              "      <td>5</td>\n",
              "      <td>True</td>\n",
              "      <td>True</td>\n",
              "      <td>{'clf__n_neighbors': 5, 'scaler__with_mean': T...</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.913043</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.909091</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970356</td>\n",
              "      <td>0.041939</td>\n",
              "      <td>5</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.910714</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.969687</td>\n",
              "      <td>0.042880</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>0.001512</td>\n",
              "      <td>0.000131</td>\n",
              "      <td>0.007573</td>\n",
              "      <td>0.004027</td>\n",
              "      <td>5</td>\n",
              "      <td>True</td>\n",
              "      <td>False</td>\n",
              "      <td>{'clf__n_neighbors': 5, 'scaler__with_mean': T...</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.913043</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.954545</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970685</td>\n",
              "      <td>0.032562</td>\n",
              "      <td>3</td>\n",
              "      <td>0.955556</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.969619</td>\n",
              "      <td>0.034298</td>\n",
              "      <td>9</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>0.001884</td>\n",
              "      <td>0.000986</td>\n",
              "      <td>0.005648</td>\n",
              "      <td>0.002699</td>\n",
              "      <td>5</td>\n",
              "      <td>False</td>\n",
              "      <td>True</td>\n",
              "      <td>{'clf__n_neighbors': 5, 'scaler__with_mean': F...</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.913043</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.909091</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970356</td>\n",
              "      <td>0.041939</td>\n",
              "      <td>5</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.910714</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.969687</td>\n",
              "      <td>0.042880</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>0.001446</td>\n",
              "      <td>0.000099</td>\n",
              "      <td>0.008071</td>\n",
              "      <td>0.004958</td>\n",
              "      <td>5</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>{'clf__n_neighbors': 5, 'scaler__with_mean': F...</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.913043</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.954545</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970685</td>\n",
              "      <td>0.032562</td>\n",
              "      <td>3</td>\n",
              "      <td>0.955556</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.969619</td>\n",
              "      <td>0.034298</td>\n",
              "      <td>9</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>0.002101</td>\n",
              "      <td>0.001313</td>\n",
              "      <td>0.005209</td>\n",
              "      <td>0.000959</td>\n",
              "      <td>10</td>\n",
              "      <td>True</td>\n",
              "      <td>True</td>\n",
              "      <td>{'clf__n_neighbors': 10, 'scaler__with_mean': ...</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.913043</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.954545</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.963439</td>\n",
              "      <td>0.029966</td>\n",
              "      <td>11</td>\n",
              "      <td>0.955556</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>0.955556</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.962212</td>\n",
              "      <td>0.031632</td>\n",
              "      <td>11</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>0.001772</td>\n",
              "      <td>0.000372</td>\n",
              "      <td>0.006268</td>\n",
              "      <td>0.001582</td>\n",
              "      <td>10</td>\n",
              "      <td>True</td>\n",
              "      <td>False</td>\n",
              "      <td>{'clf__n_neighbors': 10, 'scaler__with_mean': ...</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.909091</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970356</td>\n",
              "      <td>0.033597</td>\n",
              "      <td>5</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>0.910714</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970036</td>\n",
              "      <td>0.033366</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>0.004500</td>\n",
              "      <td>0.003316</td>\n",
              "      <td>0.009153</td>\n",
              "      <td>0.003559</td>\n",
              "      <td>10</td>\n",
              "      <td>False</td>\n",
              "      <td>True</td>\n",
              "      <td>{'clf__n_neighbors': 10, 'scaler__with_mean': ...</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.913043</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.954545</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.963439</td>\n",
              "      <td>0.029966</td>\n",
              "      <td>11</td>\n",
              "      <td>0.955556</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>0.955556</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.962212</td>\n",
              "      <td>0.031632</td>\n",
              "      <td>11</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>0.001475</td>\n",
              "      <td>0.000125</td>\n",
              "      <td>0.005685</td>\n",
              "      <td>0.001782</td>\n",
              "      <td>10</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>{'clf__n_neighbors': 10, 'scaler__with_mean': ...</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>0.909091</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970356</td>\n",
              "      <td>0.033597</td>\n",
              "      <td>5</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>0.910714</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970036</td>\n",
              "      <td>0.033366</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>0.001441</td>\n",
              "      <td>0.000052</td>\n",
              "      <td>0.004896</td>\n",
              "      <td>0.000112</td>\n",
              "      <td>15</td>\n",
              "      <td>True</td>\n",
              "      <td>True</td>\n",
              "      <td>{'clf__n_neighbors': 15, 'scaler__with_mean': ...</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.913043</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.909091</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970356</td>\n",
              "      <td>0.041939</td>\n",
              "      <td>5</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.910714</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.969687</td>\n",
              "      <td>0.042880</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>0.001500</td>\n",
              "      <td>0.000069</td>\n",
              "      <td>0.005179</td>\n",
              "      <td>0.000726</td>\n",
              "      <td>15</td>\n",
              "      <td>True</td>\n",
              "      <td>False</td>\n",
              "      <td>{'clf__n_neighbors': 15, 'scaler__with_mean': ...</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.909091</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.977602</td>\n",
              "      <td>0.034508</td>\n",
              "      <td>1</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.977026</td>\n",
              "      <td>0.035247</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>0.001516</td>\n",
              "      <td>0.000066</td>\n",
              "      <td>0.004809</td>\n",
              "      <td>0.000103</td>\n",
              "      <td>15</td>\n",
              "      <td>False</td>\n",
              "      <td>True</td>\n",
              "      <td>{'clf__n_neighbors': 15, 'scaler__with_mean': ...</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.913043</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.909091</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.970356</td>\n",
              "      <td>0.041939</td>\n",
              "      <td>5</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.910714</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.969687</td>\n",
              "      <td>0.042880</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>0.001328</td>\n",
              "      <td>0.000155</td>\n",
              "      <td>0.004345</td>\n",
              "      <td>0.000738</td>\n",
              "      <td>15</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>{'clf__n_neighbors': 15, 'scaler__with_mean': ...</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.956522</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.909091</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.977602</td>\n",
              "      <td>0.034508</td>\n",
              "      <td>1</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.954751</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.907407</td>\n",
              "      <td>1.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0.977026</td>\n",
              "      <td>0.035247</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f09bea61-c8ad-4808-bab0-20064caea745')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-f09bea61-c8ad-4808-bab0-20064caea745 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-f09bea61-c8ad-4808-bab0-20064caea745');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-8dddc715-dc15-42cb-8469-af174d648407\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8dddc715-dc15-42cb-8469-af174d648407')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const charts = await google.colab.kernel.invokeFunction(\n",
              "          'suggestCharts', [key], {});\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-8dddc715-dc15-42cb-8469-af174d648407 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 79
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import classification_report\n",
        "y_pred = gs.predict(X_test)\n",
        "\n",
        "print(classification_report(y_test, y_pred))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "BWgXdyuuUScw",
        "outputId": "3bc46983-1d53-463e-ef00-f26ed1545235"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "              precision    recall  f1-score   support\n",
            "\n",
            "           0       1.00      1.00      1.00         5\n",
            "           1       1.00      1.00      1.00         5\n",
            "           2       1.00      1.00      1.00         5\n",
            "\n",
            "    accuracy                           1.00        15\n",
            "   macro avg       1.00      1.00      1.00        15\n",
            "weighted avg       1.00      1.00      1.00        15\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Parte B\n",
        "\n",
        "En esta parte vamos a poner en práctica los conceptos vistos en clase. Vamos a usar el daraset de `california housing`. Este dataset es para estimar precios promedios de casas en California, pero, lo convertiremos en un problema de clasificacion binaria: casas baratas vs casas baratas, en función de si su precio es mayor o no que el promedio de precios del dataset."
      ],
      "metadata": {
        "id": "aICK6TUQiLgm"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(fetch_california_housing().DESCR)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "qXqa_DSFcQrg",
        "outputId": "3c49794c-ad32-4408-c080-63afc6152276"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            ".. _california_housing_dataset:\n",
            "\n",
            "California Housing dataset\n",
            "--------------------------\n",
            "\n",
            "**Data Set Characteristics:**\n",
            "\n",
            "    :Number of Instances: 20640\n",
            "\n",
            "    :Number of Attributes: 8 numeric, predictive attributes and the target\n",
            "\n",
            "    :Attribute Information:\n",
            "        - MedInc        median income in block group\n",
            "        - HouseAge      median house age in block group\n",
            "        - AveRooms      average number of rooms per household\n",
            "        - AveBedrms     average number of bedrooms per household\n",
            "        - Population    block group population\n",
            "        - AveOccup      average number of household members\n",
            "        - Latitude      block group latitude\n",
            "        - Longitude     block group longitude\n",
            "\n",
            "    :Missing Attribute Values: None\n",
            "\n",
            "This dataset was obtained from the StatLib repository.\n",
            "https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html\n",
            "\n",
            "The target variable is the median house value for California districts,\n",
            "expressed in hundreds of thousands of dollars ($100,000).\n",
            "\n",
            "This dataset was derived from the 1990 U.S. census, using one row per census\n",
            "block group. A block group is the smallest geographical unit for which the U.S.\n",
            "Census Bureau publishes sample data (a block group typically has a population\n",
            "of 600 to 3,000 people).\n",
            "\n",
            "A household is a group of people residing within a home. Since the average\n",
            "number of rooms and bedrooms in this dataset are provided per household, these\n",
            "columns may take surprisingly large values for block groups with few households\n",
            "and many empty houses, such as vacation resorts.\n",
            "\n",
            "It can be downloaded/loaded using the\n",
            ":func:`sklearn.datasets.fetch_california_housing` function.\n",
            "\n",
            ".. topic:: References\n",
            "\n",
            "    - Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,\n",
            "      Statistics and Probability Letters, 33 (1997) 291-297\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.datasets import fetch_california_housing\n",
        "from sklearn.model_selection import train_test_split\n",
        "\n",
        "X, y = fetch_california_housing(return_X_y=True, as_frame=False)\n",
        "\n",
        "y_mean = np.mean(y)\n",
        "\n",
        "# pasamos el target a binario: 0 o 1\n",
        "y[y<=y_mean] = 0\n",
        "y[y>y_mean] = 1\n",
        "y = y.astype(int)\n",
        "\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, y,\n",
        "                                                    test_size=0.1,\n",
        "                                                    random_state=0,\n",
        "                                                    stratify=y,\n",
        "                                                    shuffle=True)\n",
        "X_train.shape, y_train.shape"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "eyRcP85XYole",
        "outputId": "3cba3d9b-22fe-41bd-a94a-82d58c28c975"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "((18576, 8), (18576,))"
            ]
          },
          "metadata": {},
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Vamos a penalizar distintos los errores:\n",
        "- Una casa cara que es clasificada como barata, va a tener un costo de 1\n",
        "- Una casa barata que es clasificada como cara, va a tener un costo de 2\n",
        "\n",
        "Definir la matriz de costos como un numpy array.\n",
        "\n",
        "Con esta matriz, implementar la `expected_cost_los`: el costo esperado visto en clase:"
      ],
      "metadata": {
        "id": "6DxS9LbdjBpZ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import confusion_matrix, make_scorer\n",
        "\n",
        "COST_MATRIX = np.array([\n",
        "    [xx, yy],\n",
        "    [ww, zz]\n",
        "])\n",
        "\n",
        "assert COST_MATRIX.shape == (2, 2)\n",
        "\n",
        "def expected_cost_loss(y_true, y_pred):\n",
        "    # == su codigo empieza aqui ====\n",
        "\n",
        "    cost =\n",
        "    # == su codigo termina aqui ====\n",
        "    return cost\n",
        "\n",
        "expected_cost_scorer = make_scorer(expected_cost_loss, greater_is_better=False)"
      ],
      "metadata": {
        "id": "qny8-Gccbl18"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Implementar un pipeline para encontrar un clasificador probabilistico (o sea, asegurense que tenga disponible un `predict_proba`) y sus parámetros para minimizar el costo esperado definido anteriormente."
      ],
      "metadata": {
        "id": "JkHEd-SOjk8F"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.pipeline import Pipeline\n",
        "from sklearn.neighbors import KNeighborsClassifier\n",
        "from sklearn.model_selection import GridSearchCV\n",
        "\n",
        "# == su codigo empieza aqui ====\n",
        "pipe = Pipeline([\n",
        "\n",
        "])\n",
        "\n",
        "gs = GridSearchCV(\n",
        "\n",
        ")\n",
        "# == su codigo termina aqui ====\n",
        "gs.fit(X_train, y_train)\n",
        "print(gs.best_params_)\n",
        "print(gs.best_score_)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "gevd8mera_Gk",
        "outputId": "e255d744-33dd-4360-8515-8972b52fc6ec"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{'scaler__with_mean': True, 'scaler__with_std': True}\n",
            "-0.2635120585701981\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Evaluar el clasificador entrenado. Reportar el costo esperado y el costo esperado normalizado para el mejor clasificador encontrado."
      ],
      "metadata": {
        "id": "AZeNhXb5j6i4"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import classification_report\n",
        "\n",
        "# == su codigo empieza aqui ====\n",
        "# == su codigo termina aqui ===="
      ],
      "metadata": {
        "id": "Ge_7ABDJkYf2"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Reportar auc_ROC y auc_PR [opcional: graficarlos]"
      ],
      "metadata": {
        "id": "M_vp1lPd1Zsx"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# == su codigo empieza aqui ====\n",
        "# == su codigo termina aqui ===="
      ],
      "metadata": {
        "id": "w1tUtE1v1cUX"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "\n",
        "Para el clasificador entrenado, encontrar un threshold que minimice la funcion de costo esperado"
      ],
      "metadata": {
        "id": "WVQyMo7S1MVw"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# == su codigo empieza aqui ====\n",
        "# == su codigo termina aqui ===="
      ],
      "metadata": {
        "id": "tkiK2GmZknQy"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Su clasificador, esta bien calibrado? Calibrarlo. Mostrar el brier_score_loss antes y después ed calibrarlo. [opcional: mostrar diagramas de calibración antes y después de calibrarlo]"
      ],
      "metadata": {
        "id": "TP_4F7xG1oNR"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# == su codigo empieza aqui ====\n",
        "# == su codigo termina aqui ===="
      ],
      "metadata": {
        "id": "HuYJHH-C1ytM"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}