{"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.2"},"colab":{"provenance":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"tmPdeNjs-usF"},"source":["# Clase 2\n","En esta clase vamos a poner en practica los conocimiento vistos en la primera clase.\n","\n","Vamos a utilizar un dataset bastante conocido: Titanic, que muestra datos de los pasajeros del crucero. El objetivo es entrenar un clasificador binario que, a partir de los datos de los pasajeros, clasifique correctamente su supervivencia.\n","\n","Comencemos por descargarlo en la siguiente celda:"]},{"cell_type":"code","metadata":{"id":"8hFEhzXLOt_1","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1692767441373,"user_tz":180,"elapsed":2450,"user":{"displayName":"Rodrigo Laguna","userId":"16454622511494825193"}},"outputId":"ec0c1dc5-7e90-4c05-c037-525f69415ff3"},"source":["! wget https://eva.fing.edu.uy/pluginfile.php/255092/mod_folder/content/0/titanic.txt"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["--2023-08-23 05:10:38-- https://eva.fing.edu.uy/pluginfile.php/255092/mod_folder/content/0/titanic.txt\n","Resolving eva.fing.edu.uy (eva.fing.edu.uy)... 164.73.32.9\n","Connecting to eva.fing.edu.uy (eva.fing.edu.uy)|164.73.32.9|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 116946 (114K) [text/plain]\n","Saving to: ‘titanic.txt.2’\n","\n","titanic.txt.2 100%[===================>] 114.21K 144KB/s in 0.8s \n","\n","2023-08-23 05:10:41 (144 KB/s) - ‘titanic.txt.2’ saved [116946/116946]\n","\n"]}]},{"cell_type":"markdown","source":["La siguiente celda es para verificar que la descarga fue exitosa. En caso de tener problemas, pueden hacerlo manualmente desde el EVA del curso, en el material de la clase 2."],"metadata":{"id":"XdsYHtBOrFrq"}},{"cell_type":"code","source":["import os\n","assert os.path.isfile(\"titanic.txt\"), \"No se descargo el archivo! Verificar la ejecucion de la celda anterior.\""],"metadata":{"id":"lMgBki_gkLpa"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"n_YDGSfw-usI"},"source":["Luego vamos a importar algunas librerias genericas.\n","\n","_Nota: Siempre es buena idea importar las librerías genéricas al principio._"]},{"cell_type":"code","metadata":{"id":"F2AJS_Oz-usK"},"source":["import sklearn as sk\n","import numpy as np\n","import matplotlib.pyplot as plt\n","\n","RANDOM_STATE=0"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"iJBU6x7--usL"},"source":["## Importación y procesamiento de datos\n","En todo proyecto de aprendizaje automático es fundamental manejar el conjunto de datos.\n","Es importante tener una noción del conjunto de datos, para saber, entre otras cosas:\n","* Qué atributos son numéricos, y cuáles son categóricos.\n","* Si hace falta normalizar los atributos (depende del algoritmo a utilizar también)\n","* La presencia de atributos faltantes."]},{"cell_type":"markdown","metadata":{"id":"wti5JaJH-usM"},"source":["### Importación\n","Primero importamos los datos. \n","Al observar el archivo, pueden identificar columnas e instancias que no sean necesarias?"]},{"cell_type":"code","metadata":{"id":"GWBSuIrh-usN","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1692767442395,"user_tz":180,"elapsed":466,"user":{"displayName":"Rodrigo Laguna","userId":"16454622511494825193"}},"outputId":"f8ccf4ee-7304-401c-f073-85442c7e6df5"},"source":["import os\n","import pandas as pd\n","\n","assert os.path.isfile('titanic.txt'), 'No se encontró el archivo titanic.txt; asegurate de haberlo cargado'\n","\n","# leemos el dataset utilizando Pandas\n","data = pd.read_csv('titanic.txt')\n","# eliminamos la columna row.names que solo tiene el nuero de fila\n","# tambien vamos a eliminar los atributos 'name' y 'home.dest'\n","# ya que contienen texto libre, y aun no hemos visto como tratar con ellos\n","data.drop(['row.names', 'name', 'home.dest'], axis=1, inplace=True)\n","\n","print('Mostramos para cada columna, el porcentaje de datos faltantes:\\n')\n","print(data.isnull().mean()*100)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Mostramos para cada columna, el porcentaje de datos faltantes:\n","\n","pclass 0.000000\n","survived 0.000000\n","age 51.789794\n","embarked 37.471439\n","room 94.135567\n","ticket 94.744859\n","boat 73.571973\n","sex 0.000000\n","dtype: float64\n"]}]},{"cell_type":"code","metadata":{"id":"LyMW796DWxuf"},"source":["# === Su codigo empieza acá ===\n","# Modificar la lista columns de manera que contenga solo aquellos atributos\n","# que querramos a usar. Notar que actualmente tiene todos los atributos\n","# borren aquellos que crean inutiles\n","columns = ['pclass', 'embarked',\n"," 'room', 'ticket', 'boat', 'sex', 'age']\n","\n","# === Su codigo termina acá ===\n","\n","# Nos quedamos con los datos como numpy array\n","X = data[columns].values\n","y = data['survived'].values\n","\n","print('X tiene forma', X.shape)\n","print('y tiene forma', y.shape)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"W3RjY6eBTR0N"},"source":["En la siguiente celda, interesa contar cuantos casos hay en cada clase, para elegir una metrica apropiada:\n","\n","_Pista: hay varias formas de hacerlo, por ejemplo, pueden usar la función [`np.unique`](https://numpy.org/doc/stable/reference/generated/numpy.unique.html)_"]},{"cell_type":"code","metadata":{"id":"EVMcTboQef94"},"source":["# === Su código empieza acá ===\n","\n","# === Su código termina acá ==="],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"jAs6wOanc1yx"},"source":["Vamos a visualizar algunos ejemplos al azar, utilizando la funcion `show_some_samples`:"]},{"cell_type":"code","metadata":{"id":"OFlsWmESc9Eg"},"source":["def show_some_samples(X, y, columns=columns, n_samples=3, seed=None):\n"," \"\"\"\n"," show random instances from X with its label in y.\n"," X: dataset to sample, as a numpy array of shape (n_samples, n_features)\n"," y: labels, as a numpy array of shape (n_samples,)\n"," columns: list of string with the name of each column, so len(columns) == n_features\n"," n_samples: number of samples to show\n"," seed: seed to set before choosing examples\n"," \"\"\"\n"," if seed is not None:\n"," np.random.seed(seed=seed)\n","\n"," idx = np.random.choice(len(X), n_samples)\n","\n"," for i, (x, t) in enumerate(zip(X[idx], y[idx])):\n"," print(f'==== idx {idx[i]:6d} :: target = {t} ====')\n"," for feat_name, feat_value in zip(columns, x):\n"," print(f'\\t{feat_name}: {feat_value}')\n","\n","# aca esta la invocacion, no tienen que cambiar nada\n","show_some_samples(X, y)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"cZ9ux4FAYxSq"},"source":["# Pregunta A\n","\n","Qué debería hacer primero: \n","\n","\n","1. Rellenar los datos faltantes con la politica elegida (por ejemplo, el más frecuente)\n","2. Partir el dataset en entrenamiento y test\n","\n","**Justifique**\n","\n","*Nota* Ajuste el orden de las siguientes celdas de acuerdo a lo que crea más conveniente.\n"]},{"cell_type":"markdown","metadata":{"id":"ofvGN3TGX3g_"},"source":["# 1: Separar train y test\n","En la siguiente celda, utilizar la funcion `sklearn.model_selection.train_test_split` para separar el dataset en entrenamiento y test. Vamos a tomar un 30% para test, y utilizar como semilla la constante `RANDOM_STATE`:"]},{"cell_type":"code","metadata":{"id":"_uP92xlyYQ6_"},"source":["from sklearn.model_selection import train_test_split\n","\n","# === Su código empieza acá ===\n","# X_train, X_test, y_train, y_test = # completar la invocación\n","X_train, X_test, y_train, y_test =\n","# === Su código termina acá ===\n","\n","# El siguiente codigo es un chequeo automatico de que todo va bien\n","# Si no salta ningun error, es porque esta todo Ok\n","assert len(X_train) == len(y_train), f'X_train e y_train deberian tener la misma cantidad de elementos: {len(X_train)} != {len(y_train)}'\n","assert len(X_test) == len(y_test), f'X_test e y_test deberian tener la misma cantidad de elementos: {len(X_test)} != {len(y_test)}'\n","assert X_train.shape[1] == X_test.shape[1], f'X_train y X_test deberian tener los mismos atributos: {X_test.shape[1]} != {X_test.shape[1]}'\n","assert len(X) * 0.28 < len(X_test) < len(X) * 0.32, 'Verificar que el test sea 30%'"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"a3hSa8VZZ1V3"},"source":["# 2: Imputar datos faltantes\n","Utilizar la clase `sklearn.impute.SimpleImputer` para rellenar los atributos faltantes.\n","\n","Probar la estrategia `mean` y `most_frequent`."]},{"cell_type":"markdown","source":["# Pregunta B\n","\n","**Cuál crees que es más adecuada? Justifique**"],"metadata":{"id":"gSTWVPAqt9eX"}},{"cell_type":"code","metadata":{"id":"W2w2mlnfCXAG"},"source":["from sklearn.impute import SimpleImputer\n","\n","# === Su código empieza acá ===\n","# definir el imputer y entrenarlo\n","imputer =\n","# === Su código termina acá ===\n","\n","X_train_fill = imputer.transform(X_train)\n","X_test_fill = imputer.transform(X_test)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"8BERiOyhk6o8"},"source":["En la siguiente celda vamos a ver como son los contenidos de cada atributo: cuantos hay de cada tipo.\n","\n","El objetivo de su ejecución es ayudarnos a decidir cuáles son y cómo vamos a codificar los atributos categóricos."]},{"cell_type":"code","metadata":{"id":"saYImSVLjPcY"},"source":["# Iteramos sobre cada una de las columnas\n","for idx, clm in enumerate(columns):\n"," print(f'==={idx}: {clm}===')\n"," # Para cada columna, contamos la cantidad de valores unicos que hay\n"," unq, cnt = np.unique(X_train_fill[:, idx], return_counts=True)\n"," for u, c in zip(unq, cnt):\n"," # mostramos cada valor unico, con la cantidad que hay,\n"," # y qué porcentaje representa del dataset\n"," print(f'\\t{u}: {c} - {100*c/cnt.sum():5.2f} %')"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"FuS1iypXbAqC"},"source":["# 3: Codificar atributos categóricos\n","El siguente paso va a ser codificar los atributos categóricos.\n","\n","En clase, mencionamos principalmente dos estrategias: `sklearn.preprocessing.OrdinalEncoder` y `sklearn.preprocessing.OneHotEncoder`. Utilice la (o las) que crea más conveniente.\n","\n","**PISTA**\n","\n","Hasta el momento, tenemos un maximo de 8 atributos. Sin embargo, **no todos ellos son categoricos**, por lo que en realidad solo necesito transformar alguno de ellos.\n","\n","Ademas, para cada uno, podria necesitar una codificacion distinta.\n","\n","Para esto, nos vamos a ayudar del transformador `sklearn.compose.ColumnTransformer`, que permite aplicar un transformador diferente a cada atributo (columna), y cuyo funcionamiento es el siguiente:\n","\n","```python\n","from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, MinMaxScaler\n","from sklearn.compose import ColumnTransformer\n","\n","transformers = [\n"," # ('nombre_arbitrario', Transformador, [indices])\n"," (\"trnf1\", OrdinalEncoder(), [0]),\n"," (\"trnf2\", OneHotEncoder(), [1, 2]),\n"," (\"scaler\", MinMaxScaler(), [3])\n"," ]\n","\n","ct = ColumnTransformer(transformers, remainder='passthrough')\n","\n","ct.fit(X)\n","X_trans = ct.transform(X)\n","```\n","\n","El `ColumnTransformer` recibe una lista de transformadores a aplicar, indicando a qué columna aplicarlo, y se debe especificar qué hacer con el resto de las columnas. En el ejemplo, `reminder='passthrough'` quiere decir que los valores se pasan de largo sin ninguna modificación."]},{"cell_type":"code","metadata":{"id":"fzIh-AMZeNHp"},"source":["print(columns)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"fLx6zeFDa_vv","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1692767442847,"user_tz":180,"elapsed":12,"user":{"displayName":"Rodrigo Laguna","userId":"16454622511494825193"}},"outputId":"cf95073a-135b-483d-febd-ba41bd3852d3"},"source":["from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, MinMaxScaler\n","from sklearn.compose import ColumnTransformer\n","\n","# === Su código empieza acá ===\n","# sugerencia: verificar el shape antes y despues de las transformaciones\n","# para asegurar que esta todo coherente\n","# definir el column transformer (ct) y entrenarlo\n","\n","\n","ct = ColumnTransformer #continuar la definicion\n","\n","# === Su código termina acá ===\n","X_train_fill_num = ct.transform(X_train_fill)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n"]}]},{"cell_type":"markdown","metadata":{"id":"faScon6DUlmZ"},"source":["Verificar el shape, que tenga sentido"]},{"cell_type":"code","metadata":{"id":"BpuQJwZCGGNq"},"source":["X_train_fill_num.shape"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["# Pregunta C\n","Cuantas columnas tiene, por qué y qué representa cada una?\n","\n","_Nota: no es estrictamente necesario saber cada columna con qué se corresponde (es decir: que hay en la columna 0? que hay en la columna 1? ... no es necesario responder a ese nivel)._\n","\n","_Se espera que si en este punto tienen, por ejemplo, 13 columnas, explique cuáles son y cómo llegaste a ellas._"],"metadata":{"id":"URsO7vYjufSe"}},{"cell_type":"markdown","metadata":{"id":"s8avUgpLEFqS"},"source":["\n","# 4: Seleccionar atributos\n","Utilizar alguna de las estrategias vistas en clase para quedarnos con 5 atributos."]},{"cell_type":"code","metadata":{"id":"arZObzwgEFV0"},"source":["from sklearn.feature_selection import RFE, SelectKBest, chi2, SequentialFeatureSelector\n","from sklearn.tree import DecisionTreeClassifier\n","\n","# === Su código empieza acá ===\n","# definir el feature selector (fs) y entrenarlo\n","\n","\n","fs =\n","\n","\n","# === Su código termina acá ===\n","X_train_fill_num_selected = fs.transform(X_train_fill_num)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ttbi5CSYi10k"},"source":["# 5: Entrenar un clasificador\n","En el siguiente paso, vamos a entrenar un clasificador [`sklearn.tree.DecisionTreeClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) con los datos siguiendo todas las transformaciones que seguimos hasta acá.\n","\n","Utilizar `sklearn.model_selection.GridSearchCV` o `sklearn.model_selection.RandomizedSearchCV` para seleccionar los mejores parametros.\n","\n","Utilizar validacion cruzada con 10 particiones. Utilizar una metrica acorde al problema."]},{"cell_type":"markdown","source":["# Pregunta D\n"," - Cuál es la **menor** cantidad de particiones que puedo usar?\n"," - Cuál es la **mayor** cantidad de particiones que puedo usar?\n"," - Qué pasa en cada uno de estos extremos?\n"," - Qué métrica vas a usar? **Justifique brevemente**\n"],"metadata":{"id":"JSTN30Lwwqiq"}},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"O5iE2CTEk720","executionInfo":{"status":"ok","timestamp":1692767443824,"user_tz":180,"elapsed":980,"user":{"displayName":"Rodrigo Laguna","userId":"16454622511494825193"}},"outputId":"6df74044-a85f-4b32-a629-ccda62661a04"},"source":["from sklearn.tree import DecisionTreeClassifier\n","from sklearn.model_selection import GridSearchCV, RandomizedSearchCV\n","\n","# === Su código empieza acá ===\n","# definir la grid search o random search (grid) y entrenarlo\n","grid =\n","# === Su código termina acá ===\n","grid.fit(X_train_fill_num_selected, y_train)\n","\n","print(grid.best_params_)\n","grid.best_score_"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["{'criterion': 'gini', 'max_depth': 2, 'max_features': None, 'splitter': 'random'}\n"]},{"output_type":"execute_result","data":{"text/plain":["0.6807936838257316"]},"metadata":{},"execution_count":15}]},{"cell_type":"markdown","metadata":{"id":"93GHxfCdHVgt"},"source":["# 6: pipeline\n","Compactar todos los pasos ejecutados hasta ahora en un mismo Pipeline."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"qib2hHQDHIJ0","executionInfo":{"status":"ok","timestamp":1692767443825,"user_tz":180,"elapsed":7,"user":{"displayName":"Rodrigo Laguna","userId":"16454622511494825193"}},"outputId":"3ed39a71-4f2a-43c6-91b9-09e4a356d1f5"},"source":["from sklearn.pipeline import Pipeline\n","from sklearn.model_selection import cross_validate\n","\n","\n","# === Su código empieza acá ===\n","# definir la pipeline (pipe) y complete con la metrica seleccionada\n","\n","pipe =\n","result = cross_validate(pipe, X_train, y_train, cv=10, scoring='accuracy')\n","# === Su código termina acá ===\n","\n","\n","score_mean = result['test_score'].mean()\n","score_std = result['test_score'].std()\n","\n","print(f'score obtenido: {score_mean:.3f} ± {score_std:.3f} %')"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["score obtenido: 0.683 ± 0.060 %\n"]},{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n","/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n","/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n","/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n","/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n","/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n","/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n","/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n","/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n","/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n"," warnings.warn(\n"]}]},{"cell_type":"markdown","metadata":{"id":"HRGI1jD9KQ6X"},"source":["# 7: Obtener el mejor clasificador posible\n","\n","Ahora que todos los pasos estan dentro de un pipeline, podemos re ver las desiciones tomadas en cada paso para obtener el mejor clasificador posible.\n"]},{"cell_type":"code","metadata":{"id":"QY_lAf3cJ6iT"},"source":["from sklearn.metrics import classification_report\n","from sklearn.neighbors import KNeighborsClassifier\n","from sklearn.ensemble import RandomForestClassifier\n","from sklearn.svm import SVC\n","from sklearn.linear_model import LogisticRegression\n","from sklearn.ensemble import GradientBoostingClassifier\n","\n","# === Su código empieza acá ===\n","# de ser necesario, agreguen más celdas\n","# de ser necesario, importen otros modulos\n","# Si quieren, aprovechen a rever todas las decisiones tomadas hasta acá\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"vEvlfuYASElX"},"source":["# 8: evaluacion\n","\n","\n","Una vez encontrado este clasificador, evaluarlo sobre el dataset de test con la funcion `sklearn.metrics.classification_report`"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"HJqeS7bHNqFg","executionInfo":{"status":"ok","timestamp":1692767719647,"user_tz":180,"elapsed":57,"user":{"displayName":"Rodrigo Laguna","userId":"16454622511494825193"}},"outputId":"53eac4ab-cd44-45ea-8969-4df87e63f2b1"},"source":["from sklearn.metrics import classification_report\n","\n","# === Su código empieza acá ===\n","# Utilizar el mejor modelo encontrado para clasificar X_test\n","# Asegurate de que este entrenado con los datos correctos\n","y_pred =\n","# === Su código termina acá ===\n","print(classification_report(y_test, y_pred))"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[" precision recall f1-score support\n","\n"," 0 0.84 0.81 0.82 264\n"," 1 0.64 0.68 0.66 130\n","\n"," accuracy 0.77 394\n"," macro avg 0.74 0.75 0.74 394\n","weighted avg 0.77 0.77 0.77 394\n","\n"]}]},{"cell_type":"markdown","source":["# Pregunta E\n","Suponiendo que el resultado de la celda anterior es este:\n","```python\n"," precision recall f1-score support\n","\n"," 0 0.84 0.81 0.82 264\n"," 1 0.64 0.68 0.66 130\n","\n"," accuracy 0.77 394\n"," macro avg 0.74 0.75 0.74 394\n","weighted avg 0.77 0.77 0.77 394\n","```\n","\n","Interprete con sus palabras, para una persona no tecnica, la implicancia de obtener:\n","\n","- precision = 0.84 para la clase 0\n","- recall = 0.68 para la clase 1"],"metadata":{"id":"nBit7JXfoCLi"}}]}