{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "woh2l3SQBqh_" }, "source": [ "# Práctico 4 - Regresión lineal bayesiana" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7HUwTrcGBqiE" }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy.stats import norm\n", "from scipy.stats import multivariate_normal" ] }, { "cell_type": "markdown", "metadata": { "id": "it-OqujWBqiF" }, "source": [ "## Setting" ] }, { "cell_type": "markdown", "source": [ "Consideramos el modelo lineal\n", "$$y=Xw+\\epsilon$$\n", "donde $\\epsilon \\sim \\mathcal{N}(0,\\,\\sigma^{2})$ con $\\sigma^{2}$ conocida." ], "metadata": { "id": "DvkntOE0htRD" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kMlIYhZCBqiG" }, "outputs": [], "source": [ "# Datos de entrenamiento\n", "X_train = np.array([206, 188, 219, 372, 345, 231, 203, 170, 55, 91, 292, 141, 129, 170, 324]).reshape(-1, 1)\n", "y_train = np.array([29, 25, 31, 25, 29, 30, 26, 23, 12, 15, 28, 24, 23, 22, 30])\n", "\n", "# Datos de validación\n", "X_val = np.array([213, 80, 391, 250, 57, 303, 263, 157, 72, 157, 188, 216, 362, 283, 308]).reshape(-1, 1)\n", "y_val = np.array([30, 16, 25, 26, 9, 28, 28, 25, 13, 23, 26, 25, 28, 33, 30])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nGL21bwpBqiH" }, "outputs": [], "source": [ "# Matriz de diseño: agregar columna de unos para el término independiente b\n", "X = np.hstack([np.ones((X_train.shape[0], 1)), X_train])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "uEb522NVBqiI" }, "outputs": [], "source": [ "print(X)" ] }, { "cell_type": "markdown", "metadata": { "id": "9Ebdzvw1BqiJ" }, "source": [ "## Regresión lineal con máxima verosimilitud" ] }, { "cell_type": "markdown", "source": [ "Calcular los estimadores de máxima verosimilitud para el modelo lineal, usando que $\\hat{\\theta}_{MLE}=(X^TX)^{-1}X^Ty$" ], "metadata": { "id": "-ig3UozLcm5h" } }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "dGeuL9crBqiK" }, "outputs": [], "source": [ "#COMPLETAR\n", "#theta =" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Pywy5yZ7BqiM", "outputId": "34aa2e8f-73b3-438b-e68e-c8f6e1a4f9a9" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "MLE de b: 15.715950272533467\n", "MLE de w: 0.04345049295663197\n" ] } ], "source": [ "print(f'MLE de b: {theta[0]}')\n", "print(f'MLE de w: {theta[1]}')" ] }, { "cell_type": "markdown", "source": [ "Calcular el MSE en train" ], "metadata": { "id": "7rCDmB5GfVep" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "IWTGEDmxBqiN" }, "outputs": [], "source": [ "#COMPLETAR\n", "#mse =\n", "print(f'MLE de sigma (mse): {mse}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Ej6tfZRHBqiO" }, "outputs": [], "source": [ "# Suponemos sigma conocido\n", "sigma2 = mse" ] }, { "cell_type": "markdown", "metadata": { "id": "yMAeRdXIBqiP" }, "source": [ "## Distribución a priori" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NKR6SteGBqiP" }, "outputs": [], "source": [ "# Hiperparámetros de la distribución a priori\n", "mu_0 = np.array([0, 0])\n", "Sigma_0 = np.array([[100, 0], [0, 100]])\n", "\n", "# Inversa de la matriz de covarianza a priori\n", "Sigma_0_inv = np.linalg.inv(Sigma_0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZMjmSqgwBqiP" }, "outputs": [], "source": [ "print(\"Matriz de covarianza a priori (Sigma_0):\\n\", Sigma_0)\n", "print(\"\\nMedia a priori (mu_0):\\n\", mu_0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gskXz5skBqiP" }, "outputs": [], "source": [ "# Definir el grid donde evaluaremos la densidad\n", "b = np.linspace(-30, 30, 500)\n", "w = np.linspace(-30, 30, 500)\n", "B, W = np.meshgrid(b, w)\n", "pos = np.dstack((B, W))\n", "\n", "# Evaluamos la densidad en cada punto del grid\n", "P = multivariate_normal(mu_0, Sigma_0).pdf(pos)\n", "\n", "# Graficamos el relleno de las curvas de nivel\n", "plt.contourf(B, W, P, levels=50, cmap=\"viridis\", extend='both')\n", "plt.colorbar()\n", "\n", "# Agregamos líneas blancas a los niveles\n", "plt.contour(B, W, P, levels=10, colors=\"white\", linewidths=1)\n", "\n", "plt.title(\"Densidad a priori\")\n", "plt.xlabel(\"b (intercepto)\")\n", "plt.ylabel(\"w (pendiente)\")\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "bY4ZOkgRBqiP" }, "source": [ "## Distribución a posteriori" ] }, { "cell_type": "markdown", "source": [ "Calcular la media y la matriz de covarianza de la distribución a posteriori" ], "metadata": { "id": "k60KQ6rnfieB" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aLYsgc7NBqiQ" }, "outputs": [], "source": [ "#COMPLETAR\n", "# Cálculo de la matriz de covarianza a posteriori\n", "Sigma =\n", "\n", "# Cálculo de la media a posteriori\n", "mu =\n", "\n", "print(\"Matriz de covarianza a posteriori (Sigma):\\n\", Sigma)\n", "print(\"\\nMedia a posteriori (mu):\\n\", mu)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "37tf4knmBqiQ" }, "outputs": [], "source": [ "# Definir el grid donde evaluaremos la densidad\n", "b_min = mu[0]-3*np.sqrt(Sigma[0,0])\n", "b_max = mu[0]+3*np.sqrt(Sigma[0,0])\n", "w_min = mu[1]-3*np.sqrt(Sigma[1,1])\n", "w_max = mu[1]+3*np.sqrt(Sigma[1,1])\n", "\n", "b = np.linspace(b_min, b_max, 500)\n", "w = np.linspace(w_min, w_max, 500)\n", "B, W = np.meshgrid(b, w)\n", "pos = np.dstack((B, W))\n", "\n", "# Evaluamos la densidad en cada punto del grid\n", "P = multivariate_normal(mu, Sigma).pdf(pos)\n", "\n", "# Graficamos el relleno de las curvas de nivel\n", "plt.contourf(B, W, P, levels=50, cmap=\"viridis\", extend='both')\n", "plt.colorbar()\n", "\n", "# Agregamos líneas blancas a los niveles\n", "plt.contour(B, W, P, levels=10, colors=\"white\", linewidths=1)\n", "\n", "plt.plot(mu[0], mu[1], marker='o', linestyle=' ', color='white', label=r'$\\mu$')\n", "plt.plot(theta[0], theta[1], marker='x', linestyle=' ', color ='red', label = 'MLE')\n", "\n", "plt.title(\"Densidad a posteriori\")\n", "plt.xlabel(\"b (intercepto)\")\n", "plt.ylabel(\"w (pendiente)\")\n", "plt.legend()\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "62QlawuTBqiQ" }, "source": [ "## Predicciones" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "i1ByRbpcBqiQ" }, "outputs": [], "source": [ "# Valor de x_nuevo\n", "x_nuevo = 200" ] }, { "cell_type": "markdown", "source": [ "Calcular la media y la varianza para $y_{nuevo}$" ], "metadata": { "id": "TGDOKSq8kk7H" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-hbMlzneBqiQ" }, "outputs": [], "source": [ "#COMPLETAR\n", "# Calculamos la media y la varianza para y_nuevo\n", "mu_pred =\n", "var_pred =" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZqeYQzPaBqiQ" }, "outputs": [], "source": [ "# Definir valores de y para evaluar la densidad\n", "y_values = np.linspace(mu_pred - 4*np.sqrt(var_pred), mu_pred + 4*np.sqrt(var_pred), 400)\n", "\n", "# Evaluar la densidad en cada punto\n", "pdf_values = norm.pdf(y_values, mu_pred, np.sqrt(var_pred))\n", "\n", "# Graficar\n", "plt.plot(y_values, pdf_values, label=r\"$x_{nuevo} =$\"+ str(x_nuevo))\n", "plt.title(r\"Densidad a posteriori para $y_{nuevo}$\")\n", "plt.xlabel(r\"$y_{nuevo}$\")\n", "plt.ylabel(\"Densidad\")\n", "plt.legend()\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "source": [ "Calcular y graficar la media y la varianza para cada valor de $x_{nuevo}$" ], "metadata": { "id": "OgvwoRlklecV" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "CCX3cq6EBqiR" }, "outputs": [], "source": [ "# Rango para x_nuevo\n", "x_nuevo_values = np.linspace(min(X_train.ravel()) - 20, max(X_train.ravel()) + 20, 400)\n", "\n", "######COMPLETAR#######\n", "# Cálculo de medias y varianzas para cada valor de x_nuevo\n", "mean_values =\n", "variance_values =\n", "######################\n", "\n", "# Graficar datos de entrenamiento\n", "plt.scatter(X_train, y_train, color='tab:blue', label='Datos de entrenamiento')\n", "\n", "# Graficar recta de medias\n", "plt.plot(x_nuevo_values, mean_values, color='tab:orange', label=r'Media de $y_{nuevo}$')\n", "\n", "# Añadir área de relleno para la varianza\n", "upper_bound = np.array(mean_values) + np.sqrt(variance_values)\n", "lower_bound = np.array(mean_values) - np.sqrt(variance_values)\n", "plt.fill_between(x_nuevo_values, lower_bound, upper_bound, color='tab:orange', alpha=0.2)\n", "\n", "plt.title(\"Datos de entrenamiento y predicción a posteriori\")\n", "plt.xlabel(r\"$x_{nuevo}$\")\n", "plt.ylabel(r\"$y_{nuevo}$\")\n", "plt.legend()\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "q6Y3ZqtPBqiR" }, "source": [ "## Ejercicio: Bayesiana vs Máxima Verosimilitud\n", "\n", "1. Regresión Polinomial Bayesiana: Ajusta modelos de regresión polinomial bayesiana para los grados \\(d=2, 3, 4, 5\\).\n", "\n", "2. Visualización de Predicciones Bayesiana: Para cada grado \\(d\\), grafica la media de las predicciones y su varianza sombreada.\n", " \n", "3. Regresión Polinomial con Máxima Verosimilitud: Ajusta modelos de regresión polinomial utilizando MLE para los grados \\(d=2, 3, 4, 5\\).\n", "\n", "4. Visualización de Predicciones de Máxima Verosimilitud: Para cada grado \\(d\\), grafica las predicciones del modelo y su varianza sombreada (asumir el mismo valor de $\\sigma^2$ que en las partes anteriores).\n", " \n", "5. Selección del Grado del Polinomio: Basándote en las gráficas obtenidas en los pasos anteriores, elige el grado polinomial que consideres más adecuado para modelar tus datos." ] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.15" }, "orig_nbformat": 4, "colab": { "provenance": [] } }, "nbformat": 4, "nbformat_minor": 0 }