{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(sup_reg_ex: develop)=\n",
    "# Regression Model Development (part 1)\n",
    "\n",
    "Supervised algorithms use inputs (independent variables) and labeled outputs (dependent variable -the \"answers\") to create a model that can measure its performance and learn over time. Splitting the data into independent and dependent variables, we have the following (again, this will be very similar to the [previous example](sup_class_ex:develop)):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "#Note: we only repeat this step from before, because this is a new .ipyb page.\n",
    "#   it only needs to be executed once per file.\n",
    "\n",
    "#We'll import libraries as needed, but when submitting, having them all at the top is best practice\n",
    "import pandas as pd\n",
    "\n",
    "# Reloading the dataset\n",
    "url = \"https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv\"\n",
    "df = pd.read_csv(url) #read CSV into Python as a dataframe\n",
    "\n",
    "column_names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'type']\n",
    "df = pd.read_csv(url, names = column_names) #read CSV into Python as a dataframe\n",
    "\n",
    "#Choosing the variables. \n",
    "X = df.drop(columns=['sepal-length']) #indpendent variables\n",
    "y = df[['sepal-length']].copy() #dependent variables"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(sup_reg_ex: develop: train)=\n",
    "## Train Model\n",
    "\n",
    "Recall, splitting the data into training and testing sets is not required, but it is good practice. Furthermore, it provides content for part D. As with the [previous example](sup_class_ex:develop), we'll use [scikit-learn aka sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) built-ins for this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "#split the variable sets into training and testing subsets\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.333, random_state=41)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "</style>\n",
       "<table id=\"T_dd6f3\" style='display:inline'>\n",
       "  <caption>Independents variables</caption>\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th class=\"blank level0\" >&nbsp;</th>\n",
       "      <th id=\"T_dd6f3_level0_col0\" class=\"col_heading level0 col0\" >sepal-width</th>\n",
       "      <th id=\"T_dd6f3_level0_col1\" class=\"col_heading level0 col1\" >petal-length</th>\n",
       "      <th id=\"T_dd6f3_level0_col2\" class=\"col_heading level0 col2\" >petal-width</th>\n",
       "      <th id=\"T_dd6f3_level0_col3\" class=\"col_heading level0 col3\" >type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th id=\"T_dd6f3_level0_row0\" class=\"row_heading level0 row0\" >111</th>\n",
       "      <td id=\"T_dd6f3_row0_col0\" class=\"data row0 col0\" >2.700000</td>\n",
       "      <td id=\"T_dd6f3_row0_col1\" class=\"data row0 col1\" >5.300000</td>\n",
       "      <td id=\"T_dd6f3_row0_col2\" class=\"data row0 col2\" >1.900000</td>\n",
       "      <td id=\"T_dd6f3_row0_col3\" class=\"data row0 col3\" >Iris-virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_dd6f3_level0_row1\" class=\"row_heading level0 row1\" >82</th>\n",
       "      <td id=\"T_dd6f3_row1_col0\" class=\"data row1 col0\" >2.700000</td>\n",
       "      <td id=\"T_dd6f3_row1_col1\" class=\"data row1 col1\" >3.900000</td>\n",
       "      <td id=\"T_dd6f3_row1_col2\" class=\"data row1 col2\" >1.200000</td>\n",
       "      <td id=\"T_dd6f3_row1_col3\" class=\"data row1 col3\" >Iris-versicolor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_dd6f3_level0_row2\" class=\"row_heading level0 row2\" >130</th>\n",
       "      <td id=\"T_dd6f3_row2_col0\" class=\"data row2 col0\" >2.800000</td>\n",
       "      <td id=\"T_dd6f3_row2_col1\" class=\"data row2 col1\" >6.100000</td>\n",
       "      <td id=\"T_dd6f3_row2_col2\" class=\"data row2 col2\" >1.900000</td>\n",
       "      <td id=\"T_dd6f3_row2_col3\" class=\"data row2 col3\" >Iris-virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_dd6f3_level0_row3\" class=\"row_heading level0 row3\" >27</th>\n",
       "      <td id=\"T_dd6f3_row3_col0\" class=\"data row3 col0\" >3.500000</td>\n",
       "      <td id=\"T_dd6f3_row3_col1\" class=\"data row3 col1\" >1.500000</td>\n",
       "      <td id=\"T_dd6f3_row3_col2\" class=\"data row3 col2\" >0.200000</td>\n",
       "      <td id=\"T_dd6f3_row3_col3\" class=\"data row3 col3\" >Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_dd6f3_level0_row4\" class=\"row_heading level0 row4\" >33</th>\n",
       "      <td id=\"T_dd6f3_row4_col0\" class=\"data row4 col0\" >4.200000</td>\n",
       "      <td id=\"T_dd6f3_row4_col1\" class=\"data row4 col1\" >1.400000</td>\n",
       "      <td id=\"T_dd6f3_row4_col2\" class=\"data row4 col2\" >0.200000</td>\n",
       "      <td id=\"T_dd6f3_row4_col3\" class=\"data row4 col3\" >Iris-setosa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "          <style type=\"text/css\">\n",
       "</style>\n",
       "<table id=\"T_21ebd\" style='display:inline'>\n",
       "  <caption>Dependents variables</caption>\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th class=\"blank level0\" >&nbsp;</th>\n",
       "      <th id=\"T_21ebd_level0_col0\" class=\"col_heading level0 col0\" >sepal-length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th id=\"T_21ebd_level0_row0\" class=\"row_heading level0 row0\" >111</th>\n",
       "      <td id=\"T_21ebd_row0_col0\" class=\"data row0 col0\" >6.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_21ebd_level0_row1\" class=\"row_heading level0 row1\" >82</th>\n",
       "      <td id=\"T_21ebd_row1_col0\" class=\"data row1 col0\" >5.800000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_21ebd_level0_row2\" class=\"row_heading level0 row2\" >130</th>\n",
       "      <td id=\"T_21ebd_row2_col0\" class=\"data row2 col0\" >7.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_21ebd_level0_row3\" class=\"row_heading level0 row3\" >27</th>\n",
       "      <td id=\"T_21ebd_row3_col0\" class=\"data row3 col0\" >5.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_21ebd_level0_row4\" class=\"row_heading level0 row4\" >33</th>\n",
       "      <td id=\"T_21ebd_row4_col0\" class=\"data row4 col0\" >5.500000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#Nice displays are nice but not required. \n",
    "from IPython.display import display_html \n",
    "X_train_styler = X_train.head(5).style.set_table_attributes(\"style='display:inline'\").set_caption('Independents variables')\n",
    "y_train_styler = y_train.head(5).style.set_table_attributes(\"style='display:inline'\").set_caption('Dependents variables')\n",
    "space = \"\\xa0\" * 10 #space between columns\n",
    "display_html(X_train_styler._repr_html_()+ space  + y_train_styler._repr_html_(), raw=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Review sklearn's nice [supervised learning library](https://scikit-learn.org/stable/supervised_learning.html). Note that many of these models have both classification and regression extensions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LinearRegression "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our data is mostly quantitative and the scatterplots indicate some linear relations between variables. So linear regression isn't a bad place to start. Once we've trained and tested a linear regression model, we'll easily be able to experiment with different algorithms. \n",
    "\n",
    ":::{margin} Is linear regression ML?\n",
    "It depends on who you ask. Google \"Is linear regression machine learning?\" and you'll see some interesting (and entertaining) discussion. For the capstone, it applies an algorithm to data so the answer is -yes.  \n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": [
     "scroll-output"
    ]
   },
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "could not convert string to float: 'Iris-virginica'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "Cell \u001b[1;32mIn[5], line 2\u001b[0m\n\u001b[0;32m      1\u001b[0m linear_reg_model \u001b[39m=\u001b[39m LinearRegression()\n\u001b[1;32m----> 2\u001b[0m linear_reg_model\u001b[39m.\u001b[39;49mfit(X_train,y_train)\n",
      "File \u001b[1;32mc:\\Users\\ashej\\.virtualenvs\\jupyter-books-WZpnkDri\\Lib\\site-packages\\sklearn\\linear_model\\_base.py:649\u001b[0m, in \u001b[0;36mLinearRegression.fit\u001b[1;34m(self, X, y, sample_weight)\u001b[0m\n\u001b[0;32m    645\u001b[0m n_jobs_ \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mn_jobs\n\u001b[0;32m    647\u001b[0m accept_sparse \u001b[39m=\u001b[39m \u001b[39mFalse\u001b[39;00m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mpositive \u001b[39melse\u001b[39;00m [\u001b[39m\"\u001b[39m\u001b[39mcsr\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mcsc\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mcoo\u001b[39m\u001b[39m\"\u001b[39m]\n\u001b[1;32m--> 649\u001b[0m X, y \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_validate_data(\n\u001b[0;32m    650\u001b[0m     X, y, accept_sparse\u001b[39m=\u001b[39;49maccept_sparse, y_numeric\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m, multi_output\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m\n\u001b[0;32m    651\u001b[0m )\n\u001b[0;32m    653\u001b[0m sample_weight \u001b[39m=\u001b[39m _check_sample_weight(\n\u001b[0;32m    654\u001b[0m     sample_weight, X, dtype\u001b[39m=\u001b[39mX\u001b[39m.\u001b[39mdtype, only_non_negative\u001b[39m=\u001b[39m\u001b[39mTrue\u001b[39;00m\n\u001b[0;32m    655\u001b[0m )\n\u001b[0;32m    657\u001b[0m X, y, X_offset, y_offset, X_scale \u001b[39m=\u001b[39m _preprocess_data(\n\u001b[0;32m    658\u001b[0m     X,\n\u001b[0;32m    659\u001b[0m     y,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m    662\u001b[0m     sample_weight\u001b[39m=\u001b[39msample_weight,\n\u001b[0;32m    663\u001b[0m )\n",
      "File \u001b[1;32mc:\\Users\\ashej\\.virtualenvs\\jupyter-books-WZpnkDri\\Lib\\site-packages\\sklearn\\base.py:554\u001b[0m, in \u001b[0;36mBaseEstimator._validate_data\u001b[1;34m(self, X, y, reset, validate_separately, **check_params)\u001b[0m\n\u001b[0;32m    552\u001b[0m         y \u001b[39m=\u001b[39m check_array(y, input_name\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39my\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mcheck_y_params)\n\u001b[0;32m    553\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[1;32m--> 554\u001b[0m         X, y \u001b[39m=\u001b[39m check_X_y(X, y, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mcheck_params)\n\u001b[0;32m    555\u001b[0m     out \u001b[39m=\u001b[39m X, y\n\u001b[0;32m    557\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m no_val_X \u001b[39mand\u001b[39;00m check_params\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mensure_2d\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39mTrue\u001b[39;00m):\n",
      "File \u001b[1;32mc:\\Users\\ashej\\.virtualenvs\\jupyter-books-WZpnkDri\\Lib\\site-packages\\sklearn\\utils\\validation.py:1104\u001b[0m, in \u001b[0;36mcheck_X_y\u001b[1;34m(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)\u001b[0m\n\u001b[0;32m   1099\u001b[0m         estimator_name \u001b[39m=\u001b[39m _check_estimator_name(estimator)\n\u001b[0;32m   1100\u001b[0m     \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[0;32m   1101\u001b[0m         \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mestimator_name\u001b[39m}\u001b[39;00m\u001b[39m requires y to be passed, but the target y is None\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m   1102\u001b[0m     )\n\u001b[1;32m-> 1104\u001b[0m X \u001b[39m=\u001b[39m check_array(\n\u001b[0;32m   1105\u001b[0m     X,\n\u001b[0;32m   1106\u001b[0m     accept_sparse\u001b[39m=\u001b[39;49maccept_sparse,\n\u001b[0;32m   1107\u001b[0m     accept_large_sparse\u001b[39m=\u001b[39;49maccept_large_sparse,\n\u001b[0;32m   1108\u001b[0m     dtype\u001b[39m=\u001b[39;49mdtype,\n\u001b[0;32m   1109\u001b[0m     order\u001b[39m=\u001b[39;49morder,\n\u001b[0;32m   1110\u001b[0m     copy\u001b[39m=\u001b[39;49mcopy,\n\u001b[0;32m   1111\u001b[0m     force_all_finite\u001b[39m=\u001b[39;49mforce_all_finite,\n\u001b[0;32m   1112\u001b[0m     ensure_2d\u001b[39m=\u001b[39;49mensure_2d,\n\u001b[0;32m   1113\u001b[0m     allow_nd\u001b[39m=\u001b[39;49mallow_nd,\n\u001b[0;32m   1114\u001b[0m     ensure_min_samples\u001b[39m=\u001b[39;49mensure_min_samples,\n\u001b[0;32m   1115\u001b[0m     ensure_min_features\u001b[39m=\u001b[39;49mensure_min_features,\n\u001b[0;32m   1116\u001b[0m     estimator\u001b[39m=\u001b[39;49mestimator,\n\u001b[0;32m   1117\u001b[0m     input_name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mX\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[0;32m   1118\u001b[0m )\n\u001b[0;32m   1120\u001b[0m y \u001b[39m=\u001b[39m _check_y(y, multi_output\u001b[39m=\u001b[39mmulti_output, y_numeric\u001b[39m=\u001b[39my_numeric, estimator\u001b[39m=\u001b[39mestimator)\n\u001b[0;32m   1122\u001b[0m check_consistent_length(X, y)\n",
      "File \u001b[1;32mc:\\Users\\ashej\\.virtualenvs\\jupyter-books-WZpnkDri\\Lib\\site-packages\\sklearn\\utils\\validation.py:877\u001b[0m, in \u001b[0;36mcheck_array\u001b[1;34m(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)\u001b[0m\n\u001b[0;32m    875\u001b[0m         array \u001b[39m=\u001b[39m xp\u001b[39m.\u001b[39mastype(array, dtype, copy\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m)\n\u001b[0;32m    876\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[1;32m--> 877\u001b[0m         array \u001b[39m=\u001b[39m _asarray_with_order(array, order\u001b[39m=\u001b[39;49morder, dtype\u001b[39m=\u001b[39;49mdtype, xp\u001b[39m=\u001b[39;49mxp)\n\u001b[0;32m    878\u001b[0m \u001b[39mexcept\u001b[39;00m ComplexWarning \u001b[39mas\u001b[39;00m complex_warning:\n\u001b[0;32m    879\u001b[0m     \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[0;32m    880\u001b[0m         \u001b[39m\"\u001b[39m\u001b[39mComplex data not supported\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m{}\u001b[39;00m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39mformat(array)\n\u001b[0;32m    881\u001b[0m     ) \u001b[39mfrom\u001b[39;00m \u001b[39mcomplex_warning\u001b[39;00m\n",
      "File \u001b[1;32mc:\\Users\\ashej\\.virtualenvs\\jupyter-books-WZpnkDri\\Lib\\site-packages\\sklearn\\utils\\_array_api.py:185\u001b[0m, in \u001b[0;36m_asarray_with_order\u001b[1;34m(array, dtype, order, copy, xp)\u001b[0m\n\u001b[0;32m    182\u001b[0m     xp, _ \u001b[39m=\u001b[39m get_namespace(array)\n\u001b[0;32m    183\u001b[0m \u001b[39mif\u001b[39;00m xp\u001b[39m.\u001b[39m\u001b[39m__name__\u001b[39m \u001b[39min\u001b[39;00m {\u001b[39m\"\u001b[39m\u001b[39mnumpy\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mnumpy.array_api\u001b[39m\u001b[39m\"\u001b[39m}:\n\u001b[0;32m    184\u001b[0m     \u001b[39m# Use NumPy API to support order\u001b[39;00m\n\u001b[1;32m--> 185\u001b[0m     array \u001b[39m=\u001b[39m numpy\u001b[39m.\u001b[39;49masarray(array, order\u001b[39m=\u001b[39;49morder, dtype\u001b[39m=\u001b[39;49mdtype)\n\u001b[0;32m    186\u001b[0m     \u001b[39mreturn\u001b[39;00m xp\u001b[39m.\u001b[39masarray(array, copy\u001b[39m=\u001b[39mcopy)\n\u001b[0;32m    187\u001b[0m \u001b[39melse\u001b[39;00m:\n",
      "File \u001b[1;32mc:\\Users\\ashej\\.virtualenvs\\jupyter-books-WZpnkDri\\Lib\\site-packages\\pandas\\core\\generic.py:2070\u001b[0m, in \u001b[0;36mNDFrame.__array__\u001b[1;34m(self, dtype)\u001b[0m\n\u001b[0;32m   2069\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__array__\u001b[39m(\u001b[39mself\u001b[39m, dtype: npt\u001b[39m.\u001b[39mDTypeLike \u001b[39m|\u001b[39m \u001b[39mNone\u001b[39;00m \u001b[39m=\u001b[39m \u001b[39mNone\u001b[39;00m) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m np\u001b[39m.\u001b[39mndarray:\n\u001b[1;32m-> 2070\u001b[0m     \u001b[39mreturn\u001b[39;00m np\u001b[39m.\u001b[39;49masarray(\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_values, dtype\u001b[39m=\u001b[39;49mdtype)\n",
      "\u001b[1;31mValueError\u001b[0m: could not convert string to float: 'Iris-virginica'"
     ]
    }
   ],
   "source": [
    "linear_reg_model = LinearRegression()\n",
    "linear_reg_model.fit(X_train,y_train)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Error? Wait what happened!?!* This error returns a lot of output, but the last line makes things clear:\n",
    "\n",
    "`ValueError: could not convert string to float: 'Iris-virginica'`\n",
    "\n",
    "The algorithm expected numbers; it does not know what to do with the flower types (strings). So how do we fix this?\n",
    "\n",
    "(sup_reg_ex: develop: train: categorical_1)=\n",
    "## Processing Categorical Data (the easy way)\n",
    "\n",
    "One way to fix a problem is to avoid it. You are not required to use all the data -only some of it. Sometimes choosing the right variables is the real trick. [Dimensionality reduction](https://en.wikipedia.org/wiki/Dimensionality_reduction) is an important part of the data sciences. Here, those flower types DO matter, and it would be best to include that data -but goal #1 is to get things working. Improving things is step #2 and step #3 and step #4 and ... step $\\# \\infty$.\n",
    "\n",
    "So just to get things rolling, let's remove the column with the categorical data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "scroll-output"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sepal-width</th>\n",
       "      <th>petal-length</th>\n",
       "      <th>petal-width</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>119</th>\n",
       "      <td>2.2</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128</th>\n",
       "      <td>2.8</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135</th>\n",
       "      <td>3.0</td>\n",
       "      <td>6.1</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.6</td>\n",
       "      <td>1.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112</th>\n",
       "      <td>3.0</td>\n",
       "      <td>5.5</td>\n",
       "      <td>2.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>2.8</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123</th>\n",
       "      <td>2.7</td>\n",
       "      <td>4.9</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>3.4</td>\n",
       "      <td>4.5</td>\n",
       "      <td>1.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147</th>\n",
       "      <td>3.0</td>\n",
       "      <td>5.2</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>3.2</td>\n",
       "      <td>5.9</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.9</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>3.4</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>3.0</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>93</th>\n",
       "      <td>2.3</td>\n",
       "      <td>3.3</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>3.3</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133</th>\n",
       "      <td>2.8</td>\n",
       "      <td>5.1</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>3.1</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>2.7</td>\n",
       "      <td>5.1</td>\n",
       "      <td>1.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>2.3</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>2.4</td>\n",
       "      <td>3.7</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>3.2</td>\n",
       "      <td>5.7</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>3.5</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.2</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>64</th>\n",
       "      <td>2.9</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>2.2</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>3.3</td>\n",
       "      <td>4.7</td>\n",
       "      <td>1.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>2.7</td>\n",
       "      <td>4.1</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>3.3</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63</th>\n",
       "      <td>2.9</td>\n",
       "      <td>4.7</td>\n",
       "      <td>1.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>2.6</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>2.8</td>\n",
       "      <td>4.6</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106</th>\n",
       "      <td>2.5</td>\n",
       "      <td>4.5</td>\n",
       "      <td>1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>2.6</td>\n",
       "      <td>4.4</td>\n",
       "      <td>1.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>145</th>\n",
       "      <td>3.0</td>\n",
       "      <td>5.2</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>4.0</td>\n",
       "      <td>1.2</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>141</th>\n",
       "      <td>3.1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>3.2</td>\n",
       "      <td>4.5</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>3.1</td>\n",
       "      <td>5.4</td>\n",
       "      <td>2.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70</th>\n",
       "      <td>3.2</td>\n",
       "      <td>4.8</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>97</th>\n",
       "      <td>2.9</td>\n",
       "      <td>4.3</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>2.8</td>\n",
       "      <td>4.5</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>4.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>3.0</td>\n",
       "      <td>5.8</td>\n",
       "      <td>2.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>136</th>\n",
       "      <td>3.4</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>3.8</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108</th>\n",
       "      <td>2.5</td>\n",
       "      <td>5.8</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>2.5</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>2.2</td>\n",
       "      <td>4.5</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     sepal-width  petal-length  petal-width\n",
       "119          2.2           5.0          1.5\n",
       "128          2.8           5.6          2.1\n",
       "135          3.0           6.1          2.3\n",
       "91           3.0           4.6          1.4\n",
       "112          3.0           5.5          2.1\n",
       "71           2.8           4.0          1.3\n",
       "123          2.7           4.9          1.8\n",
       "85           3.4           4.5          1.6\n",
       "147          3.0           5.2          2.0\n",
       "143          3.2           5.9          2.3\n",
       "127          3.0           4.9          1.8\n",
       "39           3.4           1.5          0.2\n",
       "38           3.0           1.3          0.2\n",
       "93           2.3           3.3          1.0\n",
       "23           3.3           1.7          0.5\n",
       "133          2.8           5.1          1.5\n",
       "30           3.1           1.6          0.2\n",
       "83           2.7           5.1          1.6\n",
       "37           3.1           1.5          0.1\n",
       "41           2.3           1.3          0.3\n",
       "81           2.4           3.7          1.0\n",
       "120          3.2           5.7          2.3\n",
       "43           3.5           1.6          0.6\n",
       "2            3.2           1.3          0.2\n",
       "64           2.9           3.6          1.3\n",
       "62           2.2           4.0          1.0\n",
       "56           3.3           4.7          1.6\n",
       "67           2.7           4.1          1.0\n",
       "49           3.3           1.4          0.2\n",
       "63           2.9           4.7          1.4\n",
       "79           2.6           3.5          1.0\n",
       "54           2.8           4.6          1.5\n",
       "106          2.5           4.5          1.7\n",
       "90           2.6           4.4          1.2\n",
       "145          3.0           5.2          2.3\n",
       "14           4.0           1.2          0.2\n",
       "141          3.1           5.1          2.3\n",
       "51           3.2           4.5          1.5\n",
       "139          3.1           5.4          2.1\n",
       "70           3.2           4.8          1.8\n",
       "97           2.9           4.3          1.3\n",
       "55           2.8           4.5          1.3\n",
       "32           4.1           1.5          0.1\n",
       "104          3.0           5.8          2.2\n",
       "136          3.4           5.6          2.4\n",
       "18           3.8           1.7          0.3\n",
       "108          2.5           5.8          1.8\n",
       "98           2.5           3.0          1.1\n",
       "45           3.0           1.4          0.3\n",
       "68           2.2           4.5          1.5"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train_no_type = X_train.drop(columns = ['type'])\n",
    "X_test_no_type =  X_test.drop(columns = ['type'])\n",
    "\n",
    "X_test_no_type"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the models will train without error:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>#sk-container-id-1 {color: black;background-color: white;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>LinearRegression()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LinearRegression</label><div class=\"sk-toggleable__content\"><pre>LinearRegression()</pre></div></div></div></div></div>"
      ],
      "text/plain": [
       "LinearRegression()"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "linear_reg_model.fit(X_train_no_type, y_train)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the model can make predictions for an entire set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "scroll-output"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[5.98770812],\n",
       "       [6.42220879],\n",
       "       [6.81026327],\n",
       "       [6.3038615 ],\n",
       "       [6.48153317],\n",
       "       [5.75587752],\n",
       "       [6.02106191],\n",
       "       [6.34587216],\n",
       "       [6.31716812],\n",
       "       [6.788514  ],\n",
       "       [6.23165894],\n",
       "       [5.01764515],\n",
       "       [4.57470184],\n",
       "       [5.07393461],\n",
       "       [4.87302581],\n",
       "       [6.48997581],\n",
       "       [4.88812177],\n",
       "       [6.34092093],\n",
       "       [4.885904  ],\n",
       "       [4.00445291],\n",
       "       [5.46842817],\n",
       "       [6.62636673],\n",
       "       [4.85349432],\n",
       "       [4.71509986],\n",
       "       [5.50178197],\n",
       "       [5.57125107],\n",
       "       [6.43782043],\n",
       "       [6.00331976],\n",
       "       [4.86637251],\n",
       "       [6.31473613],\n",
       "       [5.44667891],\n",
       "       [6.08460761],\n",
       "       [5.63522521],\n",
       "       [6.01862993],\n",
       "       [6.08060051],\n",
       "       [5.19561829],\n",
       "       [6.06972588],\n",
       "       [6.28433001],\n",
       "       [6.47065854],\n",
       "       [6.29098332],\n",
       "       [6.06929744],\n",
       "       [6.16124571],\n",
       "       [5.58789409],\n",
       "       [6.64589822],\n",
       "       [6.60683523],\n",
       "       [5.3817326 ],\n",
       "       [6.61032665],\n",
       "       [4.89225584],\n",
       "       [4.57691961],\n",
       "       [5.58233992]])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_pred_no_type = linear_reg_model.predict(X_test_no_type)\n",
    "y_pred_no_type"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or a single input:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[4.71509986]]\n"
     ]
    }
   ],
   "source": [
    "# The model was trained with a dataframe, so you can only predict on dataframes\n",
    "# Recall we removed the petal type, and we are predicting the sepal-length\n",
    "column_names_short = ['sepal-width', 'petal-length', 'petal-width']\n",
    "\n",
    "# Creates a dataframe from a single element for input. This avoids a warning for missing feature names. \n",
    "#Alternatively, you can use print(linear_reg_model.predict([[3.2, 1.3, .2]])) without error. \n",
    "input_df = pd.DataFrame(np.array([[3.2, 1.3, .2]]), columns = column_names_short)\n",
    "\n",
    "print(linear_reg_model.predict(input_df))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{note}\n",
    "Your model can only predict on data simliar to what it was trained with. Since this model was trained with a dataframe, a matching new dataframe, 'input_df' was created to predict a single input. Alternatively, we could have converted the original data to an array using `ravel()` or `.values`  (see the [previous example](sup_class_ex:develop:train)).                       \n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(sup_reg_ex: develop: accuracy)=\n",
    "## Accuracy Analysis (for Regression) part 1\n",
    "\n",
    "Now that the model works, we can work on improving it. But first we'll need metrics so we can tell if we're making progress. As we are trying to predict a continuous number, even the very best model will have errors in almost every prediction (if not, it's almost certainly [overfitted](https://en.wikipedia.org/wiki/Overfitting)). Whereas measuring the success of our classification model was a simple ratio, here we need a way to measure how much those predictions deviate from actual values. See sklearn's list of [metrics and scoring](https://scikit-learn.org/stable/modules/model_evaluation.html) for regression. \n",
    "\n",
    "To get things started, we'll use the *mean squared error* (MSE), a popular metric for evaluating regression models. Regression metrics are covered in more depth in the [Regression Accuracy Analysis](sup_reg_ex: develop: accuracy) section. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.12264182555541722"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.metrics import mean_squared_error\n",
    "\n",
    "mean_squared_error(y_test, y_pred_no_type)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What does this mean? The closer the MSE is to 0, the better. However, this value is not given in terms of the dependent variable, and MSE values are not comparable across use cases, i.e., comparing an MSE from your project to that of a different model is not comparing \"apples to apples.\" A \"good\" MSE will depend on your data and project needs.\n",
    "\n",
    "This value can be used to determine if tweaks improve the results. For example, reviewing the [visualizations of this dataset](sup_class_ex:descriptive_methods_and_visualizations), we might expect that the regression coefficients (numbers determining the lines directions) should be positive, and try `LinearRegression(positive = True)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.10302108975537984"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "linear_reg_model2 = LinearRegression(positive = True)\n",
    "linear_reg_model2.fit(X_train_no_type, y_train)\n",
    "y_pred_no_type = linear_reg_model2.predict(X_test_no_type)\n",
    "\n",
    "mean_squared_error(y_test, y_pred_no_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And $0.103 < 0.122$."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "jupyter-books-WZpnkDri",
   "language": "python",
   "name": "jupyter-books-wzpnkdri"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.1"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}