{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(sup_class_ex:develop)=\n",
    "# Model Development\n",
    "\n",
    "Supervised algorithms use inputs (independent variables) and labeled outputs (dependent variable -the \"answers\") to create a model that can measure its performance and learn over time. Splitting the data into independent and dependent variables, we have the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Note: we only repeat this step from before, because this is a new .ipyb page.\n",
    "#   it only needs to be executed once per file. \n",
    "  \n",
    "#We'll import libraries as needed, but when submitting, having them all at the top is best practice\n",
    "import pandas as pd\n",
    "\n",
    "# Reloading the dataset\n",
    "url = \"https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv\"\n",
    "column_names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'type']\n",
    "df = pd.read_csv(url, names = column_names) #read CSV into Python as a dataframe\n",
    "\n",
    "X = df.drop(columns=['type']) #indpendent variables\n",
    "y = df[['type']].copy() #dependent variables"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{note}\n",
    "The focus of [Task 2 part D *Data Product*](task2d:dataproduct) will be the what, how, and why of your model's development.\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(sup_class_ex:develop:train)=\n",
    "## Training the Model\n",
    "\n",
    "```{margin}\n",
    "A model can learn the details and noise particular to the training data so well that it doesn't perform well on new data. This is called [*overfitting*](https://en.wikipedia.org/wiki/Overfitting). Overcomplicated non-linear and nonparametric models are more susceptible to this. The term *overtraining* can be used synonymously or to mean too much training causing overfitting. \n",
    "```\n",
    "\n",
    "Studying for a test when you have all the answers beforehand will likely yield a good grade. But how well would that grade measure understanding of material outside those answers? Similarly, supervised methods tend to perform well when tested on their training data, but you want your model to perform well on *unseen* data. So while it's not required, separating data used to train and test the model (validation) is good practice. Furthermore, it provides content for part D of the documentation. \n",
    "\n",
    "Fortunately, most libraries have built-in functions for this. Here we'll stick with [scikit-learn aka sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) validation processes We'll need to randomly split the data into independent (input values) and dependent (output, i.e., the answers) variables. For now, we'll keep things as DataFrames, but later convert them to 2-d arrays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "#split the variable sets into training and testing subsets\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.333, random_state=41)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "</style>\n",
       "<table id=\"T_0f2d9\" style='display:inline'>\n",
       "  <caption>Independents variables</caption>\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th class=\"blank level0\" >&nbsp;</th>\n",
       "      <th id=\"T_0f2d9_level0_col0\" class=\"col_heading level0 col0\" >sepal-length</th>\n",
       "      <th id=\"T_0f2d9_level0_col1\" class=\"col_heading level0 col1\" >sepal-width</th>\n",
       "      <th id=\"T_0f2d9_level0_col2\" class=\"col_heading level0 col2\" >petal-length</th>\n",
       "      <th id=\"T_0f2d9_level0_col3\" class=\"col_heading level0 col3\" >petal-width</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th id=\"T_0f2d9_level0_row0\" class=\"row_heading level0 row0\" >111</th>\n",
       "      <td id=\"T_0f2d9_row0_col0\" class=\"data row0 col0\" >6.400000</td>\n",
       "      <td id=\"T_0f2d9_row0_col1\" class=\"data row0 col1\" >2.700000</td>\n",
       "      <td id=\"T_0f2d9_row0_col2\" class=\"data row0 col2\" >5.300000</td>\n",
       "      <td id=\"T_0f2d9_row0_col3\" class=\"data row0 col3\" >1.900000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_0f2d9_level0_row1\" class=\"row_heading level0 row1\" >82</th>\n",
       "      <td id=\"T_0f2d9_row1_col0\" class=\"data row1 col0\" >5.800000</td>\n",
       "      <td id=\"T_0f2d9_row1_col1\" class=\"data row1 col1\" >2.700000</td>\n",
       "      <td id=\"T_0f2d9_row1_col2\" class=\"data row1 col2\" >3.900000</td>\n",
       "      <td id=\"T_0f2d9_row1_col3\" class=\"data row1 col3\" >1.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_0f2d9_level0_row2\" class=\"row_heading level0 row2\" >130</th>\n",
       "      <td id=\"T_0f2d9_row2_col0\" class=\"data row2 col0\" >7.400000</td>\n",
       "      <td id=\"T_0f2d9_row2_col1\" class=\"data row2 col1\" >2.800000</td>\n",
       "      <td id=\"T_0f2d9_row2_col2\" class=\"data row2 col2\" >6.100000</td>\n",
       "      <td id=\"T_0f2d9_row2_col3\" class=\"data row2 col3\" >1.900000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_0f2d9_level0_row3\" class=\"row_heading level0 row3\" >27</th>\n",
       "      <td id=\"T_0f2d9_row3_col0\" class=\"data row3 col0\" >5.200000</td>\n",
       "      <td id=\"T_0f2d9_row3_col1\" class=\"data row3 col1\" >3.500000</td>\n",
       "      <td id=\"T_0f2d9_row3_col2\" class=\"data row3 col2\" >1.500000</td>\n",
       "      <td id=\"T_0f2d9_row3_col3\" class=\"data row3 col3\" >0.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_0f2d9_level0_row4\" class=\"row_heading level0 row4\" >33</th>\n",
       "      <td id=\"T_0f2d9_row4_col0\" class=\"data row4 col0\" >5.500000</td>\n",
       "      <td id=\"T_0f2d9_row4_col1\" class=\"data row4 col1\" >4.200000</td>\n",
       "      <td id=\"T_0f2d9_row4_col2\" class=\"data row4 col2\" >1.400000</td>\n",
       "      <td id=\"T_0f2d9_row4_col3\" class=\"data row4 col3\" >0.200000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "          <style type=\"text/css\">\n",
       "</style>\n",
       "<table id=\"T_a1c53\" style='display:inline'>\n",
       "  <caption>Dependents variables</caption>\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th class=\"blank level0\" >&nbsp;</th>\n",
       "      <th id=\"T_a1c53_level0_col0\" class=\"col_heading level0 col0\" >type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th id=\"T_a1c53_level0_row0\" class=\"row_heading level0 row0\" >111</th>\n",
       "      <td id=\"T_a1c53_row0_col0\" class=\"data row0 col0\" >Iris-virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_a1c53_level0_row1\" class=\"row_heading level0 row1\" >82</th>\n",
       "      <td id=\"T_a1c53_row1_col0\" class=\"data row1 col0\" >Iris-versicolor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_a1c53_level0_row2\" class=\"row_heading level0 row2\" >130</th>\n",
       "      <td id=\"T_a1c53_row2_col0\" class=\"data row2 col0\" >Iris-virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_a1c53_level0_row3\" class=\"row_heading level0 row3\" >27</th>\n",
       "      <td id=\"T_a1c53_row3_col0\" class=\"data row3 col0\" >Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_a1c53_level0_row4\" class=\"row_heading level0 row4\" >33</th>\n",
       "      <td id=\"T_a1c53_row4_col0\" class=\"data row4 col0\" >Iris-setosa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#Nice displays are nice but not required. \n",
    "from IPython.display import display_html \n",
    "X_train_styler = X_train.head(5).style.set_table_attributes(\"style='display:inline'\").set_caption('Independents variables')\n",
    "y_train_styler = y_train.head(5).style.set_table_attributes(\"style='display:inline'\").set_caption('Dependents variables')\n",
    "space = \"\\xa0\" * 10 #space between columns\n",
    "display_html(X_train_styler._repr_html_()+ space  + y_train_styler._repr_html_(), raw=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read the [docs](sklearn_link)! By default `train_test_split`, [\"randomly\"](https://engineering.mit.edu/engage/ask-an-engineer/can-a-computer-generate-a-truly-random-number/) splits the sets. Setting the seed (or state) with `random_state` controls the experiments. See [should you use a random seed?](https://datascience.stackexchange.com/questions/78109/should-you-use-random-state-or-random-seed-in-machine-learning-models).\n",
    "\n",
    "We can now train a model using the independent (usually denoted `X`) and dependent variables (usually denoted `y`) from the training data. Sklearn has a deep [supervised learning library](https://scikit-learn.org/stable/supervised_learning.html). Note that many of these models (including SVM) have both classification and regression extensions. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "c:\\Users\\ashej\\.virtualenvs\\jupyter-books-WZpnkDri\\Lib\\site-packages\\sklearn\\utils\\validation.py:1141: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<style>#sk-container-id-3 {color: black;background-color: white;}#sk-container-id-3 pre{padding: 0;}#sk-container-id-3 div.sk-toggleable {background-color: white;}#sk-container-id-3 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-3 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-3 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-3 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-3 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-3 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-3 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-3 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-3 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-3 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-3 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-3 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-3 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-3 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-3 div.sk-item {position: relative;z-index: 1;}#sk-container-id-3 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-3 div.sk-item::before, #sk-container-id-3 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-3 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-3 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-3 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-3 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-3 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-3 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-3 div.sk-label-container {text-align: center;}#sk-container-id-3 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-3 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-3\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>SVC(C=1)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-3\" type=\"checkbox\" checked><label for=\"sk-estimator-id-3\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">SVC</label><div class=\"sk-toggleable__content\"><pre>SVC(C=1)</pre></div></div></div></div></div>"
      ],
      "text/plain": [
       "SVC(C=1)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn import svm\n",
    "\n",
    "svm_model = svm.SVC(gamma='scale', C=1) #Creates a svm model object. Mote, 'scale' and 1.0 are gamma and C's respective defaults \n",
    "svm_model.fit(X_train,y_train)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What's with the warning? `DataConversionWarning: A column-vector y was passed when a 1d array was expected.`\n",
    "\n",
    "Looking at the `sklearn.svm.SVC` [docs for the `fit` function](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.fit), a 1d array was expected for the `y`, but we gave it a DataFrame. This is a warning -not an error, and the model appears to work. However, it's best practice to clean warnings up when possible, and in this case, it's an easy fix.\n",
    "\n",
    "```{margin}\n",
    "[What's that `gamma` and `C` for?](https://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html), read the docs!\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>#sk-container-id-4 {color: black;background-color: white;}#sk-container-id-4 pre{padding: 0;}#sk-container-id-4 div.sk-toggleable {background-color: white;}#sk-container-id-4 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-4 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-4 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-4 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-4 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-4 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-4 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-4 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-4 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-4 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-4 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-4 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-4 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-4 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-4 div.sk-item {position: relative;z-index: 1;}#sk-container-id-4 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-4 div.sk-item::before, #sk-container-id-4 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-4 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-4 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-4 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-4 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-4 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-4 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-4 div.sk-label-container {text-align: center;}#sk-container-id-4 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-4 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-4\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>SVC(C=1)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-4\" type=\"checkbox\" checked><label for=\"sk-estimator-id-4\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">SVC</label><div class=\"sk-toggleable__content\"><pre>SVC(C=1)</pre></div></div></div></div></div>"
      ],
      "text/plain": [
       "SVC(C=1)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train_array, y_test_array = y_train['type'].values, y_test['type'].values\n",
    "svm_model.fit(X_train,y_train_array) \n",
    "#Note: you can also use 'svm_model.fit(X_train,y_train_array.ravel())' "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Applying the Model\n",
    "\n",
    "Now we've trained the model (without warnings)! What does that mean? Sklearn's SVM algorithm creates an equation representing the relationship between the variables. \n",
    "\n",
    "$$F_{\\text{predict}}(X)=\\text{prediction(s)}$$\n",
    "\n",
    "For example, the Iris at index `82` has the values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sepal-length</th>\n",
       "      <th>sepal-width</th>\n",
       "      <th>petal-length</th>\n",
       "      <th>petal-width</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>5.8</td>\n",
       "      <td>2.7</td>\n",
       "      <td>3.9</td>\n",
       "      <td>1.2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    sepal-length  sepal-width  petal-length  petal-width\n",
       "82           5.8          2.7           3.9          1.2"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train.loc[[82]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And `svm_model.predict(X_train.loc[[82]])` inputs the flower dimensions into the prediction function:\n",
    "\n",
    "$$F_{\\text{predict}}(5.8, 2.7, 3.9, 1.2)=\\text{Iris-versicolor}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Iris-versicolor']\n"
     ]
    }
   ],
   "source": [
    "print(svm_model.predict(X_train.loc[[82]]))\n",
    "#Alternatively: svm_model.predict([[5.8, 2.7, 3.9, 1.2]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which in this example turns out to be correct:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sepal-length</th>\n",
       "      <th>sepal-width</th>\n",
       "      <th>petal-length</th>\n",
       "      <th>petal-width</th>\n",
       "      <th>type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>5.8</td>\n",
       "      <td>2.7</td>\n",
       "      <td>3.9</td>\n",
       "      <td>1.2</td>\n",
       "      <td>Iris-versicolor</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    sepal-length  sepal-width  petal-length  petal-width             type\n",
       "82           5.8          2.7           3.9          1.2  Iris-versicolor"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[[82]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Applying the prediction function to the entire dataset, we get a prediction for each flower:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "output_scroll"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', ..., 'Iris-virginica',\n",
       "       'Iris-virginica', 'Iris-virginica'], dtype=object)"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predictions = svm_model.predict(X)\n",
    "print(predictions)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But how good are these predictions? Answering that question is our next step. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "jupyter-books-WZpnkDri",
   "language": "python",
   "name": "jupyter-books-wzpnkdri"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.1"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "3ff4b9f9a77e43d422b45ad0e34f66a3a995e732d437005df0ccbc0093bddc0e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}