{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Running Python code requires a running Python kernel. Click the {fa}`rocket` --> {guilabel}`Live Code` button above on this page to run the code below.\n", "\n", "```{warning}\n", "🚧 This site is under construction! As of now, the Python kernel may not run on the page or have very long wait times. Also, expect typos.👷🏽‍♀️\n", "```\n", "(sup_class_ex)=\n", "# Example: Supervised Classification App\n", "\n", "A supervised classification method fits the project requirements well and is so a good place to start. The nature of your Data and organizational needs dictate which methods you can use. So what type of data works with supervised classification methods? \n", "\n", "- One of the features (columns) contains mutually exclusive *categories* you want to predict (the dependent variable).\n", "- At least one other feature (the independent variable(s)).\n", "\n", ":::{margin}\n", "Classifying non-mutually exclusive categories is called *multi-label* or *mult-output* classification. Not to be confused with *multiclass* classification presented in this example, multi-label classification requires different techniques, particularly with measuring accuracy. See [Introduction to Multi-label Classification](https://www.geeksforgeeks.org/an-introduction-to-multilabel-classification/) for more information. \n", ":::\n", "\n", "This will be a simple example. Simple data. Simple model. Simple interface. However, it does demonstrate the minimum requirements for [part C](task2c). We'll also show how things can progressively be improved, building on the *working* code. Simple is a great place to start -scaling up is typically easier than going in the other direction. \n", "\n", "

\n", " \"Purple\n", " \"Gird \n", "

\n", "\n", "Let's look at the famous [Fisher's Iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set): " ] }, { "cell_type": "code", "execution_count": 2, "id": "b053efd1", "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal-lengthsepal-widthpetal-lengthpetal-widthtype
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
45.03.61.40.2Iris-setosa
..................
1456.73.05.22.3Iris-virginica
1466.32.55.01.9Iris-virginica
1476.53.05.22.0Iris-virginica
1486.23.45.42.3Iris-virginica
1495.93.05.11.8Iris-virginica
\n", "
" ], "text/plain": [ " sepal-length sepal-width petal-length petal-width type\n", "0 5.1 3.5 1.4 0.2 Iris-setosa\n", "1 4.9 3.0 1.4 0.2 Iris-setosa\n", "2 4.7 3.2 1.3 0.2 Iris-setosa\n", "3 4.6 3.1 1.5 0.2 Iris-setosa\n", "4 5.0 3.6 1.4 0.2 Iris-setosa\n", ".. ... ... ... ... ...\n", "145 6.7 3.0 5.2 2.3 Iris-virginica\n", "146 6.3 2.5 5.0 1.9 Iris-virginica\n", "147 6.5 3.0 5.2 2.0 Iris-virginica\n", "148 6.2 3.4 5.4 2.3 Iris-virginica\n", "149 5.9 3.0 5.1 1.8 Iris-virginica" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#We'll import libraries as needed, but when submitting, \n", "# it's best having them all at the top.\n", "import pandas as pd\n", "\n", "# Load this well-worn dataset:\n", "url = \"https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv\"\n", "df = pd.read_csv(url) #read CSV into Python as a DataFrame\n", "df # displays the DataFrame\n", "\n", "column_names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'type']\n", "df = pd.read_csv(url, names = column_names) #read CSV into Python as a DataFrame\n", "pd.options.display.show_dimensions = False #suppresses dimension output\n", "display(df)\n", "#Code hide and toggle managed with Jupyter meta-code 'tags.'" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3e1087d3", "metadata": {}, "source": [ "Though we described everything as \"simple,\" we'll also see that this dataset is quite *rich* with angles to investigate. At this point, we have many options, but for a classification project we need a categorical feature as our dependent variable, and for this, we only have the choice: **type**." ] }, { "cell_type": "code", "execution_count": 3, "id": "9edff8f0-c39e-41f0-ba20-738072a29da3", "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 sepal-lengthsepal-widthpetal-lengthpetal-widthtype
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
45.03.61.40.2Iris-setosa
..................
1456.73.05.22.3Iris-virginica
1466.32.55.01.9Iris-virginica
1476.53.05.22.0Iris-virginica
1486.23.45.42.3Iris-virginica
1495.93.05.11.8Iris-virginica
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "##preserves Jupyter preview style (the '...') after applying .style. This is for presentation only. \n", "def display_df(dataframe, column_names, highlighted_col, precision=2):\n", " pd.set_option(\"display.precision\", 2)\n", " columns_dict = {}\n", " for i in column_names:\n", " columns_dict[i] ='...'\n", " df2 = pd.concat([dataframe.iloc[:5,:],\n", " pd.DataFrame(index=['...'], data=columns_dict),\n", " dataframe.iloc[-5:,:]]).style.format(precision = precision).set_properties(subset=[highlighted_col], **{'background-color': 'yellow'})\n", " pd.options.display.show_dimensions = True\n", " display(df2)\n", "\n", "#display dataframe with highlighted column \n", "display_df(df, column_names, 'type', 1)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "96de1903-332e-49f8-b7e7-96b50fc940ee", "metadata": { "tags": [] }, "source": [ ":::{sidebar} Watch\n", "\n", ":::\n", "\n", "The highlighted column, **type** provides a category to predict/classify (dependent variables), and the non-highlighted columns are something by which to make that prediction/classification (independent variables)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" }, "vscode": { "interpreter": { "hash": "3ff4b9f9a77e43d422b45ad0e34f66a3a995e732d437005df0ccbc0093bddc0e" } } }, "nbformat": 4, "nbformat_minor": 5 }