Supervised algorithms use inputs (independent variables) and labeled outputs (dependent variable -the “answers”) to create a model that can measure its performance and learn over time. Splitting the data into independent and dependent variables, we have the following:
Training the Model
Studying for a test when you have all the answers beforehand will likely yield a good grade. But how well would that grade measure understanding of material outside those answers? Similarly, supervised methods tend to perform well when tested on their training data, but you want your model to perform well on unseen data. So while it’s not required, separating data used to train and test the model (validation) is good practice. Furthermore, it provides content for part D of the documentation.
Fortunately, most libraries have built-in functions for this. Here we’ll stick with scikit-learn aka sklearn validation processes We’ll need to randomly split the data into independent (input values) and dependent (output, i.e., the answers) variables. For now, we’ll keep things as DataFrames, but later convert them to 2-d arrays
Read the docs ! By default train_test_split , “randomly” splits the sets. Setting the seed (or state) with random_state controls the experiments. See should you use a random seed? .
We can now train a model using the independent (usually denoted X ) and dependent variables (usually denoted y ) from the training data. Sklearn has a deep supervised learning library . Note that many of these models (including SVM) have both classification and regression extensions.
D:\Task_1_updates\course_webpages\C964\.venv\Lib\site-packages\sklearn\utils\validation.py:1365: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
SVC(C=1) In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
C
C: float, default=1.0 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. For an intuitive visualization of the effects of scaling the regularization parameter C, see :ref:`sphx_glr_auto_examples_svm_plot_svm_scale_c.py`.
1
kernel
kernel: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf' Specifies the kernel type to be used in the algorithm. If none is given, 'rbf' will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape ``(n_samples, n_samples)``. For an intuitive visualization of different kernel types see :ref:`sphx_glr_auto_examples_svm_plot_svm_kernels.py`.
'rbf'
degree
degree: int, default=3 Degree of the polynomial kernel function ('poly'). Must be non-negative. Ignored by all other kernels.
3
gamma
gamma: {'scale', 'auto'} or float, default='scale' Kernel coefficient for 'rbf', 'poly' and 'sigmoid'. - if ``gamma='scale'`` (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma, - if 'auto', uses 1 / n_features - if float, must be non-negative. .. versionchanged:: 0.22 The default value of ``gamma`` changed from 'auto' to 'scale'.
'scale'
coef0
coef0: float, default=0.0 Independent term in kernel function. It is only significant in 'poly' and 'sigmoid'.
0.0
shrinking
shrinking: bool, default=True Whether to use the shrinking heuristic. See the :ref:`User Guide <shrinking_svm>`.
True
probability
probability: bool, default=False Whether to enable probability estimates. This must be enabled prior to calling `fit`, will slow down that method as it internally uses 5-fold cross-validation, and `predict_proba` may be inconsistent with `predict`. Read more in the :ref:`User Guide <scores_probabilities>`. ..deprecated:: 1.9 The `probability` parameter is deprecated and will be removed in 1.11. Use `CalibratedClassifierCV(SVC(), ensemble=False)` instead of `SVC(probability=True)`.
'deprecated'
tol
tol: float, default=1e-3 Tolerance for stopping criterion.
0.001
cache_size
cache_size: float, default=200 Specify the size of the kernel cache (in MB).
200
class_weight
class_weight: dict or 'balanced', default=None Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``.
None
verbose
verbose: bool, default=False Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.
False
max_iter
max_iter: int, default=-1 Hard limit on iterations within solver, or -1 for no limit.
-1
decision_function_shape
decision_function_shape: {'ovo', 'ovr'}, default='ovr' Whether to return a one-vs-rest ('ovr') decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one ('ovo') decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, note that internally, one-vs-one ('ovo') is always used as a multi-class strategy to train models; an ovr matrix is only constructed from the ovo matrix. The parameter is ignored for binary classification. .. versionchanged:: 0.19 decision_function_shape is 'ovr' by default. .. versionadded:: 0.17 *decision_function_shape='ovr'* is recommended. .. versionchanged:: 0.17 Deprecated *decision_function_shape='ovo' and None*.
'ovr'
break_ties
break_ties: bool, default=False If true, ``decision_function_shape='ovr'``, and number of classes > 2, :term:`predict` will break ties according to the confidence values of :term:`decision_function`; otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict. See :ref:`sphx_glr_auto_examples_svm_plot_svm_tie_breaking.py` for an example of its usage with ``decision_function_shape='ovr'``. .. versionadded:: 0.22
False
random_state
random_state: int, RandomState instance or None, default=None Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when `probability` is False. Pass an int for reproducible output across multiple function calls. See :term:`Glossary <random_state>`.
None
Fitted attributes
Name
Type
Value
class_weight_
class_weight_: ndarray of shape (n_classes,) Multipliers of parameter C for each class. Computed based on the ``class_weight`` parameter.
ndarray[float64](3,)
[1.,1.,1.]
classes_
classes_: ndarray of shape (n_classes,) The classes labels.
ndarray[object](3,)
['Iris-setosa','Iris-versicolor','Iris-virginica']
dual_coef_
dual_coef_: ndarray or sparse array/matrix of shape (n_classes -1, n_SV) Dual coefficients of the support vector in the decision function (see :ref:`sgd_mathematical_formulation`), multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the :ref:`multi-class section of the User Guide <svm_multi_class>` for details. If `X` is sparse, these will also be sparse.
ndarray[float64](2, 45)
[[ 1. , 1. , 0. ,...,-0. ,-0. ,-0.13],
[ 0. , 0.61, 0.01,...,-1. ,-1. ,-0. ]]
feature_names_in_
feature_names_in_: ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0
ndarray[object](4,)
['sepal-length','sepal-width','petal-length','petal-width']
fit_status_
fit_status_: int 0 if correctly fitted, 1 otherwise (will raise warning)
int
0
intercept_
intercept_: ndarray of shape (n_classes * (n_classes - 1) / 2,) Constants in decision function.
ndarray[float64](3,)
[ 0.02,-0.04, 0.04]
n_features_in_
n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24
int
4
n_iter_
n_iter_: ndarray of shape (n_classes * (n_classes - 1) // 2,) Number of iterations run by the optimization routine to fit the model. The shape of this attribute depends on the number of models optimized which in turn depends on the number of classes. .. versionadded:: 1.1
ndarray[int32](3,)
[11,15,18]
n_support_
n_support_: ndarray of shape (n_classes,), dtype=int32 Number of support vectors for each class.
ndarray[int32](3,)
[ 5,22,18]
probA_
probA_: ndarray of shape (n_classes * (n_classes - 1) / 2) If `probability=True`, it corresponds to the parameters learned in Platt scaling to produce probability estimates from decision values. If `probability=False`, it's an empty array. Platt scaling uses the logistic function
ndarray[float64](0,)
[]
probB_
probB_: ndarray of shape (n_classes * (n_classes - 1) / 2) If `probability=True`, it corresponds to the parameters learned in Platt scaling. Platt scaling uses the logistic function ``1 / (1 + exp(decision_value * probA_ + probB_))`` where ``probA_`` and ``probB_`` are learned from the dataset [2]_. For more information on the multiclass case and training procedure see section 8 of [1]_. .. deprecated:: 1.9 The attributes `probA_` and `probB_` are deprecated in version 1.9 and will be removed in 1.11.
ndarray[float64](0,)
[]
shape_fit_
shape_fit_: tuple of int of shape (n_dimensions_of_X,) Array dimensions of training vector ``X``.
tuple
(100, 4)
support_
support_: ndarray of shape (n_SV) Indices of support vectors.
ndarray[int32](45,)
[15,22,45,...,80,83,93]
support_vectors_
support_vectors_: ndarray or sparse array/matrix of shape (n_SV, n_features) Support vectors. An empty array if kernel is precomputed. If `X` is sparse, these will also be sparse.
ndarray[float64](45, 4)
[[5. ,3. ,1.6,0.2],
[4.8,3.4,1.9,0.2],
[5.7,4.4,1.5,0.4],
...,
[7.2,3. ,5.8,1.6],
[6.4,3.1,5.5,1.8],
[7.9,3.8,6.4,2. ]]
What’s with the warning? DataConversionWarning: A column-vector y was passed when a 1d array was expected.
Looking at the sklearn.svm.SVC docs for the fit function , a 1d array was expected for the y , but we gave it a DataFrame. This is a warning -not an error, and the model appears to work. However, it’s best practice to clean warnings up when possible, and in this case, it’s an easy fix.
SVC(C=1) In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
C
C: float, default=1.0 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. For an intuitive visualization of the effects of scaling the regularization parameter C, see :ref:`sphx_glr_auto_examples_svm_plot_svm_scale_c.py`.
1
kernel
kernel: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf' Specifies the kernel type to be used in the algorithm. If none is given, 'rbf' will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape ``(n_samples, n_samples)``. For an intuitive visualization of different kernel types see :ref:`sphx_glr_auto_examples_svm_plot_svm_kernels.py`.
'rbf'
degree
degree: int, default=3 Degree of the polynomial kernel function ('poly'). Must be non-negative. Ignored by all other kernels.
3
gamma
gamma: {'scale', 'auto'} or float, default='scale' Kernel coefficient for 'rbf', 'poly' and 'sigmoid'. - if ``gamma='scale'`` (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma, - if 'auto', uses 1 / n_features - if float, must be non-negative. .. versionchanged:: 0.22 The default value of ``gamma`` changed from 'auto' to 'scale'.
'scale'
coef0
coef0: float, default=0.0 Independent term in kernel function. It is only significant in 'poly' and 'sigmoid'.
0.0
shrinking
shrinking: bool, default=True Whether to use the shrinking heuristic. See the :ref:`User Guide <shrinking_svm>`.
True
probability
probability: bool, default=False Whether to enable probability estimates. This must be enabled prior to calling `fit`, will slow down that method as it internally uses 5-fold cross-validation, and `predict_proba` may be inconsistent with `predict`. Read more in the :ref:`User Guide <scores_probabilities>`. ..deprecated:: 1.9 The `probability` parameter is deprecated and will be removed in 1.11. Use `CalibratedClassifierCV(SVC(), ensemble=False)` instead of `SVC(probability=True)`.
'deprecated'
tol
tol: float, default=1e-3 Tolerance for stopping criterion.
0.001
cache_size
cache_size: float, default=200 Specify the size of the kernel cache (in MB).
200
class_weight
class_weight: dict or 'balanced', default=None Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``.
None
verbose
verbose: bool, default=False Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.
False
max_iter
max_iter: int, default=-1 Hard limit on iterations within solver, or -1 for no limit.
-1
decision_function_shape
decision_function_shape: {'ovo', 'ovr'}, default='ovr' Whether to return a one-vs-rest ('ovr') decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one ('ovo') decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, note that internally, one-vs-one ('ovo') is always used as a multi-class strategy to train models; an ovr matrix is only constructed from the ovo matrix. The parameter is ignored for binary classification. .. versionchanged:: 0.19 decision_function_shape is 'ovr' by default. .. versionadded:: 0.17 *decision_function_shape='ovr'* is recommended. .. versionchanged:: 0.17 Deprecated *decision_function_shape='ovo' and None*.
'ovr'
break_ties
break_ties: bool, default=False If true, ``decision_function_shape='ovr'``, and number of classes > 2, :term:`predict` will break ties according to the confidence values of :term:`decision_function`; otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict. See :ref:`sphx_glr_auto_examples_svm_plot_svm_tie_breaking.py` for an example of its usage with ``decision_function_shape='ovr'``. .. versionadded:: 0.22
False
random_state
random_state: int, RandomState instance or None, default=None Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when `probability` is False. Pass an int for reproducible output across multiple function calls. See :term:`Glossary <random_state>`.
None
Fitted attributes
Name
Type
Value
class_weight_
class_weight_: ndarray of shape (n_classes,) Multipliers of parameter C for each class. Computed based on the ``class_weight`` parameter.
ndarray[float64](3,)
[1.,1.,1.]
classes_
classes_: ndarray of shape (n_classes,) The classes labels.
ndarray[object](3,)
['Iris-setosa','Iris-versicolor','Iris-virginica']
dual_coef_
dual_coef_: ndarray or sparse array/matrix of shape (n_classes -1, n_SV) Dual coefficients of the support vector in the decision function (see :ref:`sgd_mathematical_formulation`), multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the :ref:`multi-class section of the User Guide <svm_multi_class>` for details. If `X` is sparse, these will also be sparse.
ndarray[float64](2, 45)
[[ 1. , 1. , 0. ,...,-0. ,-0. ,-0.13],
[ 0. , 0.61, 0.01,...,-1. ,-1. ,-0. ]]
feature_names_in_
feature_names_in_: ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0
ndarray[object](4,)
['sepal-length','sepal-width','petal-length','petal-width']
fit_status_
fit_status_: int 0 if correctly fitted, 1 otherwise (will raise warning)
int
0
intercept_
intercept_: ndarray of shape (n_classes * (n_classes - 1) / 2,) Constants in decision function.
ndarray[float64](3,)
[ 0.02,-0.04, 0.04]
n_features_in_
n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24
int
4
n_iter_
n_iter_: ndarray of shape (n_classes * (n_classes - 1) // 2,) Number of iterations run by the optimization routine to fit the model. The shape of this attribute depends on the number of models optimized which in turn depends on the number of classes. .. versionadded:: 1.1
ndarray[int32](3,)
[11,15,18]
n_support_
n_support_: ndarray of shape (n_classes,), dtype=int32 Number of support vectors for each class.
ndarray[int32](3,)
[ 5,22,18]
probA_
probA_: ndarray of shape (n_classes * (n_classes - 1) / 2) If `probability=True`, it corresponds to the parameters learned in Platt scaling to produce probability estimates from decision values. If `probability=False`, it's an empty array. Platt scaling uses the logistic function
ndarray[float64](0,)
[]
probB_
probB_: ndarray of shape (n_classes * (n_classes - 1) / 2) If `probability=True`, it corresponds to the parameters learned in Platt scaling. Platt scaling uses the logistic function ``1 / (1 + exp(decision_value * probA_ + probB_))`` where ``probA_`` and ``probB_`` are learned from the dataset [2]_. For more information on the multiclass case and training procedure see section 8 of [1]_. .. deprecated:: 1.9 The attributes `probA_` and `probB_` are deprecated in version 1.9 and will be removed in 1.11.
ndarray[float64](0,)
[]
shape_fit_
shape_fit_: tuple of int of shape (n_dimensions_of_X,) Array dimensions of training vector ``X``.
tuple
(100, 4)
support_
support_: ndarray of shape (n_SV) Indices of support vectors.
ndarray[int32](45,)
[15,22,45,...,80,83,93]
support_vectors_
support_vectors_: ndarray or sparse array/matrix of shape (n_SV, n_features) Support vectors. An empty array if kernel is precomputed. If `X` is sparse, these will also be sparse.
ndarray[float64](45, 4)
[[5. ,3. ,1.6,0.2],
[4.8,3.4,1.9,0.2],
[5.7,4.4,1.5,0.4],
...,
[7.2,3. ,5.8,1.6],
[6.4,3.1,5.5,1.8],
[7.9,3.8,6.4,2. ]]
Applying the Model
Now we’ve trained the model (without warnings)! What does that mean? Sklearn’s SVM algorithm creates an equation representing the relationship between the variables.
\[F_{\text{predict}}(X)=\text{prediction(s)}\]
For example, the Iris at index 82 has the values:
And svm_model.predict(X_train.loc[[82]]) inputs the flower dimensions into the prediction function:
\[F_{\text{predict}}(5.8, 2.7, 3.9, 1.2)=\text{Iris-versicolor}\]
Which in this example turns out to be correct:
Applying the prediction function to the entire dataset, we get a prediction for each flower:
But how good are these predictions? Answering that question is our next step.