Choose a Classifier - MATLAB & Simulink

27/02/2016

C hoose a Cl assi fi er - M ATLAB & Si m ul i nk

Choose a Classifier Choose a Classifier Type In Classification Learner, you can explore several types of classifiers. To see all available classifier options, on the Classificat Classification ion Learner tab, click the arrow on the far right of the Classifier section to expand the list of classifiers. The options in the Classifier gallery are preset starting points with different settings, suitable for a range of different classification problems. For help choosing a classifier type, see the table showing typical characteristics of different supervised learning algorithms. Use the table as a guide for your initial choice of algorithms. Decide on the tradeoff you want in speed, memory usage, flexibility, and interpretability. The best classifier type depends on your data. Tip Try a decision tree or discrimin ant first, because these classifiers classifie rs are fast fast and easy to interpret. IfIf the the models are not accurate enough predicting the response, try other classifiers with higher flexibility. To control flexibility, see the details for each classifier type. To To avoid overfit overfitting, ting, look for a model of l ower flexibili ty that provides that provides suff sufficie icient nt accuracy.

Choose Choo se a Classifier Type Classifier

Prediction Speed

Memory Usage

Interpretability

Decision Deci sion Trees

Fast

Small

Easy

Discriminant Disc riminant Analysis

Fast

Small for linear, large Easy for quadratic

Support Vector Machines

Medium for linear. Slow for others.

Medium for linear. All others: med ium for multiclass, large for binary.

Easy for Linear SVM. Hard for all other kernel types.

Nearest Neighbor Classifiers

Slow for cubic. Medium for others.

Medium

Hard

Ensemble Classifiers

Fast to medium depending on choice of algorithm

Low to high depending on choice of algorithm.

Hard

The tables on this page describe general characteristics of speed and memory usage for all the preset classifiers. The classifiers were tested with various data sets (up to 7000 observations, 80 predictors, and 50 classes), and the results define the following groups: Speed •

Fast 0.01 second

•

Medium 1 second

•

Slow 100 seconds

Memory •

Small 1MB

•

Medium 4MB

•

Large 100MB

These tables provide a general guide. Your results depend on your data and the speed of your machine. http://w ww .m athw or ks.com /hel p/stats/choose- a- cl assi fi er .htm l

1/11

27/02/2016

Choose a Classifier - MATLAB & Simulink

To read a description of each classifier in Classification Learner, switch to the details view.

Tip After you choose a classifier type (e.g., decision trees), try traini ng using each of the classifiers. The options in the Classifier gallery are starting points with different settings. Try them all to see which option produces the best model with your data.

For workflow instructions, see Explore Classification Models Interactively. Categorical Predictor Support In Classification Learner, the classifier gallery only shows classifier types that support your selected data. Classifier

All predictors numeric

All predictors categorical

Some categorical, some numeric

Decision Trees

Yes

Yes

Yes

Discriminant Analysis

Yes

No

No

SVM

Yes

Yes

Yes

N ea rest N ei ghbo ur

Eu cl idea n d istance o nl y

Ha mmi ng d ista nce on ly

No

Ensembles

Yes

Yes, except Subspace Discriminant

Yes, except any Subspace

Decision Trees Decision trees are easy to interpret, fast for fitting and prediction, and low on memory usage, but they can have low predictive accuracy. Try to grow simpler trees to prevent overfitting. Control the depth with the Maximum number of splits setting. Tip Model flexibility increases with the Maximum number of splits setting.

Prediction Speed http://www.mathworks.com/help/stats/choose-a-classifier.html

Model Flexibility 2/11

27/02/2016


Classifier Type

Memory Usage

Interpretability

Simple Tree

Fast

Small

Easy

Low. Few leaves to make coarse distinctions between classes (maximum number of splits is 4).

Medium Tree

Fast

Small

Easy

Medium Medium number of leaves for finer distinctions between classes (maximum number of splits is 20).

Complex Tree

Fast

Small

Easy

High Many leaves to make many fine distinctions between classes (maximum number of splits is 100).

Tip Try training each of the decision tree options in the Classifier gallery. Train them all to see which settings produce the best model with your data. Select the best model in the History list. To try to improve your model, try feature selection, and then try changing some advanced options.

You train classification trees to predict responses to data. To predict a response, follow the decisions in the tree from the root (beginning) node down to a leaf node. The leaf node contains the response. Statistics and Machine Learning Toolbox™ trees are binary. Each step in a prediction involves checking the value of one predictor (variable). For example, here is a simple classification tree:

This tree predicts classifications based on two predictors, x1 and x2. To predict, start at the top node. At each decision, check the values of the predictors to decide which branch to follow. When the branches reach a leaf node, the data is classified either as type 0 or 1. You can visualize your decision tree model by exporting the model from the app, and then entering: view(trainedClassifier.ClassificationTree,'Mode','graph') The figure shows an example complex tree trained with the fisheriris data.

http://www.mathworks.com/help/stats/choose-a-classifier.html

3/11

27/02/2016


Tip For an example, see Explore Decision Trees Interactively.

Advanced Tree Options Classification trees in Classification Learner use the fitctree function. You can set these options: •

Maximum number of splits Specify the maximum number of splits or branch points to control the depth of your tree. When you grow a decision tree, consider its simplicity and predictive power. To change the number of splits, click the buttons or enter a positive integer value in the Maximum number of splits box. ˗ A complex tree with many leaves is usually highly accurate on the training data. However, the tree might not show comparable accuracy on an independent test set. A leafy tree tends to overtrain, and its validation accuracy is often far lower than its training (or resubstitution) accuracy. ˗

•

In contrast, a simple tree does not attain high training accuracy. But a simple tree can be more robust in that its training accuracy can approach that of a representative test set. Also, a simple tree is easy to interpret.

Split criterion Specify the split criterion measure for deciding when to split nodes. Try each of the three settings to see if they improve the model with your data. Split criterion options are Gini's diversity index , Twoing rule, or Maximum deviance reduction (also known as cross entropy). The classification tree tries to optimize to pure nodes containing only one class. Gini's diversity index (the default) and the deviance criterion measure node impurity. The twoing rule is a different measure for deciding how to split a node, where maximizing the twoing rule expression increases node purity. For details of these split criteria, see ClassificationTree Definitions.

•

Surrogate decision splits — Only for missing data.


4/11

27/02/2016


Specify surrogate use for decision splits. If you have data with missing values, use surrogate splits to improve the accuracy of predictions. When you set Surrogate decision splits to On, the classification tree finds at most 10 surrogate splits at each branch node. To change the number, click the buttons or enter a positive integer value in the Maximum surrogates per node box. When you set Surrogate decision splits to Find All, the classification tree finds all surrogate splits at each branch node. The Find All setting can use considerable time and memory.

Discriminant Analysis Discriminant analysis is a popular first classification algorithm to try because it is fast, accurate and easy to interpret. Discriminant analysis is good for wide datasets. Discriminant analysis assumes that different classes generate data based on different Gaussian distributions. To train a classifier, the fitting function estimates the parameters of a Gaussian distribution for each class. Classifier Type

Prediction Speed

Memory Usage

Interpretability

Model Flexibility

Linear Discriminant

Fast

Small

Easy

Low. Creates linear boundaries between classes.

Quadratic Discriminant

Fast

Large

Easy

Low. Creates nonlinear boundaries between classes (ellipse, parabola or hyperbola).

Advanced Discriminant Options Discriminant analysis in Classification Learner uses the fitcdiscr function. For either linear or quadratic discriminants, you can change the Regularization option. If you change the regularization option, training can fail if you have predictors with zero variance or any of the covariance matrices of your predictors are singular. •

Linear discriminants: If your predictors are independent or if your data set is small, then leave the default Diagonal Covariance . If your predictors are dependent, try selecting the Auto option to set regularization automatically and see if that improves your model. If training fails, use Diagonal Covariance instead.

•

Quadratic discriminants: You can try selecting None to remove regularization and see if that improves your model. The None option can cause training to fail if any of the covariance matrices of your predictors are singular. Try removing predictors, select the Diagonal Covariance option instead, or try another classifier.

Support Vector Machines In Classification Learner, you can train SVMs when your data has two or more classes. Tip Model flexibility decreases with the Kernel scale setting.

Classifier Type Linear SVM

Prediction Speed Binary: Fast

Memory Usage

Interpretability Model Flexibility

Medium

Easy

Low Makes a simple linear separation between classes.

Binary: Medium

Hard

Medium

Multiclass: Medium Quadratic SVM

Binary: Fast

Multiclass: Large http://www.mathworks.com/help/stats/choose-a-classifier.html

5/11

27/02/2016


Multiclass: Slow C ub ic SVM

Bi na ry: Fa st

Binary: Medium

Multiclass: Slow

Multiclass: Large

Fine Gaussian SVM

Binary: Fast

Binary: Medium

Multiclass: Slow

Multiclass: Large

Medium Gaussian SVM

Binary: Fast

Binary: Medium

Multiclass: Slow

Multiclass: Large

Coarse Gaussian SVM

Binary: Fast

Binary: Medium

Multiclass: Slow

Multiclass: Large

Hard

Medium

Hard

High — decreases with Kernel scale setting. Makes finely detailed distinctions between classes, with kernel scale set to sqrt(P)/4.

Hard

Medium Medium distinctions, with kernel scale set to sqrt(P).

Hard

Low Makes coarse distinctions between classes, with kernel scale set to sqrt(P)*4, where P is the number of predictors.

Tip Try training each of the support vector machine options in the Classifier gallery. Train them all to see which settings produce the best model with your data. Select the best model in the History list. To try to improve your model, try feature selection, and then try changing some advanced options.

An SVM classifies data by finding the best hyperplane that separates data points of one c lass from those of the other class. The best hyperplane for an SVM means the one with the largest margin between the two classes. Margin means the maximal width of the slab parallel to the hyperplane that has no interior data points. The support vectors are the data points that are closest to the separating hyperplane; these points are on the boundary of the slab. The following figure illustrates these definitions, with + indicating data points of type 1, and – indicating data points of type –1.

SVMs can also use a soft margin, meaning a hyperplane that separates many, but not all data points. For an example, see Explore Support Vector Machines Interactively . Advanced SVM Options If you have exactly two classes, Classification Learner uses the fitcsvm function to train the classifier. If you have more than two classes, the app uses the fitcecoc function to reduce the multiclass classification problem to a set of binary classification subproblems, with one SVM learner for each subproblem. To examine


6/11

27/02/2016


the code for the binary and multiclass classifier types, you can generate code from your trained classifiers in the app. You can set these options in the app: •

Kernel function Specify the Kernel function to compute the Gram matrix.

•

˗

Linear kernel, easiest to interpret

˗

Gaussian or Radial Basis Function (RBF) kernel

˗

Quadratic

˗

Cubic

Box constraint level Specify the box constraint to keep the allowable values of the Lagrange multipliers in a box, a bounded region. To tune your SVM classifier, try increasing the box constraint level. Click the buttons or enter a positive scalar value in the Box constraint level box. Increasing the box constraint level can decrease the number of support vectors, but also can increase training time. The Box Constraint parameter is the soft-margin penalty known as C in the primal equations, and is a hard "box" constraint in the dual equations.

•

Kernel scale mode Specify manual kernel scaling if desired. When you set Kernel scale mode to Auto, then the software uses a heuristic procedure to select the scale value. The heuristic procedure uses subsampling. Therefore, to reproduce results, set a random number seed using rng before training the classifier. When you set Kernel scale mode to Manual, you can specify a value. Click the buttons or enter a positive scalar value in the Manual kernel scale box. The software divides all elements of the predictor matrix by the value of the kernel scale. Then, the software applies the appropriate kernel norm to compute the Gram matrix.

•

Multiclass method Only for data with 3 or more classes. This method reduces the multiclass classification problem to a set of binary classification subproblems, with one SVM learner for each subproblem. One‐vs‐One trains one learner for each pair of classes. It learns to distinguish one class from the other. One‐vs‐All trains one learner for each class. It learns to distinguish one class from all others.

•

Standardize data Specify whether to scale each coordinate distance. If predictors have widely different scales, standardizing can improve the fit.

Nearest Neighbor Classifiers Nearest neighbor classifiers typically have good predictive accuracy in low dimensions, but might not in high dimensions. They have high memory usage, and are not easy to interpret. Tip Model flexibility decreases with the Number of neighbors setting.

Classifier Type

Prediction Speed

Memory Usage

Interpretability

Model Flexibility

Fine KNN

Medium

Medium

Hard

Finely detailed distinctions between classes. The number of neighbors is set to 1.

Medium KNN

Medium

Medium

Hard


7/11

27/02/2016


Medium di stinctions between classes. The number of neighbors is set to 10. Coarse KNN

Medium

Medium

Hard

Coarse distinctions between classes. The number of neighbors is set to 100.

Cosine KNN

Medium

Medium

Hard

Medium distinctions between classes, using a Cosine distance metric. The number of neighbors is set to 10.

Cubic KNN

Slow

Medium

Hard

Medium distinctions between classes, using a cubic distance metric. The number of neighbors is set to 10.

Weighted KNN

Medium

Medium

Hard

Medium distinctions between classes, using a distance weight. The number of neighbors is set to 10.

Tip Try training each of the nearest neighbor options in the Classifier gallery. Train them all to see which settings produce the best model with your data. Select the best model in the History list. To try to improve your model, try feature selection, and then (optionally) try changing some advanced options.

What is k -Nearest Neighbor classification? Categorizing query points based on their distance to points (or neighbours) in a training dataset can be a simple yet effective way of classifying new points. You can use various metrics to determine the distance. Given a set X of n points and a distance function, k -nearest neighbor (k NN) search lets you find the k closest points in X to a query point or set of points. k NN-based algorithms are widely used as benchmark machine learning rules.


8/11

27/02/2016


For an example, see Explore Nearest Neighbor Classification Interactively. Advanced KNN Options Nearest Neighbor classifiers in Classification Learner use the fitcknn function. You can set these options: •

Number of neighbors Specify the number of nearest neighbors to find for classifying each point when predicting. Specify a fine (low number) or coarse classifier (high number) by changing the number of neighbors. For example, a fine KNN uses one neighbor, and a coarse KNN uses 100. Many neighbors can be time consuming to fit.

•

Distance metric You can use various metrics to determine the distance to points. For definitions, see the class ClassificationKNN .

•

Distance weight Specify the distance weighting function. You can choose Equal (no weights), Inverse (weight is 1/distance), or Squared Inverse (weight is 1/distance2).

•

Standardize data Specify whether to scale each coordinate distance. If predictors have widely different scales, standardizing can improve the fit.

Ensemble Classifiers Ensemble classifiers meld results from many weak learners into one high-quality ensemble predictor. Qualities depend on the choice of algorithm. Tip Model flexibility increases with the Number of learners setting. All ensemble classifiers tend to be slow to fit because they often need many learners.

Classifier Type Boosted Trees

Prediction Speed Fast

Memory Usage

Interpretability

Low

Hard

Ensemble Method

Model Flexibility

AdaBoost, with Decision Tree

Medium to high — increases with Number of learners or Maximum number of splits setting.

learners


9/11

27/02/2016


Tip Boosted trees can usually do better than bagged, but might require parameter tuning and more learners

Bagged Trees

Medium

High

Hard

Random forest Bag, with Decision Tree learners

High — increases with Number of learners setting. Tip Try this classifier first.

Subspace Discriminant

Medium

Low

Hard

Subspace, with Discriminant learners

Medium — increases with Number of learners setting. Good for many predi ctors

Subspace KNN

Medium

Medium

Hard

Subspace, with Nearest Medium — increases with Neighbor learners Number of learners

setting. Good for many predi ctors RUSBoost Trees

Fast

Low

Hard

RUSBoost, with Decision Tree

learners

Medium — increases with Number of learners or Maximum number of splits setting. Good for skewed data (with many more observations of 1 class)

GentleBoost or LogitBoost — not available in Classifier gallery. If you have 2 class data, select manually.

Fast

Low

Hard

GentleBoost or LogitBoost, with Decision Tree

learners Choose Boosted Trees and change to GentleBoost method.

Medium — increases with Number of learners or Maximum number of splits setting. For binary classification only

Bagged trees use Breiman's 'random forest' algorithm. For reference, see Breiman, L. Random Forests. Machine Learning 45, pp. 5–32, 2001. Tips •

Try bagged trees first. Boosted trees can usually do better but might require searching many parameter values, which is time-consuming.

•

Try training each of the ensemble classifier options in the Classifier gallery. Train them all to see which settings produce the best model with your data. Select the best model in the History list. To try to improve your model, try feature selection, PCA, and then (optionally) try changing some advanced options.

•

For boosting ensemble methods, you can get fine detail with either deeper trees or larger numbers of shallow trees. As with single tree classifiers, deep trees can cause overfitting. You need to experiment to choose the best tree depth for the trees in the ensemble, in order to trade-off data fit with tree complexity. Use the Number of learners and Maximum number of splits settings.

For an example, see Explore Ensemble Classification Interactively. Advanced Ensemble Options http://www.mathworks.com/help/stats/choose-a-classifier.html

10/11

27/02/2016


Ensemble classifiers in Classification Learner use the fitensemble function. You can set these options: •

For help choosing Ensemble method and Learner type, see the Ensemble table. Try the presets first.

•

Maximum number of splits For boosting ensemble methods, specify the maximum number of splits or branch points to control the depth of your tree learners. Many branches tend to overfit, and simpler trees can be more robust and easy to interpret. Experiment to choose the best tree depth for the trees in the ensemble.

•

Number of learners Try changing the number of learners to see if you can improve the model. Many learners can produce high accuracy, but can be time consuming to fit. Start with a few dozen learners, and then inspect the performance. An ensemble with good predictive power can need a few hundred learners.

•

Learning rate Specify the learning rate for shrinkage. If you set the learning rate to less than 1, the ensemble requires more learning iterations but often achieves better accuracy. 0.1 is a popular choice.

•

Subspace dimension For subspace ensembles, specify the number of predictors to sample in each learner. The app chooses a random subset of the predictors for each learner. The subsets chosen by different learners are independent.

For next steps training models, see Explore Classification Models Interactively.

Related Examples •

Explore Classification Models Interactively

•

Select Data and Validation for Classification Problem

•

Feature Selection and Feature Transformation

• Ass ess Classifier Performance •

Export Classification Model to Predict New Data

•

Explore Decision Trees Interactively


11/11

Choose a Classifier - MATLAB & Simulink

Recommend Documents