# ML model types for categorization

These families of ML models can be used in categorization experiments:

- Margin classifiers
- Decision trees ensembles
- Linear
- Naive Bayes

## Margin classifiers

Margin classifiers are based on the concept of margin (or distance) between data points and a decision boundary. Support Vector Machines (SVM) models, for example, separate data points into groups by a line or n-dimensional spaces of best fit.

Margin classifiers available in Platform are:

Note

Passive aggressive and SGD are the only models available for Online-ML experiments.

## Decision trees ensemble

Decision trees ensembles are based on a collection of decision trees (typically if-then branches) and infer decision rules from the data features of the training samples.

These models are an efficient way to combine the predictions of different estimators built with a given learning algorithm.

They can be used to generate multiple decision tree estimators following two different approaches:

- Bagging: generates multiple decision trees by splitting data.
- Boosting: starts with one single estimator and iteratively adds more estimators to boost accuracy and reduce the remaining errors.

The available model types based on decision trees are:

- Random Forest, using bagging
- Gradient Boosting (Gboost), using boosting
- Extreme Gradient Boosting (XGBoost) using boosting

## Linear

Linear models determine functions that describe the relationship between a dependent variable and one or more independent variables.

The linear model available is Logistic Regression.

## Naive Bayes

Naive Bayes algorithms are based on Bayes' theorem.

They assume that the contribution of each feature to a given prediction is independent and equal. This "naive" assumption is far from correct, especially in text-classification (where different textual categorizations are highly correlated), but in specific conditions it could turn out to be a reasonable approximation.

Naive Bayes models available in Platform are: