ML model types for categorization

These families of ML models can be used in categorization experiments:

Margin classifiers
Decision trees ensembles
Linear
Naive Bayes

Margin classifiers

Margin classifiers are based on the concept of margin (or distance) between data points and a decision boundary. Support Vector Machines (SVM) models, for example, separate data points into groups by a line or n-dimensional spaces of best fit.

Margin classifiers available in Platform are:

Note

Passive aggressive and SGD are the only models available for Online-ML experiments.

Decision trees ensemble

Decision trees ensembles are based on a collection of decision trees (typically if-then branches) and infer decision rules from the data features of the training samples.

These models are an efficient way to combine the predictions of different estimators built with a given learning algorithm.

They can be used to generate multiple decision tree estimators following two different approaches:

Bagging: generates multiple decision trees by splitting data.
Boosting: starts with one single estimator and iteratively adds more estimators to boost accuracy and reduce the remaining errors.

The available model types based on decision trees are:

Random Forest, using bagging
Gradient Boosting (Gboost), using boosting
Extreme Gradient Boosting (XGBoost) using boosting

Linear

Linear models determine functions that describe the relationship between a dependent variable and one or more independent variables.
The linear model available is Logistic Regression.

Naive Bayes

Naive Bayes algorithms are based on Bayes' theorem.
They assume that the contribution of each feature to a given prediction is independent and equal. This "naive" assumption is far from correct, especially in text-classification (where different textual categorizations are highly correlated), but in specific conditions it could turn out to be a reasonable approximation.

Naive Bayes models available in Platform are: