ML model types for categorization
These families of ML models can be used in categorization experiments:
- Margin classifiers
- Decision trees ensembles
- Linear
- Naive Bayes
Margin classifiers
Margin classifiers are based on the concept of margin (or distance) between data points and a decision boundary. Support Vector Machines (SVM) models, for example, separate data points into groups by a line or n-dimensional spaces of best fit.
Margin classifiers available in Platform are:
Note
Passive aggressive and SGD are the only models available for Online-ML experiments.
Decision trees ensemble
Decision trees ensembles are based on a collection of decision trees (typically if-then branches) and infer decision rules from the data features of the training samples.
These models are an efficient way to combine the predictions of different estimators built with a given learning algorithm.
They can be used to generate multiple decision tree estimators following two different approaches:
- Bagging: generates multiple decision trees by splitting data.
- Boosting: starts with one single estimator and iteratively adds more estimators to boost accuracy and reduce the remaining errors.
The available model types based on decision trees are:
- Random Forest, using bagging
- Gradient Boosting (Gboost), using boosting
- Extreme Gradient Boosting (XGBoost) using boosting
Linear
Linear models determine functions that describe the relationship between a dependent variable and one or more independent variables.
The linear model available is Logistic Regression.
Naive Bayes
Naive Bayes algorithms are based on Bayes' theorem.
They assume that the contribution of each feature to a given prediction is independent and equal. This "naive" assumption is far from correct, especially in text-classification (where different textual categorizations are highly correlated), but in specific conditions it could turn out to be a reasonable approximation.
Naive Bayes models available in Platform are: