ML model types for categorization
ML model types that can be used in categorization projects are classified in the following groups:
Support Vector Machine
Support Vector Machines (SVM) models separate data points into groups by a line or n-dimensional spaces of best fit.
The available SVM model types are:
Decision trees ensemble
Decision trees ensemble models generate collections of decision trees (typically if-then branches) and infer decision rules from the data features of the training samples.
These models are an efficient way to combine the predictions of different estimators built with a given learning algorithm.
They can be used to generate multiple decision tree estimators following two different approaches:
- Bagging: the Random Forest model approach, which generates multiple decision trees by splitting data.
- Boosting: the Gradient Boosting (Gboost) and Extreme Gradient Boosting (XGBoost) models approach, which starts with one single estimator and iteratively adds more estimators to boost accuracy and reduce the remaining errors.
The available model types based on decision trees are:
Linear
Linear models determine functions that describe the relationship between a dependent variable and one or more independent variables.
The linear model available is Logistic Regression.
Naive Bayes
Naive Bayes models are a family of classification algorithms based on Bayes' Theorem.
Naive Bayes Models assume that the contribution of each feature to a given prediction is independent and equal. This "naive" assumption is far from correct, especially in text-classification (where different textual categorizations are highly correlated), but in specific conditions it could turn out to be a reasonable approximation.
The available Naive Bayes model types are: