Complement Naive Bayes

Complement Naive Bayes (CNB) is an adaptation of the standard Multinomial Naive Bayes (MNB) algorithm that is particularly suited for imbalanced data sets.

Specifically, CNB uses statistics from the complement of each class to compute the model weights. CNB estimates parameters using data from all classes except the class that is currently analyzed.

The inventors of CNB showed empirically that the parameter estimates for CNB are more stable than those for MNB. CNB regularly outperforms MNB—often by a considerable margin—on text classification tasks.
The training algorithm is typically very fast and it is able to produce relatively good prediction performance when the training set is relatively small (dozens of samples per class). It can be more robust than the MNB algorithm when classes are not evenly balanced or, thanks to the Normalize parameter, when training documents have different lengths.