Multinomial Naive Bayes

Multinomial Naive Bayes is one of the two classic Naive Bayes variants used in text classification.

P(x_i | y) of features i appearing in sample belonging to class y is:

(Ny_i + alpha)/(Ny + alpha * n)

where:

Ny_i = Sum(x_i) the number of times feature i appears in a sample of class y in the training set.
Ny = Sum(Ny_i) is the total count of all features for class y.

The training algorithm is typically very fast, and it is able to produce relatively good prediction performance when:

The training set is relatively small (dozens of samples per class).
Training and test data are internally well balanced (the different classes are equally represented in the data distribution).
Dataset are mainly made of equally sized documents.