Support Vector Machines (SVM) sliding window

The principle behind this paradigm is that sequence tagging can be translated into a local classification task where the input is given by the features extracted from the pivot token (where the label needs to be assigned) and from a defined context window.

A linear SVM classifier is used as a base classifier over a given window. A specific non-balanced configuration of the linear SVM is used, mainly due to the distribution of classes which tends to be highly unbalanced in the entity extraction use case.

The sequence labeling task is modeled through the BIO (BEGIN, INNER, OTHER) format as for Conditonal Random Fields (CRF).
The "OTHER" class often dominates the distribution, and with a balanced configuration, the rare classes would tend to be predicted too often, generating many false positives.
The prediction over the N^th token is injected as a context feature for the token at position N+1, which adds a sequence soft constraint to the local prediction. Once the classifier has predicted the class for all the given windows, a harmonization post-processing step is triggered. This forces the predictions to follow the BIO format (BEGIN_CLASS always preceding INNER_CLASS).