Stochastic Gradient Descent (SGD)

Description

Stochastic Gradient Descent (SGD) is an optimization method for a linear classifier. This implementation is a Linear SVM with stochastic gradient descent (SGD) learning.

The gradient of the loss is estimated one sample at a time and the model is updated with each estimation. The algorithm descends along the cost function towards its minimum for each training example.

SGD has been successfully applied to large-scale and sparse machine learning problems often encountered in text classification and NLP. Given that the data is sparse, the classifiers in this module easily scale to problems with large training sets and a big number of features.

Properties

SGD training provides three major advantages over standard SVM training due to implementing an online update of its parameters one sample at a time:

Lower memory consumption.
Scalability: larger datasets can be used for training.

Hyperparameters

The hyperparameters for this model type are:

SGD alpha regularization parameter
Class weight