Innuo

Posts

Showing posts with the label online learning

Sparse online kernel logistic regression

December 06, 2009

In a previous post , I talked about an idea for sparsifying kernel logistic regression by using random prototypes. I also showed how the prototypes themselves (as well as the kernel parameters) can be updated. (Update Apr 2010. Slides for a tutorial on this stuff.) (As a brief aside, I note that an essentially identical approach was used to sparsify Gaussian Process Regression by Snelson and Gharahmani . For GPR they use gradient ascent on the log-likelihood to learn the prototypes and labels, which is akin to learning the prototypes and betas for logistic regression. The set of prototypes and labels generated by their algorithm can be thought of as a pseudo training set.) I recently (with the help of my super-competent Java developer colleague Hiroko Bretz) implemented the sparse kernel logistic regression algorithm. The learning is done in an online fashion (i.e., using stochastic gradient descent). It seems to perform reasonably well on large datasets. Below I'll show its behav...

Online logistic regression

August 03, 2009

I like Bob Carpenter's paper -- " Lazy sparse online logistic regression ... ". In addition to being a nice overview of logistic regression, it describes online training for logistic regression by stochastic gradient descent under various parameter priors. Another cool feature is that if the feature dimensionality is large but the examples are sparse, only the parameters corresponding to the features that are non-zero (for the current example) need to be updated (this is the lazy part). It is super easy to implement (a few hundred lines in C, for an svm_light like stand-alone application) and trains very fast, as attested to by Leon Bottou. There is one issue about the regularization discount in a truly online setting where there is no "end of epoch", which was discussed by Carpenter. He suggests leaving it at a constant, which, as he points out, corresponds to steadily decreasing the variance of the prior with the number of examples. In my implementation I u...