Posts

Showing posts with the label kernel logistic regression

Sparse online kernel logistic regression

Image
In a previous post , I talked about an idea for sparsifying kernel logistic regression by using random prototypes. I also showed how the prototypes themselves (as well as the kernel parameters) can be updated. (Update Apr 2010. Slides for a tutorial on this stuff.) (As a brief aside, I note that an essentially identical approach was used to sparsify Gaussian Process Regression by Snelson and Gharahmani . For GPR they use gradient ascent on the log-likelihood to learn the prototypes and labels, which is akin to learning the prototypes and betas for logistic regression. The set of prototypes and labels generated by their algorithm can be thought of as a pseudo training set.) I recently (with the help of my super-competent Java developer colleague Hiroko Bretz) implemented the sparse kernel logistic regression algorithm. The learning is done in an online fashion (i.e., using stochastic gradient descent). It seems to perform reasonably well on large datasets. Below I'll show its behav...

An effective kernelization of logistic regression

Image
I will present a sparse kernelization of logistic regression where the prototypes are not necessarily from the training data. Traditional sparse kernel logistic regression Consider an $latex M$ class logistic regression model given by $latex P(y|x)\propto\mbox{exp}(\beta_{y0} + \sum_{j}^{d}\beta_{yj}x_j)$ for $latex y =0,1,\ldots,M$ where $latex j$ indexes the $latex d$ features. Fitting the model to a data set $latex D = \{x_i, y_i\}_{i=1,\ldots,N}$ involves estimating the betas to maximize the likelihood of $latex D$. The above logistic regression model is quite simple (because the classifier is a linear function of the features of the example), and in some circumstances we might want a classifier that can produce a more complex decision boundary. One way to achieve this is by kernelization . We write $latex P(y|x) \propto \mbox{exp}(\beta_{y0} + \sum_{i=1}^N \beta_{yi} k(x,x_i))$ for $latex y=0,1,\ldots,M$. where $latex k(.,.)$ is a kernel function. In order to be able to use this c...