Posts

Showing posts from October, 2010

The Cult of Universality in Statistical Learning Theory

The question is frequently raised as to why the theory and practice of machine learning are so divergent. Whereas if you glance at any article about classification, chances are that you will find symbol upon lemma & equation upon inequality, making claims about the bounds on the error rates, that should putatively guide the engineer in the solution of her problem. However, the situation seems to be that the engineer having been forewarned by her pragmatic colleagues (or having checked a few herself) that these bounds are vacuous for most realistic problems, circumvents them altogether in her search for any useful nuggets in the article. So why do these oft-ignored analyses still persist in a field that is largely comprised of engineers? From my brief survey of the literature it seems that one  (but, by no means, the only) reason is the needless preponderance of worst-case thinking . (Being a panglossian believer of the purity of science and of the intentions of its workers, I

Random Fourier Features for Kernel Density Estimation

Image
The NIPS paper Random Fourier Features for Large-scale Kernel Machines , by Rahimi and Recht presents a method for randomized feature mapping where dot products in the transformed feature space approximate (a certain class of) positive definite (p.d.) kernels in the original space. We know that for any p.d. kernel there exists a deterministic map that has the aforementioned property but it may be infinite dimensional. The paper presents results indicating that with the randomized map we  can get away with only a "small" number of features (at least for a classification setting). Before applying the method to density estimation let us review the relevant section of the paper briefly. Bochner's Theorem and Random Fourier Features Assume that we have data in $R^d$ and a continuous p.d. kernel $K(x,y)$ defined for every pair of points $x,y \in R^d$. Assume further that the kernel is shift-invariant, i.e., $K(x,y) = K(x-y) \triangleq K(\delta)$ and that the kernel is