Probability density estimation as classification

August 01, 2009

Perhaps it has always been obvious to exalted statistical minds that density estimation can be viewed as classification and (perhaps) done using a classifier.

Assume that we have samples $latex \{x_i\}_{i=1,\ldots,N}$ of a random vector $latex X$ whose distribution has bounded support. In fact, without loss of generality, let the support be the unit hypercube $latex [0,1]^d$. We are required to estimate $latex P_X(x)$ the density of $latex X$.

Now assume that we generate a bunch of samples $latex \{z_i\}_{i=1,\ldots,M}$ uniformly distributed in $latex [0,1]^d$. We assign a label $latex y = 1$ to all the samples $latex x_i$ and a label $latex y = 0$ to all $latex z_i$ and a build a classifier $latex \psi$ between the two sample sets. In other words we construct an estimate $latex P_\psi(y=1|x)$ of the posterior class probability $latex P(y=1|x)$ $latex \forall x \in [0,1]^d$.

Now, we know that

where $latex U(x) = 1$, the uniform distribution over the unit hypercube. The above equation can be solved for $latex P_X(x)$ to obtain an estimate

$latex \hat{P}_X(x)=\frac{M}{N}\frac{P_\psi(y=1|x)}{P_\psi(y=0|x)}$

Because $latex M$ is in our control, ideally we would like to obtain

$latex \hat{P}_X(x)=\frac{1}{N} \mbox{lim}_{M \rightarrow \infty}\{ \frac{M . P_\psi(y=1|x)}{P_\psi(y=0|x)}\}$

The question is, because we know the distribution of the samples for class 0 (uniform!), for any particular classifier (say the Gaussian process classifier or logistic regression) can the limit be computed/approximated without actually sampling and then learning?

This paper (which I haven't yet read) may be related.

Update Aug 24, 2009. The uniform distribution can be substituted by any other proposal distribution from which we can draw samples and which has a support that includes the support of the density we wish to estimate. George, thanks for pointing this out.

Search This Blog

Innuo

Probability density estimation as classification

Comments

Post a Comment

Popular posts from this blog

Incremental complexity support vector machine

The Cult of Universality in Statistical Learning Theory

An effective kernelization of logistic regression