Probability density estimation as classification

Perhaps it has always been obvious to exalted statistical minds that density estimation can be viewed as classification and (perhaps) done using a classifier.

Assume that we have samples $latex \{x_i\}_{i=1,\ldots,N}$ of a random vector $latex X$ whose distribution has bounded support. In fact, without loss of generality, let the support be the unit hypercube $latex [0,1]^d$. We are required to estimate $latex P_X(x)$ the density of $latex X$.

Now assume that we generate a bunch of samples $latex \{z_i\}_{i=1,\ldots,M}$ uniformly distributed in $latex [0,1]^d$. We assign a label $latex y = 1$ to all the samples $latex x_i$ and a label $latex y = 0$ to all $latex z_i$ and a build a classifier $latex \psi$ between the two sample sets. In other words we construct an estimate $latex P_\psi(y=1|x)$ of the posterior class probability $latex P(y=1|x)$ $latex \forall x \in [0,1]^d$.

Now, we know that

eq1

where $latex U(x) = 1$, the uniform distribution over the unit hypercube. The above equation can be solved for $latex P_X(x)$ to obtain an estimate

$latex \hat{P}_X(x)=\frac{M}{N}\frac{P_\psi(y=1|x)}{P_\psi(y=0|x)}$

Because $latex M$ is in our control, ideally we would like to obtain

$latex \hat{P}_X(x)=\frac{1}{N} \mbox{lim}_{M \rightarrow \infty}\{ \frac{M . P_\psi(y=1|x)}{P_\psi(y=0|x)}\}$

The question is, because we know the distribution of the samples for class 0 (uniform!), for any particular classifier (say the Gaussian process classifier or logistic regression)  can the limit be computed/approximated without actually sampling and then learning?

This paper (which I haven't yet read) may be related.

Update Aug 24, 2009. The uniform distribution can be substituted by any other proposal distribution from which we can draw samples and which has a support that includes the support of the density we wish to estimate. George, thanks for pointing this out.

Comments

Popular posts from this blog

Incremental complexity support vector machine

An effective kernelization of logistic regression

The Cult of Universality in Statistical Learning Theory