Probability density estimation as classification
Perhaps it has always been obvious to exalted statistical minds that density estimation can be viewed as classification and (perhaps) done using a classifier.
Assume that we have samples $latex \{x_i\}_{i=1,\ldots,N}$ of a random vector $latex X$ whose distribution has bounded support. In fact, without loss of generality, let the support be the unit hypercube $latex [0,1]^d$. We are required to estimate $latex P_X(x)$ the density of $latex X$.
Now assume that we generate a bunch of samples $latex \{z_i\}_{i=1,\ldots,M}$ uniformly distributed in $latex [0,1]^d$. We assign a label $latex y = 1$ to all the samples $latex x_i$ and a label $latex y = 0$ to all $latex z_i$ and a build a classifier $latex \psi$ between the two sample sets. In other words we construct an estimate $latex P_\psi(y=1|x)$ of the posterior class probability $latex P(y=1|x)$ $latex \forall x \in [0,1]^d$.
Now, we know that
where $latex U(x) = 1$, the uniform distribution over the unit hypercube. The above equation can be solved for $latex P_X(x)$ to obtain an estimate
$latex \hat{P}_X(x)=\frac{M}{N}\frac{P_\psi(y=1|x)}{P_\psi(y=0|x)}$
Because $latex M$ is in our control, ideally we would like to obtain
$latex \hat{P}_X(x)=\frac{1}{N} \mbox{lim}_{M \rightarrow \infty}\{ \frac{M . P_\psi(y=1|x)}{P_\psi(y=0|x)}\}$
The question is, because we know the distribution of the samples for class 0 (uniform!), for any particular classifier (say the Gaussian process classifier or logistic regression) can the limit be computed/approximated without actually sampling and then learning?
This paper (which I haven't yet read) may be related.
Update Aug 24, 2009. The uniform distribution can be substituted by any other proposal distribution from which we can draw samples and which has a support that includes the support of the density we wish to estimate. George, thanks for pointing this out.
Assume that we have samples $latex \{x_i\}_{i=1,\ldots,N}$ of a random vector $latex X$ whose distribution has bounded support. In fact, without loss of generality, let the support be the unit hypercube $latex [0,1]^d$. We are required to estimate $latex P_X(x)$ the density of $latex X$.
Now assume that we generate a bunch of samples $latex \{z_i\}_{i=1,\ldots,M}$ uniformly distributed in $latex [0,1]^d$. We assign a label $latex y = 1$ to all the samples $latex x_i$ and a label $latex y = 0$ to all $latex z_i$ and a build a classifier $latex \psi$ between the two sample sets. In other words we construct an estimate $latex P_\psi(y=1|x)$ of the posterior class probability $latex P(y=1|x)$ $latex \forall x \in [0,1]^d$.
Now, we know that
where $latex U(x) = 1$, the uniform distribution over the unit hypercube. The above equation can be solved for $latex P_X(x)$ to obtain an estimate
$latex \hat{P}_X(x)=\frac{M}{N}\frac{P_\psi(y=1|x)}{P_\psi(y=0|x)}$
Because $latex M$ is in our control, ideally we would like to obtain
$latex \hat{P}_X(x)=\frac{1}{N} \mbox{lim}_{M \rightarrow \infty}\{ \frac{M . P_\psi(y=1|x)}{P_\psi(y=0|x)}\}$
The question is, because we know the distribution of the samples for class 0 (uniform!), for any particular classifier (say the Gaussian process classifier or logistic regression) can the limit be computed/approximated without actually sampling and then learning?
This paper (which I haven't yet read) may be related.
Update Aug 24, 2009. The uniform distribution can be substituted by any other proposal distribution from which we can draw samples and which has a support that includes the support of the density we wish to estimate. George, thanks for pointing this out.
Comments
Post a Comment