Posts

Showing posts with the label Semi-supervised learning

The redundancy of view-redundancy for co-training

Image
Blum and Mitchell's co-training is a (very deservedly) popular semi-supervised learning algorithm that relies on class-conditional feature independence, and view-redundancy (or view-agreement) for semi-supervised learning. I will argue that the view-redundancy assumption is unnecessary, and along the way show how surrogate learning can be plugged into co-training  (which is not all that surprising considering that both are multi-view semi-sup algorithms that rely on class-conditional view-independence). I'll first explain co-training with an example. Co-training - The setup Consider a $latex y \in \{0,1\}$ classification problem on the feature space $latex \mathcal{X}=\mathcal{X}_1 \times \mathcal{X}_2$. I.e., a feature vector $latex x$ can be split into two as $latex x = [x_1, x_2]$. We make the rather restrictive assumption that $latex x_1$ and $latex x_2$ are class-conditionally independent for both classes. I.e., $latex P(x_1, x_2|y) = P(x_1|y) P(x_2|y)$ for $latex y \in ...

A surrogate learning mystery

I'll present an application of the surrogate learning idea in the previous post . It is mildly surprising at first blush, which I'll contrive to make more mysterious. Murders she induced For readers who appreciate this sort of a thing, here's the grandmother of all murder mysteries. In this one Miss Marple solves a whole bunch of unrelated murders all at once. Miss Marple having finally grown wise to the unreliability of feminine intuition in solving murder mysteries spent some time learning statistics. She then convinced one of her flat-footed friends at the Scotland Yard to give her the files on all the unsolved murders, that were just sitting around gathering dust and waiting for someone with her imagination and statistical wiles. She came home with the massive pile of papers and sat down to study them. The file on each murder carefully listed all the suspects, with their possible motives, their accessibility to the murder weapon, psychological characteristics, previous ...

Surrogate learning with mean independence

In this paper we showed that if we had a feature $latex x_1$ that was class-conditionally statistically independent of the rest of the features, denoted $latex x_2$, learning a classifier between the two classes $latex y=0$ and $latex y = 1$ can be transformed into learning a predictor of $latex x_1$ from $latex x_2$ and another of $latex y$ from $latex x_1$. Since the first predictor can be learned on unlabeled examples and the second is a classifier on a 1-D space, the learning problem becomes easy. In a sense $latex x_1$ acts as a surrogate for $latex y$. Similar ideas can be found in Ando and Zhang '07 , Quadrianto et. al. '08 , Blitzer et. al. '06 , and others. Derivation from mean-independence I'll now derive a similar surrogate learning algorithm from mean independence rather than full statistical independence. Recall that the random variable $latex U$ is mean-independent of  the r.v. $latex V$ if $latex E[U|V] = E[U]$. Albeit weaker than full independence, mean...