The redundancy of view-redundancy for co-training
Blum and Mitchell's co-training is a (very deservedly) popular semi-supervised learning algorithm that relies on class-conditional feature independence, and view-redundancy (or view-agreement) for semi-supervised learning. I will argue that the view-redundancy assumption is unnecessary, and along the way show how surrogate learning can be plugged into co-training (which is not all that surprising considering that both are multi-view semi-sup algorithms that rely on class-conditional view-independence). I'll first explain co-training with an example. Co-training - The setup Consider a $latex y \in \{0,1\}$ classification problem on the feature space $latex \mathcal{X}=\mathcal{X}_1 \times \mathcal{X}_2$. I.e., a feature vector $latex x$ can be split into two as $latex x = [x_1, x_2]$. We make the rather restrictive assumption that $latex x_1$ and $latex x_2$ are class-conditionally independent for both classes. I.e., $latex P(x_1, x_2|y) = P(x_1|y) P(x_2|y)$ for $latex y \in ...