This post is about using minimax estimation for robust learning when the test data distribution is expected to be different from the training data distribution, i.e learning that is robust to data drift. Cost Sensitive Loss Functions Given a training data set $latex D = \{x_i, y_i\}_{i=1,\ldots,N}$, most learning algorithms learn a classifier $latex \phi$ that is parametrized by a vector $latex w$ by minimizing a loss function where $latex l(x_i, y_i, w)$ is the loss on example $latex i$ and $latex f(w)$ is some function that penalizes complexity. For example for logistic regression the loss function looks like for some $latex \lambda > 0$. If, in addition, the examples came with costs $latex c_i$ (that somehow specify the importance of minimizing the loss on that particular example), we can perform cost sensitive learning by over/under-sampling the training data or minimize a cost-weighted loss function (see this paper by Zadrozny et. al. ) We further constrain $latex \sum_i^N c_i...