This post is about using minimax estimation for robust learning when the test data distribution is expected to be different from the training data distribution, i.e learning that is robust to data drift. Cost Sensitive Loss Functions Given a training data set latex D = \{x_i, y_i\}_{i=1,\ldots,N}, most learning algorithms learn a classifier latex \phi that is parametrized by a vector latex w by minimizing a loss function where latex l(x_i, y_i, w) is the loss on example latex i and latex f(w) is some function that penalizes complexity. For example for logistic regression the loss function looks like for some latex \lambda > 0. If, in addition, the examples came with costs latex c_i (that somehow specify the importance of minimizing the loss on that particular example), we can perform cost sensitive learning by over/under-sampling the training data or minimize a cost-weighted loss function (see this paper by Zadrozny et. al. ) We further constrain $latex \sum_i^N c_i...