This post is about using minimax estimation for robust learning when the test data distribution is expected to be different from the training data distribution, i.e learning that is robust to data drift. Cost Sensitive Loss Functions Given a training data set latexD={xi,yi}i=1,…,N, most learning algorithms learn a classifier latexϕ that is parametrized by a vector latexw by minimizing a loss function where latexl(xi,yi,w) is the loss on example latexi and latexf(w) is some function that penalizes complexity. For example for logistic regression the loss function looks like for some latexλ>0. If, in addition, the examples came with costs latexci (that somehow specify the importance of minimizing the loss on that particular example), we can perform cost sensitive learning by over/under-sampling the training data or minimize a cost-weighted loss function (see this paper by Zadrozny et. al. ) We further constrain $latex \sum_i^N c_i...