We study nonparametric estimation of a conditional probability for classification
based on a collection of finite-dimensional models. For more flexibility,
different types of models, linear or nonlinear are allowed as long as each
satisfies a dimensionality assumption. We show with a suitable model selection
criterion, the penalized maximum likelihood estimator has risk bounded
by an index of resolvability expressing a good trade-off among approximation
error, estimation error, and a model complexity. The bound does not require
any assumption on the target quantity of conditional probability and can
be used to show adaptation property of estimators based on model selection.
As a demonstration, we show that for the case with high feature dimension,
when splines models (with different smoothness orders, numbers of knots,
and interaction orders), neural network models, and sparse subset models
from a multi-indexed basis are considered, the resulting estimator behaves
optimally or near optimally in terms of rates of convergence automatically
over Sobolev classes with unknown orders of interaction and smoothness,
classes of integrable Fourier transform of gradient, and some sparse function
classes as if one knew which of them contains the true conditional probability
in advance. The corresponding classifier also converges optimally or nearly
optimally simultaneously over these classes.
Index Terms: Minimax adaptive estimation, minimax rates of convergence,
model selection, nonparametric classification, neural networks, resolvability,
sparse approximation, wavelets.
Copies of preprints are available from the author upon request. Use the preprint number (located at the top of the page) and make the request directly to the author, Iowa State University, Department of Statistics, Snedecor Hall, Ames, IA 50011-1210.