%A O. L. Mangasarian
%T Mathematical Programming in Data Mining
%D August 1996 - Revised November 1996 and March 1997
%R 96-05
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Mathematical programming approaches to three fundamental problems
will be described: feature selection, clustering and robust
representation.
The feature selection problem is characterized by optimization methods
that recognize irrelevant and redundant features and suppress them. This
creates a lean model that often generalizes better to new unseen data.
Clustering is exemplified by the unsupervised learning of patterns and
clusters that may exist in various databases. A mathematical
programming
formulation of this problem is proposed that is theoretically
justifiable and practically implementable.
Robust representation is concerned with minimizing
trained model degradation when applied to new problems.
A novel approach is proposed that purposely tolerates
a small error in the training process in order to avoid
overfitting data that may contain errors.
Examples of applications of these concepts are given.