%A P. S. Bradley
%T Mathematical Programming Approaches to Machine Learning and Data Mining
%D September 1998
%R 98-11
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X
Machine learning problems of supervised classification, unsupervised
clustering and parsimonious
approximation
are formulated as mathematical programs.
The feature selection problem arising in the supervised classification
task is
effectively addressed by calculating a separating plane
by minimizing
separation error and the number of problem
features utilized.
The support vector machine approach is formulated using various norms
to measure the margin of separation.
The clustering problem of
assigning $m$ points in $n$-dimensional real space to $k$ clusters is
formulated as minimizing a piecewise-linear concave function over a
polyhedral set.
This problem is also formulated in
a novel fashion by minimizing the sum of squared distances of data
points to
nearest cluster planes characterizing the $k$ clusters.
The problem of obtaining a parsimonious solution to a
linear system
where the right hand side vector may be corrupted
by noise is formulated as minimizing the system residual plus either
the number of nonzero elements in the solution vector or the norm of
the solution vector.
The feature selection problem,
the clustering problem and the parsimonious approximation
problem can all be stated as the minimization of a concave function over a
polyhedral region and
are solved by a
theoretically justifiable, fast and finite successive linearization
algorithm.
Numerical tests indicate the utility and efficiency of these
formulations on real-world databases.
In particular, the feature selection
approach via concave minimization computes a separating-plane based
classifier that improves upon the generalization ability of a
separating plane computed
without feature suppression. This approach produces
classifiers utilizing fewer original problem features than the support
vector machine approaches, with comparable generalization ability.
The clustering techniques are shown to be effective and efficient data
mining tools in medical
survival analysis applications.
The parsimonious approximation methods yield improved results in a
signal processing application, with high signal to noise ratio, over
least squares and a lengthy combinatorial search.
These results support the claim that
mathematical programming is effective as the basis of
data mining tools to
extract patterns from a database which contain ``knowledge'' and thus
achieve ``knowledge discovery in databases''.