%A P. S. Bradley %T Mathematical Programming Approaches to Machine Learning and Data Mining %D September 1998 %R 98-11 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Machine learning problems of supervised classification, unsupervised clustering and parsimonious approximation are formulated as mathematical programs. The feature selection problem arising in the supervised classification task is effectively addressed by calculating a separating plane by minimizing separation error and the number of problem features utilized. The support vector machine approach is formulated using various norms to measure the margin of separation. The clustering problem of assigning $m$ points in $n$-dimensional real space to $k$ clusters is formulated as minimizing a piecewise-linear concave function over a polyhedral set. This problem is also formulated in a novel fashion by minimizing the sum of squared distances of data points to nearest cluster planes characterizing the $k$ clusters. The problem of obtaining a parsimonious solution to a linear system where the right hand side vector may be corrupted by noise is formulated as minimizing the system residual plus either the number of nonzero elements in the solution vector or the norm of the solution vector. The feature selection problem, the clustering problem and the parsimonious approximation problem can all be stated as the minimization of a concave function over a polyhedral region and are solved by a theoretically justifiable, fast and finite successive linearization algorithm. Numerical tests indicate the utility and efficiency of these formulations on real-world databases. In particular, the feature selection approach via concave minimization computes a separating-plane based classifier that improves upon the generalization ability of a separating plane computed without feature suppression. This approach produces classifiers utilizing fewer original problem features than the support vector machine approaches, with comparable generalization ability. The clustering techniques are shown to be effective and efficient data mining tools in medical survival analysis applications. The parsimonious approximation methods yield improved results in a signal processing application, with high signal to noise ratio, over least squares and a lengthy combinatorial search. These results support the claim that mathematical programming is effective as the basis of data mining tools to extract patterns from a database which contain ``knowledge'' and thus achieve ``knowledge discovery in databases''.