The following papers are available via ftp from the Machine Learning Research Group of the University of Wisconsin, Madison. These papers can be retrieved via a WWW browser, such as Mosaic, or by anonymous ftp. The URL for our publications page is: http://www.cs.wisc.edu/~shavlik/publications.html The URL for our group's "home" page is: http://www.cs.wisc.edu/~shavlik/uwml.html The URL for retrieving these papers via anonymous ftp is: ftp://ftp.cs.wisc.edu/machine-learning/shavlik-group/ If your computer cannot uncompress ".Z" files, simply leave off the ".Z" when you request the ftp transfer; our ftp server will then uncompress the file before transmitting it. Email shavlik@cs.wisc.edu if you have any major problems. ------ THESES ------ @phdthesis{dimaio.thesis ,author = "Frank DiMaio" ,title = "Probabilistic Methods for Interpreting Electron-Density Maps" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "2007" ,filename = "dimaio.thesis.pdf dimaio.thesis.ppt" ,code = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/" ,note = "(Also appears as UW Technical Report CS-TR-07-1617)" ,abstract= "With recent advances in structural genomics, there has been considerable interest in the rapid determination of protein structures in a high-throughput setting. One bottleneck in this process arises in protein crystallography, and deals with interpretation of the electron-density map, the three-dimensional 'picture' of the protein that crystallography produces. This thesis presents a novel solution to this important problem of electron-density map interpretation. I apply probabilistic methods to automate the interpretation of poor-quality electron-density maps.

I show my probabilistic approach to density-map interpretation leads to more complete and more accurate protein models, in terms of the fraction of the protein automatically interpreted, as well as the RMS error of my method's inferred models versus 'ground truth' (the deposited structure), than do other automated approaches. My probabilistic approach is also amenable to production of multiple protein models that explain the observed density. I show that multiple static conformations generated by my framework do a better job of explaining the observed density than does a single structure, based on the R-free metric, which measures the difference between observed and predicted crystallographic reflection data on a testset of held-aside data. My method accurately interprets 3-4A density maps, further extending the resolution of density maps that can be automatically interpreted.

This thesis also describes several computational contributions. I describe a significant improvement over previous work in three-dimensional template matching in electron-density maps. I use the spherical-harmonic decomposition of a template to rapidly search for all rotations of the template. This offers both improved efficiency and accuracy compared to previous work, producing better models in 60% of the running time. I present a novel joint type as well as improved methods for collision-handling in part-based object-recognition. Finally, I present a general part-based object-recognition framework specialized for identifying topologically complex objects in large three-dimensional images. My framework introduces an algorithm that improves the efficiency of current probabilistic inference algorithms. This improved efficiency allows recognition of objects with hundreds of parts. Although originally developed for density-map interpretation, these computational contributions may be beneficial in other problem domains." } @phdthesis{molla.thesis ,author = "Michael Molla" ,title = "Novel Uses for Machine Learning and Other Computational Methods for the Design and Interpretation of Genetic Microarrays" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "2007" ,filename = "molla.thesis.pdf molla.thesis.ppt" ,note = "(Also appears as UW Technical Report CS-TR-07-1612)" ,abstract= "It is clear that high-throughput techniques, such as rapid DNA sequencing and gene chips are changing the science of genetics. Hypothesis-driven science is now strongly complemented by these newer data-driven approaches. Over the course of the past decade, DNA microarrays, also known as gene chips, have come into prominence for genetic-level analysis throughout the life sciences. Using these microarrays, a scientist is able to perform hundreds of thousands of experiments on the surface of a single one-inch-by-one-inch wafer in the space of a single afternoon, generating more data than an army of researchers could have a generation ago. This potential flood of data brings many informatic challenges in both analysis and design. It is well understood that computer science will play a crucial role in their development and application. This thesis presents novel applications of machine learning and other computational methods to central tasks in highthroughput biology. These tasks include gene-chip design, detection of genomic variation, and the interpretation of gene-expression patterns." } @phdthesis{goadrich.thesis ,author = "Mark Goadrich" ,title = "Learning Ensembles of First-Order Clauses that Optimize Precision-Recall Curves" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "2007" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/IE-protein-location/" ,filename = "goadrich.thesis.pdf goadrich.thesis.ppt" ,note = "(Also appears as UW Technical Report CS-TR-07-1609)" ,abstract= "Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data, such as biomedical information extraction, citation matching, and learning relationships in social networks. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy, and to examine their tradeoffs by plotting a precision-recall curve. The goal of this thesis is to find new approaches within ILP particularly suited for large, highly skewed domains. I propose and investigate Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs thresholding methods to combine sets of selected clauses. I compare Gleaner to ensembles of standard theories learned by Aleph, a standard ILP algorithm, using a number of large relational domains. I find that Gleaner produces comparable testset results in a fraction of the training time and outperforms Aleph ensembles when given the same amount of training time. I explore extensions to Gleaner with respect to searching and combining clauses, namely find- ing ways to fully explore the hypothesis space as well as to make better use of those found clauses. I also use Gleaner to estimate the probability that a query is true, further investigate the properties underlying precision-recall curves, and then conclude with a discussion of future work in this area." } @mastersthesis{geisler.thesis ,author = "Benjamin Geisler" ,title = "An Empirical Study of Machine Learning Algorithms Applied to Modeling Player Behavior in a 'First Person Shooter' Video Game" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "2002" ,filename = "geisler.thesis.pdf geisler.thesis.ps" ,abstract= "Modern video games present many challenging applications for artificial intelligence. Agents must not only appear intelligent but must also be fun to play against. In the video game genre of the first person shooter an agent must mimic all the behaviors of a human soldier in a combat situation. The standard opponent in a 'first person shooter' uses a finite-state machine and a series of hand coded rules. Drawbacks of this system include a high level of predictability of opponents and a large amount of work manually programming each rule. Modern advances in machine learning have enabled agents to accurately learn rules from a set of examples. By sampling data from an expert player we use these machine learning algorithms in a first person shooter. With this system in place, the programmer has less work when hand coding the combat rules and the learned behaviors are often more unpredictable and life-like than any hard-wired finite state machine. This thesis explores several popular machine learning algorithms and shows how these algorithms can be applied to the game. The empirical study includes decision trees, Na�e Bayes classifiers, neural networks, and neural networks trained using boosting and bagging methods. We show that a subset of AI behaviors can be learned by player modeling using machine learning techniques. Under this system we have successfully been able to learn the combat behaviors of an expert player and apply them to an agent in a modified version of the video game Soldier of Fortune 2. The following tasks were learned: speed of acceleration, direction of movement, direction of facing, and jumping. We evaluate both empirically and aesthetically which learner performed the best and make recommendations for the future. We also have created a system which uses these learned behaviors in a finite-state system within the game at real time." } @phdthesis{eliassi-rad.thesis ,author = "Tina Eliassi-Rad" ,title = "Building Intelligent Agents that Learn to Retrieve and Extract Information" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "2001" ,filename = "eliassi-rad.thesis.pdf eliassi-rad.thesis.ps" ,note = "(Also appears as UW Technical Report CS-TR-01-1431)" ,abstract= "The rapid growth of on-line information has created a surge of interest in tools that are able to retrieve and extract information from on-line documents. In this thesis, I present and evaluate a computer system that rapidly and easily builds instructable and self-adaptive software agents for both the information retrieval (IR) and the information extraction (IE) tasks. My system is called WAWA (short for Wisconsin Adaptive Web Assistant). WAWA interacts with the user and an on-line (textual) environment (e.g., the Web) to build an intelligent agent for retrieving and extracting information. WAWA has two sub-systems: (i) an information retrieval (IR) sub-system, called WAWA-IR; and, (ii) an information extraction (IE) sub-system, called WAWA-IE. WAWA-IR is a general search-engine agent, which can be trained to produce specialized and personalized IR agents. WAWA-IE is a general extractor system, which creates specialized agents that accurately extract pieces of information from documents in the domain of interest. WAWA utilizes a theory-refinement approach to build its intelligent agents. There are four four primary advantages of using such an approach. First, WAWA's agents are able to perform reasonably well initially because they are able to utilize users' prior knowledge. Second, users' prior knowledge does not have to be correct since it is refined through learning. Third, the use of prior knowledge, plus the continual dialog between the user and an agent, decreases the need for a large number of training examples because training is not limited to a binary representation of positive and negative examples. Finally, WAWA provides an appealing middle ground between non-adaptive agent programming languages and systems that solely learn user preferences from training examples. WAWA's agents have performed quite well in empirical studies. WAWA-IR experiments demonstrate the efficacy of incorporating the feedback provided by the Web into the agent's neural networks to improve the evaluation of potential hyperlinks to traverse. WAWA-IE experiments produce results that are competitive with other state-of-art systems. Moreover, they demonstrate that WAWA-IE agents are able to intelligently and efficiently select from the space of possible extractions and solve multi-slot extraction problems." } @phdthesis{allex.thesis ,author = "Carolyn F. Allex" ,title = "Computational Methods for Fast and Accurate DNA Fragment Assembly" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "1999" ,filename = "allex.thesis99.pdf allex.thesis99.ps" ,note = "(Also appears as UW Technical Report CS-TR-99-1406)" ,abstract= "As advances in technology result in the production of increasing amounts of DNA sequencing data in decreasing amounts of time, developing computational methods that allow data analysis to keep pace is imperative. In this dissertation, I present methods that improve the speed and accuracy of DNA fragment assembly. One critical characteristic of automatic methods for fragment assembly is that they must be accurate. Currently, to ensure accurate sequences, the data that underlies questionable base calls must be examined by human editors so that the correct base call can be determined. This manual process is both error-prone and time-consuming. Automatic methods that yield high accuracy and few questionable calls can reduce errors and lessen the need for manual inspections. In my work, I developed a method, Trace-Evidence, that automatically produces highly accurate consensus sequences, even with few aligned sequences. Most assembly programs analyze only base calls when determining a consensus sequence. The key to the high accuracy is that I incorporate morphological information about the underlying ABI trace data. This is accomplished through a new representation of traces, Trace-Class, that characterizes the height and shape of traces. The new representation not only yields high accuracy when used in consensus-calling methods, but also produces improved results when used in removing poor-quality data, and when used as inputs for neural networks for consensus determination. The need for fast processing is becoming more important as the size of sequencing projects increases. Almost all existing fragment assembly programs perform pairwise comparisons of reads, resulting in execution times proportional to n2, where n is the number of reads. I describe a new algorithm for fragment layout, SLIC, that runs in time proportional to n. SLIC relies on subsequences of bases that occur in overlapping regions of fragment reads. Subsequences that are common to two or more fragment reads are aligned to determine the overall layout of reads. The work I present provides improvements to currently available computational methods for DNA sequencing that can serve as a foundation for further study in developing better solutions to problems in fragment assembly." } @phdthesis{craven.thesis ,author = "Mark W. Craven" ,title = "Extracting Comprehensible Models from Trained Neural Networks" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "1996" ,filename = "craven.thesis.pdf craven.thesis.ps" ,note = "(Also appears as UW Technical Report CS-TR-96-1326)" ,code = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/trepan" ,abstract= "Although neural networks have been used to develop highly accurate classifiers in numerous real-world problem domains, the models they learn are notoriously difficult to understand. This thesis investigates the task of extracting comprehensible models from trained neural networks, thereby alleviating this limitation. The primary contribution of the thesis is an algorithm that overcomes the significant limitations of previous methods by taking a novel approach to the task of extracting comprehensible models from trained networks. This algorithm, called TREPAN, views the task as an inductive learning problem. Given a trained network, or any other learned model, TREPAN uses queries to induce a decision tree that approximates the function represented by the model. Unlike previous work in this area, TREPAN is broadly applicable as well as scalable to large networks and problems with high-dimensional input spaces. The thesis presents experiments that evaluate TREPAN by applying it to individual networks and to ensembles of neural networks trained in classification, regression, and reinforcement-learning domains. These experiments demonstrate that TREPAN is able to extract decision trees that are comprehensible, yet maintain high levels of fidelity to their respective networks. In problem domains in which neural networks provide superior predictive accuracy to conventional decision tree algorithms, the trees extracted by TREPAN also exhibit superior accuracy, but are comparable in terms of complexity, to the trees learned directly from the training data. A secondary contribution of this thesis is an algorithm, called BBP, that constructively induces simple neural networks. The motivation underlying this algorithm is similar to that for TREPAN: to learn comprehensible models in problem domains in which neural networks have an especially appropriate inductive bias. The BBP algorithm, which is based on a hypothesis-boosting method, learns perceptrons that have relatively few connections. This algorithm provides an appealing combination of strengths: it provides learnability guarantees for a fairly natural class of target functions; it provides good predictive accuracy in a variety of problem domains; and it constructs syntactically simple models, thereby facilitating human comprehension of what it has learned. These algorithms provide mechanisms for improving the understanding of what a trained neural network has learned." } @phdthesis{opitz.thesis ,author = "David W. Opitz" ,title = "An Anytime Approach to Connectionist Theory Refinement: Refining the Topologies of Knowledge-Based Neural Networks" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "1995" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/" ,filename = "opitz.thesis.pdf opitz.thesis.ps" ,note = "(Also appears as UW Technical Report CS-TR-95-1281)(" ,abstract= "Many scientific and industrial problems can be better understood by learning from samples of the task at hand. For this reason, the machine learning and statistics communities devote considerable research effort on generating inductive-learning algorithms that try to learn the true ``concept'' of a task from a set of its examples. Often times, however, one has additional resources readily available, but largely unused, that can improve the concept that these learning algorithms generate. These resources include available computer cycles, as well as prior knowledge describing what is currently known about the domain. Effective utilization of available computer time is important since for most domains an expert is willing to wait for weeks, or even months, if a learning system can produce an improved concept. Using prior knowledge is important since it can contain information not present in the current set of training examples.
In this thesis, I present three ``anytime'' approaches to connectionist theory refinement. Briefly, these approaches start by translating a set of rules describing what is currently known about the domain into a neural network, thus generating a knowledge-based neural network (KNN). My approaches then utilize available computer time to improve this KNN by continually refining its weights and topology. My first method, TopGen, searches for good ``local'' refinements to the KNN topology. It does this by adding nodes to the KNN in a manner analogous to symbolically adding rules and conjuncts to an incorrect rule base. My next approach, REGENT, uses genetic algorithms to find better ``global'' changes to this topology. REGENT proceeds by using (a) the domain-specific rules to help create the initial population of KNNs and (b) crossover and mutation operators specifically designed for KNNs. My final algorithm, ADDEMUP, searches for an ``ensemble'' of KNNs that work together to produce an effective composite prediction. ADDEMUP works by using genetic algorithms to continually create new networks, keeping the set of networks that are as accurate as possible while disagreeing with each other as much as possible. Empirical results show that these algorithms successfully achieve each of their respective goals." } @phdthesis{maclin.thesis ,author = "Richard Maclin" ,title = "Learning from Instruction and Experience: Methods for Incorporating Procedural Domain Theories into Knowledge-Based Neural Networks" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "1995" ,filename = "maclin.thesis.firsthalf.ps maclin.thesis.secondhalf.ps maclin.thesis.firsthalf.pdf maclin.thesis.secondhalf.pdf" ,note = "(Also appears as UW Technical Report CS-TR-95-1285)" ,abstract= "This thesis defines and evaluates two systems that allow a teacher to provide instructions to a machine learner. My systems, FSKBANN and RATLE, expand the language that a teacher may use to provide advice to the learner. In particular, my techniques allow a teacher to give partially correct instructions about procedural tasks - tasks that are solved as sequences of steps. FSKBANN and RATLE allow a computer to learn both from instruction and from experience.
Experiments with these systems on several testbeds demonstrate that they produce learners that successfully use and refine the instructions they are given.
In my initial approach, FSKBANN, the teacher provides instructions as a set of propositional rules organized around one or more finite-state automata (FSAs). FSKBANN maps the knowledge in the rules and FSAs into a recurrent neural network. I used FSKBANN to refine the Chou-Fasman algorithm, a method for solving the secondary-structure prediction problem, a difficult task in molecular biology. FSKBANN produces a refined algorithm that outperforms the original (non-learning) Chou-Fasman algorithm, as well as a standard neural-network approach.
My second system, RATLE, allows a teacher to communicate advice, using statements in a simple programming language, to a connectionist, reinforcement-learning agent. The teacher indicates conditions of the environment and actions the agent should take under those conditions. RATLE allows the teacher to give advice continuously by translating the teacher's statements into additions to the agent's neural network. The RATLE language also includes novel (to the theory-refinement literature) features such as multi-step plans and looping constructs. In experiments with RATLE on two simulated testbeds involving multiple agents, I demonstrate that a RATLE agent receiving advice outperforms both an agent that does not receive advice and an agent that receives instruction, but does not refine it.
My methods provide an appealing approach for learning from both instruction and experience in procedural tasks. This work widens the ``information pipeline'' between humans and machine learners, without requiring that the human provide absolutely correct information to the learner." } @phdthesis{towell.thesis ,author = "Geoffrey G. Towell" ,title = "Symbolic Knowledge and Neural Networks: Insertion, Refinement and Extraction" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "1991" ,note = "(Also appears as UW Technical Report 1072)" ,data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/splice-junction-gene-sequences/" ,filename = "towell.thesis.1-2.ps towell.thesis.3.ps towell.thesis.4-7.ps towell.thesis.app.ps towell.thesis.1-2.pdf towell.thesis.3.pdf towell.thesis.4-7.pdf towell.thesis.app.pdf" ,abstract= "Explanation-based and empirical learning are two largely complementary methods of machine learning. These approaches to machine learning both have serious problems which preclude their being a general purpose learning method. However, a ``hybrid'' learning method that combines explanation-based with empirical learning may be able to use the strengths of one learning method to address the weaknesses of the other method. Hence, a system that effectively combines the two approaches to learning can be expected to be superior to either approach in isolation. This thesis describes a hybrid system called KBANN which is shown to be an effective combination of these two learning methods.
KBANN (Knowledge-Based Artificial Neural Networks) is a three-part hybrid learning system built on top of ``neural'' learning techniques. The first part uses a set of approximately-correct rules to determine the structure and initial link weights of an artificial neural network, thereby making the rules accessible for modification by neural learning. The second part of KBANN modifies the resulting network using essentially standard neural learning techniques. The third part of KBANN extracts refined rules from trained networks.
KBANN is evaluated by empirical tests in the domain of molecular biology. Networks created by KBANN are shown to be superior, in terms of their ability to correctly classify unseen examples, to a wide variety of learning systems as well as techniques proposed by experts in the problems investigated. In addition, empirical tests show that KBANN is robust to errors in the initial rules and insensitive to problems resulting from the presence of extraneous input features.
The third part of KBANN, which extracts rules from trained networks, addresses a significant problem in the use of neural networks -- understanding what a neural network learns. Empirical tests of the proposed rule-extraction method show that it simplifies understanding of trained networks by reducing the number of: consequents (hidden units), antecedents (weighted links), and possible antecedent weights. Surprisingly, the extracted rules are often more accurate at classifying examples not seen during training than the trained network from which they came." } @phdthesis{gutstein.thesis ,author = "Eric Gutstein" ,title = "SIFT: A Self-Improving Fractions Tutor" ,school = "Department of Computer Sciences, University of Wisconsin-Madison" ,year = "1993" } ----------------- CONFERENCE PAPERS ----------------- @inproceedings{walker.ilp07 ,filename = "walker.ilp07.pdf " ,author = "T. Walker and L. Torrey and J. Shavlik and R. Maclin" ,title = "Building Relational World Models for Reinforcement Learning" ,booktitle = "Proceedings of the Seventeenth Conference on Inductive Logic Programming" ,year = "2007" ,address = "Corvallis, Oregon" ,abstract = "Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward states, utilizing inductive logic programming to learn their preimage, logical definitions of the region of state space that leads to the high-reward states via some action. These learned preimages are chained together to form an MDP that abstractly represents the domain. AMBIL estimates the reward and transition probabilities of this MDP from past experience. Since our MDPs are small, AMBIL uses value iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. AMBIL is able to employ complex background knowledge and supports relational representations. Empirical evaluation on both synthetic domains and a sub-task of the RoboCup soccer domain shows significant performance gains compared to standard Q-learning." } @inproceedings{oliphant.ilp07 ,filename = "oliphant.ilp07.pdf oliphant.ilp07-presentation.ppt oliphant.ilp07-poster.ps" ,author = "L. Oliphant and J. Shavlik" ,title = "Using Bayesian Networks to Direct Stochastic Search in Inductive Logic Programming" ,booktitle = "Proceedings of the Seventeenth Conference on Inductive Logic Programming" ,year = "2007" ,address = "Corvallis, Oregon" ,abstract = "Stochastically searching the space of candidate clauses is an appealing way to scale up ILP to large datasets. We address an approach that uses a Bayesian network model to adaptively guide search in this space. We examine guiding search towards areas that previously performed well and towards areas that ILP has not yet thoroughly explored. We show improvement in area under the curve for recall-precision curves using these modifications." } @inproceedings{dimaio.bibm07 ,filename = "dimaio.bibm07.pdf" ,author = "F. DiMaio and A. Soni and G. Phillips and J. Shavlik" ,title = "Improved Methods for Template-Matching in Electron-Density Maps Using Spherical Harmonics" ,booktitle = "Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'07)" ,year = 2007 ,address = "Fremont, CA" ,code = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/" ,abstract = "An important problem in high-throughput protein crystallography is constructing a protein model from an electron-density map. Previous work by some of this paper's authors describes an automated approach to this otherwise time-consuming process. An important step in the previous method requires searching the density map for many small template protein-fragments. This previous approach uses Fourier convolution to quickly match some of the template to the entire density map. This paper proposes an alternate approach that makes use of the spherical-harmonic decomposition of the template and of some region in the density map. This new framework allows us to mask specific regions of the map, enabling a first-pass filter to eliminate a majority of the density map without requiring an expensive rotational search. We show our new template matching method improves accuracy and reduces running time, compared to our previous approach. Protein models constructed using this matching also show significant accuracy improvement." } @inproceedings{goadrich.ilp07 ,filename = "goadrich.ilp07.pdf" ,author = "M. Goadrich and J. Shavlik" ,title = "Combining Clauses with Various Precisions and Recalls to Produce Accurate Probabilistic Estimates" ,booktitle = "Proceedings of the Seventeenth Conference on Inductive Logic Programming" ,year = "2007" ,address = "Corvallis, Oregon" ,abstract = "Statistical Relational Learning (SRL) combines the benefits of probabilistic machine learning approaches with complex, structured domains from Inductive Logic Programming (ILP). We propose a new SRL algorithm, GleanerSRL, to generate the probability that an example is positive within highly-skewed relational domains. In this work, we combine clauses from Gleaner, an ILP algorithm for learning a wide variety of first-order clauses, with the propositional learning technique of support vector machines to learn well-calibrated probabilities. We find that our results are comparable to SRL algorithms SAYU and SAYU-VISTA on a well-known relational testbed. " } @inproceedings{torrey.ilp07 ,filename = "torrey.ilp07.pdf torrey.ilp07.ppt torrey.ilp07-poster.pdf" ,author = "L. Torrey and J. Shavlik and T. Walker and R. Maclin" ,title = "Relational Macros for Transfer in Reinforcement Learning" ,booktitle = "Proceedings of the Seventeenth Conference on Inductive Logic Programming" ,year = "2007" ,address = "Corvallis, Oregon" ,abstract = "We describe an application of inductive logic programming to transfer learning. Transfer learning is the use of knowledge learned in a source task to improve learning in a related target task. The tasks we work with are in reinforcement-learning domains. Our approach transfers relational macros, which are finite-state machines in which the transition conditions and the node actions are represented by first-order logical clauses. We use inductive logic programming to learn a macro that characterizes successful behavior in the source task, and then use the macro for decision-making in the early learning stages of the target task. Through experiments in the RoboCup simulated soccer domain, we show that Relational Macro Transfer via Demonstration (RMT-D) from a source task can provide a substantial head start in the target task." } @inproceedings{maclin.aaai07 ,filename = "maclin.aaai07.pdf maclin.aaai07.ppt" ,author = "R. Maclin and E. Wild and J. Shavlik and L. Torrey and T. Walker" ,title = "Refining Rules Incorporated into Knowledge-Based Support Vector Learners Via Successive Linear Programming" ,booktitle = "Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI'07)" ,year = 2007 ,address = "Vancouver, British Columbia" ,abstract = "Knowledge-based classification and regression methods are especially powerful forms of learning. They allow a system to take advantage of prior domain knowledge supplied either by a human user or another algorithm, combining that knowledge with data to produce accurate models. A limitation of the use of prior knowledge occurs when the provided knowledge is incorrect. Such knowledge likely still contains useful information, but knowledge-based learners might not be able to fully exploit such information. In fact, incorrect knowledge can lead to poorer models than result from knowledge-free learners. We present a support-vector method for incorporating and refining domain knowledge that not only allows the learner to make use of that knowledge, but also suggests changes to the provided knowledge. Our approach is built on the knowledge-based classification and regression methods presented by Fung, Mangasarian, & Shavlik (2002; 2003) and by Mangasarian, Shavlik, & Wild (2004). Experiments on artificial data sets with known properties, as well as on a real-world data set, demonstrate that our method learns more accurate models while also adjusting the provided rules in intuitive ways. Our new algorithm provides an appealing extension to knowledge-based, support-vector learning that is not only able to combine knowledge from rules with data, but is also able to use the data to modify and change those rules to better fit the data." } @inproceedings{dimaio.icdm06 ,filename = "dimaio.icdm06.pdf dimaio.icdm06.ppt" ,author = "F. DiMaio and J. Shavlik" ,title = "Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition" ,pages = "845-850" ,year = 2006 ,booktitle = "Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM'06)" ,address = "Hong Kong" ,code = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/" ,note = "A longer version of this paper appears as UW Machine Learning Research Group Working Paper 06-1." ,abstract = "We describe a part-based object-recognition framework, specialized to mining complex 3D objects from detailed 3D images. Objects are modeled as a collection of parts together with a pairwise potential function. An efficient inference algorithm - based on belief propagation (BP) - finds the optimal layout of parts, given some input image. We introduce AggBP, a message aggregation scheme for BP, in which groups of messages are approximated as a single message. For objects consisting of N parts, we reduce CPU time and memory requirements from O(N^2) to O(N). We apply AggBP on synthetic data as well as a real-world task identifying protein fragments in three-dimensional images. These experiments show that our improvements result in minimal loss in accuracy in significantly less time." } @inproceedings{torrey.ecml06 ,filename = "torrey.ecml06.pdf torrey.ecml06.ppt" ,author = "L. Torrey and J. Shavlik and T. Walker and R. Maclin" ,title = "Skill Acquisition via Transfer Learning and Advice Taking" ,pages = "425-436" ,year = 2006 ,code = "http://www.biostat.wisc.edu/~ml-group/RoboCup/" ,booktitle = "Proceedings of the Seventeenth European Conference on Machine Learning (ECML'06)" ,address = "Berlin, Germany" ,abstract = "We describe a reinforcement learning system that transfers skills from a previously learned source task to a related target task. The system uses inductive logic programming to analyze experience in the source task, and transfers rules for when to take actions. The target task learner accepts these rules through an advice-taking algorithm, which allows learners to benefit from outside guidance that may be imperfect. Our system accepts a human-provided mapping, which specifies the similarities between the source and target tasks and may also include advice about the differences between them. Using three tasks in the RoboCup simulated soccer domain, we demonstrate that this system can speed up reinforcement learning substantially." } @inproceedings{chen.vldb06 ,filename = "chen.vldb06.pdf" ,author = "B. Chen and R. Ramakrishnan and J. Shavlik and P. Tamma" ,title = "Bellwether Analysis: Predicting Global Aggregates from Local Regions" ,pages = "655-666" ,year = 2006 ,booktitle = "Proceedings of the Thirty-Second International Conference on Very Large Data Bases (VLDB'06)" ,address = "Seoul, Korea" ,abstract = "Massive datasets are becoming commonplace in a wide range of domains, and mining them is recognized as a challenging problem with great potential value. Motivated by this challenge, much effort has been concentrated on developing scalable versions of machine learning algorithms. An often overlooked issue is that large datasets are rarely labeled with the outputs that we wish to learn to predict, due to the human labor required. We make the key observation that analysts can often use queries to define labels for cases, which leads to the problem of learning to predict such query-produced labels. Of course, if a dataset is available in its entirety, we can simply run the query again to compute labels. The interesting scenarios are those where, after the predictive model is trained, new data is gathered at significant incremental cost and, perhaps, over time. The challenge is to accurately predict the query-labels for the projected completion of new datasets, based only on certain cost-effective subsets, which we call bellwethers." } @inproceedings{maclin.aaai06 ,filename = "maclin.aaai06.pdf maclin.aaai06.ppt" ,author = "R. Maclin and J. Shavlik and T. Walker and L. Torrey" ,title = "A Simple and Effective Method for Incorporating Advice into Kernel Methods" ,year = 2006 ,booktitle = "Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI'06)" ,address = "Boston, MA" ,abstract = "We propose a simple mechanism for incorporating advice (prior knowledge), in the form of simple rules, into support-vector methods for both classification and regression. Our approach is based on introducing inequality constraints associated with datapoints that match the advice. These constrained datapoints can be standard examples in the training set, but can also be unlabeled data in a semi-supervised, advice-taking approach. Our new approach is simpler to implement and more efficiently solved than the knowledge-based support vector classification methods of Fung, Mangasarian and Shavlik (2002; 2003) and the knowledge-based support vector regression method of Mangasarian, Shavlik, and Wild (2004), while performing approximately as well as these more complex approaches. Experiments using our new approach on a synthetic task and a reinforcementlearning problem within the RoboCup soccer simulator show that our advice-taking method can significantly outperform a method without advice and perform similarly to prior advice-taking, support-vector machines." } @inproceedings{davis.icml06 ,filename = "davis.icml06.pdf" ,author = "J. Davis and M. Goadrich" ,title = "The Relationship Between Precision-Recall and ROC Curves" ,year = 2006 ,booktitle = "Proceedings of the Twenty-Third International Conference on Machine Learning (ICML'06)" ,address = "Pittsburgh, PA" ,abstract = "Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm's performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve." } @inproceedings{torrey.ecml05 ,filename = "torrey.ecml05.pdf torrey.ecml05.ppt" ,author = "L. Torrey and T. Walker and J. Shavlik and R. Maclin" ,title = "Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another" ,pages = "412-424" ,year = 2005 ,booktitle = "Proceedings of the Sixteenth European Conference on Machine Learning (ECML'05)" ,address = "Porto, Portugal" ,abstract = "We present a method for transferring knowledge learned in one task to a related task. Our problem solvers employ reinforcement learning to acquire a model for one task. We then transform that learned model into advice for a new task. A human teacher provides a mapping from the old task to the new task to guide this knowledge transfer. Advice is incorporated into our problem solver using a knowledge-based support vector regression method that we previously developed. This advice-taking approach allows the problem solver to refine or even discard the transferred knowledge based on its subsequent experiences. We empirically demonstrate the effectiveness of our approach with two games from the RoboCup soccer simulator: KeepAway and BreakAway. Our results demonstrate that a problem solver learning to play BreakAway using advice extracted from KeepAway outperforms a problem solver learning without the benefit of such advice." } @inproceedings{torrey.kcap05 ,filename = "torrey.kcap05.pdf" ,author = "L. Torrey and T. Walker and J. Shavlik and R. Maclin" ,title = "Knowledge Transfer Via Advice Taking" ,pages = "217-218" ,year = "2005" ,booktitle = "Proceedings of the Third International Conference on Knowledge Capture (KCAP'05)" ,address = "Banff, Canada" ,abstract = "We present a framework for knowledge transfer from one reinforcement learning task to a related task through advicetaking mechanisms. We discuss the importance of transfer in complex domains such as RoboCup soccer, and show how touse automatically generated advice to perform transfer." } @inproceedings{bravo.ilp05 ,filename = "bravo.ilp05.pdf" ,author = "H.C. Bravo and D. Page and R. Ramakrishnan and J. Shavlik and V.S. Costa" ,title = "A Framework for Set-Oriented Computation in Inductive Logic Programming and its Application in Generalizing Inverse Entailment" ,booktitle = "Proceedings of the Fifteenth International Conference on Inductive Logic Programming" ,pages = "69-86" ,year = "2005" ,address = "Bonn, Germany" ,abstract = "We propose a new approach to Inductive Logic Programming that systematically exploits caching and offers a number of advantages over current systems. It avoids redundant computation, is more amenable to the use of set-oriented generation and evaluation of hypotheses, and allows relational DBMS technology to be more easily applied to ILP systems. Further, our approach opens up new avenues such as probabilistically scoring rules during search and the generation of probabilistic rules. As a first example of the benefits of our ILP framework, we propose a scheme for defining the hypothesis search space through Inverse Entailment using multiple example seeds." } @inproceedings{maclin.aaai05 ,filename = "maclin.aaai05.pdf maclin.aaai05.ppt" ,author = "R. Maclin and J. Shavlik and L. Torrey and T. Walker and E. Wild" ,title = "Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression" ,booktitle = "Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI'05)" ,pages = "819-824" ,year = 2005 ,code = "http://www.biostat.wisc.edu/~ml-group/RoboCup/" ,address = "Pittsburgh, PA" ,abstract = "We present a novel formulation for providing advice to a reinforcement learner that employs support-vector regression as its function approximator. Our new method extends a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), that accepts advice concerning a single action of a reinforcement learner. In KBKR, users can say that in some set of states, an action's value should be greater than some linear expression of the current state. In our new technique, which we call Preference KBKR ( Pref-KBKR ) , the user can provide advice in a more natural manner by recommending that some action is preferred over another in the specified set of states. Specifying preferences essentially means that users are giving advice about policies rather than Q values, which is a more natural way for humans to present advice. We present the motivation for preference advice and a proof of the correctness of our extension to KBKR. In addition, we show empirical results that our method can make effective use of advice on a novel reinforcement-learning task, based on the RoboCup simulator, which we call Breakaway. Our work demonstrates the significant potential of advice-giving techniques for addressing complex reinforcement learning problems, while further demonstrating the use of support-vector regression for reinforcement learning." } @inproceedings{davis.ijcai05 ,filename = "davis.ijcai05.pdf" ,author = "J. Davis and E. Burnside and I. Dutra and D. Page and R. Ramakrishnan and V. Santos Costa and J. Shavlik" ,title = "View Learning for Statistical Relational Learning: With an Application to Mammography" ,booktitle = "Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence" ,pages = "677-683" ,year = "2005" ,address = "Edinburgh, Scotland" ,abstract = "Statistical relational learning (SRL) constructs probabilistic models from relational databases. A key capability of SRL is the learning of arcs (in the Bayes net sense) connecting entries in different rows of a relational table, or in different tables. Nevertheless, SRL approaches currently are constrained to use the existing database schema. For many database applications, users find it profitable to define alternative ''views'' of the database, in effect defining new fields or tables. Such new fields or tables can also be highly useful in learning. We provide SRL with the capability of learning new views." } @inproceedings{dimaio-nips05 ,filename = "dimaio.nips05.pdf dimaio.nips05.doc dimaio.nips05-poster.pdf" ,author = "F. DiMaio and J. Shavlik and G. Phillips" ,title = "Pictorial Structures for Molecular Modeling: Interpreting Density Maps" ,booktitle = "Advances in Neural Information Processing Systems (NIPS) 17" ,year = "2005" ,pages = "369-376" ,address = "Vancouver, Canada" ,abstract = "X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. Matching this pictorial structure into the density map is a way of automating density-map interpretation. Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. DEFT is tested on a set of density maps ranging from 2 to 4�resolution, producing root-mean-squared errors ranging from 1.38 to 1.84�" } @inproceedings{goadrich-ilp04 ,filename = "goadrich.ilp04.pdf goadrich.ilp04.ppt" ,author = "M. Goadrich and L. Oliphant and J. Shavlik" ,title = "Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction" ,booktitle = "Proceedings of the Fourteenth International Conference on Inductive Logic Programming" ,year = "2004" ,pages = "98-115" ,address = "Porto, Portugal" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/IE-protein-location/" ,abstract = "Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as biomedical journals, and putting those facts in an organized system. In particular, we have focused on learning to recognize instances of the protein-localization relationship in Medline abstracts. We view the problem as a machine-learning task: given positive and negative extractions from a training corpus of abstracts, learn a logical theory that performs well on a held-aside testing set. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. We propose Gleaner, a randomized search method which collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an ''at least N of these M clauses'' thresholding method to combine the selected clauses. We compare Gleaner to ensembles of standard Aleph theories and find that Gleaner produces comparable testset results in a fraction of the training time needed for ensembles." } @inproceedings{dimaio-ilp04 ,filename = "dimaio.ilp04.pdf dimaio.ilp04.ppt dimaio.ilp04.doc" ,author = "F. DiMaio and J. Shavlik" ,title = "Learning an Approximation to Inductive Logic Programming Clause Evaluation" ,booktitle = "Proceedings of the Fourteenth International Conference on Inductive Logic Programming" ,year = "2004" ,pages = "80-97" ,address = "Porto, Portugal" ,abstract = "One challenge faced by many Inductive Logic Programming (ILP) systems is poor scalability to problems with large search spaces and many examples. Randomized search methods such as stochastic clause selection (SCS) and rapid random restarts (RRR) have proven somewhat successful at addressing this weakness. However, on datasets where hypothesis evaluation is computationally expensive, even these algorithms may take unreasonably long to discover a good solution. We attempt to improve the performance of these algorithms on datasets by learning an approximation to ILP hypothesis evaluation. We generate a small set of hypotheses, uniformly sampled from the space of candidate hypotheses, and evaluate this set on actual data. These hypotheses and their corresponding evaluation scores serve as training data for learning an approximate hypothesis evaluator. We outline three techniques that make use of the trained evaluation-function approximator in order to reduce the computation required during an ILP hypothesis search. We test our approximate clause evaluation algorithm using the popular ILP system Aleph. Empirical results are provided on several benchmark datasets. We show that the clause evaluation function can be accurately approximated." } @inproceedings{shavlik-kdd04 ,filename = "shavlik-kdd04.pdf shavlik-kdd04.doc skavlik-kdd04.ppt" ,author = "J. Shavlik and M. Shavlik" ,title = "Selection, Combination, and Evaluation of Effective Software Sensors for Detecting Abnormal Computer Usage" ,booktitle = "Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining" ,year = "2004" ,pages = "276-285" ,address = "Seattle, WA" ,abstract = "We present and empirically analyze a machine-learning approach for detecting intrusions on individual computers. Our Winnow-based algorithm continually monitors user and system behavior, recording such properties as the number of bytes transferred over the last 10 seconds, the programs that currently are running, and the load on the CPU. In all, hundreds of measurements are made and analyzed each second. Using this data, our algorithm creates a model that represents each particular computer's range of normal behavior. Parameters that determine when an alarm should be raised, due to abnormal activity, are set on a per-computer basis, based on an analysis of training data. A major issue in intrusion-detection systems is the need for very low false-alarm rates. Our empirical results suggest that it is possible to obtain high intrusion-detection rates (95%) and low false-alarm rates (less than one per day per computer), without ''stealing'' too many CPU cycles (less than 1%). We also report which system measurements are the most valuable in terms of detecting intrusions. A surprisingly large number of different measurements prove significantly useful." } @inproceedings{molla.csb04 ,filename = "molla.csb04.pdf molla.csb04.doc molla.csb04.ppt" ,author = "M. Molla and J. Shavlik and T. Albert and T. Richmond and S. Smith" ,title = "A Self-Tuning Method for One-Chip SNP Identification" ,booktitle = "Proceedings of the IEEE Conference on Computational Systems Bioinformatics" ,year = "2004" ,pages = "69-79" ,address = "Stanford, CA" ,abstract = "Current methods for interpreting oligonucleotide-based SNP-detection microarrays, SNP chips, are based on statistics and require extensive parameter tuning as well as extremely high-resolution images of the chip being processed. We present a method, based on a simple data-classification technique called nearest-neighbors that, on haploid organisms, produces results comparable to the published results of the leading statistical methods and requires very little in the way of parameter tuning. Furthermore, it can interpret SNP chips using lower-resolution scanners of the type more typically used in current microarray experiments. Along with our algorithm, we present the results of a SNP-detection experiment where, when independently applying this algorithm to six identical SARS SNP chips, we correctly identify all 24 SNPs in a particular strain of the SARS virus, with between 6 and 13 false positives across the six experiments." } @inproceedings{baiao.ilp03 ,filename = "baiao.ilp03.pdf baiao.ilp03.ps baiao.ilp03.doc" ,author = "F. Baiao and M. Mattoso and J. Shavlik and G. Zaverucha" ,title = "Applying Theory Revision to the Design of Distributed Databases" ,booktitle = "Proceedings of the Thirteenth International Conference on Inductive Logic Programming" ,pages = "57-74" ,year = "2003" ,address = "Szeged, Hungary" ,abstract = "This work presents the application of theory revision to the design of distributed databases to automatically revise a heuristic-based algorithm (called analysis algorithm) through the use of the FORTE system. The analysis algorithm decides the fragmentation technique to be used in each class of the database and its Prolog implementation is provided as the initial domain theory. Fragmentation schemas with previously known performance, obtained from experimental results on top of an object database benchmark, are provided as the set of examples. We show the effectiveness of our approach in finding better fragmentation schemas with improved performance." } @inproceedings{fung.colt03 ,filename = "fung.colt03.pdf fung.colt03.ps" ,author = "G. Fung and O. Mangasarian and J. Shavlik" ,title = "Knowledge-Based Nonlinear Kernel Classifiers" ,booktitle = "16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines, Proceedings" ,year = "2003" ,address = "Washington, DC" ,abstract = "Prior knowledge in the form of multiple polyhedral sets, each belonging to one of two categories, is introduced into a reformulation of a nonlinear kernel support vector machine (SVM) classifier. The resulting formulation leads to a linear program that can be solved efficiently. This extends, in a rather unobvious fashion, previous work that incorporated similar prior knowledge into a linear SVM classifier. Numerical tests on standard-type test problems, such as exclusive-or prior knowledge sets and a checkerboard with 16 points and prior knowledge instead of the usual 1000 points, show the effectiveness of the proposed approach in generating sharp nonlinear classifiers based mostly or totally on prior knowledge." } @inproceedings{dutra.euro03 ,filename = "dutra.euro03.pdf dutra.euro03.ps" ,author = "I. Dutra and D. Page and V. Santos Costa and J. Shavlik and M. Waddell" ,title = "Toward Automatic Management of Embarrassingly Parallel Applications" ,booktitle = "Proceedings of International Conference on Parallel and Distributed Computing (Euro-Par)" ,year = "2003" ,address = "Klagenfurt, Austria" ,abstract = "Large-scale applications that require executing very large numbers of tasks are only feasible through parallelism. In this work we present a system that automatically handles large numbers of experiments and data in the context of machine learning. Our system controls all experiments, including re-submission of failed jobs and relies on available resource managers to spawn jobs through pools of machines. Our results show that we can manage a very large number of experiments, using a reasonable amount of idle CPU cycles, with very little user intervention." } @inproceedings{fung.nips02 ,filename = "fung.nips02.pdf fung.nips02.ps" ,author = "G. Fung and O. Mangasarian and J. Shavlik" ,title = "Knowledge-Based Support Vector Machine Classifiers" ,booktitle = "Proceedings of Sixteenth Conference on Neural Information Processing Systems (NIPS)" ,year = "2002" ,address = "Vancouver, Canada" ,abstract = "Prior knowledge in the form of multiple polyhedral sets, each belonging to one of two categories, is introduced into a reformulation of a linear support vector machine classifier. The resulting formulation leads to a linear program that can be solved efficiently. Real world examples, from DNA sequencing and breast cancer prognosis, demonstrate the effectiveness of the proposed method. Numerical results show improvement in the test set accuracy after the incorporation of prior knowledge into ordinary, data-based linear support vector machine classifiers. One experiment also shows that a linear classifier, based solely on prior knowledge, far outperforms the direct application of prior knowledge rules to classify data." } @inproceedings{dutra.ilp02 ,filename = "dutra.ilp02.pdf dutra.ilp02.ps" ,author = "I. Dutra and D. Page and V. Santos Costa and J. Shavlik" ,title = "An Empirical Evaluation of Bagging in Inductive Logic Programming" ,booktitle = "Proceedings of the Twelfth International Conference on Inductive Logic Programming" ,year = "2002" ,pages = "48--65" ,address = "Sydney, Australia" ,abstract = "Ensembles have proven useful for a variety of applications, with a variety of machine learning approaches. While Quinlan has applied boosting to FOIL, the widely-used approach of bagging has never been employed in ILP. Bagging has the advantage over boosting that the different members of the ensemble can be learned and used in parallel. This advantage is especially important for ILP where run-times often are high. We evaluate bagging on three different application domains using the complete-search ILP system, Aleph. We contrast bagging with an approach where we take advantage of the non-determinism in ILP search, by simply allowing Aleph to run multiple times, each time choosing ''seed'' examples at random." } @inproceedings{shavlik.raid01 ,filename = "shavlik.raid01.pdf" ,author = "J. Shavlik and M. Shavlik and M. Fahland" ,title = "Evaluating Software Sensors for Actively Profiling Windows 2000 Computer Users" ,booktitle = "Presented at the Fourth International Symposium on Recent Advances in Intrusion Detection" ,year = "2001" ,address = "Davis, CA" ,abstract = "We report on a new, on-going intrusion-detection project that empirically investigates the usefulness of ''stealing'' a small amount of CPU cycles (1%), main memory (16MB), and disk memory (100 MB) in order to continually gather and analyze dozens of fine-grained system measurements, such as network traffic, identity of the current programs executing, and the user's typing speed. The underlying scientific hypothesis is that a properly chosen set of measurements can provide a ''fingerprint'' that is unique to each user. Hence, such measurements could serve to help distinguish appropriate use of a given computer from misuse, especially by insiders. " } @inproceedings{eliassi-rad.icml01 ,filename = "eliassi-rad.icml01.pdf eliassi-rad.icml01.ps" ,author = "T. Eliassi-Rad and J. Shavlik" ,title = "A Theory-Refinement Approach to Information Extraction" ,booktitle = "Proceedings of the Eighteenth International Conference on Machine Learning" ,year = "2001" ,address = "Williamstown, MA" ,abstract = "We investigate applying theory refinement to the task of extracting information from text. In theory refinement, partial domain knowledge (which may be incorrect) is given to a supervised learner. The provided knowledge guides the learner in its task, but the learner can refine or even discard this knowledge during training. Our supervised learner is a knowledge-based neural network that initially contains compiled prior knowledge about a particular information extraction (IE) task. The prior knowledge needs to specify the extraction slots for the specific IE task. Our approach uses generate-and-test to address the IE task. In the generation step, we produce candidate extractions by intelligently searching the space of possible extractions. In the test step, we use the trained network to judge each candidate and output those that exceed a system-selected threshold. Experiments on the CMU seminar-announcements and the Yeast subcellular-localization domains demonstrate our approach's value." } @inproceedings{craven.ismb2000 ,filename = "craven-ismb00.pdf craven-ismb00.ps" ,author = "M. Craven and D. Page and J. Shavlik and J. Bockhorst and J. Glasner" ,title = "Probabilistic Learning Approach to Whole-Genome Operon Prediction" ,booktitle = "Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology" ,year = 2000 ,address = "La Jolla, CA" ,abstract = "We present a computational approach to predicting operons in the genomes of prokaryotic organisms. Our approach uses machine learning methods to induce predictive models for this task from a rich variety of data types including sequence data, gene expression data, and functional annotations associated with genes. We use multiple learned models that individually predict promoters, terminators and operons themselves. A key part of our approach is a dynamic programming method that uses our predictions to map every known and putative gene in a given genome into its most probable operon. We evaluate our approach using data from the E. coli K-12 genome." } @inproceedings{craven.icml2000 ,filename = "craven-icml00.pdf craven-icml00.ps" ,author = "M. Craven and D. Page and J. Shavlik and J. Bockhorst and J. Glasner" ,title = "Using Multiple Levels of Learning and Diverse Evidence Sources to Uncover Coordinately Controlled Genes" ,booktitle = "Proceedings of the Seventeenth International Conference on Machine Learning" ,year = 2000 ,address = "Palo Alto, CA" ,abstract = "Now that the complete genomes of numerous organisms have been determined, a key problem in computational molecular biology is uncovering the relationships that exist among the genes in each organism and the regulatory mechanisms that control their operation. We are developing computational methods for discovering such regulatory mechanisms and relationships. Toward this end, we have developed a machine learning approach to identifying sets of genes that are coordinately controlled in the E. coli genome. A number of factors make this an interesting application for machine learning: (i) there is a rich variety of data types that provide useful evidence for this task, (ii) the overall problem of uncovering regulatory mechanisms can be decomposed in multiple machine learning subtasks operating at different levels of detail, (iii) there are not any known negative training examples, and (iv) some of the features are misleading in their predictiveness." } @inproceedings{goecks.iui2000 ,filename = "goecks-iui2000.doc goecks-iui2000.html" ,author = "J. Goecks and J. Shavlik" ,title = "Learning Users' Interests by Unobtrusively Observing Their Normal Behavior" ,booktitle = "Proceedings of the 2000 International Conference on Intelligent User Interfaces" ,year = 2000 ,address = "New Orleans, LA" ,pages = "129-132" ,abstract = "For intelligent interfaces attempting to learn a user's interests, the cost of obtaining labeled training instances is prohibitive because the user must directly label each training instance, and few users are willing to do so. We present an approach that circumvents the need for human-labeled pages. Instead, we learn ``surrogate'' tasks where the desired output is easily measured, such as the number of hyperlinks clicked on a page or the amount of scrolling performed. Our assumption is that these outputs will highly correlate with the user's interests. In other words, by unobtrusively ``observing'' the user's behavior we are able to learn functions of value. For example, an intelligent browser could silently observe the user's browsing behavior during the day, then use these training examples to learn such functions and gather, during the middle of the night, pages that are likely to be of interest to the user. Previous work has focused on learning a user profile by passively observing the hyperlinks clicked on and those passed over. We extend this approach by measuring user mouse and scrolling activity in addition to user browsing activity. We present empirical results that demonstrate our agent can accurately predict some easily measured aspects of one's use of his or her browser." } @inproceedings{shavlik.iui99 ,filename = "shavlik.iui99.pdf shavlik.iui99.ps" ,author = "J. Shavlik and S. Calcari and T. Eliassi-Rad and J. Solock" ,title = "An Instructable, Adaptive Interface for Discovering and Monitoring Information on the World-Wide Web" ,booktitle = "Proceedings of the 1999 International Conference on Intelligent User Interfaces" ,year = 1999 ,address = "Redondo Beach, CA" ,pages = "157 - 160" ,abstract = "We are creating a customizable, intelligent interface to the World-Wide Web that assists a user in locating specific, current, and relevant information. The Wisconsin Adaptive Web Assistant (WAWA) is capable of accepting instructions regarding what type of information the users are seeking and how to go about looking for it. WAWA compiles these instructions into neural networks, which means that the system's behavior can be modified via training examples. Users can create these training examples by rating pages retrieved by WAWA, but more importantly the system uses techniques from reinforcement learning to internally create its own examples (users can also later provide additional instructions). WAWA uses these neural networks to guide its autonomous navigation of the Web, thereby producing an interface to the Web that users periodically instruct and which in the background searches the Web for relevant information, including periodically revisiting pages that change regularly." } @inproceedings{allex.ismb97 ,filename = "allex.ismb97.pdf allex.ismb97.ps" ,author = "C. Allex and S. Baldwin and J. Shavlik and F. Blattner" ,title = "Increasing Consensus Accuracy in DNA Fragment Assemblies by Incorporating Fluorescent Trace Representations" ,booktitle = "Proceedings, Fifth International Conference on Intelligent Systems for Molecular Biology" ,year = 1997 ,pages = "" ,publisher = "AAAI Press" ,pub_addr = "Menlo Park, CA" ,address = "Halkidiki, Greece" ,abstract = "We present a new method for determining the consensus sequence in DNA fragment assemblies. The new method, Trace-Evidence, directly incorporates aligned ABI trace information into consensus calculations via our previously described representation, Trace-Data Classifications. The new method extracts and sums evidence indicated by the representation to determine consensus calls. Using the Trace-Evidence method results in automatically produced consensus sequences that are more accurate and less ambiguous than those produced with standard majority-voting methods. Additionally, these improvements are achieved with less coverage than required by the standard methods �using Trace-Evidence and a coverage of only three, error rates are as low as those with a coverage of over ten sequences." } @inproceedings{allex.ismb96 ,filename = "allex.ismb96.pdf allex.ismb96.ps" ,author = "C. Allex and S. Baldwin and J. Shavlik and F. Blattner" ,title = "Improving the Quality of Automatic DNA Sequence Assembly using Fluorescent Trace-Data Classifications" ,booktitle = "Proceedings, Fourth International Conference on Intelligent Systems for Molecular Biology" ,year = 1996 ,pages = "3-14" ,publisher = "AAAI Press" ,pub_addr = "Menlo Park, CA" ,address = "St. Louis, MO" ,abstract = "Virtually all large-scale sequencing projects use automatic sequence-assembly programs to aid in the determination of DNA sequences. The computer-generated assemblies require substantial hand-editing to transform them into submissions for GenBank. As the size of sequencing projects increases, it becomes essential to improve the quality of the automated assemblies so that this time-consuming hand-editing may be reduced. Current ABI sequencing technology uses base calls made from fluorescently-labeled DNA fragments run on gels. We present a new representation for the fluorescent trace data associated with individual base calls. This representation can be used before, during, and after fragment assembly to improve the quality of assemblies. We demonstrate one such use -- end-trimming of sub-optimal data -- that results in a significant improvement in the quality of subsequent assemblies." } @inproceedings{cherkauer.kdd96 ,filename = "cherkauer.kdd96.pdf cherkauer.kdd96.ps" ,author = "K. Cherkauer and J. Shavlik" ,title = "Growing Simpler Decision Trees to Facilitate Knowledge Discovery" ,booktitle = "Proceedings, Second International Conference on Knowledge Discovery and Data Mining" ,year = 1996 ,pages = "315-318" ,publisher = "AAAI Press" ,pub_addr = "Menlo Park, CA" ,address = "Portland, OR" ,abstract = "When using machine learning techniques for knowledge discovery, output that is comprehensible to a human is as important as predictive accuracy. We introduce a new algorithm, SET-Gen, that improves the comprehensibility of decision trees grown by standard C4.5 without reducing accuracy. It does this by using genetic search to select the set of input features C4.5 is allowed to use to build its tree. We test SET-Gen on a wide variety of real-world datasets and show that SET-Gen trees are significantly smaller and reference significantly fewer features than trees grown by C4.5 without using SET-Gen. Statistical significance tests show that the accuracies of SET-Gen's trees are either not distinguishable from or are more accurate than those of the original C4.5 trees on all ten datasets tested." } @inproceedings{opitz.nips96 ,author = "D. Opitz and J. Shavlik" ,title = "Generating Accurate and Diverse Members of a Neural-Network Ensemble" ,booktitle = "Advances in Neural Information Processing Systems" ,volume = 8 ,year = 1996 ,publisher = "MIT Press" ,address = "Denver, CO" ,pages = "535-543" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/promoters-large/" ,filename = "opitz.nips96.pdf opitz.nips96.ps" ,abstract= "Neural-network ensembles have been shown to be very accurate classification techniques. Previous work has shown that an effective ensemble should consist of networks that are not only highly correct, but ones that make their errors on different parts of the input space as well. Most existing techniques, however, only indirectly address the problem of creating such a set of networks. In this paper we present a technique called ADDEMUP that uses genetic algorithms to directly search for an accurate and diverse set of trained networks. ADDEMUP works by first creating an initial population, then uses genetic operators to continually create new networks, keeping the set of networks that are as accurate as possible while disagreeing with each other as much as possible. Experiments on three DNA problems show that ADDEMUP is able to generate a set of trained networks that is more accurate than several existing approaches. Experiments also show that ADDEMUP is able to effectively incorporate prior knowledge, if available, to improve the quality of its ensemble." } Note: see also opitz.mlrgwp95.ps @inproceedings{cherkauer.nips96 ,author = "K. Cherkauer and J. Shavlik" ,title = "Rapid Quality Estimation of Neural Network Input Representations" ,booktitle = "Advances in Neural Information Processing Systems" ,volume = 8 ,year = 1996 ,pages = "45-51" ,publisher = "MIT Press" ,address = "Denver, CO" ,filename = "cherkauer.nips96.pdf cherkauer.nips96.ps" ,abstract= "The choice of an input representation for a neural network can have a profound impact on its accuracy in classifying novel instances. However, neural networks are typically computationally expensive to train, making it difficult to test large numbers of alternative representations. This paper introduces fast quality measures for neural network representations, allowing one to quickly and accurately estimate which of a collection of possible representations for a problem is the best. We show that our measures for ranking representations are more accurate than a previously published measure, based on experiments with three difficult, real-world pattern recognition problems." } Note: see also cherkauer.ijcai-wkshp95.ps @inproceedings{craven.nips96 ,author = "M. Craven and J. Shavlik" ,title = "Extracting Tree-Structured Representations of Trained Networks" ,booktitle = "Advances in Neural Information Processing Systems" ,volume = 8 ,year = 1996 ,pages = "24-30" ,address = "Denver, CO" ,publisher = "MIT Press" ,filename = "craven.nips96.pdf craven.nips96.ps" ,code = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/trepan" ,abstract= "A significant limitation of neural networks is that the representations they learn are usually incomprehensible to humans. We present a novel algorithm, TREPAN, for extracting comprehensible, symbolic representations from trained neural networks. Our algorithm uses queries to induce a decision tree that approximates the concept represented by a given network. Our experiments demonstrate that TREPAN is able to produce decision trees that maintain a high level of fidelity to their respective networks, while being comprehensible and accurate. Unlike previous work in this area, our algorithm is both general in its applicability and scales well to large networks and problems with high-dimensional input spaces." } @inproceedings{jackson.nips96 ,author = "J. Jackson and M. Craven" ,title = "Learning Sparse Perceptrons" ,booktitle = "Advances in Neural Information Processing Systems" ,volume = 8 ,year = 1996 ,pages = "654-662" ,address = "Denver, CO" ,publisher = "MIT Press" ,filename = "jackson.nips96.pdf jackson.nips96.ps" ,abstract= "We introduce a new algorithm designed to learn sparse perceptrons over input representations which include high-order features. Our algorithm, which is based on a hypothesis-boosting method, is able to PAC-learn a relatively natural class of target concepts. Moreover, the algorithm appears to work well in practice: on a set of three problem domains, the algorithm produces classifiers that utilize small numbers of features yet exhibit good generalization performance. Perhaps most importantly, our algorithm generates concept descriptions that are easy for humans to understand." } @inproceedings{shavlik.icnn96 ,author = "J. Shavlik" ,title = "An Overview of Research at Wisconsin on Knowledge-Based Neural Networks" ,booktitle = "Proceedings of the International Conference on Neural Networks" ,year = 1996 ,address = "Washington, DC" ,filename = "shavlik.icnn96.pdf shavlik.icnn96.ps" ,pages = "65-69" ,abstract= "Recent research at the University of Wisconsin on knowledge-based neural networks is surveyed. This work has focused on (a) using symbolically represented background knowledge to improve neural-network learning and (b) extracting comprehensible symbolic representations from trained networks. Important open issues are discussed." } @inproceedings{maclin.ijcai95 ,author = "R. Maclin and J. Shavlik" ,title = "Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks" ,booktitle = "Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence" ,year = 1995 ,address = "Montreal, Canada" ,filename = "maclin.ijcai95.pdf maclin.ijcai95.ps" ,pages = "524-530" ,abstract= "The primary goal of inductive learning is to generalize well - that is, induce a function that accurately produces the correct output for future inputs. Hansen and Salamon showed that, under certain assumptions, combining the predictions of several separately trained neural networks will improve generalization. One of their key assumptions is that the individual networks should be independent in the errors they produce. In the standard way of performing backpropagation this assumption may be violated, because the standard procedure is to initialize network weights in the region of weight space near the origin. This means that backpropagation's gradient-descent search may only reach a small subset of the possible local minima. In this paper we present an approach to initializing neural networks that uses competitive learning to intelligently create networks that are originally located far from the origin of weight space, thereby potentially increasing the set of reachable local minima. We report experiments on two real-world datasets where combinations of networks initialized with our method generalize better than combinations of networks initialized the traditional way." } @inproceedings{craven.ismb95 ,author = "M. Craven and R. Mural and L. Hauser and E. Uberbacher" ,title = "Predicting Protein Folding Classes without Overly Relying on Homology" ,booktitle = "Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology" ,year = 1995 ,address = "Cambridge, England" ,publisher = "AAAI Press" ,pages = "98-106" ,filename = "craven.ismb95.pdf craven.ismb95.ps" } @inproceedings{opitz.mlc94 ,author = "D. Opitz and J. Shavlik" ,title = "Using Genetic Search to Refine Knowledge-Based Neural Networks" ,booktitle = "Machine Learning: Proceedings of the Eleventh International Conference" ,year = 1994 ,address = "New Brunswick, NJ" ,publisher = "Morgan Kaufmann" ,pages = "208-216" ,filename = "opitz.mlc94.pdf opitz.mlc94.ps" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/" ,abstract= "An ideal inductive-learning algorithm should exploit all available resources, such as computing power and domain-specific knowledge, to improve its ability to generalize. Connectionist theory-refinement systems have proven to be effective at utilizing domain-specific knowledge; however, most are unable to exploit available computing power. This weakness occurs because they lack the ability to refine the topology of the networks they produce, thereby limiting generalization, especially when given impoverished domain theories. We present the REGENT algorithm, which uses genetic algorithms to broaden the type of networks seen during its search. It does this by using (a) the domain theory to help create an initial population and (b) crossover and mutation operators specifically designed for knowledge-based networks. Experiments on three real-world domains indicate that our new algorithm is able to significantly increase generalization compared to a standard connectionist theory-refinement system, as well as our previous algorithm for growing knowledge-based networks." } @inproceedings{craven.mlc94 ,author = "M. Craven and J. Shavlik" ,title = "Using Sampling and Queries to Extract Rules from Trained Neural Networks" ,booktitle = "Proceedings of the Eleventh International Conference on Machine Learning" ,year = 1994 ,pages = "37-45" ,address = "New Brunswick, NJ" ,filename = "craven.mlc94.pdf craven.mlc94.ps" ,abstract= "Concepts learned by neural networks are difficult to understand because they are represented using large assemblages of real-valued parameters. One approach to understanding trained neural networks is to extract symbolic rules that describe their classification behavior. There are several existing rule-extraction approaches that operate by searching for such rules. We present a novel method that casts rule extraction not as a search problem, but instead as a learning problem. In addition to learning from training examples, our method exploits the property that networks can be efficiently queried. We describe algorithms for extracting both conjunctive and M-of-N rules, and present experiments that show that our method is more efficient than conventional search-based approaches." } @inproceedings{maclin.aaai94 ,author = "R. Maclin and J. Shavlik" ,title = "Incorporating Advice into Agents that Learn from Reinforcements" ,booktitle = "Proceedings of the Twelfth National Conference on Artificial Intelligence" ,year = 1994 ,address = "Seattle, WA" ,pages = "694-699" ,filename = "maclin.aaai94.pdf maclin.aaai94.ps" ,abstract= "Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present an approach that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach, the advice-giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple programming language. Based on techniques from knowledge-based neural networks, these programs are inserted directly into the agent's utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that shows our approach leads to statistically-significant gains in expected reward. Importantly, the advice improves the expected reward regardless of the stage of training at which it is given." } @inproceedings{opitz.ijcai93 ,author = "D. Opitz and J. Shavlik" ,title = "Heuristically Expanding Knowledge-Based Neural Networks" ,booktitle = "Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence" ,year = 1993 ,month = "September" ,address = "Chambery, France" ,pages = "1360-1365" ,filename = "opitz.ijcai93.pdf opitz.ijcai93.ps" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/" ,abstract= "Knowledge-based neural networks are networks whose topology is determined by mapping the dependencies of a domain-specific rulebase into a neural network. However, existing network training methods lack the ability to add new rules to the (reformulated) rulebases. Thus, on domain theories that are lacking rules, generalization is poor, and training can corrupt the original rules, even those that were initially correct. We present TOPGEN, an extension to the KBANN algorithm, that heuristically searches for possible expansions of a knowledge-based neural network, guided by the domain theory, the network, and the training data. It does this by dynamically adding hidden nodes to the neural representation of the domain theory, in a manner analogous to adding rules and conjuncts to the symbolic rulebase. Experiments indicate that our method is able to heuristically find effective places to add nodes to the knowledge-base network and verify that new nodes must be added in an intelligent manner." } @inproceedings{craven.ijcai93 ,author = "M. Craven and J. Shavlik" ,title = "Learning to Represent Codons: A Challenge Problem for Constructive Induction" ,booktitle = "Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence" ,year = 1993 ,month = "September" ,pages = "1319-1324" ,address = "Chambery, France" ,filename = "craven.ijcai93.pdf craven.ijcai93.ps" ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/coding/" ,abstract= "The ability of an inductive learning system to find a good solution to a given problem is dependent upon the representation used for the features of the problem. Systems that perform constructive induction are able to change their representation by constructing new features. We describe an important, real-world problem - finding genes in DNA - that we believe offers an interesting challenge to constructive-induction researchers. We report experiments that demonstrate that: (1) two different input representations for this task result in significantly different generalization performance for both neural networks and decision trees; and (2) both neural and symbolic methods for constructive induction fail to bridge the gap between these two representations. We believe that this real-world domain provides an interesting challenge problem for constructive induction because the relationship between the two representations is well known, and because the representational shift involved in constructing the better representation is not imposing." } @inproceedings{cherkauer.ismb93, author = "K. Cherkauer and J. Shavlik", title = "Protein Structure Prediction: Selecting Salient Features From Large Candidate Pools", booktitle = "Proceedings of the First International Conference on Intelligent Systems for Molecular Biology", year = 1993, address = "Bethesda, MD", pages = "74-82", publisher = "AAAI Press", filename = "cherkauer.ismb93.pdf cherkauer.ismb93.ps" ,abstract= "We introduce a parallel approach, ``DT-Select,'' for selecting features used by inductive learning algorithms to predict protein secondary structure. DT-Select is able to rapidly choose small, nonredundant feature sets from pools containing hundreds of thousands of potentially useful features. It does this by building a decision tree, using features from the pool, that classifies a set of training examples. The features included in the tree provide a compact description of the training data and are thus suitable for use as inputs to other inductive learning algorithms. Empirical experiments in the protein secondary-structure task, in which sets of complex features chosen by DT-Select are used to augment a standard artificial neural network representation, yield surprisingly little performance gain, even though features are selected from very large feature pools. We discuss some possible reasons for this result." } @inproceedings{craven.mlc93, author = "M. Craven and J. Shavlik", title = "Learning Symbolic Rules Using Artificial Neural Networks", booktitle = "Proceedings of the Tenth International Conference on Machine Learning", year = 1993, address = "Amherst, MA", pages = "73-80", publisher = "Morgan Kaufmann", filename = "craven.mlc93.pdf craven.mlc93.ps" ,abstract= "A distinct advantage of symbolic learning algorithms over artificial neural networks is that typically the concept representations they form are more easily understood by humans. One approach to understanding the representations formed by neural networks is to extract symbolic rules from trained networks. In this paper we describe and investigate an approach for extracting rules from networks that uses (1) the NofM extraction algorithm, and (2) the network training method of soft weight-sharing. Previously, the NofM algorithm had been successfully applied only to knowledge-based neural networks. Our experiments demonstrate that our extracted rules generalize better than rules learned using the C4.5 system. In addition to being accurate, our extracted rules are also reasonably comprehensible." } @inproceedings{craven.hicss93, author = "M. Craven and J. Shavlik", title = "Learning to Predict Reading Frames in {{E}. coli} {DNA} Sequences", booktitle = "Proceedings of the 26th Hawaii International Conference on System Sciences", year = 1993, address = "Wailea, HI", publisher = "IEEE Computer Society Press", pages = "773-782", data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/coding-HICSS/", filename = "craven.hicss93.pdf craven.hicss93.ps" ,abstract= "Two fundamental problems in analyzing DNA sequences are (1) locating the regions of a DNA sequence that encode proteins, and (2) determining the reading frame for each region. We investigate using artificial neural networks (ANNs) to find coding regions, determine reading frames, and detect frameshift errors in E. coli DNA sequences. We describe our adaptation of the approach used by Uberbacher and Mural to identify coding regions in human DNA, and we compare the performance of ANNs to several conventional methods for predicting reading frames. Our experiments demonstrate that ANNs can outperform these conventional approaches." } @inproceedings{towell.aaai92, author = "G. Towell and J. Shavlik", title = "Using Symbolic Learning to Improve Knowledge-Based Neural Networks", booktitle = "Proceedings of the Tenth National Conference on Artificial Intelligence", pages = "177-182", year = "1992", address = "San Jose, CA", filename = "towell.aaai92.pdf towell.aaai92.ps" ,abstract= "The previously-reported KBANN system integrates existing knowledge into neural networks by defining the network topology and setting initial link weights. Standard neural learning techniques can then be used to train such networks, thereby refining the information upon which the network is based. However, standard neural learning techniques are reputed to have difficulty training networks with multiple layers of hidden units; KBANN commonly creates such networks. In addition, standard neural learning techniques ignore some of the information contained in the networks created by KBANN. This paper describes a symbolic inductive learning algorithm for training such networks that uses this previously-ignored information and which helps to address the problems of training ``deep'' networks. Empirical evidence shows that this method improves not only learning speed, but also the ability of networks to generalize correctly to testing examples." } @inproceedings{maclin.aaai92, author = "R. Maclin and J. Shavlik", title = "Using Knowledge-Based Neural Networks to Improve Algorithms: Refining the Chou-Fasman Algorithm for Protein Folding", booktitle = "Proceedings of the Tenth National Conference on Artificial Intelligence", pages = "165-170", year = "1992", address = "San Jose, CA", filename = "maclin.aaai92.pdf maclin.aaai92.ps" ,abstract= "We describe a method for using machine learning to refine algorithms represented as generalized finite-state automata. The knowledge in an automaton is translated into an artificial neural network, and then refined with backpropagation on a set of examples. Our technique for translating an automaton into a network extends KBANN, a system that translates a set of propositional rules into a corresponding neural network. The extended system, FSKBANN, allows one to refine the large class of algorithms that can be represented as state-based processes. As a test, we use FSKBANN to refine the Chou-Fasman algorithm, a method for predicting how globular proteins fold. Empirical evidence shows the refined algorithm FSKBANN produces is statistically significantly more accurate than both the original Chou-Fasman algorithm and a neural network trained using the standard approach." } @inproceedings{scott.nips4, author = "G. Scott and J. Shavlik and W. Ray", title = "Refining {PID} Controllers using Neural Networks", booktitle = "Advances in Neural Information Processing Systems", year = "1992", volume = "4", pages = "555-562", editor = "J. Moody and S. Hanson and R. Lippmann", address = "Denver, CO", publisher = "Morgan Kaufmann", filename = "scott.nips4.pdf scott.nips4.ps" ,abstract= "The KBANN (Knowledge-Based Artificial Neural Networks) approach uses neural networks to refine knowledge that can be written in the form of simple propositional rules. We extend this idea further by presenting the MANNCON (Multivariable Artificial Neural Network Control) algorithm by which the mathematical equations governing a PID (Proportional-Integral-Derivative) controller determine the topology and initial weights of a network, which is further trained using backpropagation. We apply this method to the task of controlling the outflow and temperature of a water tank, producing statistically-significant gains in accuracy over both a standard neural network approach and a non-learning PID controller. Furthermore, using the PID knowledge to initialize the weights of the network produces statistically less variation in testset accuracy when compared to networks initialized with small random numbers." } @inproceedings{towell.nips4, author = "G. Towell and J. Shavlik", title = "Interpretation of Artificial Neural Networks: Mapping knowledge-based Neural Networks into Rules", booktitle = "Advances in Neural Information Processing Systems", year = 1992, volume = 4, pages = "977-984", editor = "J. Moody and S. Hanson and R. Lippmann", address = "Denver, CO", publisher = "Morgan Kaufmann", data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/splice-junction-gene-sequences/", filename = "towell.nips4.pdf towell.nips4.ps" ,abstract= "We propose and empirically evaluate a method for the extraction of expert-comprehensible rules from trained neural networks. Our method operates in the context of a three-step process for learning that uses rule-based domain knowledge in combination with neural networks. Empirical tests using real-worlds problems from molecular biology show that the rules our method extracts from trained neural networks: closely reproduce the accuracy of the network from which they came, are superior to the rules derived by a learning system that directly refines symbolic rules, and are expert-comprehensible." } Note: see also towell.mlj93 @inproceedings{noordewier.nips3, author = "M. Noordewier and G. Towell and J. Shavlik", title = "Training Knowledge-Based Neural Networks to Recognize Genes in {DNA} Sequences", booktitle = "Advances in Neural Information Processing Systems", year = 1991, pages = "530-536", volume = 3, editor = "R. Lippmann and J. Moody and D. Touretzky", publisher = "Morgan Kaufmann", address = "Denver, CO", data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/splice-junction-gene-sequences/" ,filename = "noordewier.nips3.pdf noordewier.nips3.ps" ,abstract= "We describe the application of a hybrid symbolic/connectionist machine learning algorithm to the task of recognizing important genetic sequences. The symbolic portion of the KBANN system utilizes inference rules that provide a roughly-correct method for recognizing a class of DNA sequences known as eukaryotic splice-junctions. We then map this ``domain theory'' into a neural network and provide training examples. Using the samples, the neural network's learning algorithm adjusts the domain theory so that it properly classifies these DNA sequences. Our procedure constitutes a general method for incorporating preexisting knowledge into artificial neural networks. We present an experiment in molecular genetics that demonstrates the value of doing so." } Note: see also towell.aij94 @inproceedings{towell.aaai90, author = "G. Towell and J. Shavlik and M. Noordewier", title = "Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks", booktitle = "Proceedings of the Eighth National Conference on Artificial Intelligence", year = "1990", pages = "861-866", address = "Boston, MA", filename = "towell.aaai90.pdf towell.aaai90.ps", data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/promoter-gene-sequences" ,abstract= "Standard algorithms for explanation-based learning require complete and correct knowledge bases. The KBANN system relaxes this constraint through the use of empirical learning methods to refine approximately correct knowledge. This knowledge is used to determine the structure of an artificial neural network and the weights on its links, thereby making the knowledge accessible for modification by neural learning. KBANN is evaluated by empirical tests in the domain of molecular biology. Networks created by KBANN are shown to be superior, in terms of their ability to correctly classify unseen examples, to randomly initialized neural networks, decision trees, ``nearest neighbor'' matching, and standard techniques reported in the biological literature. In addition, KBANN's networks improve the initial knowledge in biologically interesting ways." } Note: see also towell.aij94.ps --------------- WORKSHOP PAPERS --------------- @inproceedings{torrey.aaai08 ,filename = "torrey.aaai08.pdf" ,author = "L. Torrey and J. Shavlik and S. Natarajan and P. Kuppili and T. Walker" ,title = "Transfer in Reinforcement Learning via Markov Logic Networks" ,year = 2008 ,wkshp_date = "July 2008" ,conf_date = "July 2008" ,address = "Chicago, IL" ,booktitle = "AAAI'08 Workshop on Transfer Learning for Complex Tasks" ,workshop = "Workshop on Transfer Learning for Complex Tasks" ,abstract = "We propose the use of statistical relational learning, and in particular the formalism of Markov Logic Networks, for transfer in reinforcement learning. Our goal is to extract relational knowledge from a source task and use it to speed up learning in a related target task. We do so by learning a Markov Logic Network that describes the source-task Q-function, and then using it for decision making in the early learning stages of the target task. Through experiments in the RoboCup simulated-soccer domain, we show that this approach can provide a substantial performance benefit in the target task." } @inproceedings{torrey.icml06 ,filename = "torrey.icml-wkshp06.pdf" ,author = "L. Torrey and J. Shavlik and T. Walker and R. Maclin" ,title = "Relational Skill Transfer via Advice Taking" ,year = 2006 ,wkshp_date = "July 2006" ,conf_date = "July 2006" ,address = "Pittsburg, PA" ,booktitle = "ICML'06 Workshop on Structural Knowledge Transfer for Machine Learning" ,workshop = "Structural Knowledge Transfer for Machine Learning" ,abstract = "We describe a reinforcement learning system that transfers relational skills from a previously learned source task to a related target task. The system uses inductive logic programming to analyze experience in the source task, and transfers rules about when to take actions. The target-task learner accepts these rules through an advice-taking algorithm. Our system also accepts human-provided advice, which can guide the application of transferred skills and provide instruction about new skills in the target task." } @inproceedings{goadrich.icml05 ,filename = "goadrich.icml05.pdf" ,author = "M. Goadrich and L. Oliphant and J. Shavlik" ,title = "Learning to Extract Genic Interactions Using Gleaner" ,year = 2005 ,wkshp_date = "August 2005" ,conf_date = "August 2005" ,address = "Bonn, Germany" ,booktitle = "ICML'05 Workshop on Learning Language in Logic (LLL05)" ,workshop = "Learning Language in Logic" ,abstract = "We explore here the application of Gleaner, an Inductive Logic Programming approach to learning in highly-skewed domains, to the Learning Language in Logic 2005 biomedical information-extraction challenge task. We create and describe a large number of background knowledge predicates suited for this task. We find that Gleaner outperforms standard Aleph theories with respect to recall and that additional linguistic background knowledge improves recall." } @inproceedings{maclin.ijcai05 ,filename = "maclin.ijcai05.pdf maclin.ijcai05-poster.pdf" ,author = "R. Maclin and J. Shavlik and L. Torrey and T. Walker" ,title = "Knowledge Based Support Vector Regression for Reinforcement Learning" ,year = 2005 ,wkshp_date = "August 2005" ,conf_date = "August 2005" ,address = "Edinburgh, Scotland" ,booktitle = "IJCAI'05 Workshop on Reasoning, Representation, and Learning in Computer Games" ,workshop = "Reasoning, Representation, and Learning in Computer Games" ,abstract = "Reinforcement learning (RL) methods have difficulty scaling to large, complex problems. One approach that has proven effective for scaling RL is to make use of advice provided by a human. We extend a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), to RL and evaluate our approach on the KeepAway subtask of the RoboCup soccer simulator. We present empirical results that show our approach can make effective use of advice. Our work not only demonstrates the potential of advice-giving techniques such as KBKR for RL, but also offers insight into some of the design decisions involved in employing support-vector regression in RL." } @inproceedings{walker.icml04 ,filename = "walker.icml04.pdf walker.icml04.doc" ,author = "T. Walker and J. Shavlik and R. Maclin" ,title = "Relational Reinforcement Learning via Sampling the Space of First-Order Conjunctive Features" ,booktitle = "Proceedings of the ICML Workshop on Relational Reinforcement Learning" ,year = "2004" ,publisher = "" ,pub_addr = "" ,wkshp_date = "July 2004" ,conf_date = "July 2004" ,address = "Banff, Canada" ,workshop = "Relational Reinforcement Learning" ,abstract = "We propose a novel method for reinforcement learning in domains that are best described using relational (first-order) features. Our approach is to rapidly sample a large space of such features, selecting a good subset to use as the basis for our Q-function. Our Q-function is created via a regression model that combines the collection of first-order features into a single prediction. To control the effect of the random predictions we use an ensemble approach for our predictions, generating multiple Q-function models and then combining the results of these models into a single prediction. Experiments with our technique on an interesting reinforcement learning problem, the Keep-Away subtask of RoboCup, suggest that our method can learn to effectively predict Q-values for a challenging reinforcement-learning task." } @inproceedings{kuhlmann-aaai04 ,filename = "kuhlmann-aaai04.pdf kuhlmann-aaai04.ps" ,author = "G. Kuhlmann and P. Stone and R. Mooney and J. Shavlik" ,title = "Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer" ,booktitle = "Proceedings of the AAAI Workshop on Supervisory Control of Learning and Adaptive Systems" ,year = "2004" ,publisher = "" ,pub_addr = "" ,wkshp_date = "July 2004" ,conf_date = "July 2004" ,address = "San Jose, CA, USA" ,workshop = "Supervisory Control of Learning and Adaptive Systems" ,abstract = "We describe our current efforts towards creating a reinforcement learner that learns both from reinforcements provided by its environment and from human-generated advice. Our research involves two complementary components: (a) mapping advice expressed in English to a formal advice language and (b) using advice expressed in a formal notation in a reinforcement learner. We use a subtask of the challenging RoboCup simulated soccer task (Noda et al. 1998) as our testbed." } @inproceedings{dimaio.icdm03 ,filename = "dimaio.icdm03.pdf dimaio.icdm03.ppt" ,author = "F. DiMaio and J. Shavlik" ,title = "Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores" ,booktitle = "Proceedings of the ICDM Workshop on Foundations and New Directions of Data Mining" ,year = "2003" ,publisher = "" ,pub_addr = "" ,wkshp_date = "Nov 2003" ,conf_date = "Nov 2003" ,address = "Melbourne, Florida, USA" ,workshop = "Foundations and New Directions in Data Mining" ,note = "A longer and updated version of this paper appears in Proceedings of the Fourteenth International Conference on Inductive Logic Programming (2004)" ,abstract = "The motivation behind multi-relational data mining is knowledge discovery in relational databases containing multiple related tables. One difficulty relational data mining faces is managing intractably large hypothesis spaces. We attempt to overcome this difficulty by first sampling the hypothesis space. We generate a small set of hypotheses, uniformly sampled from the space of candidate hypotheses, and evaluate this set on actual data. These hypotheses and their corresponding evaluation scores serve as training data in learning an approximate hypothesis evaluator. We use this approximate evaluation to quickly rate potential hypotheses without needing to score them on actual data. We test our approximate clause evaluation algorithm using the popular Inductive Logic Programming (ILP) system Aleph. We use a neural network to approximate the hypothesis-evaluation function. The trained neural network replaces Alephs hypothesis evaluation on actual data, scoring potential rules in time independent of the number of examples. Our approximate evaluator can also be used in a heuristic search to help escape local maxima. We test the neural network's ability in learning the hypothesis-evaluation function on four benchmark ILP domains; the neural network is able to accurately approximate the hypothesis-evaluation function." } @inproceedings{dimaio.icml03 ,filename = "dimaio.icml03.pdf dimaio.icml03.ppt" ,author = "F. DiMaio and J. Shavlik and G. Phillips" ,title = "Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps" ,booktitle = "Working Notes of the ICML Workshop on Machine Learning in Bioinformatics" ,year = "2003" ,publisher = "" ,pub_addr = "" ,wkshp_date = "Aug 2003" ,conf_date = "Aug 2003" ,address = "Washington DC, USA" ,workshop = "Machine Learning in Bioinformatics" ,abstract = "One of the most time-consuming steps in determining a protein's structure via x-ray crystallography is interpretation of the electron density map. This can be viewed as a computer-vision problem, since a density map is simply a three-dimensional image of a protein. However, due to the intractably large space of conformations the protein can adopt, building a protein model to match in the density map is extremely difficult. This paper describes the use of pictorial structures to build a flexible protein model from the protein's amino acid sequence. A pictorial structure is a way of representing an object as a collection of parts connected, pairwise, by deformable springs. Model parameters are learned from training data. Using an efficient algorithm to match the model to the density map, the most probable arrangement of the protein's atoms can be found in a reasonable running time. We test the algorithm is on two different tasks. The first is an amino-acid sidechain-refinement task, in which the location of the protein's backbone is approximately known. The algorithm places the remaining atoms into the region of density quite accurately, placing 72% of atoms within 1.0 �of their actual location (as determined by a crystallographer). In the second task, a classification task, the algorithm is used to predict the type of amino acid contained in an unknown region of density. In this task, the algorithm is 61% accurate in discriminating between four different amino acids." } @inproceedings{mooney.nsf02 ,filename = "mooney.nsf02.pdf mooney.nsf02.ps" ,author = "R. Mooney and P. Melville and L. Tang and J. Shavlik and I. Dutra and D. Page and V. Santos Costa" ,title = "Relational Data Mining with Inductive Logic Programming for Link Discovery" ,booktitle = "Proceedings of the National Science Foundation Workshop on Next Generation Data Mining" ,year = 2002 ,publisher = "" ,pub_addr = "" ,wkshp_date = "Nov 2002" ,conf_date = "Nov 2002" ,address = "Baltimore, Maryland, USA" ,workshop = "" ,note = "A longer and updated version of this paper appears as a chapter in ``Data Mining: Next Generation Challenges and Future Directions'', H. Kargupta and A. Joshi (eds.), by AAAI/MIT Press" ,abstract = "Link discovery (LD) is an important task in data mining for counter-terrorism and is the focus of DARPA's Evidence Extraction and Link Discovery (EELD) research program. Link discovery concerns the identification of complex relational patterns that indicate potentially threatening activities in large amounts of relational data. Most data-mining methods assume data is in the form of a feature-vector (a single relational table) and cannot handle multi-relational data. Inductive logic programming is a form of relational data mining that discovers rules in first-order logic from multi-relational data. This paper discusses the application of ILP to learning patterns for link discovery." } @inproceedings{goecks.ijcai-wkshp99 ,filename = "goecks_IJCAI_ML_IF_workshop99.pdf" ,author = "J. Goecks and J. Shavlik" ,title = "Automatically Labeling Web Pages Based on Normal User Actions" ,booktitle = "Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering" ,year = 1999 ,publisher = "" ,pub_addr = "" ,wkshp_date = "Jul 31--Aug 6, 1999" ,conf_date = "Jul 31--Aug 6, 1999" ,address = "Stockholm, Sweden" ,workshop = "" ,abstract = "For agents attempting to learn a user's interests, the cost of obtaining labeled training instances is prohibitive because the user must directly label each training instance, and few users are willing to do so. We present an approach that circumvents the need for human-labeled pages. Instead, we learn 'surrogate' tasks where the desired output is easily measured, such as the number of hyperlinks clicked on a page or the amount of scrolling performed. Our assumption is that these outputs will highly correlate with the user's interests. In other words, by unobtrusively 'observing' the user's behavior we are able to learn functions of value. For example, an agent could silently observe the user's browser behavior during the day, then use these training examples to learn such functions and gather, during the middle of the night, pages that are likely to be of interest to the user. Previous work has focused on learning a user profile by passively observing the hyperlinks clicked on and those passed over. We extend this approach by measuring user mouse and scrolling activity in addition to user browsing activity. We present empirical results that demonstrate our agent can accurately predict some easily measured aspects of one's use of his or her browser." } @inproceedings{shavlik.aaai-wkshp98 ,filename = "shavlik.aaai-wkshp98.pdf shavlik.aaai-wkshp98.ps" ,author = "J. Shavlik and T. Eliassi-Rad" ,title = "Intelligent Agents for Web-Based Tasks: An Advice-Taking Approach" ,booktitle = "Working Notes of the AAAI/ICML-98 Workshop on Learning for Text Categorization" ,year = 1998 ,publisher = "AAAI Press" ,pub_addr = "Menlo Park, CA" ,wkshp_date = "Jul 26--30, 1998" ,conf_date = "Jul 26--30, 1998" ,address = "Madison, WI" ,workshop = "" ,abstract = "We present and evaluate an implemented system with which to rapidly and easily build intelligent software agents for Web-based tasks. Our design is centered around two basic functions: ScoreThisLink and ScoreThisPage. If given highly accurate such functions, standard heuristic search would lead to efficient retrieval of useful information. Our approach allows users to tailor our system's behavior by providing approximate advice about the above functions. This advice is mapped into neural network implementations of the two functions. Subsequent reinforcements from the Web (e.g., dead links) and any ratings of retrieved pages that the user wishes to provide are, respectively, used to refine the link- and page-scoring functions. Hence, our architecture provides an appealing middle ground between non-adaptive agent programming languages and systems that solely learn user preferences from the user's ratings of pages. We describe our internal representation of Web pages, the major predicates in our advice language, how advice is mapped into neural networks, and the mechanisms for refining advice based on subsequent feedback. We also present a case study where we provide some simple advice and specialize our general-purpose system into a ``home-page finder''. An empirical study demonstrates that our approach leads to a more effective home-page finder than that of a leading commercial Web search site." } @inproceedings{shavlik.conald-wkshp98 ,filename = "shavlik.conald-wkshp98.pdf shavlik.conald-wkshp98.ps" ,author = "J. Shavlik and T. Eliassi-Rad" ,title = "Building Intelligent Agents for Web-Based Tasks: A Theory-Refinement Approach" ,booktitle = "Presented at the Conf on Automated Learning and Discovery Workshop on Learning from Text and the Web" ,year = 1998 ,publisher = "" ,pub_addr = "" ,wkshp_date = "Jun 11--13, 1998" ,conf_date = "Jun 11--13, 1998" ,address = "Pittsburgh, PA" ,workshop = "" ,abstract = "We present and evaluate an infrastructure with which to rapidly and easily build intelligent software agents for Web-based tasks. Our design is centered around two basic functions: ScoreThisLink and ScoreThisPage. If given highly accurate such functions, standard heuristic search would lead to efficient retrieval of useful information. Our approach allows users to tailor our system's behavior by providing approximate advice about the above functions. This advice is mapped into neural network implementations of the two functions. Subsequent reinforcements from the Web (e.g., dead links) and any ratings of retrieved pages that the user wishes to provide are, respectively, used to refine the link- and page-scoring functions. Hence, our agent architecture provides an appealing middle ground between non-adaptive ``agent'' programming languages and systems that solely learn user preferences from the user's ratings of pages. We present a case study where we provide some simple advice and specialize our general-purpose system into a ``home-page finder''. An empirical study demonstrates that our approach leads to a more effective home-page finder than that of a leading commercial Web search engine." } @inproceedings{cherkauer.aaai-wkshp96 ,filename = "cherkauer.aaai-wkshp96.pdf cherkauer.aaai-wkshp96.ps" ,author = "K. Cherkauer" ,title = "Human Expert-Level Performance on a Scientific Image Analysis Task by a System Using Combined Artificial Neural Networks" ,booktitle = "Working Notes, Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms Wkshp, 13th Nat Conf on Artificial Intelligence" ,year = 1996 ,pages = "15-21" ,publisher = "AAAI Press" ,pub_addr = "Menlo Park, CA" ,wkshp_date = "Aug 4--5, 1996" ,conf_date = "Aug 4--8, 1996" ,address = "Portland, OR" ,workshop = "" ,abstract = "This paper presents the Plannett system, which combines artificial neural networks to achieve expert-level accuracy on the difficult scientific task of recognizing volcanos in radar images of the surface of the planet Venus. Plannett uses ANNs that vary along two dimensions: the set of input features used to train and the number of hidden units. The ANNs are combined simply by averaging their output activations. When Plannett is used as the classification module of a three-stage image analysis system called JARtool, the end-to-end accuracy (sensitivity and specificity) is as good as that of a human planetary geologist on a four-image test suite. JARtool-Plannett also achieves the best algorithmic accuracy on these images to date." } @inproceedings{craven.ijcai-wkshp95 ,author = "M. Craven and J. Shavlik" ,title = "Extracting Comprehensible Concept Representations from Trained Neural Networks" ,booktitle = "Presented at the IJCAI Workshop on Comprehensibility in Machine Learning" ,year = 1995 ,address = "Montreal, Quebec, Canada" ,filename = "craven.ijcai-wkshp95.pdf craven.ijcai-wkshp95.ps" ,workshop = "" ,abstract= "Although they are applicable to a wide array of problems, and have demonstrated good performance on a number of difficult, real-world tasks, neural networks are not usually applied to problems in which comprehensibility of the acquired concepts is important. The concept representations formed by neural networks are hard to understand because they typically involve distributed, nonlinear relationships encoded by a large number of real-valued parameters. To address this limitation, we have been developing algorithms for extracting ``symbolic'' concept representations from trained neural networks. We first discuss why it is important to be able to understand the concept representations formed by neural networks. We then briefly describe our approach and discuss a number of issues pertaining to comprehensibility that have arisen in our work. Finally, we discuss choices that we have made in our research to date, and open research issues that we have not yet addressed." } @inproceedings{cherkauer.ijcai-wkshp95 ,author = "K. Cherkauer and J. Shavlik" ,title = "Rapidly Estimating the Quality of Input Representations for Neural Networks" ,booktitle = "Working Notes of the IJCAI-95 Workshop on Data Engineering for Inductive Learning, Fourteenth International Joint Conference on Artificial Intelligence" ,year = 1995 ,address = "Montreal, Quebec, Canada" ,filename = "cherkauer.ijcai-wkshp95.pdf cherkauer.ijcai-wkshp95.ps" ,workshop = "" ,abstract= "The choice of an input representation for machine learning can have a profound impact on the accuracy of the learned model in classifying novel instances. A reliable method of rapidly estimating the value of a representation, independent of the learner, would be a powerful tool in the search for better representations. We introduce a fast representation- quality measure that is more accurate than Rendell and Ragavan's blurring metric in rank ordering input representations for neural networks on two difficult, real-world datasets. This work constitutes a step forward both in representation quality measures and in our understanding of the characteristics that engender good representations." } Note: see also cherkauer.nips96.ps @inproceedings{opitz.isiknh94 ,author = "D. Opitz and J. Shavlik" ,title = "Genetically Refining Topologies of Knowledge-Based Neural Networks" ,booktitle = "International Symposium on Integrating Knowledge and Neural Heuristics" ,year = 1994 ,address = "Pensacola, FL" ,pages = "57-66" ,filename = "opitz.isiknh94.pdf opitz.isiknh94.ps" ,workshop = "" ,abstract= "Traditional approaches to connectionist theory refinement map the dependencies of a domain-specific rulebase into a neural network, then refine these reformulated rules using neural learning. These approaches have proven to be effective at classifying previously unseen examples; however, most of these approaches suffer in that they are unable to refine the topology of the networks they produce. Thus, when given an ``impoverished'' domain theory, they generalize poorly. A recently published improvement to these approaches, the TopGen algorithm, addressed this limitation by heuristically searching expansions to the knowledge-based networks produced by these algorithms. We show, however, that TopGen's search is too restricted. In response, we present the REGENT algorithm, which uses genetic algorithms to broaden the type of networks seen during its search. It does this by using (a) the domain theory to help create an initial population and (b) crossover and mutation operators specifically designed for knowledge-based networks. Experiments on three real-world domains indicate that our new algorithm is able to significantly increase generalization when compared to both TopGen and a standard approach that does not alter its knowledge-based network's topology." } @inproceedings{maclin.mlc91, author = "R. Maclin and J. Shavlik", title = "Refining Domain Theories Expressed as Finite-State Automata", booktitle = "Proceedings of the Eighth International Machine Learning Workshop", year = "1991", pages = "524-528", address = "Evanston, IL", publisher = "Morgan Kaufmann", filename = "maclin.mlc91.pdf maclin.mlc91.ps" ,workshop = "" ,abstract= "The KBANN system uses neural networks to refine domain theories. Currently, domain knowledge in KBANN is expressed as non-recursive, propositional rules. We extend KBANN to domain theories expressed as finite-state automata. We apply finite-state KBANN to the task of predicting how proteins fold, producing a small but statistically significant gain in accuracy over both a standard neural network approach and a non-learning algorithm from the biological literature. Our method shows promise at solving this and other real-world problems that can be described in terms of state-dependent decisions." } Note: this work has been superseded by maclin.aaai92 @inproceedings{towell.mlc91, author = "G. Towell and M. Craven and J. Shavlik", title = "Constructive Induction in Knowledge-Based Neural Networks", booktitle = "Proceedings of the Eighth International Machine Learning Workshop", year = 1991, pages = "213-217", address = "Evanston, IL", publisher = "Morgan Kaufmann", filename = "towell.mlc91.pdf towell.mlc91.ps", data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/splice-junction-gene-sequences/", workshop = "" ,abstract= "Artificial neural networks have proven to be a successful, general method for inductive learning from examples. However, they have not often been viewed in terms of constructive induction. We describe a method for using a knowledge-based neural network of the kind created by the KBANN algorithm as the basis of a system for constructive induction. After training, we extract two types of rules from a network: modified versions of the rules initially provided to the knowledge-based neural network, and rules which describe newly constructed features. Our experiments show that the extracted rules are more accurate, at classifying novel examples, than the trained network from which the rules are extracted." } Note: this work has been superseded by towell.nips4, towell.mlj93, @inproceedings{shavlik.cbr91, author = "J. Shavlik", title = "Finding Genes by Case-Based Reasoning in the Presence of Noisy Case Boundaries", booktitle = "Proceedings of the DARPA Cased-Based Reasoning Works