The following papers are available via ftp from the Machine
Learning Research Group of the University of Wisconsin, Madison.

These papers can be retrieved via a WWW browser, such as Mosaic,
or by anonymous ftp.  The URL for our publications page is:
	http://www.cs.wisc.edu/~shavlik/publications.html
The URL for our group's "home" page is:
	http://www.cs.wisc.edu/~shavlik/uwml.html
The URL for retrieving these papers via anonymous ftp is:
	ftp://ftp.cs.wisc.edu/machine-learning/shavlik-group/

If your computer cannot uncompress ".Z" files, simply leave off
the ".Z" when you request the ftp transfer; our ftp server will
then uncompress the file before transmitting it.

Email shavlik@cs.wisc.edu if you have any major problems.


   ------
   THESES
   ------   

@phdthesis{kuusisto.thesis
,author      = "Finn Kuusisto"
,title       = "Machine Learning for Medical Decision Support and Individualized Treatment Assignment"
,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
,year        = "2015"
,filename    = "kuusisto.thesis.pdf kuusisto.thesis.pptx"
,abstract    = "World Health Organization estimates of health care expenditure reveal a global trend of increasing costs, and health care systems need to become more efficient at treating patients to slow this trend.
Incentives are in place to develop information-based health care systems, and I claim that using machine learning tools in medicine will lead to improvements in patient care.
My work demonstrates new methods to improve collaboration between machine learning experts and clinicians, and new methods for modeling individual responses to treatment.

My work in collaboration with clinical experts involves the adaptation of machine learning models to address the challenging task of identifying benign breast cancer biopsies that cannot be definitively diagnosed.
I first adapt an inductive logic programming learner to prefer rules that do not misclassify malignant cases, and show promising results that both adhere to the clinical objective and provide insight into the task.
I later present a framework for collaboration between clinical and machine learning experts, leveraging clinician expertise to build and refine a model that meets the conservative objective of missing no malignant cases.

My work on estimating individual responses to treatment takes lessons from the marketing domain, applying uplift modeling to two primary medical tasks.
One task is to identify patients at greater risk of heart attack due to treatment with COX-2 inhibitors, and another is understanding characteristics of in situ breast cancer specific to older women.
I first present a statistical relational learner that constructs Bayesian networks to maximize area under the uplift curve (AUU), and show that the learned networks capture clinically-relevant characteristics of indolent, in situ breast cancer.
I next present a support vector machine for maximizing AUU and show promising results on both the COX-2 inhibitor and breast cancer tasks, as well as a synthetic marketing task.
Finally, I present a collaboration showing strong evidence that machine learning for individualized treatment effect estimation improves upon current methods in multiple ways.
Overall, I present multiple works that demonstrate improved clinical collaboration and new methods for modeling individual responses to treatment within machine learning."
}

@phdthesis{zhang.thesis
,author      = "Ce Zhang"
,title       = "DeepDive: A Data Management System for Automatic Knowledge Base Construction"
,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
,year        = "2015"
,filename    = "zhang.thesis.pdf zhang.thesis.pptx"
,prelim      = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/zhang.prelim.pptx"
,abstract    = "Many pressing questions in science are macroscopic: they require scientists to consult information expressed in a wide range of resources, many of which are not organized in a structured relational form. Knowledge base construction (KBC) is the process of populating a knowledge base, i.e., a relational database storing factual information, from unstructured inputs. KBC holds the promise of facilitating a range of macroscopic sciences by making information accessible to scientists.

One key challenge in building a high-quality KBC system is that developers must often deal with data that are both diverse in type and large in size. Further complicating the scenario is that these data need to be manipulated by both relational operations and state-of-the-art machine-learning techniques. This dissertation focuses on supporting this complex process of building KBC systems. DeepDive is a data management system that we built to study this problem; its ultimate goal is to allow scientists to build a KBC system by declaratively specifying domain knowledge without worrying about any algorithmic, performance, or scalability issues.

DeepDive was built by generalizing from our experience in building more than ten high-quality KBC systems, many of which exceed human quality or are top-performing systems in KBC competitions, and many of which were built completely by scientists or industry users using DeepDive. From these examples, we designed a declarative language to specify a KBC system and a concrete protocol that iteratively improves the quality of KBC systems. This flexible framework introduces challenges of scalability and performance--Many KBC systems built with DeepDive contain statistical inference and learning tasks over terabytes of data, and the iterative protocol also requires executing similar inference problems multiple times. Motivated by these challenges, we designed techniques that make both the batch execution and incremental execution of a KBC program up to two orders of magnitude more efficient and/or scalable. This dissertation describes the DeepDive framework, its applications, and these techniques, to demonstrate the thesis that it is feasible to build an efficient and scalable data management system for the end-to-end workflow of building KBC systems."
}

@phdthesis{khot.thesis
,author      = "Tushar Khot"
,title       = "Efficient Learning of Statistical Relational Models"
,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
,year        = "2014"
,filename    = "khot.thesis.pdf khot.thesis.pptx"
,abstract    = "Machine Learning has been successfully applied to many prediction problems in varying domains. But standard techniques assume that the examples are independent of each other and have the same number of features. In many domains, the objects can be inter-related and have different number of features. To build probabilistic models over such data, Statistical Relational Learning (SRL) methods have been proposed, which combine first-order logic representation with probabilities. But due to their high expressivity, learning the structure of SRL models can be computationally intensive.

I present a structure-learning approach that learns multiple weak rules of thumb via functional-gradient boosting. My approach can be used to learn the structure of two popular SRL models. I empirically demonstrate it to be more accurate and computationally faster than state-of-the-art methods.

To further increase the applicability of my approach, I extend it to handle missing data by deriving an Expectation-Maximization approach for relational models. To handle Natural Language Processing domains with only positive labeled examples, I present and evaluate a non-parametric approach for relational one-class classification using a tree-based relational distance measure.

Apart from learning models, this thesis also explores knowledge representation in Markov Logic Networks (MLN). I present and evaluate an approach that can convert multi-level combination functions along with their corresponding parameters into MLN clauses. I present an algorithm for converting two combination functions into MLNs and show the correctness of my transformation.

Finally this thesis shows how my approach can be used for Alzheimer's disease prediction from MRI images as well as to augment expert rules for temporal relation extraction. I present my approach for a large-scale novel relation extraction task, where I process terabytes of streaming data to detect changes in extracted relations.

Overall, this thesis presents multiple structure-learning approaches for SRL, starting from a  boosting-based algorithm, which is extended to handle missing values via EM. Next, I present a structure-learning approach for one-class classification by learning a relational distance metric. I present application of these structure-learning approach on multiple SRL datasets and real-world tasks."
}

@phdthesis{walker.thesis
 ,author      = "Trevor Walker"
  ,title       = "Broadening the Applicability of Relational Learning "
   ,school      = "Department of Computer Sciences, University of
   Wisconsin-Madison"
    ,year        = "2011"
    ,filename    = "walker.thesis.pdf walker.thesis.pptx"
  ,note        = "(Also appears as UW Technical Report CS-TR-11-1698)"
  ,abstract  = "Inductive Logic Programming (ILP) provides an effective method of learning logical theories given a 
  set of positive examples, a set of negative examples, a corpus of background knowledge and specification 
  of a search space from which to compose the theories.  While specifying positive and negative examples 
  is relatively straightforward, composing effective background knowledge and search space definition 
  requires detailed understanding of many aspects of the ILP process and limits the usability of ILP.  This 
  research explores a number of techniques to automate the use of ILP for a experts whose expertise lies 
  outside of ILP.  These techniques include automatic generation of background knowledge from user supplied information
  in the form advice about specific training examples, utilization of type hierarchies to 
  constrain search, and an iterative-deepening style search process.   Additionally, I examine methods of 
  knowledge acquisition through human-computer interfaces, facilitating the use of ILP by the novice user."
  }

@phdthesis{soni.thesis
 ,author      = "Ameet Soni"
  ,title       = "Techniques for Improved Probabilistic Inference In
  Protein-Structure Determination via X-Ray Crystallography"
   ,school      = "Department of Computer Sciences, University of
   Wisconsin-Madison"
    ,year        = "2011"
    ,filename    = "soni.thesis.pdf soni.thesis.pptx"
  ,note        = "(Also appears as UW Technical Report CS-TR-11-1703)"
       ,code = "http://research.cs.wisc.edu/acmi/"
        ,abstract="Over the past decade, the field of machine learning has
	seen a large increase in the use and study of probabilistic graphical
	models due to their ability to provide a compact representation of
	complex, multidimensional problems.  Graphical models have
	applications in many areas, including natural language processing,
	computer vision, gene regulatory-network modeling, and medical
	diagnosis.  Recently, the complexity of problems posed in many domains
	has stressed the ability of algorithms to reason in graphical models.
	New techniques for inference are essential to meet the demands of
	these problems in an efficient and accurate manner.

	One such area of application is in the area of structural genomics.
	The task of determining protein structures has been a central one to
	the biological community, with recent years seeing significant
	investments in structural-genomic initiatives.  X-ray crystallography,
	a molecular-imaging technique, is at the core of many of these
	initiatives as it is the most popular method for determining protein
	structures.  In creating a high-throughput crystallography pipeline,
	however, the final step of constructing an all-atom protein model from
	an electron-density map - a three-dimensional image of a molecule
	produced as an intermediate product of X-ray crystallography - remains
	a major bottleneck in need of computational methods.  In difficult
	cases where the image is poor, this can take months of manual effort
	by an experienced crystallographer.

	In this thesis, I develop new inference techniques for the use of
	probabilistic graphical models for the automated determination of
	protein structures in electron-density maps.  The first, guided belief
	propagation using domain knowledge, prioritizes messages in the
	popular belief propagation algorithm for approximate inference.
	Second, I propose Probabilistic Ensembles in ACMI (PEA), a framework
	for leveraging multiple, diverse executions of approximate inference
	to produce more accurate estimations of a variable's posterior
	probability distribution.    Lastly, I present work on the use of
	statistical sampling (particle filtering) for the purpose of providing
	physically feasible, all-atom protein structures.

	I demonstrate that my new methods not only improve the accuracy of the
	probabilistic model in terms of log-likelihood values, but also
	produce protein structures with higher completeness, lower RMS error,
	and better fit to the density map according to R-free factor.  My
	methods interpret difficult electron-density maps (3-4A resolution)
	better than prior inference approaches.  Across a set of poor-quality
	density maps, my work outperforms all related work in the field by
	improving the state-of-the-art technique, ACMI.  In addition, I show
	that the ability to incorporate biochemical domain knowledge is an
	important aspect to probabilistic modeling, creating more accurate
	modeling functions and influencing algorithmic design of belief
	propagation.

	I also describe my contributions on the subtask of three-dimensional
	shape matching in electron-density maps by utilizing
	spherical-harmonic decompositions to quickly align two 3D objects over
	rotations.  I show that spherical-harmonic decompositions, when
	applied to the task of matching small amino-acid fragments, are more
	efficient and accurate than previous work.  I also extend spherical
	harmonics to two other shape-detection tasks: homologous structure
	detection in electron-density maps and feature generation for 3D shape
	classification of local density regions.

	While the application of my work specifically targets the problem of
	protein-structure determination, the issues I pose generalize to
	computational problems seen in many areas of the field of artificial
	intelligence.  Throughout this work, I will refer to, and develop,
	techniques to solve problems seen in probabilistic inference,
	three-dimensional shape matching, and statistical sampling, among
	others."
}
@phdthesis{oliphant.thesis
 ,author      = "Louis Oliphant"
 ,title       = "Adaptively Finding and Combining First-Order Rules for Large, Skewed Data Sets"
 ,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
 ,year        = "2009"
 ,filename    = "oliphant.thesis.pdf"
 ,note	      = "(Also appears as UW Technical Report CS-TR-09-1661)"
 ,abstract = "Inductive Logic Programming (ILP) is a machine-learning approach that uses first-order logic to create human-readable rules from a database of information and a set of positive and negative examples. When working with highly skewed data sets where the negatives vastly outnumber the positives, common metrics such as predictive accuracy and area under the receiver-operator characteristic curves (AUROC) do not work well because these metrics count negatives and positives equally, causing performance on the negative examples to dominate. This thesis explores creating ensembles of rules to maximize area under the recall-precision curves (AURPC), a much better metric that focuses specifically on the coverage and accuracy of labeling the positive examples.  I create an ensemble of rules from a wide range of recall values and combine them to maximize AURPC.  My Gleaner algorithm retains a set of rules for each positive seed example where standard ILP methods keep only a single rule. Gleaning rules from those rules that would normally be discarded and combining them into a single ensemble shows improved predictive performance while reducing the number of rules evaluated.  I evaluate several modified search methods for finding sets of clauses that work well together.  One method applies a probability distribution over the space of rules and stochastically selects rules more likely to improve Gleaner�s predictive performance. A second method follows a boosting framework and weights examples in order to maximize AURPC. Tying together the method of combining rules with the search for good candidate rules shows improvement over the standard Gleaner algorithm. I apply these first-order ensemble techniques to several data sets from two very different domains. The first data sets come from the Information-Extraction (IE) domain where the task is to find specific relationships in text. The next data sets come from the computer-assisted medicaldiagnosis domain. The task is to identify findings on a mammogram as malignant or benign given descriptors of the findings, patient risk factors, radiologist�s score, and information from any previous mammograms. I also include my work with Davis et al.'s SAYU algorithm. I demonstrate methods to improve predictive performance and to increase understanding of malignancy indicators. Inclusion of additional background knowledge that allows for rules to contain ranges of values provides for more complex models that improve predictive performance. I also show that transferred models are able to outperform radiologists at new institutions even when no additional data are available from the new institution. Finally, first-order rules and probability help in improving understanding of malignant indicators. I use these techniques to confirm the importance of high mass density in identifying malignant findings. I also identify surprising pairs of features that perform better than expected at identifying malignant findings than would be expected by looking at the features individually."
 }
@phdthesis{torrey.thesis
 ,author      = "Lisa Torrey"
 ,title       = "Relational Transfer in Reinforcement Learning"
 ,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
 ,year        = "2009"
 ,filename    = "torrey.thesis.pdf torrey.thesis.pptx"
 ,note	= "(Also appears as UW Technical Report CS-TR-09-1657)"
 ,abstract = "Transfer learning is an inherent aspect of human learning. When humans
learn to perform a task, we rarely start from scratch. Instead, we recall
relevant knowledge from previous learning experiences and apply that
knowledge to help us master the new task more quickly.

This principle can be applied to machine learning as well. Machine
learning often addresses single learning tasks in isolation. Even though
multiple related tasks may exist in a domain, many algorithms for machine
learning have no way to utilize those relationships. Algorithms that allow
successful transfer from one task (the source) to another task (the
target) are necessary steps towards making machine learning as adaptable
as human learning.

This thesis investigates transfer methods for reinforcement learning (RL),
where an agent takes series of actions in an environment. RL often
requires substantial amounts of nearly random exploration, particularly in
the early stages of learning. The ability to transfer knowledge from
previous tasks can therefore be an important asset for RL agents. Transfer
from related source tasks can improve the low initial performance that is
common in challenging target tasks.

I focus on transferring relational knowledge that guides action choices.
Relational knowledge typically uses first-order logic to express
information about relationships among objects. First-order logic, unlike
propositional logic, can use variables that generalize over classes of
objects. This greater generalization makes first-order logic more
effective for transfer.

This thesis contributes six transfer algorithms in three categories:
advice-based transfer, macro transfer, and MLN transfer. Advice-based
transfer uses source-task knowledge to provide advice for a target-task
learner, which can follow, refine, or ignore the advice according to its
value. Macro-transfer and MLN-transfer methods use source-task experience
to demonstrate good behavior for a target-task learner.

I evaluate these transfer algorithms experimentally in the complex
reinforcement-learning domain of RoboCup simulated soccer. All of my
algorithms provide empirical benefits compared to non-transfer approaches,
either by increasing initial performance or by enabling faster learning in
the target task."
}

@phdthesis{dimaio.thesis
,author      = "Frank DiMaio"
,title       = "Probabilistic Methods for Interpreting Electron-Density Maps"
,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
,year        = "2007"
,filename    = "dimaio.thesis.pdf dimaio.thesis.ppt"
,code		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
,data		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
,note    = "(Also appears as UW Technical Report CS-TR-07-1617)"
,abstract = "With recent advances in structural genomics, there has been
considerable interest in the rapid determination of protein structures in a
high-throughput setting.  One bottleneck in this process arises in protein
crystallography, and deals with interpretation of the electron-density map, the
three-dimensional 'picture' of the protein that crystallography produces.  This
thesis presents a novel solution to this important problem of electron-density
map interpretation.  I apply probabilistic methods to automate the
interpretation of poor-quality electron-density maps. <br><br>I show my probabilistic
approach to density-map interpretation leads to more complete and more accurate
protein models, in terms of the fraction of the protein automatically
interpreted, as well as the RMS error of my method's inferred models versus
'ground truth' (the deposited structure), than do other automated approaches.  
My probabilistic approach is also amenable to production of multiple protein
models that explain the observed density.  I show that multiple static
conformations generated by my framework do a better job of explaining the
observed density than does a single structure, based on the R-free metric, which measures the difference between observed and
predicted crystallographic reflection data on a testset of held-aside data.  My method accurately interprets 3-4A density
maps, further extending the resolution of density maps that can be automatically interpreted. <br><br>This thesis also describes several computational contributions.  I describe a significant improvement over previous work in three-dimensional template matching in electron-density maps.  I use the spherical-harmonic decomposition of a template to rapidly search for all rotations of the template.  This offers both improved efficiency and accuracy compared to previous work, producing better models in 60% of the running time.  I present a novel joint type as well as improved methods for collision-handling in part-based object-recognition.  Finally, I present a general part-based object-recognition framework specialized for identifying topologically complex objects in large three-dimensional images.  My framework introduces an algorithm that improves the efficiency of current probabilistic inference algorithms.  This improved efficiency allows recognition of objects with hundreds of parts.  Although originally developed for density-map interpretation, these computational contributions may be beneficial in other problem domains."
}
   
@phdthesis{molla.thesis
 ,author      = "Michael Molla"
 ,title       = "Novel Uses for Machine Learning and Other Computational Methods for the Design and Interpretation of Genetic Microarrays"
 ,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
 ,year        = "2007"
 ,filename    = "molla.thesis.pdf molla.thesis.ppt"
 ,note	= "(Also appears as UW Technical Report CS-TR-07-1612)"
 ,abstract = "It is clear that high-throughput techniques, such as rapid DNA sequencing and gene chips are changing the science of genetics. Hypothesis-driven science is now strongly complemented by these newer data-driven approaches. Over the course of the past decade, DNA microarrays, also known as gene chips, have come into prominence for genetic-level analysis throughout the life sciences. Using these microarrays, a scientist is able to perform hundreds of thousands of experiments on the surface of a single one-inch-by-one-inch wafer in the space of a single afternoon, generating more data than an army of researchers could have a generation ago. This potential flood of data brings many informatic challenges in both analysis and design. It is well understood that computer science will play a crucial role in their development and application. This thesis presents novel applications of machine learning and other computational methods to central tasks in highthroughput biology. These tasks include gene-chip design, detection of genomic variation, and the interpretation of gene-expression patterns."
}

@phdthesis{goadrich.thesis
 ,author      = "Mark Goadrich"
 ,title       = "Learning Ensembles of First-Order Clauses that Optimize Precision-Recall Curves"
 ,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
 ,year        = "2007"
 ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/IE-protein-location/"
 ,filename    = "goadrich.thesis.pdf goadrich.thesis.ppt"
 ,note	= "(Also appears as UW Technical Report CS-TR-07-1609)"
 ,abstract = "Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced
data, such as biomedical information extraction, citation matching, and learning relationships in
social networks. A common way to measure performance in these domains is to use precision and
recall instead of simply using accuracy, and to examine their tradeoffs by plotting a precision-recall
curve. The goal of this thesis is to find new approaches within ILP particularly suited for large,
highly skewed domains.

I propose and investigate Gleaner, a randomized search method that collects good clauses from
a broad spectrum of points along the recall dimension in recall-precision curves and employs
thresholding methods to combine sets of selected clauses. I compare Gleaner to ensembles of
standard theories learned by Aleph, a standard ILP algorithm, using a number of large relational
domains. I find that Gleaner produces comparable testset results in a fraction of the training time
and outperforms Aleph ensembles when given the same amount of training time.

I explore extensions to Gleaner with respect to searching and combining clauses, namely find-
ing ways to fully explore the hypothesis space as well as to make better use of those found clauses.
I also use Gleaner to estimate the probability that a query is true, further investigate the properties
underlying precision-recall curves, and then conclude with a discussion of future work in this area."
}


@mastersthesis{geisler.thesis
   ,author      = "Benjamin Geisler"
   ,title       = "An Empirical Study of Machine Learning Algorithms
                   Applied to Modeling Player Behavior in a 'First Person
                   Shooter' Video Game"
   ,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
   ,year        = "2002"
   ,filename    = "geisler.thesis.pdf geisler.thesis.ps"
   ,abstract = "Modern video games present many challenging applications
for artificial
intelligence. Agents must not only appear intelligent but must also be
fun to play against. In the video game genre of the first person shooter
an agent must mimic all the behaviors of a human soldier in a combat
situation. The standard opponent in a 'first person shooter' uses a
finite-state machine and a series of hand coded rules. Drawbacks of this
system include a high level of predictability of opponents and a large
amount of work manually programming each rule. Modern advances in machine
learning have enabled agents to accurately learn rules from a set of
examples. By sampling data from an expert player we use these machine
learning algorithms in a first person shooter. With this system in place,
the programmer has less work when hand coding the combat rules and the
learned behaviors are often more unpredictable and life-like than any
hard-wired finite state machine. This thesis explores several popular
machine learning algorithms and shows how these algorithms can be applied
to the game. The empirical study includes decision trees, Na e Bayes
classifiers, neural networks, and neural networks trained using boosting
and bagging methods. We show that a subset of AI behaviors can be learned
by player modeling using machine learning techniques. Under this system
we have successfully been able to learn the combat behaviors of an expert
player and apply them to an agent in a modified version of the video game
Soldier of Fortune 2. The following tasks were learned: speed of
acceleration, direction of movement, direction of facing, and jumping. We
evaluate both empirically and aesthetically which learner performed the
best and make recommendations for the future. We also have created a
system which uses these learned behaviors in a finite-state system within
the game at real time."
}

@phdthesis{eliassi-rad.thesis
   ,author      = "Tina Eliassi-Rad"
   ,title       = "Building Intelligent Agents that Learn to Retrieve and Extract Information"
   ,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
   ,year        = "2001"
   ,filename    = "eliassi-rad.thesis.pdf eliassi-rad.thesis.ps"
   ,note	= "(Also appears as UW Technical Report CS-TR-01-1431)"
   ,abstract = "The rapid growth of on-line information has created a surge of 
    interest in tools that are able to retrieve and extract information from 
    on-line documents.  In this thesis, I present and evaluate a computer system 
    that rapidly and easily builds instructable and self-adaptive software agents
    for both the information retrieval (IR) and the information extraction (IE) 
    tasks.

    My system is called WAWA (short for Wisconsin Adaptive Web Assistant).
    WAWA interacts with the user and an on-line (textual) environment (e.g.,
    the Web) to build an intelligent agent for retrieving and extracting
    information.  WAWA has two sub-systems: (i) an information retrieval
    (IR) sub-system, called WAWA-IR; and, (ii) an information extraction (IE)
    sub-system, called WAWA-IE.  WAWA-IR is a general search-engine agent,
    which can be trained to produce specialized and personalized IR agents.  
    WAWA-IE is a general extractor system, which creates specialized agents
    that accurately extract pieces of information from documents in the domain
    of interest.

    WAWA utilizes a theory-refinement approach to build its intelligent
    agents.  There are four four primary advantages of using such an approach.
    First, WAWA's agents are able to perform reasonably well initially because
    they are able to utilize users' prior knowledge.  Second, users' prior
    knowledge does not have to be correct since it is refined through
    learning.  Third, the use of prior knowledge, plus the continual dialog
    between the user and an agent, decreases the need for a large number of
    training examples because training is not limited to a binary
    representation of positive and negative examples.  Finally, WAWA provides
    an appealing middle ground between non-adaptive agent programming
    languages and systems that solely learn user preferences from training
    examples.

    WAWA's agents have performed quite well in empirical studies.  WAWA-IR
    experiments demonstrate the efficacy of incorporating the feedback
    provided by the Web into the agent's neural networks to improve the
    evaluation of potential hyperlinks to traverse.  WAWA-IE experiments
    produce results that are competitive with other state-of-art systems.  
    Moreover, they demonstrate that WAWA-IE agents are able to intelligently
    and efficiently select from the space of possible extractions and solve
    multi-slot extraction problems."
}

@phdthesis{allex.thesis
   ,author      = "Carolyn F. Allex"
   ,title       = "Computational Methods for Fast and Accurate DNA Fragment Assembly"
   ,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
   ,year        = "1999"
   ,filename    = "allex.thesis99.pdf allex.thesis99.ps"
   ,note	= "(Also appears as UW Technical Report CS-TR-99-1406)"
   ,abstract = "As advances in technology result in the production of
increasing amounts of DNA sequencing data in decreasing amounts of time,
developing computational methods that allow data analysis to keep pace is
imperative. In this dissertation, I present methods that improve the
speed and accuracy of DNA fragment assembly.
One critical characteristic of automatic methods for fragment assembly is
that they must be accurate. Currently, to ensure accurate sequences, the
data that underlies questionable base calls must be examined by human
editors so that the correct base call can be determined. This manual
process is both error-prone and time-consuming. Automatic methods that
yield high accuracy and few questionable calls can reduce errors and
lessen the need for manual inspections. In my work, I developed a method,
Trace-Evidence, that automatically produces highly accurate consensus
sequences, even with few aligned sequences.

Most assembly programs analyze only base calls when determining a
consensus sequence. The key to the high accuracy is that I incorporate
morphological information about the underlying ABI trace data. This is
accomplished through a new representation of traces, Trace-Class, that
characterizes the height and shape of traces. The new representation not
only yields high accuracy when used in consensus-calling methods, but
also produces improved results when used in removing poor-quality data,
and when used as inputs for neural networks for consensus determination.
The need for fast processing is becoming more important as the size of
sequencing projects increases. Almost all existing fragment assembly
programs perform pairwise comparisons of reads, resulting in execution
times proportional to n2, where n is the number of reads. I describe a
new algorithm for fragment layout, SLIC, that runs in time proportional
to n. SLIC relies on subsequences of bases that occur in overlapping
regions of fragment reads. Subsequences that are common to two or more
fragment reads are aligned to determine the overall layout of reads.

The work I present provides improvements to currently available
computational methods for DNA sequencing that can serve as a foundation
for further study in developing better solutions to problems in fragment
assembly."
}


@phdthesis{craven.thesis
   ,author	= "Mark W. Craven"
   ,title	= "Extracting Comprehensible Models from Trained Neural Networks"
   ,school	= "Department of Computer Sciences, University of Wisconsin-Madison"
   ,year	= "1996"
   ,filename	= "craven.thesis.pdf craven.thesis.ps"
   ,note	= "(Also appears as UW Technical Report CS-TR-96-1326)"
   ,code = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/trepan"
   ,abstract = "Although neural networks have been used to develop highly
   accurate classifiers in numerous real-world problem domains, the models
   they learn are notoriously difficult to understand.  This thesis
   investigates the task of extracting comprehensible models from trained
   neural networks, thereby alleviating this limitation.

   The primary contribution of the thesis is an algorithm that overcomes the
   significant limitations of previous methods by taking a novel approach to
   the task of extracting comprehensible models from trained networks.  
   This algorithm, called TREPAN, views the task as an inductive learning
   problem.  Given a trained network, or any other learned model, TREPAN uses
   queries to induce a decision tree that approximates the function represented
   by the model.  Unlike previous work in this area, TREPAN is broadly
   applicable as well as scalable to large networks and problems with
   high-dimensional input spaces.  The thesis presents experiments that
   evaluate TREPAN by applying it to individual networks and to ensembles of
   neural networks trained in classification, regression, and
   reinforcement-learning domains.  These experiments demonstrate that TREPAN
   is able to extract decision trees that are comprehensible, yet maintain high
   levels of fidelity to their respective networks.  In problem domains in
   which neural networks provide superior predictive accuracy to conventional
   decision tree algorithms, the trees extracted by TREPAN also exhibit
   superior accuracy, but are comparable in terms of complexity, to the trees
   learned directly from the training data.

   A secondary contribution of this thesis is an algorithm, called BBP, that
   constructively induces simple neural networks.  The motivation underlying
   this algorithm is similar to that for TREPAN: to learn comprehensible models
   in problem domains in which neural networks have an especially appropriate
   inductive bias.  The BBP algorithm, which is based on a hypothesis-boosting
   method, learns perceptrons that have relatively few connections.  This
   algorithm provides an appealing combination of strengths: it provides
   learnability guarantees for a fairly natural class of target functions;
   it provides good predictive accuracy in a variety of problem domains; and
   it constructs syntactically simple models, thereby facilitating human
   comprehension of what it has learned.

   These algorithms provide mechanisms for improving the understanding of
   what a trained neural network has learned."
}


@phdthesis{opitz.thesis
   ,author	= "David W. Opitz"
   ,title	= "An Anytime Approach to Connectionist Theory Refinement:
                   Refining the Topologies of Knowledge-Based Neural Networks"
   ,school	= "Department of Computer Sciences, University of Wisconsin-Madison"
   ,year	= "1995"
   ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/"
   ,filename	= "opitz.thesis.pdf opitz.thesis.ps"
   ,note	= "(Also appears as UW Technical Report CS-TR-95-1281)("
   ,abstract = "Many scientific and industrial problems can be better understood by 
   learning from samples of the task at hand.  For this reason, the machine
   learning and statistics communities devote considerable research effort 
   on generating inductive-learning algorithms that try to learn the true 
   ``concept'' of a task from a set of its examples.  Often times, however, one 
   has additional resources readily available, but largely unused, that can 
   improve the concept that these learning algorithms generate.  These 
   resources include available computer cycles, as well as prior knowledge 
   describing what is currently known about the domain.  Effective utilization 
   of available computer time is important since for most domains an expert 
   is willing to wait for weeks, or even months, if a learning system can 
   produce an improved concept.  Using prior knowledge is important since it 
   can contain information not present in the current set of training examples.<BR>

   In this thesis, I present three ``anytime'' approaches to connectionist theory
   refinement.  Briefly, these approaches start by translating a set of rules
   describing what is currently known about the domain into a neural network, 
   thus generating a knowledge-based neural network (KNN).  My approaches then 
   utilize available computer time to improve this KNN by continually refining 
   its weights and topology.  My first method, TopGen, searches for good 
   ``local'' refinements to the KNN topology.  It does this by adding nodes to 
   the KNN in a manner analogous to symbolically adding rules and conjuncts 
   to an incorrect rule base.  My next approach, REGENT, uses genetic 
   algorithms to find better ``global'' changes to this topology.  REGENT 
   proceeds by using (a) the domain-specific rules to help create the initial 
   population of KNNs and (b) crossover and mutation operators specifically 
   designed for KNNs.  My final algorithm, ADDEMUP, searches for an ``ensemble'' 
   of KNNs that work together to produce an effective composite prediction.
   ADDEMUP works by using genetic algorithms to continually create new 
   networks, keeping the set of networks that are as accurate as possible while
   disagreeing with each other as much as possible.  Empirical results show 
   that these algorithms successfully achieve each of their respective goals."
}

@phdthesis{maclin.thesis
   ,author      = "Richard Maclin"
   ,title       = "Learning from Instruction and Experience:  Methods for Incorporating Procedural Domain Theories into Knowledge-Based Neural Networks"
   ,school      = "Department of Computer Sciences, University of Wisconsin-Madison"
   ,year        = "1995"
   ,filename	= "maclin.thesis.firsthalf.ps maclin.thesis.secondhalf.ps maclin.thesis.firsthalf.pdf maclin.thesis.secondhalf.pdf"
   ,note	= "(Also appears as UW Technical Report CS-TR-95-1285)"
   ,abstract = "This thesis defines and evaluates two systems that allow a teacher to
   provide instructions to a machine learner.  My systems, FSKBANN and RATLE,
   expand the language that a teacher may use to provide advice to the
   learner.  In particular, my techniques allow a teacher to give partially
   correct instructions about procedural tasks - tasks that are solved as
   sequences of steps.  FSKBANN and RATLE allow a computer to learn both from
   instruction and from experience.<BR>
   
   Experiments with these systems on several testbeds demonstrate that they
   produce learners that successfully use and refine the instructions they are
   given.<BR>
   
   In my initial approach, FSKBANN, the teacher provides instructions as a
   set of propositional rules organized around one or more finite-state
   automata (FSAs).  FSKBANN maps the knowledge in the rules and FSAs into a
   recurrent neural network.  I used FSKBANN to refine the Chou-Fasman
   algorithm, a method for solving the secondary-structure prediction problem,
   a difficult task in molecular biology.  FSKBANN produces a refined
   algorithm that outperforms the original (non-learning) Chou-Fasman
   algorithm, as well as a standard neural-network approach.<BR>

   My second system, RATLE, allows a teacher to communicate advice, using
   statements in a simple programming language, to a connectionist,
   reinforcement-learning agent.  The teacher indicates conditions of the
   environment and actions the agent should take under those conditions.
   RATLE allows the teacher to give advice continuously by translating the
   teacher's statements into additions to the agent's neural network.  The
   RATLE language also includes novel (to the theory-refinement literature)
   features such as multi-step plans and looping constructs.  In experiments
   with RATLE on two simulated testbeds involving multiple agents, I
   demonstrate that a RATLE agent receiving advice outperforms both an agent
   that does not receive advice and an agent that receives instruction, but
   does not refine it.<BR>

   My methods provide an appealing approach for learning from both instruction
   and experience in procedural tasks.  This work widens the ``information
   pipeline'' between humans and machine learners, without requiring that the
   human provide absolutely correct information to the learner."
}

@phdthesis{towell.thesis
   ,author	= "Geoffrey G. Towell"
   ,title	= "Symbolic Knowledge and Neural Networks:
		   Insertion, Refinement and Extraction"
   ,school	= "Department of Computer Sciences, University of Wisconsin-Madison"
   ,year	= "1991"
   ,note	= "(Also appears as UW Technical Report 1072)"
   ,data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/splice-junction-gene-sequences/"
   ,filename	= "towell.thesis.1-2.ps towell.thesis.3.ps towell.thesis.4-7.ps towell.thesis.app.ps towell.thesis.1-2.pdf towell.thesis.3.pdf towell.thesis.4-7.pdf towell.thesis.app.pdf"
   ,abstract = "Explanation-based and empirical learning are two largely complementary methods
  of machine learning.  These approaches to machine learning both have serious
  problems which preclude their being a general purpose learning method.
  However, a ``hybrid'' learning method that combines explanation-based with
  empirical learning may be able to use the strengths of one learning method to
  address the weaknesses of the other method.  Hence, a system that effectively
  combines the two approaches to learning can be expected to be superior to
  either approach in isolation.  This thesis describes a hybrid system called
  KBANN which is shown to be an effective combination of these two learning
  methods.<BR>

  KBANN (Knowledge-Based Artificial Neural Networks) is a three-part hybrid
  learning system built on top of ``neural'' learning techniques. The first part
  uses a set of approximately-correct rules to determine the structure and
  initial link weights of an artificial neural network, thereby making the rules
  accessible for modification by neural learning. The second part of KBANN
  modifies the resulting network using essentially standard neural learning
  techniques. The third part of KBANN extracts refined rules from trained
  networks.<BR>

  KBANN is evaluated by empirical tests in the domain of molecular biology.
  Networks created by KBANN are shown to be superior, in terms of their ability
  to correctly classify unseen examples, to a wide variety of learning systems
  as well as techniques proposed by experts in the problems investigated. In
  addition, empirical tests show that KBANN is robust to errors in the initial
  rules and insensitive to problems resulting from the presence of extraneous
  input features.<BR>

  The third part of KBANN, which extracts rules from trained networks, addresses
  a significant problem in the use of neural networks -- understanding what a
  neural network learns.  Empirical tests of the proposed rule-extraction method
  show that it simplifies understanding of trained networks by reducing the
  number of: consequents (hidden units), antecedents (weighted links), and
  possible antecedent weights.  Surprisingly, the extracted rules are often more
  accurate at classifying examples not seen during training than the trained
  network from which they came."
}


@phdthesis{gutstein.thesis
   ,author	= "Eric Gutstein"
   ,title	= "SIFT: A Self-Improving Fractions Tutor"
   ,school	= "Department of Computer Sciences, University of Wisconsin-Madison"
   ,year	= "1993"
}


   -----------------
   CONFERENCE PAPERS
   -----------------
@inproceedings{weiss.amia15,
  filename  = "weiss.amia15.pdf",
  author    = "J. Weiss and F. Kuusisto and K. Boyd and J. Liu and D. Page",
  title     = "Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution",
  booktitle = "Proceedings of the AMIA Annual Symposium",
  year      = "2015",
  abstract  = "Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE.  We additionally argue that ITE models should be learned with a consistent, non-parametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis.",
}


@inproceedings{kuusisto.amiacri15,
  filename  = "kuusisto.amiacri15.pdf kuusisto.amiacri15.slides.pdf",
  author    = "F. Kuusisto and I. Dutra and M. Elezaby and E. Mendonca and J. Shavlik and E. Burnside",
  title     = "Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems",
  booktitle = "Proceedings of the AMIA Joint Summits on Translational Science",
  year      = "2015",
  abstract  = "While the use of machine learning methods in clinical decision support has great potential
  for improving patient care, acquiring standardized, complete, and sufficient training data presents
  a major challenge for methods relying exclusively on machine learning techniques.  Domain experts possess
  knowledge that can address these challenges and guide model development.  We present Advice-Based-Learning
  (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show
  results for an example task: estimating the probability of malignancy following a non-definitive breast
  core needle biopsy.  By applying ABLe to this task, we demonstrate a statistically significant improvement
  in specificity (24.0% with p=0.004) without missing a single malignancy.",
}


@inproceedings{natarajan.ilp14
  ,filename = "natarajan.ilp14.pdf"
  ,author = "S. Natarajan and J. Picado T. Khot and K. Kersting and C. Re and J. Shavlik"
  ,title = "Effectively Creating Weakly Labeled Training Examples via Approximate Domain Knowledge"
  ,booktitle = "Proceedings of 24th International Conference on Inductive Logic Programming (ILP)"
  ,year = "2014"
  ,abstract = "One of the challenges to information extraction is the requirement of human annotated examples, commonly called gold-standard examples. Many successful approaches alleviate this problem by employing some form of distant supervision, i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand-coded background knowledge that explicitly looks for patterns in text. For example, they assume all sentences containing Person X and Person Y are positive examples of the relation married(X, Y). In this work, we take a different approach � we infer weakly supervised examples for relations from models learned by using knowledge outside the natural language task. We argue that this method creates more robust examples that are particularly useful when learning the entire information-extraction model (the structure and parameters). We demonstrate on three domains that this form of weak supervision yields superior results when learning structure compared to using distant supervision labels or a smaller set of gold-standard labels."
  }


@inproceedings{kuusisto.ecml14,
    filename = "kuusisto.ecml14.pdf kuusisto.ecml14.poster.pdf kuusisto.ecml14.slides.pdf",
    author    = "F. Kuusisto and V. Santos Costa and H. Nassif and E. Burnside and D. Page and J. Shavlik",
    title        = "Support Vector Machines for Differential Prediction",
    booktitle = "Proceedings of the European Conference on Machine Learning (ECML)",
    year       = "2014",
    code       = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/kuusisto.ecml14.svmuplcode.zip",
    data       = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/kuusisto.ecml14.simcustomerdata.zip",
    abstract  = "Machine learning is continually being applied to a growing set of fields,
    including the social sciences, business, and medicine. Some fields present
    problems that are not easily addressed using standard machine learning
    approaches and, in particular, there is growing interest in differential
    prediction. In this type of task we are interested in producing a classifier
    that specifically characterizes a subgroup of interest by maximizing the
    difference in predictive performance for some outcome between subgroups in a
    population. We discuss adapting maximum margin classifiers for differential
    prediction. We first introduce multiple approaches that do not affect the key
    properties of maximum margin classifiers, but which also do not directly attempt
    to optimize a standard measure of differential prediction. We next propose a
    model that directly optimizes a standard measure in this field, the uplift
    measure. We evaluate our models on real data from two medical applications and
    show excellent results."
}


@inproceedings{khot.aaai14,
    filename  = "khot.aaai14.pdf",
    author    = "T. Khot and S. Natarajan and J. Shavlik",
    title     = "Relational One-Class Classification: A Non-Parametric Approach",
    booktitle = "Proceedings of the 28th AAAI Conference",
    year      = "2014",
    abstract  = "One-class classification approaches have been proposed in the literature to learn classifiers from examples of only one class. But these approaches are not directly applicable to relational domains due to their reliance on a feature vector or a distance measure. We propose a non-parametric relational one-class classification approach based on first-order trees. We learn a tree-based distance measure that iteratively introduces new relational features to differentiate relational examples. We update the distance measure so as to maximize the one-class classification performance of our model. We also relate our model definition to existing work on probabilistic combination functions and density estimation. We experimentally show that our approach can discover relevant features and outperform three baseline approaches.",
}


@inproceedings{gokhale.sigmod14
	,filename = "gokhale.sigmod14.pdf"
	,author = "C. Gokhale and S. Das and A. Doan and J. Naughton and N. Rampali and J. Shavlik and X. Zhu"
	,title = "Corleone: Hands-Off Crowdsourcing for Entity Matching"
	,booktitle = "Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data"
	,year = "2014"
	,address = "Snowbird, Utah"
	,abstract = "Recent approaches to crowdsourcing entity matching (EM) are limited in that they crowdsource only parts of the EM workflow, requiring a developer to execute the remaining parts. Consequently, these approaches do not scale to the growing EM need at enterprises and crowdsourcing startups, and cannot handle scenarios where ordinary users (i.e., the masses) want to leverage crowdsourcing to match entities. In response, we propose the notion of hands-off crowdsourcing (HOC), which crowdsources the entire workflow of a task, thus requiring no developers. We show how HOC can represent a next logical direction for crowdsourcing research, scale up EM at enterprises and crowdsourcing startups, and open up crowdsourcing for the masses. We describe Corleone, a HOC solution for EM, which uses the crowd in all major steps of the EM process. Finally, we discuss the implications of our work to executing crowdsourced RDBMS joins, cleaning learning models, and soliciting complex information types from crowd workers."
}


@inproceedings{nassif.ilp13
	,filename = "nassif.ilp13.pdf"
	,author = "H. Nassif and F. Kuusisto and E. Burnside and J. Shavlik"
	,title = "Uplift Modeling with ROC:  An SRL Case Study"
	,booktitle = "Proceedings of the 23rd International Conference on Inductive Logic Programming"
	,year = "2013"
	,address = "Rio de Janeiro, Brazil"
	,abstract = "Uplift modeling is a classification method that determines the incremental impact of an action on a given population. Uplift modeling aims at maximizing the area under the uplift curve, which is the difference between the subject and control sets� area under the lift curve. Lift and uplift curves are seldom used outside of the marketing domain, whereas the related ROC curve is frequently used in multiple areas. Achieving a good uplift using an ROC-based model instead of lift may be more intuitive in several areas, and may help uplift modeling reach a wider audience.

	We alter SAYL, an uplift-modeling statistical relational learner, to use ROC instead of lift. We test our approach on a screening mammography dataset. SAYL-ROC outperforms SAYL on our data, though not significantly, suggesting that ROC can be used for uplift modeling. On the other hand, SAYL-ROC returns larger models, reducing interpretability."
}


@inproceedings{kunapuli.icdm13
  ,filename = "kunapuli.icdm13.pdf"
  ,author = "G. Kunapuli and P. Odom and J. Shavlik and S. Natarajan"
  ,title = "Guiding Autonomous Agents to Better Behaviors through Human Advice"
  ,booktitle = "Proceedings of the 13th IEEE International Conference on Data Mining"
  ,year = "2013"
  ,address = "Dallas, Texas"
  ,abstract = "Inverse Reinforcement Learning (IRL) is an approach for domain-reward discovery from demonstration, where an agent mines the reward function of a Markov decision process by observing an expert acting in the domain. In the standard setting, it is assumed that the expert acts (nearly) optimally, and a large number of trajectories, i.e., training examples are available for reward discovery (and consequently, learning domain behavior). These are not practical assumptions: trajectories are often noisy, and there can be a paucity of examples. Our novel approach incorporates advice-giving into the IRL framework to address these issues. Inspired by preference elicitation, a domain expert provides advice on states and actions (features) by stating preferences over them. We evaluate our approach on several domains and show that with small amounts of targeted preference advice, learning is possible from noisy demonstrations, and requires far fewer trajectories compared to simply learning from trajectories alone."
}


@inproceedings{kuusisto.healthcom13
  ,filename = "kuusisto.healthcom13.pdf"
  ,author = "F. Kuusisto and I. Dutra and H. Nassif and Y. Wu and M. Klein and H. Neuman and J. Shavlik and E. Burnside"
  ,title = "Using Machine Learning to Identify Benign Cases with Non-Definitive Biopsy"
  ,booktitle = "Proceedings of IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom)"
  ,year = "2013"
  ,address = "Lisbon, Portugal"
  ,abstract = "When mammography reveals a suspicious finding, a core needle biopsy is usually recommended. In 5% to 15% of these cases, the biopsy diagnosis is non-definitive and a more invasive surgical excisional biopsy is recommended to confirm a diagnosis. The majority of these cases will ultimately be proven benign. The use of excisional biopsy for diagnosis negatively impacts patient quality of life and increases costs to the healthcare system. In this work, we employ a multi-relational machine learning approach to predict when a patient with a non-definitive core needle biopsy diagnosis need not undergo an excisional biopsy procedure because the risk of malignancy is low."
}


@inproceedings{liu.amia13,
  filename = "liu.amia13.pdf",
  author    = "J. Liu and D. Page and H. Nassif and J. Shavlik and P. Peissig and C. McCarty and A. Onitilo and E. Burnside",
  title        = "Genetic Variants Improve Breast Cancer Risk Prediction on Mammograms" ,
  booktitle = "Proceedings of the American Medical Informatics Association Annual Symposium (AMIA)",
  year       = "2013",
  abstract  = "Several recent genome-wide association studies have identified genetic variants associated with breast cancer.  However, how much these genetic variants may help advance breast cancer risk prediction based on other clinical  features, like mammographic findings, is unknown. We conducted a retrospective case-control study, collecting  mammographic findings and high-frequency/low-penetrance genetic variants from an existing personalized medicine  data repository. A Bayesian network was developed using Tree Augmented Naive Bayes (TAN) by training on the  mammographic findings, with and without the 22 genetic variants collected. We analyzed the predictive performance  using the area under the ROC curve, and found that the genetic variants significantly improved breast cancer risk  prediction on mammograms. We also identified the interaction effect between the genetic variants and collected  mammographic findings in an attempt to link genotype to mammographic phenotype to better understand disease patterns, mechanisms, and/or natural history."
  ,pmcid = "PMC3900221"
}


@inproceedings{nassif.ecml13,
    filename = "nassif.ecml13.pdf",
    author    = "H. Nassif and F. Kuusisto and E. Burnside and D. Page and J. Shavlik and V. Santos Costa",
    title        = "Score As You Lift (SAYL): A Statistical Relational Learning Approach to Uplift Modeling",
    booktitle = "Proceedings of the European Conference on Machine Learning (ECML)",
    year       = "2013",
    abstract  = "We introduce Score As You Lift (SAYL),
    a novel Statistical Relational Learning (SRL) algorithm, and apply
    it to an important task in the diagnosis of breast cancer. SAYL
    combines SRL with the marketing concept of uplift modeling, uses the
    area under the uplift curve to direct clause construction and final
    theory evaluation, integrates rule learning and probability
    assignment, and conditions the addition of each new theory rule to
    existing ones.
    Breast cancer, the most common type of cancer among women, is
    categorized into two subtypes: an earlier in situ stage where cancer
    cells are still confined, and a subsequent invasive stage. Currently
    older women with in situ cancer are treated to prevent cancer
    progression, regardless of the fact that treatment may generate
    undesirable side-effects, and the woman may die of other
    causes. Younger women tend to have more aggressive cancers, while
    older women tend to have more indolent tumors. Therefore older women
    whose in situ tumors show significant dissimilarity with in situ
    cancer in younger women are less likely to progress, and can thus be
    considered for watchful waiting.
    Motivated by this important problem, this work makes two main
    contributions. First, we present the first multi-relational uplift
    modeling system, and introduce, implement and evaluate a novel
    method to guide search in an SRL framework. Second, we compare our
    algorithm to previous approaches, and demonstrate that the system
    can indeed obtain differential rules of interest to an expert on
    real data, while significantly improving the data uplift."
}


@inproceedings{niu.icdm12,
    filename = "niu.icdm12.pdf",
    author    = "F. Niu and C. Zhang and C. Re and J. Shavlik",
    title        = "Scaling Inference for Markov Logic via Dual Decomposition",
    booktitle = "Proceedings of the IEEE International Conference on Data Mining (ICDM)",
    year       = "2012",
    abstract  = "Markov logic is a knowledge-representation language
    that allows one to specify large graphical models.
    However, the resulting large graphical models can make inference
    for Markov logic a computationally challenging problem.
    Recently, dual decomposition (DD) has become a popular
    approach for scalable inference on graphical models. We study
    how to apply DD to scale up inference in Markov logic.
    A standard approach for DD first partitions a graphical
    model into multiple tree-structured subproblems. We apply this
    approach to Markov logic and show that DD can outperform
    prior inference approaches. Nevertheless, we observe that
    the standard approach for DD is suboptimal as it does not
    exploit the rich structure often present in the Markov logic
    program. Thus, we describe a novel decomposition strategy
    that partitions a Markov logic program into parts based on its
    structure. A crucial advantage of our approach is that we can
    use specialized algorithms for portions of the input problem -
    some of which have been studied for decades, e.g., coreference
    resolution. Empirically, we show that our program-level decomposition
    approach outperforms both non-decomposition and
    graphical model-based decomposition approaches to Markov
    logic inference on several data-mining tasks."
}

@inproceedings{kunapuli.ecml12,
    filename = "kunapuli.ecml12.pdf kunapuli.ecml12.poster.pdf kunapuli.ecml12.slides.pdf",
    author    = "G. Kunapuli and J. Shavlik",
    title        = "Mirror Descent for Metric Learning: A Unified Approach",
    booktitle = "Proceedings of the European Conference on Machine Learning (ECML)",
    year       = "2012",
    abstract  = "Most metric learning methods are characterized by diverse
    loss functions and projection methods, which naturally begs the question:
    is there a wider framework that can generalize many of these methods?
    In addition, ever persistent issues are those of scalability to large data
    sets and the question of kernelizability. We propose a unified approach to
    Mahalanobis metric learning: an online regularized metric learning algorithm
    based on the ideas of composite objective mirror descent (comid).
    The metric learning problem is formulated as a regularized positive semidefinite
    matrix learning problem, whose update rules can be derived
    using the comid framework. This approach aims to be scalable, kernelizable,
    and admissible to many different types of Bregman and loss
    functions, which allows for the tailoring of several different classes of
    algorithms. The most novel contribution is the use of the trace norm,
    which yields a sparse metric in its eigenspectrum, thus simultaneously
    performing feature selection along with metric learning."
 }
 

@inproceedings{zhang.acl12,
  filename = "zhang.acl12.pdf zhang.acl12-poster.pdf",
  author= "C. Zhang and F. Niu and C. Re and J. Shavlik",
  title="Big Data versus the Crowd: Looking for Relationships in All the Right Places",
  booktitle = "Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL)",
  year       = "2012",
  abstract="Classically, training relation extractors relies
  on high-quality, manually annotated training
  data, which can be expensive to obtain. To
  mitigate this cost, NLU researchers have considered
  two newly available sources of less
  expensive (but potentially lower quality) labeled
  data from distant supervision and crowd
  sourcing. There is, however, no study comparing
  the relative impact of these two sources
  on the precision and recall of post-learning answers.
  To fill this gap, we empirically study
  how state-of-the-art techniques are affected by
  scaling these two sources. We use corpus sizes
  of up to 100 million documents and tens of
  thousands of crowd-source labeled examples.
  Our experiments show that increasing the corpus
  size for distant supervision has a statistically
  significant, positive impact on quality
  (F1 score). In contrast, human feedback has a
  positive and statistically significant, but lower,
  impact on precision and recall.",
  }


@inproceedings{dutra.amia11,
  filename = "dutra.amia11.pdf dutra.amia11.slides.pptx",
  author    = "I. Dutra and H. Nassif and D. Page and J. Shavlik and R. Strigel and Y. Wu and M. Elezabi and E. Burnside",
  title        = "Integrating Machine Learning and Physician Knowledge to Improve the Accuracy of Breast Biopsy",
  booktitle = "Proceedings of the American Medical Informatics Association Annual Symposium (AMIA)",
  year       = "2011",
  abstract  = "In this work we show that combining physician rules and machine learned rules may improve the performance of a
  classifier that predicts whether a breast cancer is missed on percutaneous, image-guided breast core needle biopsy
  (subsequently referred to as ''breast core biopsy''). Specifically, we show how advice in the form of logical rules,
  derived by a sub-specialty, i.e. fellowship trained breast radiologists (subsequently referred to as ''our physician'')
  can guide the search in an inductive logic programming system, and improve the performance of a learned classifier.
  Our dataset of 890 consecutive benign breast core biopsy results along with corresponding mammographic findings
  contains 94 cases that were deemed non-definitive by a multidisciplinary panel of physicians, from which 15 were
  upgraded to malignant disease at surgery. Our goal is to predict upgrade prospectively and avoid surgery in women
  who do not have breast cancer. Our results, some of which trended toward significance, show evidence that inductive
  logic programming may produce better results for this task than traditional propositional algorithms with default
  parameters. Moreover, we show that adding knowledge from our physicians into the learning process may improve
  the performance of the learned classifier trained only on data."
 }


@inproceedings{khot.icdm11,
    filename = "khot.icdm11.pdf khot.icdm11.pptx",
    author    = "T. Khot and S. Natarajan and K. Kersting and J. Shavlik",
    title        = "Learning Markov Logic Networks via Functional Gradient Boosting",
    booktitle = "Proceedings of the IEEE International Conference on Data Mining (ICDM)",
    year       = "2011",
    abstract  = "Recent years have seen a surge of interest in
    Statistical Relational Learning (SRL) models that combine logic
    with probabilities. One prominent example is Markov Logic
    Networks (MLNs). While MLNs are indeed highly expressive,
    this expressiveness comes at a cost. Learning MLNs is a hard
    problem and therefore has attracted much interest in the SRL
    community. Current methods for learning MLNs follow a twostep
    approach: first, perform a search through the space of
    possible clauses and then learn appropriate weights for these
    clauses. We propose to take a different approach, namely
    to learn both the weights and the structure of the MLN
    simultaneously. Our approach is based on functional gradient
    boosting where the problem of learning MLNs is turned into
    a series of relational functional approximation problems. We
    use two kinds of representations for the gradients: clausebased
    and tree-based. Our experimental evaluation on several
    benchmark data sets demonstrates that our new approach can
    learn MLNs as good or better than those found with state-ofthe-
    art methods, but often in a fraction of the time."}


@inproceedings{kunapuli.nips11,
    filename = "kunapuli.nips11.pdf kunapuli.nips11-poster.pdf",
    author    = "G. Kunapuli and R. Maclin and J. Shavlik",
    title        = "Advice Refinement for Knowledge-Based Support Vector Machines",
    booktitle = "Proceedings of the Twenty-Fifth Conference on Neural Information Processing Systems (NIPS)",
    year       = "2011",
    abstract  = " Knowledge-based support vector machines (KBSVMs) incorporate advice from
    domain experts, which can improve generalization significantly. A major limitation
    that has not been fully addressed occurs when the expert advice is imperfect,
 which can lead to poorer models. We propose a model that extends KBSVMs
 and is able to not only learn from data and advice, but also simultaneously improves
 the advice. The proposed approach is particularly effective for knowledge
 discovery in domains with few labeled examples. The proposed model contains
 bilinear constraints, and is solved using two iterative approaches: successive linear
 programming and a constrained concave-convex approach. Experimental results
 demonstrate that these algorithms yield useful refinements to expert advice, as
 well as improve the performance of the learning algorithm overall."}

@inproceedings{soni.acmbcb11
 ,filename = "soni.acmbcb11.pdf soni.acmbcb11.pptx"
  ,author   = "A. Soni and J. Shavlik"
   ,title    = "Probabilistic Ensembles for Improved Inference in
   Protein-Structure Determination"
    ,booktitle = "Proceedings of the ACM International Conference on
    Bioinformatics and Computational Biology (ACM-BCB)"
     ,year     = "2011"
      ,address  = "Chicago, Illinois"
       ,abstract = "Protein X-ray crystallography - the most popular method
       for determining protein structures - remains a laborious process
       requiring a great deal of manual crystallographer effort to interpret
       low-quality protein images.  Automating this process is critical in
       creating a high-throughput protein-structure determination pipeline.
       Previously, our group developed ACMI, a probabilistic framework for
       producing protein-structure models from electron-density maps produced
       via X-ray crystallography.  ACMI uses a Markov Random Field to model
       the 3D location of each non-hydrogen atom in a protein.  Calculating
       the best structure in this model is intractable, so ACMI uses
       approximate inference methods to estimate the optimal structure.
       While previous results have shown ACMI to be the state-of-the-art
       method on this task, its approximate inference algorithm remains
       computationally expensive and susceptible to errors.  In this work, we
       develop Probabilistic Ensembles in ACMI (PEA), a framework for
       leveraging multiple, independent runs of approximate inference to
       produce estimates of protein structures.  Our results show
       statistically significant improvements in the accuracy of inference
       resulting in more complete and accurate  protein structures.  In
       addition, PEA provides a general framework for advanced approximate
       inference methods in complex problem domains."
       }


@inproceedings{natarajan.ijcai11
  ,filename = "natarajan.ijcai11.pdf"
  ,author = "S. Natarajan and S. Joshi and P. Tadepalli and K. Kersting and J. Shavlik"
  ,title = "Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach"
  ,booktitle = "Proceedings of 22nd International Joint Conference on Artificial Intelligence (IJCAI)"
  ,year = "2011"
  ,abstract = "Imitation learning refers to the problem of learning how to behave by observing
  a teacher in action. We consider imitation learning in relational
  domains, in which there is a varying number of objects and relations among
  them. In prior work, simple relational policies are learned by viewing imitation
  learning as supervised learning of a function from states to actions.
  For propositional worlds, functional gradient methods have been proved to be beneficial.
  They are simpler to implement than most existing methods, more
  efficient, more naturally satisfy common constraints on the cost function, and
  better represent our prior beliefs about the form of the function.
  Building on recent generalizations of functional gradient boosting to relational representations, we implement a functional gradient  boosting approach to imitation learning in relational domains. In particular, given a set of traces from the human teacher,  our system learns a policy in the form of a set of relational regression trees that additively approximate the functional gradients. The use of multiple additive trees combined with relational representation  allows for learning more expressive policies than what has been done before. We demonstrate the usefulness of our approach in several different domains."
  }


@inproceedings{walker.kcap11
  ,filename = "walker.kcap11.pdf walker.kcap11.pptx"
  ,author = "T. Walker and  G. Kunapuli and N. Larsen and D. Page and J. Shavlik"
  ,title = "Integrating Knowledge Capture and Supervised Learning through a Human-Computer Interface"
  ,booktitle = "Proceedings of 6th International Conference on Knowledge Capture (KCAP)"
  ,year = "2011"
  ,address = "Banff, Canada"
  ,abstract = "Some supervised-learning algorithms can make effective 
  use of domain knowledge in addition to the input-output 
  pairs commonly  used in machine learning.  However, 
  formulating this additional information often requires an indepth  understanding of the specific knowledge 
  representation used by a given  learning  algorithm.  The 
  requirement to use a formal knowledge-representation 
  language means that most domain experts will not be able 
  to articulate their expertise, even when a learning algorithm 
  is capable of exploiting such valuable information. We 
  investigate a method to ease this knowledge acquisition 
  through the use of a graphical, human-computer interface.
  Our interface allows users to easily provide advice about 
  specific examples, rather than requiring them to provide 
  general rules; we leave the task of properly generalizing 
  such advice to the learning algorithms. We demonstrate the 
  effectiveness of our approach using the Wargus real-time 
  strategy game, comparing learning with no advice to 
  learning with concrete advice provided through our 
  interface, as well as comparing to using generalized advice 
  written by an AI expert.   Our results show that our 
  approach of combining a GUI-based advice language with 
  an advice-taking learning algorithm is an effective way to 
  capture domain knowledge."
  }


@inproceedings{niu.vldb11
  ,filename = "niu.vldb11.pdf niu.vldb11.pptx"
  ,author = "F. Niu and C. Re and A. Doan and J. Shavlik"
  ,title = "Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS"
  ,booktitle = "Proceedings of the 37th International Conference on Very Large Data Bases (VLDB 2011)"
  ,address = "Seattle, Washington"
  ,year = "2011"
  ,abstract = "Markov Logic Networks (MLNs) have emerged as a powerful framework that combines statistical and logical reasoning; they have been applied to many data intensive problems including information extraction, entity resolution, and text mining. Current implementations of MLNs do not scale to large real-world data sets, which is preventing their widespread adoption. We present Tuffy that achieves scalability via three novel contributions: (1) a bottom-up approach to grounding that allows us to leverage the full power of the relational optimizer, (2) a novel hybrid architecture that allows us to perform AI-style local search efficiently using an RDBMS, and (3) a theoretical insight that shows when one can (exponentially) improve the efficiency of stochastic local search. We leverage (3) to build novel partitioning, loading, and parallel algorithms. We show that our approach outperforms state-of-the-art implementations in both quality and speed on several publicly available datasets."
}

@inproceedings{kunapuli.annie10,
  filename = "kunapuli.ANNIE10.pdf kunapuli.ANNIE10.slides.pdf",
  author    = "G. Kunapuli and K. Bennett and R. Maclin and J. Shavlik",
  title        = "The Adviceptron: Giving Advice To The Perceptron",
  booktitle = "Proceedings of the Conference on Artificial Neural Networks In Engineering (ANNIE 2010)",
  address = "St. Louis, MO",
  year       = "2010",
  abstract  = "We propose a novel approach for incorporating prior knowledge into the perceptron. The goal is to update the hypothesis taking into account both label feedback and prior knowledge, in the form of soft polyhedral advice, so as to make increasingly accurate predictions on subsequent rounds.  Advice helps speed up and bias learning so that good generalization can be obtained with less data. The updates to the hypothesis use a hybrid loss that takes into account the margins of both the hypothesis and advice on the current point. Analysis of the algorithm via mistake bounds and experimental results demonstrate that advice can speed up learning"
}

@inproceedings{walker.ilp10
  ,filename = "walker.ilp10.pdf"
  ,author = "T. Walker and C. O'Reilly and G. Kunapuli and S. Natarajan and R. Maclin and D. Page and J. Shavlik"
  ,title = "Automating the ILP Setup Task: Converting User Advice about Specific Examples into General Background Knowledge"
  ,booktitle = "Proceedings of the 20th International Conference on Inductive Logic Programming"
  ,address = "Florence, Italy"
  ,year = "2010"
  ,abstract = "Inductive Logic Programming (ILP) provides an effective method of learning logical theories given a set of positive examples, a set of negative examples, a corpus of background knowledge, and specification of a search space (e.g., via mode definitions) from which to compose the theories. While specifying positive and negative examples is relatively straightforward, composing effective background knowledge and search-space definition requires detailed understanding of many aspects of the ILP process and limits the usability of ILP. We introduce two techniques to automate the use of ILP for a non-ILP expert. These techniques include automatic generation of background knowledge from user-supplied information in the form of a simple relevance language, used to describe important aspects of specific training examples, and an iterative-deepening-style search process."
}

@inproceedings{nassif.ihi10
   ,filename = "nassif.ihi10.pdf"
   ,author = "H. Nassif and D. Page and M. Ayvaci and J. Shavlik and E. Burnside"
   ,title = "Uncovering Age-Specific Invasive and DCIS Breast Cancer Rules Using Inductive Logic Programming"
   ,booktitle = "Proceedings of the 1st ACM International Health Informatics Symposium (IHI '10)"
   ,address = "Arlington, VA"
   ,year = "2010"
   ,abstract = "Breast cancer is the most common type of cancer among women. Current clinical breast cancer diagnosis involves a biopsy, which is a costly, invasive and potentially painful procedure. Some researchers proposed models, based on mammography features and personal information, that help identify pre-biopsy invasive breast carcinoma and ductal carcinoma in situ (DCIS). Recently, a differential discriminating ability between invasive and DCIS has been linked to age.  Based on this finding, we use an age-stratified mammography and biopsy relational dataset and apply Inductive Logic Programming (ILP) techniques to learn age-specific logical rules that classify invasive and DCIS occurrences. We then use statistical modeling to retrieve rules that have a significantly different performance across age-stratas. These final rules reveal a number of interesting results. Although a palpable lump is more commonly associated with younger patients, it turns out to be a better predictor of invasive cancer in older women. A recurrence has a higher probability to be invasive in older and middle-aged women. A previously unreported rule revealed by our technique is that recurrence is more likely a DCIS predictor in younger women. This younger DCIS predicting rule effectively links the current diagnostic mammogram to older studies, and provides opposite predictions across the age divide. The resulting rules are age-specific, can help patients and their physicians make more informed decisions about managing their breast health,  and constitute a personalized predictive model."
}

@inproceedings{natarajan.icmla10,
   filename = "natarajan.icmla10.pdf",
   author    = "S. Natarajan and G. Kunapuli amd K. Judah and P. Tadepalli and K. Kersting and J. Shavlik",
   title        = "Multi-Agent Inverse Reinforcement Learning",
   booktitle = "Proceedings of the International Conference on Machine Learning and Applications (ICMLA 2010)",
   address = "Washington DC, USA",
   year       = "2010",
   abstract  = "Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship learning. We introduce the problem of multiagent inverse reinforcement learning, where reward functions of multiple agents are learned by observing their uncoordinated behavior. A centralized controller then learns to coordinate their behavior by optimizing a weighted sum of reward functions of all the agents. We evaluate our approach on a traffic-routing domain, in which a controller coordinates actions of multiple traffic signals to regulate traffic density. We show that the learner is not only able to match but even significantly outperform the expert."
}
@inproceedings{soni.acmbcb10
  ,filename = "soni.acmbcb10.pdf soni.acmbcb10.pptx"
  ,author   = "A. Soni and C. Bingman and J. Shavlik"
  ,title    = "Guiding Belief Propagation using Domain Knowledge for Protein-Structure Determination"
  ,booktitle= "Proceedings of the ACM International Conference on Bioinformatics and Computational Biology (ACM-BCB)"
  ,year     = "2010"
  ,address  = "Niagara Falls, New York"
  ,abstract = "A major bottleneck in high-throughput protein crystallography is producing protein-structure models from an electron-density map.  In previous work, we developed Acmi, a probabilistic framework for sampling all-atom protein-structure models.  Acmi uses a fully connected, pairwise Markov random field to model the 3D location of each non-hydrogen atom in a protein.  Since exact inference in this model is intractable, Acmi uses loopy belief propagation (BP) to calculate marginal probability distributions.  In cases of approximation, BP's message-passing protocol becomes a crucial design decision.  Previously, Acmi took a naive, round-robin protocol to sequentially process messages.  Others have proposed informed methods for message scheduling by ranking messages based on the amount of new information they contain.  These information-theoretic measures, however, fail in the highly connected, large output space domain of protein-structure inference.  In this work, we develop a framework for using domain knowledge as a criterion for prioritizing messages in BP.  Specifically, we show that using predictions of protein-disorder regions effectively guides BP in our task.  Our results show that guiding BP using protein-disorder prediction improves the accuracy of marginal probability distributions and also produces more accurate, complete protein-structure models."
}

@inproceedings{kunapuli.ecml10
  ,filename = "kunapuli.ecml10.pdf kunapuli.ecml10.slides.pdf kunapuli.ecml10.pptx"
  ,author   = "G. Kunapuli and K. Bennett and A. Shabbeer and R. Maclin and J. Shavlik"
  ,title    = "Online Knowledge-Based Support Vector Machines"
  ,booktitle= "Proceedings of the European Conference on Machine Learning (ECML 2010)"
  ,year     = "2010"
  ,address  = "Barcelona, Spain"
  ,abstract = "Prior knowledge, in the form of simple advice rules, can greatly speed up convergence in learning algorithms. Online learning methods predict the label of the current point and then receive the correct label (and learn from that information). The goal of this work is to update the hypothesis taking into account not just the label feedback, but also the prior knowledge, in the form of soft polyhedral advice, so as to make increasingly accurate predictions on subsequent examples.  Advice helps speed up and bias learning so that generalization can be obtained with less data. Our passive-aggressive approach updates the hypothesis using a hybrid loss that takes into account the margins of both the hypothesis and the advice on the current point. Encouraging computational results and loss bounds are provided."
}

@inproceedings{natarajan.ecml10
  ,filename = "natarajan.ecml10.pdf"
  ,author = "S. Natarajan and T. Khot and D. Lowd and P. Tadepalli and K. Kersting and J. Shavlik"
  ,title = "Exploiting Causal Independence in Markov Logic Networks: Combining Undirected and Directed Models"
  ,booktitle = "Proceedings of European Conference in Machine Learning (ECML 2010)"
  ,year = "2010"  
  ,address = "Barcelona, Spain"
  ,abstract = "A new method is proposed for compiling causal independencies into Markov logic networks (MLNs).  An MLN can be viewed as compactly representing a factorization of a joint probability into the product of a set of factors guided by logical formulas. We present a notion of causal independence that enables one to further factorize the factors into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability. The causal independence lets us specify the factor in terms of weighted, directed clauses and operators, such as ''or'', ''sum'' or ''max'', on the contribution of the variables involved in the factors, hence combining both undirected and directed knowledge. Our experimental evaluations shows that making use of the finer-grain factorization provided by causal independence can improve quality of parameter learning in MLNs."
}

@inproceedings{oliphant.ilp09
  ,filename = "oliphant.ilp09.pdf"
  ,author = "L. Oliphant and E. Burnside and J. Shavlik"
  ,title = "Boosting First-Order Clauses for Large, Skewed Data Sets"
  ,booktitle = "Proceedings of the Nineteenth Conference on Inductive Logic Programming"
  ,year = "2009"
  ,address = "Leuven, Belgium"
  ,abstract = "Creating an effective ensemble of clauses for large, skewed data sets requires finding a diverse, high-scoring set of clauses and then combining them in such a way as to maximize predictive performance.  We have adapted the RankBoost algorithm in order to maximize area under the recall-precision curve, a much better metric when working with highly skewed data sets than ROC curves.  We have also explored a range of possibilities for the weak hypotheses used by our modified RankBoost algorithm beyond using individual clauses.  We provide results on four large, skewed data sets showing that our modified RankBoost algorithm outperforms the original on area under the recall-precision curves."

}

@inproceedings{natarajan.icmla09,
  filename = "natarajan.icmla09.pdf natarajan.icmla09.pptx",
  author = "S. Natarajan and P. Tadepalli and G. Kunapuli and J. Shavlik",
  title = " Learning Parameters for Relational Probabilistic Models with Noisy-Or Combining Rule ",
  booktitle = "Proceedings of the International Conference on Machine Learning and Applications",
   year = "2009",
   address = "Florida",
   abstract = " Languages that combine predicate logic with probabilities are needed to succinctly represent knowledge in many real-world domains. We consider a formalism based on universally quantified conditional influence statements that capture local interactions between object attributes. The effects of different conditional influence statements can be combined using rules such as Noisy-OR. To combine multiple instantiations of the same rule we need other combining rules at a lower level. In this paper we derive and implement algorithms based on gradient-descent and EM for learning the parameters of these multi-level combining rules. We compare our approaches to learning in Markov Logic Networks and show superior performance in multiple domains."
}

@inproceedings{torrey.ilp09
  ,filename = "torrey.ilp09.pdf torrey.ilp09.pptx"
  ,author = "L. Torrey and J. Shavlik"
  ,title = "Policy Transfer via Markov Logic Networks"
  ,booktitle = "Proceedings of the Nineteenth Conference on Inductive Logic Programming"
  ,year = "2009"
  ,address = "Leuven, Belgium"
  ,abstract = "We propose using a statistical-relational model, the Markov Logic Network, for knowledge transfer in reinforcement learning. Our goal is to extract relational knowledge from a source task and use it to speed up learning in a related target task.  We show that Markov Logic Networks are effective models for capturing both source-task Q-functions and source-task policies. We apply them via demonstration, which involves using them for decision making in an initial stage of the target task before continuing to learn. Through experiments in the RoboCup simulated-soccer domain, we show that transfer via Markov Logic Networks can significantly improve early performance in complex tasks, and that transferring policies is more effective than transferring Q-functions."
}
@inproceedings{shavlik.ijcai09
  ,filename = "shavlik.ijcai09.pdf shavlik.ijcai09.pptx"
  ,author = "J.Shavlik and S. Natarajan"
  ,title = "Speeding Up Inference in Markov Logic Networks by Preprocessing
to Reduce the Size of the Resulting Grounded Network"
  ,booktitle = "Proceedings of the Twenty-first International Joint Conference on
  Artificial Intelligence"
  ,year = "2009"
  ,address = "Pasadena, California"
  ,abstract = "Statistical-relational reasoning has received much attention
due to its ability to robustly model complex relationships. A key challenge
is tractable inference, especially in domains involving many objects,
due to the combinatorics involved. One can accelerate inference by using
approximation techniques, ``lazy'' algorithms, etc. We consider Markov Logic
Networks (MLNs), which involve counting how often logical formulae are
satisfied.
We propose a preprocessing algorithm that can substantially reduce the
effective size of MLNs
by rapidly counting how often the evidence satisfies each formula,
regardless of the truth values of the query literals.
This is a general preprocessing method that loses no information and can be
used for any MLN inference algorithm.
We evaluate our algorithm empirically in three real-world domains, greatly
reducing the work needed during subsequent inference.
Such reduction might even allow exact inference to be performed when
sampling methods would be otherwise necessary."
}   
@inproceedings{walker.ilp07
  ,filename = "walker.ilp07.pdf walker.ilp07-poster.pdf"
  ,author = "T. Walker and L. Torrey and J. Shavlik and  R. Maclin"
  ,title = "Building Relational World Models for Reinforcement Learning"
  ,booktitle = "Proceedings of the Seventeenth Conference on Inductive Logic Programming"
  ,year = "2007"
  ,address = "Corvallis, Oregon"
  ,abstract = "Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward states, utilizing inductive logic programming to learn their preimage, logical definitions of the region of state space that leads to the high-reward states via some action. These learned preimages are chained together to form an MDP that abstractly represents the domain. AMBIL estimates the reward and transition probabilities of this MDP from past experience. Since our MDPs are small, AMBIL uses value iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. AMBIL is able to employ complex background knowledge and supports relational representations. Empirical evaluation on both synthetic domains and a sub-task of the RoboCup soccer domain shows significant performance gains compared to standard Q-learning."
}

@inproceedings{oliphant.ilp07
 ,filename = "oliphant.ilp07.pdf oliphant.ilp07-presentation.ppt oliphant.ilp07-poster.ps"
 ,author = "L. Oliphant and J. Shavlik"
 ,title = "Using Bayesian Networks to Direct Stochastic Search in Inductive Logic Programming"
 ,booktitle = "Proceedings of the Seventeenth Conference on Inductive Logic Programming"
 ,year = "2007"
 ,address = "Corvallis, Oregon"
 ,abstract = "Stochastically searching the space of candidate clauses is an appealing way to scale up ILP to large datasets.  We address an approach that uses a Bayesian network model to adaptively guide search in this space.  We examine guiding search towards areas that previously performed well and towards areas that ILP has not yet thoroughly explored.  We show improvement in area under the curve for recall-precision curves using these modifications."
}
   
@inproceedings{dimaio.bibm07
 ,filename = "dimaio.bibm07.pdf"
 ,author = "F. DiMaio and A. Soni and G. Phillips and J. Shavlik"
 ,title = "Improved Methods for Template-Matching in Electron-Density Maps Using Spherical Harmonics"
 ,booktitle = "Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'07)"
 ,year = 2007
 ,address = "Fremont, CA"
 ,code		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
 ,data		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
 ,abstract = "An important problem in high-throughput protein crystallography is constructing a protein model from an electron-density map.  Previous work by some of this paper's authors describes an automated approach to this otherwise time-consuming process.   An important step in the previous method requires searching the density map for many small template protein-fragments.  This previous approach uses Fourier convolution to quickly match some of the template to the entire density map.  This paper proposes an alternate approach that makes use of the spherical-harmonic decomposition of the template and of some region in the density map.  This new framework allows us to mask specific regions of the map, enabling a first-pass filter to eliminate a majority of the density map without requiring an expensive rotational search.  We show our new template matching method improves accuracy and reduces running time, compared to our previous approach.  Protein models constructed using this matching also show significant accuracy improvement."
}
@inproceedings{goadrich.ilp07
  ,filename = "goadrich.ilp07.pdf"
  ,author = "M. Goadrich and J. Shavlik"
  ,title = "Combining Clauses with Various Precisions and Recalls to Produce
  Accurate Probabilistic Estimates"
  ,booktitle = "Proceedings of the Seventeenth Conference on Inductive Logic Programming"
  ,year = "2007"
  ,address = "Corvallis, Oregon"
  ,abstract = "Statistical Relational Learning (SRL) combines the benefits of
  probabilistic machine learning approaches with complex, structured domains
  from Inductive Logic Programming (ILP).  We propose a new SRL algorithm,
  GleanerSRL, to generate the probability that an example is positive within
  highly-skewed relational domains. In this work, we combine clauses from
  Gleaner, an ILP algorithm for learning a wide variety of first-order clauses,
  with the propositional learning technique of support vector machines to learn
  well-calibrated probabilities. We find that our results are comparable to SRL
  algorithms SAYU and SAYU-VISTA on a well-known relational testbed. "
}

@inproceedings{torrey.ilp07
  ,filename = "torrey.ilp07.pdf torrey.ilp07.ppt torrey.ilp07-poster.pdf"
  ,author = "L. Torrey and J. Shavlik and T. Walker and R. Maclin"
  ,title = "Relational Macros for Transfer in Reinforcement Learning"
  ,booktitle = "Proceedings of the Seventeenth Conference on Inductive Logic Programming"
  ,year = "2007"
  ,address = "Corvallis, Oregon"
  ,abstract = "We describe an application of inductive logic programming 
to transfer learning. Transfer learning is the use of knowledge learned 
in a source task to improve learning in a related target task. The tasks 
we work with are in reinforcement-learning domains. Our approach transfers relational macros, which are finite-state machines in which the transition conditions and the node actions are represented by first-order logical clauses. We use inductive logic programming to learn a macro that characterizes successful behavior in the source task, and then use the macro for decision-making in the early learning stages of the target task. Through experiments in the RoboCup simulated soccer domain, we show that Relational Macro Transfer via Demonstration (RMT-D) from a source task can provide a substantial head start in the target task."
}
@inproceedings{maclin.aaai07
  ,filename = "maclin.aaai07.pdf maclin.aaai07.ppt"
  ,author = "R. Maclin and E. Wild and J. Shavlik and L. Torrey and T. Walker"
  ,title = "Refining Rules Incorporated into Knowledge-Based Support Vector Learners Via Successive Linear Programming"
  ,booktitle = "Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI'07)"
  ,year = 2007
  ,address = "Vancouver, British Columbia"
  ,abstract = "Knowledge-based classification and regression methods are especially powerful forms of learning. They allow a system to take advantage of prior domain knowledge supplied either by a human user or another algorithm, combining that knowledge with data to produce accurate models. A limitation of the use of prior knowledge occurs when the provided knowledge is incorrect. Such knowledge likely still contains useful information, but knowledge-based learners might not be able to fully exploit such information. In fact, incorrect knowledge can lead to poorer models than result from knowledge-free learners. We present a support-vector method for incorporating and refining domain knowledge that not only allows the learner to make use of that knowledge, but also suggests changes to the provided knowledge. Our approach is built on the knowledge-based classification and regression methods presented by Fung, Mangasarian, & Shavlik (2002; 2003) and by Mangasarian, Shavlik, & Wild (2004). Experiments on artificial data sets with known properties, as well as on a real-world data set, demonstrate that our method learns more accurate models while also adjusting the provided rules in intuitive ways. Our new algorithm provides an appealing extension to knowledge-based, support-vector learning that is not only able to combine knowledge from rules with data, but is also able to use the data to modify and change those rules to better fit the data."
}

   
@inproceedings{dimaio.icdm06
	,filename  = "dimaio.icdm06.pdf dimaio.icdm06.ppt"
	,author    = "F. DiMaio and J. Shavlik"
	,title     = "Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition"
   	,pages     = "845-850"
	,year      = 2006
	,booktitle = "Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM'06)"
	,address   = "Hong Kong"
	,code		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
	,data		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
	,note      = "A longer version of this paper appears as UW Machine Learning Research Group Working Paper 06-1."
	,abstract  = "We describe a part-based object-recognition framework, specialized to mining complex 3D objects from detailed 3D images. Objects are modeled as a collection of parts together with a pairwise potential function. An efficient inference algorithm - based on belief propagation (BP) - finds the optimal layout of parts, given some input image. We introduce AggBP, a message aggregation scheme for BP, in which groups of messages are approximated as a single message. For objects consisting of N parts, we reduce CPU time and memory requirements from O(N^2) to O(N).  We apply AggBP on synthetic data as well as a real-world task identifying protein fragments in three-dimensional images.  These experiments show that our improvements result in minimal loss in accuracy in significantly less time."
}

@inproceedings{torrey.ecml06
	,filename  = "torrey.ecml06.pdf torrey.ecml06.ppt"
	,author    = "L. Torrey and J. Shavlik and T. Walker and  R. Maclin"
	,title     = "Skill Acquisition via Transfer Learning and Advice Taking"
	,pages     = "425-436"
	,year      = 2006
	,code = "http://www.biostat.wisc.edu/~ml-group/RoboCup/"
	,booktitle = "Proceedings of the Seventeenth European Conference on Machine Learning (ECML'06)"
	,address   = "Berlin, Germany"
	,abstract  = "We describe a reinforcement learning system that transfers skills from a previously learned source task to a related target task. The system uses inductive logic programming to analyze experience in the source task, and transfers rules for when to take actions. The target task learner accepts these rules through an advice-taking algorithm, which allows learners to benefit from outside guidance that may be imperfect.  Our system accepts a human-provided mapping, which specifies the similarities between the source and target tasks and may also include advice about the differences between them. Using three tasks in the RoboCup simulated soccer domain, we demonstrate that this system can speed up reinforcement learning substantially."
}

@inproceedings{chen.vldb06
	,filename  = "chen.vldb06.pdf"
	,author    = "B. Chen and R. Ramakrishnan and J. Shavlik and P. Tamma"
	,title     = "Bellwether Analysis: Predicting Global Aggregates from Local Regions"
	,pages     = "655-666"
	,year      = 2006
	,booktitle = "Proceedings of the Thirty-Second International Conference on Very Large Data Bases (VLDB'06)"
	,address   = "Seoul, Korea"
	,abstract  = "Massive datasets are becoming commonplace in a wide range of domains, and mining them is recognized as a challenging problem with great potential value. Motivated by this challenge, much effort has been concentrated on developing scalable versions of machine learning algorithms. An often overlooked issue is that large datasets are rarely labeled with the outputs that we wish to learn to predict, due to the human labor required. We make the key observation that analysts can often use queries to define labels for cases, which leads to the problem of learning to predict such query-produced labels. Of course, if a dataset is available in its entirety, we can simply run the query again to compute labels. The interesting scenarios are those where, after the predictive model is trained, new data is gathered at significant incremental cost and, perhaps, over time. The challenge is to accurately  predict the query-labels for the projected completion of new datasets, based only on certain cost-effective subsets, which we call bellwethers."
}

@inproceedings{maclin.aaai06
	,filename  = "maclin.aaai06.pdf maclin.aaai06.ppt"
	,author    = "R. Maclin and J. Shavlik and T. Walker and L. Torrey"
	,title     = "A Simple and Effective Method for Incorporating Advice into Kernel Methods"
	,year      = 2006
	,booktitle = "Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI'06)"
	,address   = "Boston, MA"
	,abstract  = "We propose a simple mechanism for incorporating advice (prior knowledge), in the form of simple rules, into support-vector methods for both classification and regression.  Our approach is based on introducing inequality constraints associated with datapoints that match the advice. These constrained datapoints can be standard examples in the training set, but can also be unlabeled data in a semi-supervised, advice-taking approach. Our new approach is simpler to implement and more efficiently solved than the knowledge-based support vector classification methods of Fung, Mangasarian and Shavlik (2002; 2003) and the knowledge-based support vector regression method of Mangasarian, Shavlik, and Wild (2004), while performing approximately as well as these more complex approaches. Experiments using our new approach on a synthetic task and a reinforcementlearning problem within the RoboCup soccer simulator show that our advice-taking method can significantly outperform a method without advice and perform similarly to prior advice-taking, support-vector machines."
}

@inproceedings{davis.icml06
	,filename  = "davis.icml06.pdf"
	,author    = "J. Davis and M. Goadrich"
	,title     = "The Relationship Between Precision-Recall and ROC Curves"
	,year      = 2006
	,booktitle = "Proceedings of the Twenty-Third International Conference on Machine Learning (ICML'06)"
	,address   = "Pittsburgh, PA"
	,abstract  = "Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm's performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve."
}

@inproceedings{torrey.ecml05
	,filename  = "torrey.ecml05.pdf torrey.ecml05.ppt"
	,author    = "L. Torrey and T. Walker and J. Shavlik and R. Maclin"
	,title     = "Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another"
   	,pages     = "412-424"
	,year      = 2005
	,booktitle = "Proceedings of the Sixteenth European Conference on Machine Learning (ECML'05)"
	,address   = "Porto, Portugal"
	,abstract  = "We present a method for transferring knowledge learned in one task to a related task. Our problem solvers employ reinforcement learning to acquire a model for one task.  We then transform that learned model into advice for a new task.  A human teacher provides a mapping from the old task to the new task to guide this knowledge transfer.  Advice is incorporated into our problem solver using a knowledge-based support vector regression method that we previously developed.  This advice-taking approach allows the problem solver to refine or even discard the transferred knowledge based on its subsequent experiences.  We empirically demonstrate the effectiveness of our approach with two games from the RoboCup soccer simulator: KeepAway and BreakAway.  Our results demonstrate that a problem solver learning to play BreakAway using advice extracted from KeepAway outperforms a problem solver learning without the benefit of such advice."
}

@inproceedings{torrey.kcap05
    ,filename	= "torrey.kcap05.pdf"
    ,author     = "L. Torrey and T. Walker and J. Shavlik and R. Maclin"
    ,title      = "Knowledge Transfer Via Advice Taking"
   ,pages       = "217-218"
    ,year       = "2005"
    ,booktitle  = "Proceedings of the Third International Conference on Knowledge Capture (KCAP'05)"
    ,address    = "Banff, Canada"
    ,abstract   = "We present a framework for knowledge transfer from one reinforcement learning task to a related task through advicetaking mechanisms. We discuss the importance of transfer in complex domains such as RoboCup soccer, and show how touse automatically generated advice to perform transfer."
}

@inproceedings{bravo.ilp05
   ,filename	= "bravo.ilp05.pdf"
   ,author	= "H.C. Bravo and D. Page and R. Ramakrishnan and J. Shavlik and V. Santos Costa"
   ,title	= "A Framework for Set-Oriented Computation in Inductive Logic Programming and its Application in Generalizing Inverse Entailment"
   ,booktitle	= "Proceedings of the Fifteenth International Conference on Inductive Logic Programming"
   ,pages       = "69-86"
   ,year	= "2005"
   ,address	= "Bonn, Germany"
   ,abstract	= "We propose a new approach to Inductive Logic Programming that systematically exploits caching and offers a number of advantages over current systems. It avoids redundant computation, is more amenable to the use of set-oriented generation and evaluation of hypotheses, and allows relational DBMS technology to be more easily applied to ILP systems. Further, our approach opens up new avenues such as probabilistically scoring rules during search and the generation of probabilistic rules. As a first example of the benefits of our ILP framework, we propose a scheme for defining the hypothesis search space through Inverse Entailment using multiple example seeds."
}

@inproceedings{maclin.aaai05
    ,filename = "maclin.aaai05.pdf maclin.aaai05.ppt"
    ,author = "R. Maclin and J. Shavlik and L. Torrey and T. Walker and E. Wild" 
    ,title = "Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression"
    ,booktitle = "Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI'05)"
    ,pages = "819-824"
    ,year = 2005
	,code = "http://www.biostat.wisc.edu/~ml-group/RoboCup/"
    ,address = "Pittsburgh, PA"
    ,abstract = "We present a novel formulation for providing advice to a reinforcement learner that employs support-vector regression as its function approximator.  Our new method extends a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), that accepts advice concerning a single action of a reinforcement learner.  In KBKR, users can say that in some set of states, an action's value should be greater than some linear expression of the current state.  In our new technique, which we call Preference KBKR ( Pref-KBKR ) , the user can provide advice in a more natural manner by recommending that some action is preferred over another in the specified set of states.  Specifying preferences essentially means that users are giving advice about policies rather than Q values, which is a more natural way for humans to present advice.  We present the motivation for preference advice and a proof of the correctness of our extension to KBKR.  In addition, we show empirical results that our method can make effective use of advice on a novel reinforcement-learning task, based on the RoboCup simulator, which we call Breakaway.  Our work demonstrates the significant potential of advice-giving techniques for addressing complex reinforcement learning problems, while further demonstrating the use of support-vector regression for reinforcement learning."
}

@inproceedings{davis.ijcai05
   ,filename	= "davis.ijcai05.pdf"
   ,author	= "J. Davis and E. Burnside and I. Dutra and D. Page and R. Ramakrishnan and V. Santos Costa and J. Shavlik"
   ,title	= "View Learning for Statistical Relational Learning: With an Application to Mammography"
   ,booktitle	= "Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence"
   ,pages       = "677-683"
   ,year	= "2005"
   ,address	= "Edinburgh, Scotland"
   ,abstract	= "Statistical relational learning (SRL) constructs probabilistic models from relational databases. A key capability of SRL is the learning of arcs (in the Bayes net sense) connecting entries in different rows of a relational table, or in different tables.  Nevertheless, SRL approaches currently are constrained to use the existing database schema. For many database applications, users find it profitable to define alternative ''views'' of the database, in effect defining new fields or tables. Such new fields or tables can also be highly useful in learning. We provide SRL with the capability of learning new views."
}

@inproceedings{dimaio-nips05
   ,filename	= "dimaio.nips05.pdf dimaio.nips05.doc dimaio.nips05-poster.pdf"
   ,author	= "F. DiMaio and J. Shavlik and G. Phillips"
   ,title	= "Pictorial Structures for Molecular Modeling: Interpreting Density Maps"
   ,booktitle	= "Advances in Neural Information Processing Systems (NIPS) 17"
   ,year	= "2005"
   ,pages       = "369-376"
   ,address	= "Vancouver, Canada"
   ,abstract	= "X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence.  Matching this pictorial structure into the density map is a way of automating density-map interpretation.  Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation.  DEFT is tested on a set of density maps ranging from 2 to 4 resolution, producing root-mean-squared errors ranging from 1.38 to 1.84 "
}

@inproceedings{goadrich-ilp04
   ,filename	= "goadrich.ilp04.pdf goadrich.ilp04.ppt"
   ,author	= "M. Goadrich and L. Oliphant and J. Shavlik"
   ,title	= "Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction"
   ,booktitle	= "Proceedings of the Fourteenth International Conference on Inductive Logic Programming"
   ,year	= "2004"
   ,pages       = "98-115"
   ,address	= "Porto, Portugal"
   ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/IE-protein-location/"
   ,abstract	= "Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as biomedical journals, and putting those facts in an organized system. In particular, we have focused on learning to recognize instances of the protein-localization relationship in Medline abstracts. We view the problem as a machine-learning task: given positive and negative extractions from a training corpus of abstracts, learn a logical theory that performs well on a held-aside testing set. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. We propose Gleaner, a randomized search method which collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an ''at least N of these M clauses'' thresholding method to combine the selected clauses. We compare Gleaner to ensembles of standard Aleph theories and find that Gleaner produces comparable testset results in a fraction of the training time needed for ensembles."
}

@inproceedings{dimaio-ilp04
   ,filename	= "dimaio.ilp04.pdf dimaio.ilp04.ppt dimaio.ilp04.doc"
   ,author	= "F. DiMaio and J. Shavlik"
   ,title	= "Learning an Approximation to Inductive Logic Programming Clause Evaluation"
   ,booktitle	= "Proceedings of the Fourteenth International Conference on Inductive Logic Programming"
   ,year	= "2004"
   ,pages       = "80-97"
   ,address	= "Porto, Portugal"
   ,abstract	= "One challenge faced by many Inductive Logic Programming (ILP) systems is poor scalability to problems with large search spaces and many examples. Randomized search methods such as stochastic clause selection (SCS) and rapid random restarts (RRR) have proven somewhat successful at addressing this weakness. However, on datasets where hypothesis evaluation is computationally expensive, even these algorithms may take unreasonably long to discover a good solution.  We attempt to improve the performance of these algorithms on datasets by learning an approximation to ILP hypothesis evaluation.   We generate a small set of hypotheses, uniformly sampled from the space of candidate hypotheses, and evaluate this set on actual data.  These hypotheses and their corresponding evaluation scores serve as training data for learning an approximate hypothesis evaluator.  We outline three techniques that make use of the trained evaluation-function approximator in order to reduce the computation required during an ILP hypothesis search.  We test our approximate clause evaluation algorithm using the popular ILP system Aleph. Empirical results are provided on several benchmark datasets.  We show that the clause evaluation function can be accurately approximated."
}

@inproceedings{shavlik.kdd04
   ,filename	= "shavlik.kdd04.pdf shavlik.kdd04.doc shavlik.kdd04.ppt"
   ,author	= "J. Shavlik and M. Shavlik"
   ,title	= "Selection, Combination, and Evaluation of Effective Software Sensors for Detecting Abnormal Computer Usage"
   ,booktitle	= "Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining"
   ,year	= "2004"
   ,pages       = "276-285"
   ,address	= "Seattle, WA"
   ,abstract	= "We present and empirically analyze a machine-learning approach for detecting intrusions on individual computers. Our Winnow-based algorithm continually monitors user and system behavior, recording such properties as the number of bytes transferred over the last 10 seconds, the programs that currently are running, and the load on the CPU. In all, hundreds of measurements are made and analyzed each second. Using this data, our algorithm creates a model that represents each particular computer's range of normal behavior. Parameters that determine when an alarm should be raised, due to abnormal activity, are set on a per-computer basis, based on an analysis of training data. A major issue in intrusion-detection systems is the need for very low false-alarm rates. Our empirical results suggest that it is possible to obtain high intrusion-detection rates (95%) and low false-alarm rates (less than one per day per computer), without ''stealing'' too many CPU cycles (less than 1%). We also report which system measurements are the most valuable in terms of detecting intrusions. A surprisingly large number of different measurements prove significantly useful."
}

@inproceedings{molla.csb04
   ,filename	= "molla.csb04.pdf molla.csb04.doc molla.csb04.ppt"
   ,author	= "M. Molla and J. Shavlik and T. Albert and T. Richmond and S. Smith"
   ,title	= "A Self-Tuning Method for One-Chip SNP Identification"
   ,booktitle	= "Proceedings of the IEEE Conference on Computational Systems Bioinformatics"
   ,year	= "2004"
   ,pages       = "69-79"
   ,address	= "Stanford, CA"
   ,abstract	= "Current methods for interpreting oligonucleotide-based SNP-detection microarrays, SNP chips, are based on statistics and require extensive parameter tuning as well as extremely high-resolution images of the chip being processed.  We present a method, based on a simple data-classification technique called nearest-neighbors that, on haploid organisms, produces results comparable to the published results of the leading statistical methods and requires very little in the way of parameter tuning.  Furthermore, it can interpret SNP chips using lower-resolution scanners of the type more typically used in current microarray experiments. Along with our algorithm, we present the results of a SNP-detection experiment where, when independently applying this algorithm to six identical SARS SNP chips, we correctly identify all 24 SNPs in a particular strain of the SARS virus, with between 6 and 13 false positives across the six experiments."
}

@inproceedings{baiao.ilp03
   ,filename	= "baiao.ilp03.pdf baiao.ilp03.ps baiao.ilp03.doc"
   ,author	= "F. Baiao and M. Mattoso and J. Shavlik and G. Zaverucha"
   ,title	= "Applying Theory Revision to the Design of Distributed Databases"
   ,booktitle	= "Proceedings of the Thirteenth International Conference on Inductive Logic Programming"
   ,pages       = "57-74"
   ,year	= "2003"
   ,address	= "Szeged, Hungary"
   ,abstract	= "This work presents the application of theory revision to the design of distributed databases to automatically revise a heuristic-based algorithm (called analysis algorithm) through the use of the FORTE system. The analysis algorithm decides the fragmentation technique to be used in each class of the database and its Prolog implementation is provided as the initial domain theory.  Fragmentation schemas with previously known performance, obtained from experimental results on top of an object database benchmark, are provided as the set of examples. We show the effectiveness of our approach in finding better fragmentation schemas with improved performance."
}

@inproceedings{fung.colt03
   ,filename	= "fung.colt03.pdf fung.colt03.ps"
   ,author	= "G. Fung and O. Mangasarian and J. Shavlik"
   ,title	= "Knowledge-Based Nonlinear Kernel Classifiers"
   ,booktitle   = "16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines, Proceedings"
   ,year	= "2003"
   ,address	= "Washington, DC"
   ,abstract	= "Prior knowledge in the form of multiple polyhedral sets, each belonging to one of two categories, is introduced into a reformulation of a nonlinear kernel support vector machine (SVM) classifier. The resulting formulation leads to a linear program that can be solved efficiently.  This extends, in a rather unobvious fashion, previous work that incorporated similar prior knowledge into a linear SVM classifier. Numerical tests on standard-type test problems, such as exclusive-or prior knowledge sets and a checkerboard with 16 points and prior knowledge instead of the usual 1000 points, show the effectiveness of the proposed approach in generating sharp nonlinear classifiers based mostly or totally on prior knowledge."
}

@inproceedings{dutra.euro03
   ,filename    = "dutra.euro03.pdf dutra.euro03.ps"
   ,author	= "I. Dutra and D. Page and V. Santos Costa and J. Shavlik and M. Waddell"
   ,title	= "Toward Automatic Management of Embarrassingly Parallel Applications"
   ,booktitle   = "Proceedings of International Conference on Parallel and Distributed Computing (Euro-Par)"
   ,year	= "2003"
   ,address     = "Klagenfurt, Austria"
   ,abstract    = "Large-scale applications that require executing very large
numbers of tasks are only feasible through parallelism.  In this work we
present a system that automatically handles large numbers of experiments
and data in the context of machine learning.  Our system controls all experiments,
including re-submission of failed jobs and relies on available resource managers
to spawn jobs through pools of machines.  Our results show that we can manage a 
very large number of experiments, using a reasonable amount of idle CPU cycles,
with very little user intervention."
}

@inproceedings{fung.nips02
   ,filename    = "fung.nips02.pdf fung.nips02.ps"
   ,author	= "G. Fung and O. Mangasarian and J. Shavlik"
   ,title	= "Knowledge-Based Support Vector Machine Classifiers"
   ,booktitle   = "Proceedings of Sixteenth Conference on Neural Information Processing Systems (NIPS)"
   ,year	= "2002"
   ,address     = "Vancouver, Canada"
   ,abstract    = "Prior knowledge in the form of multiple polyhedral sets, each
  belonging to one of two categories, is introduced into a reformulation of a linear
  support vector machine classifier.  The resulting formulation leads to a linear program
  that can be solved efficiently.  Real world examples, from DNA sequencing and breast
  cancer prognosis, demonstrate the effectiveness of the proposed method.  Numerical 
  results show improvement in the test set accuracy after the incorporation of prior 
  knowledge into ordinary, data-based linear support vector machine classifiers.  One
  experiment also shows that a linear classifier, based solely on prior knowledge, far
  outperforms the direct application of prior knowledge rules to classify data."
}

@inproceedings{dutra.ilp02
   ,filename    = "dutra.ilp02.pdf dutra.ilp02.ps"
   ,author	= "I. Dutra and D. Page and V. Santos Costa and J. Shavlik"
   ,title	= "An Empirical Evaluation of Bagging in Inductive Logic Programming"
   ,booktitle   = "Proceedings of the Twelfth International Conference on Inductive Logic Programming"
   ,year	= "2002"
   ,pages 	= "48--65"
   ,address     = "Sydney, Australia"
   ,abstract    = "Ensembles have proven useful for a variety of applications, with a
  variety of machine learning approaches.  While Quinlan has applied
  boosting to FOIL, the widely-used approach of bagging has never been
  employed in ILP.  Bagging has the advantage over boosting that the
  different members of the ensemble can be learned and used in
  parallel.  This advantage is especially important for ILP where
  run-times often are high.  We evaluate bagging on three different
  application domains using the complete-search ILP system, Aleph. We
  contrast bagging with an approach where we take advantage of the
  non-determinism in ILP search, by simply allowing Aleph to run multiple
  times, each time choosing ''seed'' examples at random."
}

@inproceedings{shavlik.raid01
   ,filename    = "shavlik.raid01.pdf"
   ,author	= "J. Shavlik and M. Shavlik and M. Fahland"
   ,title	= "Evaluating Software Sensors for Actively Profiling Windows 2000
                   Computer Users"
   ,booktitle   = "Presented at the Fourth International Symposium on Recent Advances
                   in Intrusion Detection"
   ,year	= "2001"
   ,address     = "Davis, CA"
   ,abstract    = "We report on a new, on-going intrusion-detection project
   that empirically investigates the usefulness of ''stealing'' a small
   amount of CPU cycles (1%), main memory (16MB), and disk memory (100 MB)
   in order to continually gather and analyze dozens of fine-grained system
   measurements, such as network traffic, identity of the current programs
   executing, and the user's typing speed.  The underlying scientific
   hypothesis is that a properly chosen set of measurements can provide a
   ''fingerprint'' that is unique to each user.  Hence, such measurements
   could serve to help distinguish appropriate use of a given computer from
   misuse, especially by insiders. "
}

@inproceedings{eliassi-rad.icml01
   ,filename    = "eliassi-rad.icml01.pdf eliassi-rad.icml01.ps"
   ,author	= "T. Eliassi-Rad and J. Shavlik"
   ,title	= "A Theory-Refinement Approach to Information Extraction"
   ,booktitle   = "Proceedings of the Eighteenth International Conference on 
                   Machine Learning"
   ,year	= "2001"
   ,address     = "Williamstown, MA"
   ,abstract    = "We investigate applying theory refinement to the task of extracting
   information from text.  In theory refinement, partial domain knowledge
   (which may be incorrect) is given to a supervised learner.  The provided
   knowledge guides the learner in its task, but the learner can refine or
   even discard this knowledge during training.  Our supervised learner 
   is a knowledge-based neural network that initially contains compiled 
   prior knowledge about a particular information extraction (IE) task.  
   The prior knowledge needs to specify the extraction slots for the specific 
   IE task.  Our approach uses generate-and-test to address the IE task.  
   In the generation step, we produce candidate extractions by intelligently 
   searching the space of possible extractions.  In the test step, we 
   use the trained network to judge each candidate and output those that exceed 
   a system-selected threshold.  Experiments on the CMU seminar-announcements
   and the Yeast subcellular-localization domains demonstrate our approach's value."
}

@inproceedings{craven.ismb2000
 ,filename     = "craven-ismb00.pdf craven-ismb00.ps"
 ,author       = "M. Craven and D. Page and J. Shavlik and J. Bockhorst and J. Glasner"
 ,title        = "Probabilistic Learning Approach to Whole-Genome Operon Prediction"
 ,booktitle    = "Proceedings of the Eighth International Conference on Intelligent 
 Systems for Molecular Biology"
 ,year         = 2000 
 ,address      = "La Jolla, CA"
 ,abstract     = "We present a computational approach to predicting operons 
 in the genomes of prokaryotic organisms.  Our approach uses machine learning
 methods to induce predictive models for this task from a rich variety of 
 data types including sequence data, gene expression data, and functional 
 annotations associated with genes.  We use multiple learned models that 
 individually predict promoters, terminators and operons themselves.  A key 
 part of our approach is a dynamic programming method that uses our predictions
 to map every known and putative gene in a given genome into its most
 probable operon.  We evaluate our approach using data from the E. coli K-12 
 genome."
}

@inproceedings{craven.icml2000
 ,filename     = "craven-icml00.pdf craven-icml00.ps"
 ,author       = "M. Craven and D. Page and J. Shavlik and J. Bockhorst and J. Glasner"
 ,title        = "Using Multiple Levels of Learning and Diverse Evidence Sources
                 to Uncover Coordinately Controlled Genes"
 ,booktitle    = "Proceedings of the Seventeenth International Conference on Machine 
		 Learning"
 ,year         = 2000 
 ,address      = "Palo Alto, CA"
 ,abstract     = "Now that the complete genomes of numerous organisms have been
 determined, a key problem in computational molecular biology is uncovering the
 relationships that exist among the genes in each organism and the regulatory 
 mechanisms that control their operation.  We are developing computational methods 
 for discovering such regulatory mechanisms and relationships.  Toward this end, 
 we have developed a machine learning approach to identifying sets of genes that 
 are coordinately controlled in the E. coli genome.  A number of factors make this
 an interesting application for machine learning: (i) there is a rich variety of 
 data types that provide useful evidence for this task, (ii) the overall problem 
 of uncovering regulatory mechanisms can be decomposed in multiple machine learning
 subtasks operating at different levels of detail, (iii) there are not any known 
 negative training examples, and (iv) some of the features are misleading in their 
 predictiveness."
}

@inproceedings{goecks.iui2000
 ,filename     = "goecks-iui2000.doc goecks-iui2000.html"
 ,author       = "J. Goecks and J. Shavlik"
 ,title        = "Learning Users' Interests by Unobtrusively Observing 
		  Their Normal Behavior"
 ,booktitle    = "Proceedings of the 2000 International Conference on 
		  Intelligent User Interfaces"
 ,year         = 2000
 ,address      = "New Orleans, LA"
 ,pages	       = "129-132"
 ,abstract     = "For intelligent interfaces attempting to learn a user's interests, 
 the cost of obtaining labeled training instances is prohibitive because the user 
 must directly label each training instance, and few users are willing to do so. 
 We present an approach that circumvents the need for human-labeled pages. Instead, 
 we learn ``surrogate'' tasks where the desired output is easily measured, such as the
 number of hyperlinks clicked on a page or the amount of scrolling performed. Our 
 assumption is that these outputs will highly correlate with the user's interests. 
 In other words, by unobtrusively ``observing'' the user's behavior we are able to 
 learn functions of value. For example, an intelligent browser could silently observe 
 the user's browsing behavior during the day, then use these training examples to 
 learn such functions and gather, during the middle of the night, pages that are 
 likely to be of interest to the user. Previous work has focused on learning a user 
 profile by passively observing the hyperlinks clicked on and those passed over. We 
 extend this approach by measuring user mouse and scrolling activity in addition to 
 user browsing activity. We present empirical results that demonstrate our agent can
 accurately predict some easily measured aspects of one's use of his or her browser."
}
 
@inproceedings{shavlik.iui99
  ,filename     = "shavlik.iui99.pdf shavlik.iui99.ps"
  ,author       = "J. Shavlik and S. Calcari and T. Eliassi-Rad and J. Solock"
  ,title        = "An Instructable, Adaptive Interface for Discovering and
                   Monitoring Information on the World-Wide Web"
  ,booktitle    = "Proceedings of the 1999 International Conference on Intelligent
		   User Interfaces"
  ,year         = 1999
  ,address      = "Redondo Beach, CA"
  ,pages 	= "157 - 160"
  ,abstract     =
"We are creating a customizable, intelligent interface to the World-Wide Web
that assists a user in locating specific, current, and relevant information.
The Wisconsin Adaptive Web Assistant (WAWA) is capable of accepting instructions
regarding what type of information the users are seeking and how to go about 
looking for it.  WAWA compiles these instructions into neural networks, which
means that the system's behavior can be modified via training examples.  Users
can create these training examples by rating pages retrieved by WAWA, but more
importantly the system uses techniques from reinforcement learning to internally
create its own examples (users can also later provide additional instructions).
WAWA uses these neural networks to guide its autonomous navigation of the Web,
thereby producing an interface to the Web that users periodically instruct and 
which in the background searches the Web for relevant information, including 
periodically revisiting pages that change regularly."
}

@inproceedings{allex.ismb97
  ,filename     = "allex.ismb97.pdf allex.ismb97.ps"
  ,author       = "C. Allex and S. Baldwin and J. Shavlik and F. Blattner"
  ,title        = "Increasing Consensus Accuracy in DNA Fragment Assemblies
by Incorporating Fluorescent Trace Representations"
  ,booktitle    = "Proceedings, Fifth International Conference on Intelligent
Systems for Molecular Biology"
  ,year         = 1997
  ,pages        = ""
  ,publisher    = "AAAI Press"
  ,pub_addr     = "Menlo Park, CA"
  ,address      = "Halkidiki, Greece"
  ,abstract     =
"We present a new method for determining the consensus sequence in DNA fragment
assemblies. The new method, Trace-Evidence, directly incorporates aligned ABI
trace information into consensus calculations via our previously described
representation, Trace-Data Classifications. The new method extracts and sums
evidence indicated by the representation to determine consensus calls. Using
the Trace-Evidence method results in automatically produced consensus sequences
that are more accurate and less ambiguous than those produced with standard
majority-voting methods. Additionally, these improvements are achieved with
less coverage than required by the standard methods  using Trace-Evidence and
a coverage of only three, error rates are as low as those with a coverage of
over ten sequences."
}


@inproceedings{allex.ismb96 
  ,filename     = "allex.ismb96.pdf allex.ismb96.ps"
  ,author       = "C. Allex and S. Baldwin and J. Shavlik and F. Blattner"
  ,title        = "Improving the Quality of Automatic DNA Sequence Assembly using Fluorescent Trace-Data Classifications"
  ,booktitle    = "Proceedings, Fourth International Conference on Intelligent Systems for Molecular Biology"
  ,year         = 1996
  ,pages        = "3-14"
  ,publisher    = "AAAI Press"
  ,pub_addr     = "Menlo Park, CA"
  ,address      = "St. Louis, MO"
  ,abstract     =
"Virtually all large-scale sequencing projects use automatic sequence-assembly programs to aid in the determination of DNA sequences. The computer-generated assemblies require substantial hand-editing to transform them into submissions for GenBank. As the size of sequencing projects increases, it becomes essential to improve the quality of the automated assemblies so that this time-consuming hand-editing may be reduced. Current ABI sequencing technology uses base calls made from fluorescently-labeled DNA fragments run on gels. We present a new representation for the fluorescent trace data associated with individual base calls. This representation can be used before, during, and after fragment assembly to improve the quality of assemblies. We demonstrate one such use -- end-trimming of sub-optimal data -- that results in a significant improvement in the quality of subsequent assemblies."
}

@inproceedings{cherkauer.kdd96
  ,filename	= "cherkauer.kdd96.pdf cherkauer.kdd96.ps"
  ,author       = "K. Cherkauer and J. Shavlik"
  ,title        = "Growing Simpler Decision Trees to Facilitate Knowledge
                  Discovery"
  ,booktitle    = "Proceedings, Second International Conference on Knowledge
                   Discovery and Data Mining"
  ,year         = 1996
  ,pages	= "315-318"
  ,publisher    = "AAAI Press"
  ,pub_addr     = "Menlo Park, CA"
  ,address      = "Portland, OR"
  ,abstract	= 
   "When using machine learning techniques for knowledge discovery, output
    that is comprehensible to a human is as important as predictive accuracy.
    We introduce a new algorithm, SET-Gen, that improves the comprehensibility
    of decision trees grown by standard C4.5 without reducing accuracy. It does
    this by using genetic search to select the set of input features C4.5 is
    allowed to use to build its tree. We test SET-Gen on a wide variety of
    real-world datasets and show that SET-Gen trees are significantly smaller
    and reference significantly fewer features than trees grown by C4.5 without
    using SET-Gen. Statistical significance tests show that the accuracies of
    SET-Gen's trees are either not distinguishable from or are more accurate
    than those of the original C4.5 trees on all ten datasets tested."
}

@inproceedings{opitz.nips96
   ,author      = "D. Opitz and J. Shavlik"
   ,title       = "Generating Accurate and Diverse Members of a Neural-Network Ensemble"
   ,booktitle	= "Advances in Neural Information Processing Systems"
   ,volume	= 8
   ,year        = 1996
   ,publisher	= "MIT Press"
   ,address     = "Denver, CO"
   ,pages 	= "535-543"
   ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/promoters-large/"
   ,filename    = "opitz.nips96.pdf opitz.nips96.ps"
   ,abstract = "Neural-network ensembles have been shown to be very accurate classification
   techniques.  Previous work has shown that an effective ensemble should 
   consist of networks that are not only highly correct, but ones that make 
   their errors on different parts of the input space as well.  Most existing 
   techniques, however, only indirectly address the problem of creating such 
   a set of networks.  In this paper we present a technique called ADDEMUP 
   that uses genetic algorithms to directly search for an accurate and diverse 
   set of trained networks.  ADDEMUP works by first creating an initial 
   population, then uses genetic operators to continually create new networks, 
   keeping the set of networks that are as accurate as possible while 
   disagreeing with each other as much as possible.  Experiments on three DNA 
   problems show that ADDEMUP is able to generate a set of trained networks 
   that is more accurate than several existing approaches.  Experiments also 
   show that ADDEMUP is able to effectively incorporate prior knowledge, if 
   available, to improve the quality of its ensemble."
}

   Note: see also opitz.mlrgwp95.ps


@inproceedings{cherkauer.nips96
   ,author      = "K. Cherkauer and J. Shavlik"
   ,title       = "Rapid Quality Estimation of Neural Network Input
                   Representations"
   ,booktitle	= "Advances in Neural Information Processing Systems"
   ,volume	= 8
   ,year        = 1996
   ,pages	= "45-51"
   ,publisher	= "MIT Press"
   ,address     = "Denver, CO"
   ,filename    = "cherkauer.nips96.pdf cherkauer.nips96.ps"
   ,abstract = "The choice of an input representation for a neural network can have a
  profound impact on its accuracy in classifying novel instances. However,
  neural networks are typically computationally expensive to train, making it
  difficult to test large numbers of alternative representations. This paper
  introduces fast quality measures for neural network representations, allowing
  one to quickly and accurately estimate which of a collection of possible
  representations for a problem is the best. We show that our measures for
  ranking representations are more accurate than a previously published
  measure, based on experiments with three difficult, real-world pattern
  recognition problems."
}

  Note: see also cherkauer.ijcai-wkshp95.ps


@inproceedings{craven.nips96
   ,author	= "M. Craven and J. Shavlik"
   ,title	= "Extracting Tree-Structured Representations of Trained Networks"
   ,booktitle	= "Advances in Neural Information Processing Systems"
   ,volume	= 8
   ,year        = 1996
   ,pages	= "24-30"
   ,address     = "Denver, CO"
   ,publisher	= "MIT Press"
   ,filename    = "craven.nips96.pdf craven.nips96.ps"
   ,code = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/trepan"
   ,abstract = "A significant limitation of neural networks is that the representations
  they learn are usually incomprehensible to humans.  We present a novel
  algorithm, TREPAN, for extracting comprehensible, symbolic representations
  from trained neural networks.  Our algorithm uses queries to induce a
  decision tree that approximates the concept represented by a given network.
  Our experiments demonstrate that TREPAN is able to produce decision
  trees that maintain a high level of fidelity to their respective networks,
  while being comprehensible and accurate.  Unlike previous work in this
  area, our algorithm is both general in its applicability and scales well
  to large networks and problems with high-dimensional input spaces."
}


@inproceedings{jackson.nips96
   ,author	= "J. Jackson and M. Craven"
   ,title	= "Learning Sparse Perceptrons"
   ,booktitle	= "Advances in Neural Information Processing Systems"
   ,volume	= 8
   ,year        = 1996
   ,pages 	= "654-662"
   ,address     = "Denver, CO"
   ,publisher	= "MIT Press"
   ,filename    = "jackson.nips96.pdf jackson.nips96.ps"
   ,abstract = "We introduce a new algorithm designed to learn sparse perceptrons
  over input representations which include high-order features.
  Our algorithm, which is based on a hypothesis-boosting method,
  is able to PAC-learn a relatively natural class of target concepts.
  Moreover, the algorithm appears to work well in practice: on a set of
  three problem domains, the algorithm produces classifiers that utilize
  small numbers of features yet exhibit good generalization performance.
  Perhaps most importantly, our algorithm generates concept descriptions
  that are easy for humans to understand."
}


@inproceedings{shavlik.icnn96
   ,author	= "J. Shavlik"
   ,title	= "An Overview of Research at Wisconsin on Knowledge-Based Neural Networks"
   ,booktitle	= "Proceedings of the International Conference on Neural Networks"
   ,year        = 1996
   ,address     = "Washington, DC"
   ,filename    = "shavlik.icnn96.pdf shavlik.icnn96.ps"
   ,pages       = "65-69"
   ,abstract = "Recent research at the University of Wisconsin on knowledge-based neural
networks is surveyed.  This work has focused on (a) using symbolically
represented background knowledge to improve neural-network learning and
(b) extracting comprehensible symbolic representations from trained networks.
Important open issues are discussed."
}


@inproceedings{maclin.ijcai95
   ,author      = "R. Maclin and J. Shavlik"
   ,title       = "Combining the Predictions of Multiple Classifiers:
		   Using Competitive Learning to Initialize Neural Networks"
   ,booktitle	= "Proceedings of the Fourteenth International Joint
		   Conference on Artificial Intelligence"
   ,year        = 1995
   ,address     = "Montreal, Canada"
   ,filename    = "maclin.ijcai95.pdf maclin.ijcai95.ps"
   ,pages       = "524-530"
   ,abstract = "The primary goal of inductive learning is to generalize well - that is,
   induce a function that accurately produces the correct output for future
   inputs.  Hansen and Salamon showed that, under certain assumptions,
   combining the predictions of several separately trained neural networks
   will improve generalization.  One of their key assumptions is that the
   individual networks should be independent in the errors they produce.  In
   the standard way of performing backpropagation this assumption may be
   violated, because the standard procedure is to initialize network weights
   in the region of weight space near the origin.  This means that
   backpropagation's gradient-descent search may only reach a small subset
   of the possible local minima.  In this paper we present an approach to
   initializing neural networks that uses competitive learning to
   intelligently create networks that are originally located far from the
   origin of weight space, thereby potentially increasing the set of
   reachable local minima.  We report experiments on two real-world
   datasets where combinations of networks initialized with our method
   generalize better than combinations of networks initialized the
   traditional way."

}
   

@inproceedings{craven.ismb95
   ,author	= "M. Craven and R. Mural and L. Hauser and E. Uberbacher"
   ,title	= "Predicting Protein Folding Classes without Overly Relying
		   on Homology"
   ,booktitle	= "Proceedings of the Third International Conference on
		   Intelligent Systems for Molecular Biology"
   ,year	= 1995
   ,address	= "Cambridge, England"
   ,publisher	= "AAAI Press"
   ,pages	= "98-106"
   ,filename	= "craven.ismb95.pdf craven.ismb95.ps"
   
}


@inproceedings{opitz.mlc94
   ,author	= "D. Opitz and J. Shavlik"
   ,title	= "Using Genetic Search to Refine Knowledge-Based
		   Neural Networks"
   ,booktitle	= "Machine Learning: Proceedings of the Eleventh International
		   Conference"
   ,year	= 1994
   ,address	= "New Brunswick, NJ"
   ,publisher	= "Morgan Kaufmann"
   ,pages	= "208-216"
   ,filename	= "opitz.mlc94.pdf opitz.mlc94.ps"
   ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/"
   ,abstract = "An ideal inductive-learning algorithm should exploit all available
   resources, such as computing power and domain-specific knowledge,
   to improve its ability to generalize.  Connectionist theory-refinement
   systems have proven to be effective at utilizing domain-specific
   knowledge; however, most are unable to exploit available computing
   power.  This weakness occurs because they lack the ability to refine
   the topology of the networks they produce, thereby limiting generalization,
   especially when given impoverished domain theories.  We present
   the REGENT algorithm, which uses genetic algorithms to broaden the type
   of networks seen during its search.  It does this by using (a) the
   domain theory to help create an initial population and (b) crossover
   and mutation operators specifically designed for knowledge-based
   networks.  Experiments on three real-world domains indicate that
   our new algorithm is able to significantly increase generalization
   compared to a standard connectionist theory-refinement system,
   as well as our previous algorithm for growing knowledge-based networks."
}
   

@inproceedings{craven.mlc94
   ,author	= "M. Craven and J. Shavlik"
   ,title	= "Using Sampling and Queries to Extract Rules from
		   Trained Neural Networks"
   ,booktitle	= "Proceedings of the Eleventh International Conference on
		   Machine Learning"
   ,year	= 1994
   ,pages	= "37-45"
   ,address	= "New Brunswick, NJ"
   ,filename	= "craven.mlc94.pdf craven.mlc94.ps"
   ,abstract = "Concepts learned by neural networks are difficult to understand because
   they are represented using large assemblages of real-valued parameters.
   One approach to understanding trained neural networks is to extract
   symbolic rules that describe their classification behavior.  There are
   several existing rule-extraction approaches that operate by searching
   for such rules.  We present a novel method that casts rule extraction
   not as a search problem, but instead as a learning problem.
   In addition to learning from training examples, our method exploits
   the property that networks can be efficiently queried.  We describe
   algorithms for extracting both conjunctive and M-of-N rules, and
   present experiments that show that our method is more efficient than
   conventional search-based approaches."
}
   

@inproceedings{maclin.aaai94
   ,author	= "R. Maclin and J. Shavlik"
   ,title	= "Incorporating Advice into Agents that Learn from
		   Reinforcements"
   ,booktitle	= "Proceedings of the Twelfth National Conference on
		   Artificial Intelligence"
   ,year	= 1994
   ,address	= "Seattle, WA"
   ,pages	= "694-699"
   ,filename	= "maclin.aaai94.pdf maclin.aaai94.ps"
   ,abstract = "Learning from reinforcements is a promising approach for creating intelligent
   agents.  However, reinforcement learning usually requires a large number of
   training episodes.  We present an approach that addresses this shortcoming by
   allowing a connectionist Q-learner to accept advice given, at any time and in
   a natural manner, by an external observer.  In our approach, the advice-giver
   watches the learner and occasionally makes suggestions, expressed as
   instructions in a simple programming language.  Based on techniques from
   knowledge-based neural networks, these programs are inserted directly into
   the agent's utility function.  Subsequent reinforcement learning further
   integrates and refines the advice.  We present empirical evidence that shows
   our approach leads to statistically-significant gains in expected reward.
   Importantly, the advice improves the expected reward regardless of the stage
   of training at which it is given."
}
   

@inproceedings{opitz.ijcai93
   ,author	= "D. Opitz and J. Shavlik"
   ,title	= "Heuristically Expanding Knowledge-Based Neural Networks"
   ,booktitle	= "Proceedings of the Thirteenth International Joint 
                   Conference on Artificial Intelligence"
   ,year	= 1993
   ,month	= "September"
   ,address	= "Chambery, France"
   ,pages	= "1360-1365"
   ,filename	= "opitz.ijcai93.pdf opitz.ijcai93.ps"
   ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/"
   ,abstract = "Knowledge-based neural networks are networks whose topology is determined
  by mapping the dependencies of a domain-specific rulebase into a neural
  network.  However, existing network training methods lack the ability to
  add new rules to the (reformulated) rulebases.  Thus, on domain theories
  that are lacking rules, generalization is poor, and training can
  corrupt the original rules, even those that were initially correct.
  We present TOPGEN, an extension to the KBANN algorithm, that
  heuristically searches for possible expansions of a knowledge-based
  neural network, guided by the domain theory, the network, and the training
  data.  It does this by dynamically adding hidden nodes to the neural
  representation of the domain theory, in a manner analogous to adding rules
  and conjuncts to the symbolic rulebase.  Experiments indicate that our
  method is able to heuristically find effective places to add nodes to
  the knowledge-base network and verify that new nodes must be added
  in an intelligent manner."
}
  

@inproceedings{craven.ijcai93
   ,author	= "M. Craven and J. Shavlik"
   ,title	= "Learning to Represent Codons: A Challenge Problem for
		   Constructive Induction"
   ,booktitle	= "Proceedings of the Thirteenth International Joint Conference
		   on Artificial Intelligence"
   ,year		= 1993
   ,month	= "September"
   ,pages	= "1319-1324"
   ,address	= "Chambery, France"
   ,filename	= "craven.ijcai93.pdf craven.ijcai93.ps"
   ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/coding/"
   ,abstract = "The ability of an inductive learning system to find a good solution to
  a given problem is dependent upon the representation used for the
  features of the problem.  Systems that perform constructive induction
  are able to change their representation by constructing new features.
  We describe an important, real-world problem - finding genes in DNA -
  that we believe offers an interesting challenge to constructive-induction
  researchers.  We report experiments that demonstrate that: (1) two
  different input representations for this task result in significantly
  different generalization performance for both neural networks and
  decision trees; and (2) both neural and symbolic methods for constructive
  induction fail to bridge the gap between these two representations.
  We believe that this real-world domain provides an interesting
  challenge problem for constructive induction because the relationship
  between the two representations is well known, and because the
  representational shift involved in constructing the better representation
  is not imposing."

}
  

@inproceedings{cherkauer.ismb93,
   author	= "K. Cherkauer and J. Shavlik",
   title	= "Protein Structure Prediction:  Selecting Salient Features
		   From Large Candidate Pools",
   booktitle	= "Proceedings of the First International Conference on
		   Intelligent Systems for Molecular Biology", 
   year		= 1993,
   address	= "Bethesda, MD",
   pages	= "74-82",
   publisher	= "AAAI Press",
   filename	= "cherkauer.ismb93.pdf cherkauer.ismb93.ps"
   ,abstract = "We introduce a parallel approach, ``DT-Select,'' for selecting features
  used by inductive learning algorithms to predict protein secondary structure.
  DT-Select is able to rapidly choose small, nonredundant feature sets from
  pools containing hundreds of thousands of potentially useful features. It
  does this by building a decision tree, using features from the pool, that
  classifies a set of training examples. The features included in the tree
  provide a compact description of the training data and are thus suitable for
  use as inputs to other inductive learning algorithms. Empirical experiments
  in the protein secondary-structure task, in which sets of complex features
  chosen by DT-Select are used to augment a standard artificial neural network
  representation, yield surprisingly little performance gain, even though
  features are selected from very large feature pools. We discuss some possible
  reasons for this result."
}
  

@inproceedings{craven.mlc93,
   author	= "M. Craven and J. Shavlik",
   title	= "Learning Symbolic Rules Using Artificial Neural Networks",
   booktitle	= "Proceedings of the Tenth International Conference on
		   Machine Learning", 
   year		= 1993,
   address	= "Amherst, MA",
   pages	= "73-80",
   publisher	= "Morgan Kaufmann",
   filename	= "craven.mlc93.pdf craven.mlc93.ps"
   ,abstract = "A distinct advantage of symbolic learning algorithms over
  artificial neural networks is that typically the concept representations
  they form are more easily understood by humans.
  One approach to understanding the representations formed by neural
  networks is to extract symbolic rules from trained networks.
  In this paper we describe and investigate an approach for extracting
  rules from networks that uses (1) the NofM extraction algorithm,
  and (2) the network training method of soft weight-sharing.
  Previously, the NofM algorithm had been successfully applied only
  to knowledge-based neural networks. Our experiments demonstrate that our
  extracted rules generalize better than rules learned using
  the C4.5 system.  In addition to being accurate, our extracted rules are
  also reasonably comprehensible."
}
  

@inproceedings{craven.hicss93,
   author	= "M. Craven and J. Shavlik",
   title	= "Learning to Predict Reading Frames in {{E}. coli}
		   {DNA} Sequences",
   booktitle	= "Proceedings of the 26th Hawaii International Conference
		   on System Sciences",
   year		= 1993,
   address	= "Wailea, HI",
   publisher	= "IEEE Computer Society Press",
   pages	= "773-782",
   data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/coding-HICSS/",
   filename	= "craven.hicss93.pdf craven.hicss93.ps"
   ,abstract = "Two fundamental problems in analyzing DNA sequences are (1) locating the
  regions of a DNA sequence that encode proteins, and (2) determining the
  reading frame for each region.  We investigate using artificial neural
  networks (ANNs) to find coding regions, determine reading frames, and
  detect frameshift errors in E. coli DNA sequences.  We describe our
  adaptation of the approach used by Uberbacher and Mural to identify
  coding regions in human DNA, and we compare the performance of ANNs
  to several conventional methods for predicting reading frames.
  Our experiments demonstrate that ANNs can outperform these conventional
  approaches."
}
  

@inproceedings{towell.aaai92,
   author	= "G. Towell and J. Shavlik",
   title	= "Using Symbolic Learning to Improve Knowledge-Based
		   Neural Networks",
   booktitle	= "Proceedings of the Tenth National Conference on
		   Artificial Intelligence",
   pages	= "177-182",
   year		= "1992",
   address	= "San Jose, CA",
   filename	= "towell.aaai92.pdf towell.aaai92.ps"
   ,abstract = "The previously-reported KBANN system integrates existing knowledge into neural
  networks by defining the network topology and setting initial link weights.
  Standard neural learning techniques can then be used to train such networks,
  thereby refining the information upon which the network is based.  However,
  standard neural learning techniques are reputed to have difficulty training
  networks with multiple layers of hidden units; KBANN commonly creates such
  networks.  In addition, standard neural learning techniques ignore some of the
  information contained in the networks created by KBANN. This paper describes a
  symbolic inductive learning algorithm for training such networks that uses
  this previously-ignored information and which helps to address the problems of
  training ``deep'' networks. Empirical evidence shows that this method improves
  not only learning speed, but also the ability of networks to generalize
  correctly to testing examples."

}
  

@inproceedings{maclin.aaai92,
   author	= "R. Maclin and J. Shavlik",
   title	= "Using Knowledge-Based Neural Networks to Improve Algorithms:
		   Refining the Chou-Fasman Algorithm for Protein Folding",
   booktitle	= "Proceedings of the Tenth National Conference on
		   Artificial Intelligence",
   pages	= "165-170",
   year		= "1992",
   address	= "San Jose, CA",
   filename	= "maclin.aaai92.pdf maclin.aaai92.ps"
   ,abstract = "We describe a method for using machine learning to refine algorithms
  represented as generalized finite-state automata.  The knowledge in an
  automaton is translated into an artificial neural network, and then
  refined with backpropagation on a set of examples.  Our technique for
  translating an automaton into a network extends KBANN, a system that
  translates a set of propositional rules into a corresponding neural
  network.  The extended system, FSKBANN, allows one to refine the large
  class of algorithms that can be represented as state-based processes.
  As a test, we use FSKBANN to refine the Chou-Fasman algorithm, a method
  for predicting how globular proteins fold.  Empirical evidence shows the
  refined algorithm FSKBANN produces is statistically significantly more
  accurate than both the original Chou-Fasman algorithm and a neural
  network trained using the standard approach."
}
  

@inproceedings{scott.nips4,
   author       = "G. Scott and J. Shavlik and W. Ray",
   title        = "Refining {PID} Controllers using Neural Networks",
   booktitle    = "Advances in Neural Information Processing Systems",
   year         = "1992",
   volume       = "4",
   pages	= "555-562",
   editor =        "J. Moody and S. Hanson and R. Lippmann",
   address      = "Denver, CO",
   publisher    = "Morgan Kaufmann",
   filename	= "scott.nips4.pdf scott.nips4.ps"
   ,abstract = "The KBANN (Knowledge-Based Artificial Neural Networks) approach uses neural
  networks to refine knowledge that can be written in the form of simple
  propositional rules.  We extend this idea further by presenting the MANNCON
  (Multivariable Artificial Neural Network Control) algorithm by which the
  mathematical equations governing a PID (Proportional-Integral-Derivative)
  controller determine the topology and initial weights of a network, which is
  further trained using backpropagation.  We apply this method to the task of
  controlling the outflow and temperature of a water tank, producing
  statistically-significant gains in accuracy over both a standard neural
  network approach and a non-learning PID controller.  Furthermore, using the
  PID knowledge to initialize the weights of the network produces statistically
  less variation in testset accuracy when compared to networks initialized with
  small random numbers."

}
  

@inproceedings{towell.nips4,
   author	= "G. Towell and J. Shavlik",
   title	= "Interpretation of Artificial Neural Networks:
		   Mapping knowledge-based Neural Networks into Rules",
   booktitle	= "Advances in Neural Information Processing Systems",
   year		= 1992,
   volume	= 4,
   pages	= "977-984",
   editor	= "J. Moody and S. Hanson and R. Lippmann",
   address	= "Denver, CO",
   publisher	= "Morgan Kaufmann",
   data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/splice-junction-gene-sequences/",
   filename	= "towell.nips4.pdf towell.nips4.ps"
   ,abstract = "We propose and empirically evaluate a method for the extraction of
  expert-comprehensible rules from trained neural networks. Our method operates
  in the context of a three-step process for learning that uses rule-based
  domain knowledge in combination with neural networks.  Empirical tests using
  real-worlds problems from molecular biology show that the rules our method
  extracts from trained neural networks: closely reproduce the accuracy of the
  network from which they came, are superior to the rules derived by a learning
  system that directly refines symbolic rules, and are expert-comprehensible."
}
  Note: see also towell.mlj93
  

@inproceedings{noordewier.nips3,
   author	= "M. Noordewier and G. Towell and J. Shavlik",
   title	= "Training Knowledge-Based Neural Networks to Recognize Genes
		   in {DNA} Sequences",
   booktitle	= "Advances in Neural Information Processing Systems",
   year		= 1991,
   pages	= "530-536",
   volume	= 3,
   editor	= "R. Lippmann and J. Moody and D. Touretzky",
   publisher	= "Morgan Kaufmann",
   address	= "Denver, CO",
   data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/splice-junction-gene-sequences/"
   ,filename	= "noordewier.nips3.pdf noordewier.nips3.ps"
   ,abstract = "We describe the application of a hybrid symbolic/connectionist machine
  learning algorithm to the task of recognizing important genetic sequences.
  The symbolic portion of the KBANN system utilizes inference rules that provide
  a roughly-correct method for recognizing a class of DNA sequences known as
  eukaryotic splice-junctions.  We then map this ``domain theory'' into a neural
  network and provide training examples.  Using the samples, the neural
  network's learning algorithm adjusts the domain theory so that it properly
  classifies these DNA sequences.  Our procedure constitutes a general method
  for incorporating preexisting knowledge into artificial neural networks.  We
  present an experiment in molecular genetics that demonstrates the value of
  doing so."
}
  Note: see also towell.aij94
  

@inproceedings{towell.aaai90,
   author	= "G. Towell and J. Shavlik and M. Noordewier",
   title	= "Refinement of Approximate Domain Theories by Knowledge-Based
		   Neural Networks",
   booktitle	= "Proceedings of the Eighth National Conference on
		   Artificial Intelligence",
   year		= "1990",
   pages	= "861-866",
   address	= "Boston, MA",
   filename	= "towell.aaai90.pdf towell.aaai90.ps",
   data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/promoter-gene-sequences"
   ,abstract = "Standard algorithms for explanation-based learning require complete and
  correct knowledge bases. The KBANN system relaxes this constraint through the
  use of empirical learning methods to refine approximately correct knowledge.
  This knowledge is used to determine the structure of an artificial neural
  network and the weights on its links, thereby making the knowledge accessible
  for modification by neural learning.  KBANN is evaluated by empirical tests in
  the domain of molecular biology.  Networks created by KBANN are shown to be
  superior, in terms of their ability to correctly classify unseen examples, to
  randomly initialized neural networks, decision trees, ``nearest neighbor''
  matching, and standard techniques reported in the biological literature.  In
  addition, KBANN's networks improve the initial knowledge in biologically
  interesting ways."

}
  Note: see also towell.aij94.ps
  

   ---------------
   WORKSHOP PAPERS
   ---------------
@inproceedings{li.nips14,
    filename  = "li.nips14.pdf li.nips14.poster.pdf",
    author    = "X. Li and C. Guo and W. Chu and Y. Wang and J. Shavlik",
    title     = "Deep Learning Powered In-Session Contextual Ranking using Clickthrough Data",
    booktitle = "Workshop on Personalization: Methods and Applications, at Neural Information Processing Systems (NIPS)",
    workshop  = "Workshop on Personalization: Methods and Applications, at Neural Information Processing Systems (NIPS)",
    year      = "2014",
    abstract  = "User interactions with search engines provide many cues that can be leveraged to improve the relevance of search results through personalization. The context information (history of queries, clicked documents, etc.) provides strong signals about users' search intent, which can be used to personalize the search experience and improve a web search engine. We demonstrate how to generate the semantic features from in-session contextual information with deep learning models, and incorporate these semantic features into the current ranking model to re-rank the results. We evaluate our approach using a large, real-world search log data from a major commercial web search engine, and the experimental results show our approach can significantly improve the performance of the search engine. Furthermore, we also find that the domain-specific, click-based features can effectively decrease the unsatisfied clicks for the current ranking model to improve the search experience.",
}

@inproceedings{li.smir14,
    filename  = "li.smir14.pdf",
    author    = "X. Li and W. Gao and J. Shavlik",
    title     = "Detecting Semantic Uncertainty by Learning Hedge Cues in Sentences Using an HMM",
    booktitle = "SIGIR 2014 Workshop on Semantic Matching in Information Retrieval",
    workshop  = "SIGIR 2014 Workshop on Semantic Matching in Information Retrieval",
    year      = "2014",
    abstract  = "Detecting speculative assertions is essential to distinguish semantically uncertain information from the factual ones in text. This is critical to the trustworthiness of many intelligent systems that are based on information retrieval and natural language processing techniques, such as question answering or information extraction. We empirically explore three fundamental issues of uncertainty detection: (1) the predictive ability of different learning methods on this task; (2) whether using unlabeled data can lead to a more accurate model; and (3) whether closed-domain training or cross-domain training is better. For these purposes, we adopt two statistical learning approaches to this problem: the commonly used bag-of-words model based on Naive Bayes, and the sequence labeling approach using a Hidden Markov Model (HMM). We empirically compare between our two approaches as well as externally compare with prior results on the CoNLL-2010 Shared Task 1.

Overall, our results are promising: (1) on Wikipedia and biomedical datasets, the HMM model improves over Naive Bayes up to 17.4% and 29.0%, respectively, in terms of absolute F score; (2) compared to CoNLL-2010 systems, our best HMM model achieves 62.9% F score with MLE parameter estimation and 64.0% with EM parameter estimation on Wikipedia dataset, both outperforming the best result (60.2%) of the CoNLL-2010 systems, but our results on the biomedical dataset are less impressive; (3) when the expression ability of a model (e.g., Naive Bayes) is not strong enough, cross-domain training is helpful, and when a model is powerful (e.g., HMM), cross-domain training may produce biased parameters; and (4) under Maximum Likelihood Estimation, combining the unlabeled examples with the labeled helps.",
}


@inproceedings{khot.starai14,
    filename  = "khot.starai14.pdf",
    author    = "T. Khot and S. Natarajan and J. Shavlik",
    title     = "Classification from one class of examples for relational domains",
    booktitle = "Statistical Relational AI (StarAI) Workshop at AAAI",
    workshop  = "Statistical Relational AI (StarAI) Workshop at AAAI",
    year      = "2014",
    abstract  = "One-class classification approaches have been proposed in the literature to learn classifiers from examples of only one class. But these approaches are not directly applicable to relational domains due to their reliance on a feature vector or a distance measure. We propose a non-parametric relational one-class classification approach based on first-order trees. We learn a tree-based distance measure that iteratively introduces new relational features to differentiate relational examples. We update the distance measure so as to maximize the one-class classification performance of our model. We also relate our model definition to existing work on probabilistic combination functions and density estimation. We experimentally show that our approach can discover relevant features and outperform three baseline approaches.",
}


@inproceedings{khot.trec13,
    filename  = "khot.trec13.pdf",
    author    = "T. Khot, C. Zhang, S. Natarajan, C. Re , J. Shavlik",
    title     = "Bootstrapping Knowledge Base Acceleration",
    booktitle = "The Twenty-Second Text REtrieval Conference Proceedings (TREC 2013)",
    workshop  = "Text REtrieval Conference (TREC)",
    year      = "2013",
    abstract  = "The Streaming Slot Filler (SSF) task in TREC Knowledge Base Acceleration track involves detecting changes to slot values (relations) over time. To handle this task, the system needs to extract relations to identify slot-filler values and detect novel values. Being the first attempt at KBA, the biggest challenge that we faced was the scale of the data. We present the approach used by University of Wisconsin for the SSF task and the large scale challenge. We used Elementary, a scalable statistical inference and learning system, developed in University of Wisconsin as our core system. We used Stanford NLP Toolkit to generate parse trees, dependency graphs and named-entity recognition information. These were then converted to features for the logistic regression learner of Elementary. To handle the lack of early SSF training data, we used our existing Knowledge Base Population system to bootstrap a logistic regression.",
}


@inproceedings{narayanchen.ijcai13,
    filename = "narayanchen.ijcai13.pdf narayanchen.ijcai13.slides.pptx narayanchen.ijcai1.poster.pptx",
    author    = "A. Narayan-Chen and L. Xu and J. Shavlik",
    title        = "An Empirical Evaluation of Machine Learning Approaches for Angry Birds",
    booktitle = "IJCAI Symposium on AI in Angry Birds",
    workshop="IJCAI Symposium on AI in Angry Birds",
    year       = "2013",
    abstract = "Angry Birds is a popular video game in which players shoot birds at pigs and other objects. Because of complexities in Angry Birds, such as continuously-valued features, sequential decision making, and the inherent randomness of the physics engine, learning to play Angry Birds intelligently presents a difficult challenge for machine learning. We describe how we used the Weighted Majority Algorithm and Naive Bayesian Networks to learn how to judge possible shots. A major goal of ours is to design an approach that learns the general task of playing Angry Birds rather than learning how to play specific levels. A key aspect of our design is that the features provided to the learning algorithms are a function of the local neighborhood of a shot's expected impact point. To judge generality we evaluate the learning algorithms on game levels not seen during training. Our empirical study shows our learning approaches can play statistically significantly better than a baseline system provided by the organizers of the Angry Birds competition."
}


@inproceedings{khot.starai12,
    filename = "khot.starai12.pdf khot.starai12-poster.pdf",
    author    = "T. Khot and S. Srivastava and S. Natarajan and J. Shavlik",
    title        = "Learning Relational Structure for Temporal Relation Extraction",
    booktitle = "Statistical Relational Learning AI (StarAI) Workshop at UAI",
    workshop="Statistical Relational Learning AI (StarAI) workshop",
    year       = "2012",
    abstract = "Recently there has been a lot of interest in using Statistical Relational Learning (SRL) models for Information Extraction (IE). One of the important IE tasks is extraction of temporal relations between events and time expressions (timex). SRL methods that use hand-written rules have been proposed for various IE tasks.  In contrast, we propose an approach that employs structure learning in SRL to learn such rules. Although not required, our method can also incorporate expert advice either as features or initial theory to learn a more accurate model. We present preliminary results on the TempEval-2 task of classifying relations between events and timexes."
}


@inproceedings{niu.vlds12,
    filename = "niu.vlds12.pdf",
    author    = "F. Niu and C. Zhang and C. Re and J. Shavlik",
    title        = "DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference",
    booktitle = "Workshop on Very Large Data Search",
    workshop="Workshop on Very Large Data Search",
    year       = "2012",
    abstract = "We present an end-to-end (live) demonstration system called
    DeepDive that performs knowledge-base construction (KBC)
    from hundreds of millions of web pages. DeepDive employs
    statistical learning and inference to combine diverse data
    resources and best-of-breed algorithms. A key challenge of
    this approach is scalability, i.e., how to deal with terabytes
    of imperfect data efficiently. We describe how we address
    the scalability challenges to achieve web-scale KBC and the
    lessons we have learned from building DeepDive."
}


@inproceedings{khot.srl12,
    filename = "khot.srl12.pdf",
    author    = "T. Khot and S. Natarajan and K. Kersting and J. Shavlik",
    title        = "Structure Learning with Hidden Data in Relational Domains",
    booktitle = "Statistical Relational Learning Workshop at ICML",
    workshop="Statistical Relational Learning workshop",
    year       = "2012",
    abstract  = "Recent years have seen a surge of interest in learning the structure of Statistical Relational Learning (SRL) models, which combine logic with probabilities. Most of these models apply the closed-world assumption i.e., whatever is not observed is false in the world. We consider the problem of learning the structure
    of SRL models in the presence of hidden data, i.e. we open the closed-world assumption. We develop a functional-gradient boosting algorithm based on EM to learn the structure
    and parameters of the models simultaneously and apply it to learn two kinds of models -- Relational Dependency Networks (RDNs) and Markov Logic Networks (MLNs). Our results in two testbeds demonstrate that the algorithms can effectively learn with missing data."
}


@inproceedings{kunapuli.simws11,
    filename = "kunapuli.simws11.pdf kunapuli.simws11.poster.pdf",
    author    = "G. Kunapuli and J. Shavlik",
    title        = "Mirror Descent for Metric Learning",
    booktitle = "Beyond Mahalanobis: Supervised Large Scale Learning of Similarity Workshop at NIPS",
    workshop="Beyond Mahalanobis: Supervised Large Scale Learning of Similarity Workshop",
    year       = "2011",
    abstract  = "Most metric learning methods are characterized by diverse
			loss functions and projection methods, which naturally begs the question:
			is there a wider framework that can generalize many of these methods?
			In addition, ever persistent issues are those of scalability to large data
			sets and the question of kernelizability. We propose a unified approach to
			Mahalanobis metric learning: an online regularized metric learning algorithm
			based on the ideas of composite objective mirror descent (comid).
			The metric learning problem is formulated as a regularized positive semidefinite
			matrix learning problem, whose update rules can be derived
			using the comid framework. This approach aims to be scalable, kernelizable,
			and admissible to many different types of Bregman and loss
			functions, which allows for the tailoring of several different classes of
			algorithms. The most novel contribution is the use of the trace norm,
			which yields a sparse metric in its eigenspectrum, thus simultaneously
			performing feature selection along with metric learning."
}

@inproceedings{kunapuli.icml11,
 filename = "kunapuli.icml11.pdf kunapuli.icml11.poster.pdf",
 author    = "G. Kunapuli and R. Maclin and J. Shavlik",
 title        = "Advice Refinement for Knowledge-Based Support Vector Machines",
 booktitle = "ICML 2011 Workshop on Combining Learning Strategies for Reducing Label Cost",
 workshop = "Workshop on Combining Learning Strategies for Reducing Label Cost",
 year       = "2011",
 abstract  = "Knowledge-based support vector machines
 (KBSVMs) incorporate advice from experts,
 which can improve accuracy and generalization
 significantly. A major limitation occurs
 when the expert advice is noisy or incorrect
 which can lead to poorer models and decreased
 generalization. We propose a model
 that extends KBSVMs and learns not only
 from data and advice, but also simultaneously
 improves the advice. This model, which
 contains bilinear constraints for advice refinement,
 is effective for learning in domains with
 small data sets. We propose two approaches
 to handle the bilinearity of the formulation
 along with experimental results."
 }


@inproceedings{natarajan.starai10
  ,filename = "natarajan.starai10.pdf"
  ,author = "S. Natarajan and T. Khot and D. Lowd and P. Tadepalli and K. Kersting and J. Shavlik"
  ,title = "Exploiting Causal Independence in Markov Logic Networks: Combining Undirected and Directed Models"
  ,booktitle = "Workshop on Statistical Relational AI at AAAI"
  ,year = "2010"  
  ,address = "Atlanta, GA"
  ,workshop = "Workshop on Statistical Relational AI"
  ,abstract = "A new method is proposed for compiling causal independencies into Markov logic networks (MLNs).  An MLN can be viewed as compactly representing a factorization of a joint probability into the product of a set of factors guided by logical formulas. We present a notion of causal independence that enables one to further factorize the factors into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability. The causal independence lets us specify the factor in terms of weighted, directed clauses and operators, such as ''or'', ''sum'' or ''max'', on the contribution of the variables involved in the factors, hence combining both undirected and directed knowledge. Our experimental evaluations shows that making use of the finer-grain factorization provided by causal independence can improve quality of parameter learning in MLNs."
}

@inproceedings{natarajan.aamas10
  ,filename="natarajan.aamas10.pdf natarajan.aamas10.pptx"
  ,title="Learning from Human Teachers: Issues and Challenges in Bootstrap Learning"
  ,author="S. Natarajan and G. Kunauli and R. Maclin and D. Page and C. O'Reilly and T. Walker and J. Shavlik"
  ,booktitle="AAMAS 2010 Workshop on Agents Learning Interactively from Human Teachers"
  ,workshop="Agents Learning Interactively from Human Teachers"
  ,year="2010"
  ,address="Toronto, Canada"
  ,abstract="Bootstrap Learning (BL) is a new machine learning paradigm that seeks to build an electronic student that can learn using natural instruction provided by a human teacher and by bootstrapping on previously learned concepts.  In our setting, the teacher provides (very few) examples and some advice about the task at hand using a natural instruction interface.  To address this task, we use our Inductive Logic Programming system called WILL to translate the natural instruction into first-order logic.  We present approaches to the various challenges BL raises, namely automatic translation of domain knowledge and instruction into an ILP problem and the automation of ILP runs across different tasks and domains, which we address using a multi-layered approach.  We demonstrate that our system is able to learn effectively in over fifty different lessons across three different domains without any human-performed parameter tuning between tasks."
}

@inproceedings{nassif.icdmw09
  ,filename="nassif.icdmw09.pdf"
  ,title="Information Extraction for Clinical Data Mining: A Mammography Case Study"
  ,author="H. Nassif and R. Woods and E. Burnside and M. Ayvaci and J. Shavlik and D. Page"
  ,booktitle="Proceedings of the 2009 IEEE International Conference on Data Mining Workshops"
  ,year="2009"
  ,workshop="The 2009 International Workshop on Domain Driven Data Mining"
  ,address="Miami, FL"
  ,abstract="Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy.  However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem.  We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences.  The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts. input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts.  On our dataset, our algorithm achieves 97.7% precision, 95.5%  recall and an F1-score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level."
}

@inproceedings{natarajan.ilp09
  ,filename = "natarajan.ilp09.pdf"
  ,author = "S. Natarajan and G. Kunapuli and C. O'Reilly and R.Maclin and T. Walker and D. Page and J.Shavlik"
  ,title = "ILP for Bootstrapped Learning: A Layered Approach to Automating the ILP Setup Problem"
  ,booktitle = "Presented at the Nineteenth Conference on Inductive Logic Programming"
  ,workshop = ""
  ,wkshp_date = "2009"
  ,year = "2009"
  ,address = "Leuven, Belgium"
  ,abstract = "This paper introduces a new type of application for ILP called Bootstrapped Learning (BL).  BL brings several challenges to ILP, including the need to (a) automate the 'ILP setup' problem, (b) exploit the fact that a well-meaning teacher is providing pedagogically chosen examples and may be offering hints, (c) deal with small numbers of training examples and sometimes no explicit negative examples; and (d) 'bootstrap', i.e., to automatically base learning in part on the results of earlier learning sessions."
}

@inproceedings{sri.srl09,
  filename="sri.srl09.pdf",
  author="R. De Salvo Braz and S. Natarajan and H. Bui and J. Shavlik and S. Russell",
  title="Anytime Lifted Belief Propagation",
  booktitle="6th International Workshop on Statistical Relational Learning",
  year="2009",
  workshop = "ILP - MLG - SRL",
  wkshp_date = "June 2009",
  address = "Leuven, Belgium",
  abstract = "Lifted first-order probabilistic inference, which manipulates first-order representations directly, has been receiving increasing attention. To date, all lifted inference methods require a model to be shattered against itself and evidence (that is, splitting groups of variables until they are all composed of variables with the exact same properties), before inference starts. In many situations this produces a new model that is not far from propositionalized, therefore canceling the bene�fits of lifted inference. We present an algorithm, Anytime Lifted Belief Propagation, that corresponds to this intuition by performing shattering during belief propagation inference, on an as-needed basis, starting on the most relevant parts of a model first. The trade-off is having an (exact) bound (an interval) on the query's belief rather than an exact belief.  Bounds are useful when approximate answers are sufficient and, in decision-making applications, can even be enough to determine the decision that would be picked from the exact belief. Moreover, the bounds can be made to converge to the exact solution as inference and shattering converge to the entire model. Interestingly, this algorithm mirrors theorem-proving, helping to close the gap between probabilistic and logic inference."
}
@inproceedings{torrey.symp08,
  filename="torrey.symp08.pdf torrey.symp08.ppt",
  author="L. Torrey and T. Walker and R. Maclin and J. Shavlik",
  title="Advice Taking and Transfer Learning: Naturally Inspired Extensions to Reinforcement Learning",
  booktitle="AAAI Fall Symposium on Naturally Inspired AI",
  year="2008",
  wkshp_date = "November 2008",
  workshop = "Naturally Inspired AI",
  address="Washington, DC",
  abstract = "Reinforcement learning (RL) is a machine learning technique with strong links to natural learning. However, it shares several 'unnatural' limitations with many other successful machine learning algorithms. RL agents are not typically able to take advice or to adjust to new situations beyond the specific problem they are asked to learn. Due to limitations like these, RL remains slower and less adaptable than natural learning. Our recent work focuses on extending RL to include the naturally inspired abilities of advice taking and transfer learning. Through experiments in the RoboCup domain, we show that doing so can make RL faster and more adaptable."
}

@inproceedings{torrey.aaai08
    ,filename = "torrey.aaai08.pdf torrey.aaai08.ppt"
    ,author = "L. Torrey and J. Shavlik and S. Natarajan and P. Kuppili and T. Walker"
    ,title = "Transfer in Reinforcement Learning via Markov Logic Networks"
    ,year = 2008
    ,wkshp_date	= "July 2008"
    ,conf_date	= "July 2008"
    ,address	= "Chicago, IL"
    ,booktitle = "AAAI'08 Workshop on Transfer Learning for Complex Tasks"
    ,workshop = "Workshop on Transfer Learning for Complex Tasks"
    ,abstract = "We propose the use of statistical relational learning, and in particular the formalism of Markov Logic Networks, for transfer in reinforcement learning. Our goal is to extract relational knowledge from a source task and use it to speed up learning in a related target task. We do so by learning a Markov Logic Network that describes the source-task Q-function, and then using it for decision making in the early learning stages of the target task. Through experiments in the RoboCup simulated-soccer domain, we show that this approach can provide a substantial performance benefit in the target task."
}

@inproceedings{torrey.icml06
    ,filename = "torrey.icml-wkshp06.pdf"
    ,author = "L. Torrey and J. Shavlik and T. Walker and R. Maclin"
    ,title = "Relational Skill Transfer via Advice Taking"
    ,year = 2006
    ,wkshp_date	= "July 2006"
    ,conf_date	= "July 2006"
    ,address	= "Pittsburg, PA"
    ,booktitle = "ICML'06 Workshop on Structural Knowledge Transfer for Machine Learning"
    ,workshop = "Structural Knowledge Transfer for Machine Learning"
    ,abstract = "We describe a reinforcement learning system that transfers relational skills from a previously learned source task to a related target task. The system uses inductive logic programming to analyze experience in the source task, and transfers rules about when to take actions. The target-task learner accepts these rules through an advice-taking algorithm. Our system also accepts human-provided advice, which can guide the application of transferred skills and provide instruction about new skills in the target task."
}

@inproceedings{goadrich.icml05
    ,filename = "goadrich.icml05.pdf"
    ,author = "M. Goadrich and L. Oliphant and J. Shavlik"
    ,title = "Learning to Extract Genic Interactions Using Gleaner"
    ,year = 2005
    ,wkshp_date	= "August 2005"
    ,conf_date	= "August 2005"
    ,address	= "Bonn, Germany"
    ,booktitle = "ICML'05 Workshop on Learning Language in Logic (LLL05)" 
    ,workshop = "Learning Language in Logic"
    ,abstract = "We explore here the application of Gleaner, an Inductive Logic Programming approach to learning in highly-skewed domains, to the Learning Language in Logic 2005 biomedical information-extraction challenge task. We create and describe a large number of background knowledge predicates suited for this task. We find that Gleaner outperforms standard Aleph theories with respect to recall and that additional linguistic background knowledge improves recall."
}

@inproceedings{maclin.ijcai05
    ,filename = "maclin.ijcai05.pdf  maclin.ijcai05-poster.pdf"
    ,author = "R. Maclin and J. Shavlik and L. Torrey and T. Walker"
    ,title = "Knowledge Based Support Vector Regression for Reinforcement Learning"
    ,year = 2005
    ,wkshp_date	= "August 2005"
    ,conf_date	= "August 2005"
    ,address	= "Edinburgh, Scotland"
    ,booktitle = "IJCAI'05 Workshop on Reasoning, Representation, and Learning in Computer Games" 
    ,workshop = "Reasoning, Representation, and Learning in Computer Games"
    ,abstract = "Reinforcement learning (RL) methods have difficulty scaling to large, complex problems.  One approach that has proven effective for scaling RL is to make use of advice provided by a human.  We extend a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), to RL and evaluate our approach on the KeepAway subtask of the RoboCup soccer simulator.  We present empirical results that show our approach can make effective use of advice.  Our work not only demonstrates the potential of advice-giving techniques such as KBKR for RL, but also offers insight into some of the design decisions involved in employing support-vector regression in RL."
}

@inproceedings{walker.icml04
  ,filename	= "walker.icml04.pdf walker.icml04.doc"
  ,author	= "T. Walker and J. Shavlik and R. Maclin"
  ,title	= "Relational Reinforcement Learning via Sampling the Space of First-Order Conjunctive Features"
  ,booktitle	= "Proceedings of the ICML Workshop on Relational Reinforcement Learning"
  ,year		= "2004"
  ,publisher	= ""
  ,pub_addr	= ""
  ,wkshp_date	= "July 2004"
  ,conf_date	= "July 2004"
  ,address	= "Banff, Canada"
  ,workshop	= "Relational Reinforcement Learning"
  ,abstract	= "We propose a novel method for reinforcement learning in domains that are best described using relational (first-order) features. Our approach is to rapidly sample a large space of such features, selecting a good subset to use as the basis for our Q-function. Our Q-function is created via a regression model that combines the collection of first-order features into a single prediction. To control the effect of the random predictions we use an ensemble approach for our predictions, generating multiple Q-function models and then combining the results of these models into a single prediction. Experiments with our technique on an interesting reinforcement learning problem, the Keep-Away subtask of RoboCup, suggest that our method can learn to effectively predict Q-values for a challenging reinforcement-learning task."
}

@inproceedings{kuhlmann-aaai04
  ,filename	= "kuhlmann-aaai04.pdf kuhlmann-aaai04.ps"
  ,author	= "G. Kuhlmann and P. Stone and R. Mooney and J. Shavlik"
  ,title	= "Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer"
  ,booktitle	= "Proceedings of the AAAI Workshop on Supervisory Control of Learning and Adaptive Systems"
  ,year		= "2004"
  ,publisher	= ""
  ,pub_addr	= ""
  ,wkshp_date	= "July 2004"
  ,conf_date	= "July 2004"
  ,address	= "San Jose, CA, USA"
  ,workshop	= "Supervisory Control of Learning and Adaptive Systems"
  ,abstract	= "We describe our current efforts towards creating a reinforcement learner that learns both from reinforcements provided by its environment and from human-generated advice. Our research involves two complementary components: (a) mapping advice expressed in English to a formal advice language and (b) using advice expressed in a formal notation in a reinforcement learner. We use a subtask of the challenging RoboCup simulated soccer task (Noda et al. 1998) as our testbed."
}

@inproceedings{dimaio.icdm03
  ,filename	= "dimaio.icdm03.pdf dimaio.icdm03.ppt"
  ,author	= "F. DiMaio and J. Shavlik"
  ,title	= "Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores"
  ,booktitle	= "Proceedings of the ICDM Workshop on Foundations and New Directions of Data Mining"
  ,year		= "2003"
  ,publisher	= ""
  ,pub_addr	= ""
  ,wkshp_date	= "Nov 2003"
  ,conf_date	= "Nov 2003"
  ,address	= "Melbourne, Florida, USA"
  ,workshop	= "Foundations and New Directions in Data Mining"
  ,note         = "A longer and updated version of this paper appears in Proceedings of the Fourteenth International Conference on Inductive Logic Programming (2004)"
  ,abstract	= "The motivation behind multi-relational data mining is knowledge discovery in relational databases containing multiple related tables. One difficulty relational data mining faces is managing intractably large hypothesis spaces. We attempt to overcome this difficulty by first sampling the hypothesis space. We generate a small set of hypotheses, uniformly sampled from the space of candidate hypotheses, and evaluate this set on actual data. These hypotheses and their corresponding evaluation scores serve as training data in learning an approximate hypothesis evaluator. We use this approximate evaluation to quickly rate potential hypotheses without needing to score them on actual data. We test our approximate clause evaluation algorithm using the popular Inductive Logic Programming (ILP) system Aleph. We use a neural network to approximate the hypothesis-evaluation function. The trained neural network replaces Alephs hypothesis evaluation on actual data, scoring potential rules in time independent of the number of examples. Our approximate evaluator can also be used in a heuristic search to help escape local maxima. We test the neural network's ability in learning the hypothesis-evaluation function on four benchmark ILP domains; the neural network is able to accurately approximate the hypothesis-evaluation function."
}

@inproceedings{dimaio.icml03
  ,filename	= "dimaio.icml03.pdf dimaio.icml03.ppt"
  ,author	= "F. DiMaio and J. Shavlik and G. Phillips"
  ,title	= "Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps"
  ,booktitle	= "Working Notes of the ICML Workshop on Machine Learning in Bioinformatics"
  ,year		= "2003"
  ,publisher	= ""
  ,pub_addr	= ""
  ,wkshp_date	= "Aug 2003"
  ,conf_date	= "Aug 2003"
  ,address	= "Washington DC, USA"
  ,workshop	= "Machine Learning in Bioinformatics"
  ,abstract	= "One of the most time-consuming steps in determining a protein's structure via x-ray crystallography is interpretation of the electron density map. This can be viewed as a computer-vision problem, since a density map is simply a three-dimensional image of a protein. However, due to the intractably large space of conformations the protein can adopt, building a protein model to match in the density map is extremely difficult. This paper describes the use of pictorial structures to build a flexible protein model from the protein's amino acid sequence. A pictorial structure is a way of representing an object as a collection of parts connected, pairwise, by deformable springs. Model parameters are learned from training data. Using an efficient algorithm to match the model to the density map, the most probable arrangement of the protein's atoms can be found in a reasonable running time. We test the algorithm is on two different tasks. The first is an amino-acid sidechain-refinement task, in which the location of the protein's backbone is approximately known. The algorithm places the remaining atoms into the region of density quite accurately, placing 72% of atoms within 1.0  of their actual location (as determined by a crystallographer). In the second task, a classification task, the algorithm is used to predict the type of amino acid contained in an unknown region of density. In this task, the algorithm is 61% accurate in discriminating between four different amino acids."
}

@inproceedings{mooney.nsf02
  ,filename	= "mooney.nsf02.pdf mooney.nsf02.ps"
  ,author       = "R. Mooney and P. Melville and L. Tang and J. Shavlik and I. Dutra and D. Page and V. Santos Costa"
  ,title        = "Relational Data Mining with Inductive Logic Programming for Link Discovery"
  ,booktitle    = "Proceedings of the National Science Foundation Workshop on Next Generation Data Mining"
  ,year         = 2002
  ,publisher    = ""
  ,pub_addr     = ""
  ,wkshp_date   = "Nov 2002"
  ,conf_date    = "Nov 2002"
  ,address      = "Baltimore, Maryland, USA"
  ,workshop	= ""
  ,note         = "A longer and updated version
       of this paper appears as a chapter in ``Data Mining: Next
       Generation Challenges and Future Directions'', H. Kargupta and
       A. Joshi (eds.), by AAAI/MIT Press"
  ,abstract	=
  "Link discovery (LD) is an important task in data mining for counter-terrorism and is the
   focus of DARPA's Evidence Extraction and Link Discovery (EELD) research program.  Link
   discovery concerns the identification of complex relational patterns that indicate 
   potentially threatening activities in large amounts of relational data.  Most data-mining
   methods assume data is in the form of a feature-vector (a single relational table) and cannot
   handle multi-relational data.  Inductive logic programming is a form of relational
   data mining that discovers rules in first-order logic from multi-relational data.  This
   paper discusses the application of ILP to learning patterns for link discovery."
}

@inproceedings{goecks.ijcai-wkshp99
  ,filename	= "goecks_IJCAI_ML_IF_workshop99.pdf"
  ,author       = "J. Goecks and J. Shavlik"
  ,title        = "Automatically Labeling Web Pages Based on Normal User Actions"
  ,booktitle    = "Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering"
  ,year         = 1999
  ,publisher    = ""
  ,pub_addr     = ""
  ,wkshp_date   = "Jul 31--Aug 6, 1999"
  ,conf_date    = "Jul 31--Aug 6, 1999"
  ,address      = "Stockholm, Sweden"
  ,workshop	= ""
  ,abstract	=
  "For agents attempting to learn a user's interests, the cost of obtaining labeled 
   training instances is prohibitive because the user must directly label each training 
   instance, and few users are willing to do so.  We present an approach that 
   circumvents the need for human-labeled pages.  Instead, we learn 'surrogate' tasks 
   where the desired output is easily measured, such as the number of hyperlinks clicked
   on a page or the amount of scrolling performed. Our assumption is that these outputs 
   will highly correlate with the user's interests.  In other words, by unobtrusively 
   'observing' the user's behavior we are able to learn functions of value.  For 
   example, an agent could silently observe the user's browser behavior during the day, 
   then use these training examples to learn such functions and gather, during the 
   middle of the night, pages that are likely to be of interest to the user.  Previous 
   work has focused on learning a user profile by passively observing the hyperlinks 
   clicked on and those passed over.  We extend this approach by measuring user mouse
   and scrolling activity in addition to user browsing activity. We present empirical
   results that demonstrate our agent can accurately predict some easily measured 
   aspects of one's use of his or her browser."
}

@inproceedings{shavlik.aaai-wkshp98
  ,filename	= "shavlik.aaai-wkshp98.pdf shavlik.aaai-wkshp98.ps"
  ,author       = "J. Shavlik and T. Eliassi-Rad"
  ,title        = "Intelligent Agents for Web-Based Tasks: An Advice-Taking Approach"
  ,booktitle    = "Working Notes of the AAAI/ICML-98 Workshop on Learning for Text Categorization"
  ,year         = 1998
  ,publisher    = "AAAI Press"
  ,pub_addr     = "Menlo Park, CA"
  ,wkshp_date   = "Jul 26--30, 1998"
  ,conf_date    = "Jul 26--30, 1998"
  ,address      = "Madison, WI"
  ,workshop	= ""
  ,abstract	=
  "We present and evaluate an implemented system with which to rapidly and easily build intelligent
   software agents for Web-based tasks. Our design is centered around two basic functions:
   ScoreThisLink and ScoreThisPage. If given highly accurate such functions, standard heuristic
   search would lead to efficient retrieval of useful information. Our approach allows users to tailor
   our system's behavior by providing approximate advice about the above functions. This advice is
   mapped into neural network implementations of the two functions. Subsequent reinforcements
   from the Web (e.g., dead links) and any ratings of retrieved pages that the user wishes to provide
   are, respectively, used to refine the link- and page-scoring functions. Hence, our architecture
   provides an appealing middle ground between non-adaptive agent programming languages and
   systems that solely learn user preferences from the user's ratings of pages. We describe our
   internal representation of Web pages, the major predicates in our advice language, how advice is
   mapped into neural networks, and the mechanisms for refining advice based on subsequent
   feedback. We also present a case study where we provide some simple advice and specialize our
   general-purpose system into a ``home-page finder''. An empirical study demonstrates that our
   approach leads to a more effective home-page finder than that of a leading commercial Web
   search site."
}

@inproceedings{shavlik.conald-wkshp98
  ,filename	= "shavlik.conald-wkshp98.pdf shavlik.conald-wkshp98.ps"
  ,author       = "J. Shavlik and T. Eliassi-Rad"
  ,title        = "Building Intelligent Agents for Web-Based Tasks: A Theory-Refinement Approach"
  ,booktitle    = "Presented at the Conf on Automated Learning and Discovery Workshop on Learning
		   from Text and the Web"
  ,year         = 1998
  ,publisher    = ""
  ,pub_addr     = ""
  ,wkshp_date   = "Jun 11--13, 1998"
  ,conf_date    = "Jun 11--13, 1998"
  ,address      = "Pittsburgh, PA"
  ,workshop	= ""
  ,abstract	=
  "We present and evaluate an infrastructure with which to rapidly and easily build intelligent
   software agents for Web-based tasks. Our design is centered around two basic functions:
   ScoreThisLink and ScoreThisPage. If given highly accurate such functions, standard heuristic
   search would lead to efficient retrieval of useful information. Our approach allows users to tailor
   our system's behavior by providing approximate advice about the above functions. This advice is
   mapped into neural network implementations of the two functions. Subsequent reinforcements
   from the Web (e.g., dead links) and any ratings of retrieved pages that the user wishes to provide
   are, respectively, used to refine the link- and page-scoring functions. Hence, our agent
   architecture provides an appealing middle ground between non-adaptive ``agent'' programming
   languages and systems that solely learn user preferences from the user's ratings of pages. We
   present a case study where we provide some simple advice and specialize our general-purpose
   system into a ``home-page finder''. An empirical study demonstrates that our approach leads to a
   more effective home-page finder than that of a leading commercial Web search engine."
}

@inproceedings{cherkauer.aaai-wkshp96
  ,filename	= "cherkauer.aaai-wkshp96.pdf cherkauer.aaai-wkshp96.ps"
  ,author       = "K. Cherkauer"
  ,title        = "Human Expert-Level Performance on a Scientific Image
                   Analysis Task by a System Using Combined Artificial Neural
                   Networks"
  ,booktitle    = "Working Notes, Integrating Multiple Learned Models for
                   Improving and Scaling Machine Learning Algorithms Wkshp,
                   13th Nat Conf on Artificial Intelligence"
  ,year         = 1996
  ,pages	= "15-21"
  ,publisher    = "AAAI Press"
  ,pub_addr     = "Menlo Park, CA"
  ,wkshp_date   = "Aug 4--5, 1996"
  ,conf_date    = "Aug 4--8, 1996"
  ,address      = "Portland, OR"
  ,workshop	= ""
  ,abstract	=
   "This paper presents the Plannett system, which combines artificial
    neural networks to achieve expert-level accuracy on the difficult
    scientific task of recognizing volcanos in radar images of the surface
    of the planet Venus. Plannett uses ANNs that vary along two dimensions:
    the set of input features used to train and the number of hidden units.
    The ANNs are combined simply by averaging their output activations. When
    Plannett is used as the classification module of a three-stage image
    analysis system called JARtool, the end-to-end accuracy (sensitivity and
    specificity) is as good as that of a human planetary geologist on a
    four-image test suite. JARtool-Plannett also achieves the best
    algorithmic accuracy on these images to date."
}

@inproceedings{craven.ijcai-wkshp95
   ,author	= "M. Craven and J. Shavlik"
   ,title	= "Extracting Comprehensible Concept Representations from
		   Trained Neural Networks"
   ,booktitle	= "Presented at the IJCAI Workshop on Comprehensibility in 
                   Machine Learning"
   ,year	= 1995
   ,address	= "Montreal, Quebec, Canada"
   ,filename	= "craven.ijcai-wkshp95.pdf craven.ijcai-wkshp95.ps"
   ,workshop	= ""
   ,abstract = "Although they are applicable to a wide array of problems, and have
  demonstrated good performance on a number of difficult, real-world
  tasks, neural networks are not usually applied to problems in which
  comprehensibility of the acquired concepts is important.  The concept
  representations formed by neural networks are hard to understand
  because they typically involve distributed, nonlinear relationships
  encoded by a large number of real-valued parameters.  To address this
  limitation, we have been developing algorithms for extracting ``symbolic''
  concept representations from trained neural networks.  We first discuss
  why it is important to be able to understand the concept representations
  formed by neural networks.  We then briefly describe our approach and
  discuss a number of issues pertaining to comprehensibility that have
  arisen in our work.  Finally, we discuss choices that we have made in
  our research to date, and open research issues that we have not yet
  addressed."
}
  

@inproceedings{cherkauer.ijcai-wkshp95
   ,author	= "K. Cherkauer and J. Shavlik"
   ,title	= "Rapidly Estimating the Quality of Input Representations for
                   Neural Networks"
   ,booktitle	= "Working Notes of the IJCAI-95 Workshop on Data Engineering
                   for Inductive Learning, Fourteenth International Joint
                   Conference on Artificial Intelligence"
   ,year	= 1995
   ,address	= "Montreal, Quebec, Canada"
   ,filename	= "cherkauer.ijcai-wkshp95.pdf cherkauer.ijcai-wkshp95.ps"
   ,workshop	= ""
   ,abstract = "The choice of an input representation for machine learning can have a
  profound impact on the accuracy of the learned model in classifying novel
  instances.  A reliable method of rapidly estimating the value of a
  representation, independent of the learner, would be a powerful tool in the
  search for better representations.  We introduce a fast representation-
  quality measure that is more accurate than Rendell and Ragavan's blurring
  metric in rank ordering input representations for neural networks on two
  difficult, real-world datasets.  This work constitutes a step forward both
  in representation quality measures and in our understanding of the
  characteristics that engender good representations."
}

  Note: see also cherkauer.nips96.ps
  

@inproceedings{opitz.isiknh94
   ,author	= "D. Opitz and J. Shavlik"
   ,title	= "Genetically Refining Topologies of Knowledge-Based
		   Neural Networks"
   ,booktitle	= "International Symposium on Integrating Knowledge and
		   Neural Heuristics"
   ,year	= 1994
   ,address	= "Pensacola, FL"
   ,pages	= "57-66"
   ,filename	= "opitz.isiknh94.pdf opitz.isiknh94.ps"
   ,workshop	= ""
   ,abstract = "Traditional approaches to connectionist theory refinement map the
  dependencies of a domain-specific rulebase into a neural network,
  then refine these reformulated rules using neural learning.  These
  approaches have proven to be effective at classifying previously unseen
  examples; however, most of these approaches suffer in that they are
  unable to refine the topology of the networks they produce.  Thus, when
  given an ``impoverished'' domain theory, they generalize poorly.  A recently
  published improvement to these approaches, the TopGen algorithm, addressed
  this limitation by heuristically searching expansions to the knowledge-based
  networks produced by these algorithms.  We show, however, that TopGen's
  search is too restricted.  In response, we present the REGENT algorithm,
  which uses genetic algorithms to broaden the type of networks seen during
  its search.  It does this by using (a) the domain theory to help create
  an initial population and (b) crossover and mutation operators specifically
  designed for knowledge-based networks.  Experiments on three real-world
  domains indicate that our new algorithm is able to significantly increase
  generalization when compared to both TopGen and a standard approach that
  does not alter its knowledge-based network's topology."
}
  
@inproceedings{maclin.mlc91,
   author	= "R. Maclin and J. Shavlik",
   title	= "Refining Domain Theories Expressed as Finite-State Automata",
   booktitle	= "Proceedings of the Eighth International Machine Learning
                   Workshop",
   year		= "1991",
   pages	= "524-528",
   address	= "Evanston, IL",
   publisher	= "Morgan Kaufmann",
   filename	= "maclin.mlc91.pdf maclin.mlc91.ps"
   ,workshop	= ""
   ,abstract = "The KBANN system uses neural networks to refine domain theories.
  Currently, domain knowledge in KBANN is expressed as non-recursive,
  propositional rules.  We extend KBANN to domain theories expressed as
  finite-state automata.  We apply finite-state KBANN to the task of
  predicting how proteins fold, producing a small but statistically significant
  gain in accuracy over both a standard neural network approach and a
  non-learning algorithm from the biological literature.  Our method shows
  promise at solving this and other real-world problems that can be described in
  terms of state-dependent decisions."
}
  Note: this work has been superseded by maclin.aaai92
  

@inproceedings{towell.mlc91,
   author	= "G. Towell and M. Craven and J. Shavlik",
   title	= "Constructive Induction in Knowledge-Based Neural Networks",
   booktitle	= "Proceedings of the Eighth International Machine
		   Learning Workshop",
   year		= 1991,
   pages	= "213-217",
   address	= "Evanston, IL",
   publisher	= "Morgan Kaufmann",
   filename	= "towell.mlc91.pdf towell.mlc91.ps",
   data = "http://ftp.ics.uci.edu/pub/machine-learning-databases/molecular-biology/splice-junction-gene-sequences/",
   workshop	= ""
   ,abstract = "Artificial neural networks have proven to be a successful, general method for
  inductive learning from examples.  However, they have not often been viewed in
  terms of constructive induction.  We describe a method for using a
  knowledge-based neural network of the kind created by the KBANN algorithm as
  the basis of a system for constructive induction. After training, we extract
  two types of rules from a network: modified versions of the rules initially
  provided to the knowledge-based neural network, and rules which describe newly
  constructed features.  Our experiments show that the extracted rules are more
  accurate, at classifying novel examples, than the trained network from
  which the rules are extracted."
}
  Note: this work has been superseded by towell.nips4, towell.mlj93,


@inproceedings{shavlik.cbr91,
   author	= "J. Shavlik",
   title	= "Finding Genes by Case-Based Reasoning in the Presence of
		   Noisy Case Boundaries",
   booktitle	= "Proceedings of the DARPA Cased-Based Reasoning Workshop",
   year		= "1991",
   pages	= "327-338",
   publisher	= "Morgan Kaufmann",
   filename	= "shavlik.cbr91.pdf shavlik.cbr91.ps"
   ,workshop	= ""
   ,abstract = "Effectively using previous cases requires that a reasoner first match, in some
  fashion, the current problem against a large library of stored cases.  One
  largely unaddressed task in case-based reasoning is the process of parsing
  continuous input into discrete cases.  If this parsing is not done accurately,
  the relevant previous cases may not be found and the advantages of case-based
  problem solving will be lost.  Parsing the data into cases is further
  complicated when the input data is noisy.  This paper presents an approach to
  applying the case-based paradigm in the presence of noisy case boundaries.
  The approach has been fully implemented and applied in the domain of molecular
  biology; specifically, a successful case-based approach to gene finding is
  described.  An empirical study demonstrates that the method is robust even
  with high error rates.  This system is being used in conjunction with a Human
  Genome project in the Wisconsin Department of Genetics that is sequencing the
  DNA of the bacterium E. coli."
}
  

   --------------
   JOURNAL PAPERS
   --------------
@article{khot.mlj14,
    filename = "khot.mlj14.pdf",
    author   = "T. Khot and S. Natarajan and K. Kersting and J. Shavlik",
    journal  = "Machine Learning",
    title    = "Gradient-based Boosting for Statistical Relational Learning: The Markov Logic Network and Missing Data Cases",
    year     = "2014",
    note     = "(To appear.)",
    abstract = "Recent years have seen a surge of interest in Statistical Relational Learning (SRL) models that combine logic with probabilities. One prominent and highly expressive SRL model is Markov Logic Networks (MLNs), but the expressivity comes at the cost of learning complexity. Most of the current methods for learning MLN structure follow a two-step approach where first they search through the space of possible clauses (i.e. structures) and then learn weights via gradient descent for these clauses. We present a functional-gradient boosting algorithm to learn both the weights (in closed form) and the structure of the MLN simultaneously. Moreover most of the learning approaches for SRL apply the closed-world assumption, i.e., whatever is not observed is assumed to be false in the world. We attempt to open this assumption.We extend our algorithm for MLN structure learning to handle missing data by using an EM-based approach and show this algorithm can also be used to learn Relational Dependency Networks and relational policies. Our results in many domains demonstrate that our approach can effectively learn MLNs even in the presence of missing data.",
}

@article{niu.ijswis12
 ,filename = "niu.ijswis12.pdf"
 ,author = "F. Niu and C. Zhang and C. Re and J. Shavlik"
 ,title = "Elementary: Large-scale Knowledge-base Construction via Machine Learning and Statistical Inference"
 ,journal = "International Journal on Semantic Web and Information Systems (IJSWIS)"
 ,year = 2012
 ,abstract="Researchers have approached knowledge-base construction (KBC) with a wide range of data
 resources and techniques. We present Elementary, a prototype KBC system that is able to
 combine diverse resources and different KBC techniques via machine learning and statistical
 inference to construct knowledge bases. Using Elementary, we have implemented a solution to
 the TAC-KBP challenge with quality comparable to the state of the art, as well as an end-to-end
 online demonstration that automatically and continuously enriches Wikipedia with structured
 data by reading millions of webpages on a daily basis. We describe several challenges and
 our solutions in designing, implementing, and deploying Elementary. In particular, we first
 describe the conceptual framework and architecture of Elementary, and then discuss how we
 address scalability challenges to enable Web-scale deployment. First, to take advantage of
 diverse data resources and proven techniques, Elementary employs Markov logic, a succinct
 yet expressive language to specify probabilistic graphical models. Elementary accepts both
 domain-knowledge rules and classical machine-learning models such as conditional random
 fields, thereby integrating different data resources and KBC techniques in a principled manner.
 Second, to support large-scale KBC with terabytes of data and millions of entities, Elementary
 leverages high-throughput parallel computing infrastructure such as Hadoop, Condor, and
 parallel databases. Furthermore, to scale sophisticated statistical inference, Elementary
 employs a novel decomposition-based approach to Markov logic inference that solves routine
 subtasks such as classification and coreference with specialized algorithms. We empirically
 show that this decomposition-based inference approach achieves higher performance than prior
 inference approaches. To validate the ectiveness of Elementary's approach to KBC, we
 experimentally show that its ability to incorporate diverse signals has positive impacts on KBC
 quality."
}

@article{soni.jbcb12
 ,filename = "http://dx.doi.org/10.1142/S0219720012400094"
 ,author = "A. Soni and J. Shavlik"
 ,title = "Probabilistic Ensembles for Improved Inference in Protein-Structure Determination"
 ,journal = "Journal of Bioinformatics and Computational Biology (JBCB)"
 ,year = 2012
 , note="An extended version of this paper can be found in The Proceedings of the ACM International Conference on Bioinformatics and Computational Biology 2011."
 ,doi = "http://dx.doi.org/10.1142/S0219720012400094"
 ,abstract = "Protein X-ray crystallography -- the most popular method for determining protein
 structure -- remains a laborious process requiring a great deal of manual crystallographer effort
 to interpret low-quality protein images. Automating this process is critical in creating a high-throughput
 protein-structure determination pipeline. Previously, our group developed ACMI, a
 probabilistic framework for producing protein-structure models from electron-density maps
 produced via X-ray crystallography. ACMI uses a Markov Random Field to model the three-dimensional
 (3D) location of each non-hydrogen atom in a protein. Calculating the best structure
 in this model is intractable, so ACMI uses approximate inference methods to estimate the
 optimal structure. While previous results have shown ACMI to be the state-of-the-art method on
 this task, its approximate inference algorithm remains computationally expensive and susceptible
 to errors. In this work, we develop Probabilistic Ensembles in ACMI (PEA), a framework
 for leveraging multiple, independent runs of approximate inference to produce estimates of
 protein structures. Our results show statistically significant improvements in the accuracy of
 inference resulting in more complete and accurate protein structures. In addition, PEA provides
 a general framework for advanced approximate inference methods in complex problem domains."
 }


@article{natarajan.mlj12
 ,filename = "natarajan.mlj12.pdf"
 ,author = "S. Natarajan and T. Khot and K. Kersting and B. Gutmann and J. Shavlik"
 ,title = "Gradient-based Boosting for Statistical Relational Learning: The Relational Dependency Network Case"
 ,journal = "Machine Learning"
 ,year = 2012
 ,doi = "http://dx.doi.org/10.1007/s10994-011-5244-9"
 ,abstract = "Dependency networks approximate a joint probability distribution over multiple random variables as a product of conditions distributions. Relational Dependency Networks (RDNs) are graphical models that extend dependency networks to relational domains. This higher expressivity, however, comes at the expense of a more complex model-selection problem: an unbounded number of relational abstraction levels might need to be explored. Whereas current learning approaches for RDNs learn a single probability tree per random variable, we propose to turn the problem into a series of relational function-approximation problems using gradient-based boosting. In doing so, one can easily induce highly complex features over several iterations and in turn estimate quickly a very expressive model. Our experimental results in several different data sets show that this boosting method results in efficient learning of RDNs when compared to state-of-the-art statistical relational learning approaches."
 }

@article{ayer.cancer10
 ,filename = ""
 ,author = "T. Ayer and O. Alagoz and J. Chhatwal and J. Shavlik and C. Kahn and E. Burnside"
 ,title = "Breast Cancer Risk Estimation with Artificial Neural Networks Revisited:  Discrimination and Calibration"
 ,journal = "Cancer"
 ,year = 2010
 ,doi =  "http://dx.doi.org/10.1002/cncr.25081"
 ,abstract = "Discriminating malignant breast lesions from benign ones and accurately predicting the risk of breast cancer for individual patients are crucial to successful clinical decisions. In the past, several artificial neural network (ANN) models have been developed for breast cancer-risk prediction. All studies have reported discrimination performance, but not one has assessed calibration, which is an equivalently important measure for accurate risk prediction. In this study, the authors have evaluated whether an artificial neural network (ANN) trained on a large prospectively collected dataset of consecutive mammography findings can discriminate between benign and malignant disease and accurately predict the probability of breast cancer for individual patients.Our dataset consisted of 62,219 consecutively collected mammography findings matched with the Wisconsin State Cancer Reporting System. The authors built a 3-layer feedforward ANN with 1000 hidden-layer nodes. The authors trained and tested their ANN by using 10-fold cross-validation to predict the risk of breast cancer. The authors used area the under the receiver-operating characteristic curve (AUC), sensitivity, and specificity to evaluate discriminative performance of the radiologists and their ANN. The authors assessed the accuracy of risk prediction (ie, calibration) of their ANN by using the Hosmer-Lemeshow (H-L) goodness-of-fit test.Their ANN demonstrated superior discrimination (AUC, 0.965) compared with the radiologists (AUC, 0.939; P &lt; .001). The authors' ANN was also well calibrated as shown by an H-L goodness of fit P-value of .13.The authors' ANN can effectively discriminate malignant abnormalities from benign ones and accurately predict the risk of breast cancer for individual abnormalities."
 }
@article{chen.tkdd09
 ,filename = ""
 ,author = "B-C. Chen and R. Ramakrishnan and J. Shavlik and P. Tamma"
 ,title = "Bellwether Analysis: Searching for Cost-Effective Query-Defined Predictors in Large Databases"
 ,year = "2009"
 ,journal = "ACM Transactions on Knowledge Discovery from Data"
 ,volume = 3
 ,number = 1
 ,pages = "1-49"
 ,doi = "http://doi.acm.org/10.1145/1497577.1497582"
 ,abstract = "How to mine massive datasets is a challenging problem with great potential value. Motivated by this challenge, much effort has concentrated on developing scalable versions of machine learning algorithms. However, the cost of mining large datasets is not just computational; preparing the datasets into the right form so that learning algorithms can be applied is usually costly, due to the human labor that is typically required and a large number of choices in data preparation, which include selecting different subsets of data and aggregating data at different granularities. We make the key observation that, for a number of practically motivated problems, these choices can be defined using database queries and analyzed in an automatic and systematic manner. Specifically, we propose a new class of data-mining problem, called bellwether analysis, in which the goal is to find a few query-defined predictors (e.g., first week sales of Peoria, IL of an item) that can be used to accurately predict the result of a target query (e.g., first year worldwide sales of the item) from a large number of queries that define candidate predictors. To make a prediction for a new item, the data needed to generate such predictors has to be collected (e.g., selling the new item in Peoria, IL for a week and collecting the sales data). A useful predictor is one that has high prediction accuracy and a low data-collection cost. We call such a cost-effective predictor a bellwether.  This article introduces bellwether analysis, which integrates database query processing and predictive modeling into a single framework, and provides scalable algorithms for large datasets that cannot fit in main memory. Through a series of extensive experiments, we show that bellwethers do exist in real-world databases, and that our computation techniques achieve good efficiency on large datasets."
}
@article{woods.jdi10
 ,filename = "woods.jdi10.pdf"
 ,author = "R. Woods and L. Oliphant and K. Shinki and D. Page and J. Shavlik and E. Burnside"
 ,title = "Validation of Results from Knowledge Discovery: Mass Density as a Predictor of Breast Cancer"
 ,journal = "Journal of Digital Imaging"
 ,year = 2010
 ,volume ="23"
 ,number = "5"
 ,pages = "554-561"
 ,doi = 10.1007/s10278-009-9235-3
 ,abstract = "The purpose of our study is to identify and quantify the association between high breast mass density and breast malignancy using inductive logic programming (ILP) and conditional probabilities, and validate this association in an independent dataset. We ran our ILP algorithm on 62,219 mammographic abnormalities. We set the Aleph ILP system to generate 10,000 rules per malignant finding with a recall >5% and precision >25%. Aleph reported the best rule for each malignant finding. A total of 80 unique rules were learned. A radiologist reviewed all rules and identified potentially interesting rules. High breast mass density appeared in 24% of the learned rules. We confirmed each interesting rule by calculating the probability of malignancy given each mammographic descriptor. High mass density was the fifth highest ranked predictor. To validate the association between mass density and malignancy in an independent dataset, we collected data from 180 consecutive breast biopsies performed between 2005 and 2007. We created a logistic model with benign or malignant outcome as the dependent variable while controlling for potentially confounding factors. We calculated odds ratios based on dichomotized variables. In our logistic regression model, the independent predictors high breast mass density (OR 6.6, CI 2.5-17.6), irregular mass shape (OR 10.0, CI 3.4-29.5), spiculated mass margin (OR 20.4, CI 1.9-222.8), and subject age (Beta  = 0.09, p < 0.0001) significantly predicted malignancy. Both ILP and conditional probabilities show that high breast mass density is an important adjunct predictor of malignancy, and this association is confirmed in an independent data set of prospectively collected mammographic findings. "
}
@article{dimaio.ijdmb09
 ,filename = "dimaio.ijdmb09.pdf"
 ,author = "F. DiMaio and A. Soni and G. Phillips and J. Shavlik"
 ,title = "Spherical-Harmonic Decomposition for Molecular Recognition in Electron-Density Maps"
 ,journal = "International Journal of Data Mining and Bioinformatics"
 ,year = "2009"
 ,volume = 3
 ,number = 2
 ,pages = "205-227"
 ,nihmsid = "NIHMS68171"
 ,pmcid = "PMC2696052"
 ,doi = "10.1504/IJDMB.2009.024852"
 ,note = "(The paper is in pre-publication form.  It is an extension to: DiMaio et al. (BIBM 2007))"
 ,abstract = "Several methods for automatically constructing a protein  model from an electron-density map require searching for many small protein-fragment templates in the density. We propose to use the spherical-harmonic decomposition of the template and the maps density to speed this matching. Unlike other template-matching approaches, this allows us to eliminate large portions of the map unlikely to match any templates. We train several first-pass filters for this elimination task. We show our new template-matching method improves accuracy and reduces running time, compared to previous approaches. Finally, we extend our method to produce a structural-homology detection algorithm using electron density."
 }

@article{severtson.methods07
  ,filename    = ""
  ,author    = "D. Severtson and L. Pape and D. Page and J. Shavlik and G. Phillips and P. Flatley Brennan"
  ,title    = "Biomedical Informatics Training at the University of Wisconsin-Madison"
  ,journal     = "Methods Inf Med."
  ,year    = "2007"
  ,volume = "46 (Suppl. 1)"
  ,pages = "149-156"
  ,abstract    = "OBJECTIVES: The purpose of this paper is to describe biomedical informatics training at the University of Wisconsin-Madison (UW-Madison).<BR>
METHODS: We reviewed biomedical informatics training, research, and faculty/trainee participation at UW-Madison.<BR>
RESULTS: There are three primary approaches to training 1) The Computation & Informatics in Biology & Medicine Training Program, 2) formal biomedical informatics offered by various campus departments, and 3) individualized programs. Training at UW-Madison embodies the features of effective biomedical informatics training recommended by the American College of Medical Informatics that were delineated as: 1) curricula that integrate experiences among computational sciences and application domains, 2) individualized and interdisciplinary cross-training among a diverse cadre of trainees to develop key competencies that he or she does not initially possess, 3) participation in research and development activities, and 4) exposure to a range of basic informational and computational sciences. CONCLUSIONS: The three biomedical informatics training approaches immerse students in multidisciplinary training and education that is supported by faculty trainers who participate in collaborative research across departments. Training is provided across a range of disciplines and available at different training stages. Biomedical informatics training at UW-Madison illustrates how a large research University, with multiple departments across biological, computational and health fields, can provide effective and productive biomedical informatics training via multiple bioinformatics training approaches."
  ,note = "(Full paper not available.  PubMed ID 17700918)"
}    

@article{dimaio.bioinfo07
  ,filename    = "dimaio.bioinfo07.pdf"
  ,author    = "F. DiMaio and D. Kondrashov and E. Bitto and A. Soni and C. Bingman and G. Phillips and J. Shavlik"
  ,title    = "Creating Protein Models from Electron-Density Maps using Particle-Filtering Methods"
  ,journal     = "Bioinformatics"
  ,year    = "2007"
  ,volume = "23"
  ,pages = "2851-2858"
  ,code		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
  ,data		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
  ,doi = "10.1093/bioinformatics/btm480"
  ,pmcid = "PMC2567142"
  ,abstract    = "Motivation:  One bottleneck in high-throughput protein crystallography is interpreting an electron-density map; that is, fitting a molecular model to the 3D picture crystallography produces.  Previously, we developed ACMI, an algorithm that uses a probabilistic model to infer an accurate protein backbone layout.  Here we use a sampling method known as particle filtering to produce a set of all-atom protein models.  We use the output of Acmi to guide the particle filter's sampling, producing an accurate, physically feasible set of structures.
<br><br>Results:  We test our algorithm on ten poor-quality experimental density maps.  We show that particle filtering produces accurate all-atom models, resulting in fewer chains, lower sidechain RMS error, and reduced R factor, compared to simply placing the best-matching sidechains on ACMI's trace.   We show that our approach produces a more accurate model than three leading methods -- TEXTAL, Resolve, and ARP/wARP -- in terms of main chain completeness, sidechain identification, and crystallographic R factor."
}    

@article{dimaio.ismb06
   ,filename    = "dimaio.ismb06.pdf dimaio.ismb06.ppt"
   ,author	= "F. DiMaio and J. Shavlik and G. Phillips"
   ,title	= "A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps"
   ,journal     = "Bioinformatics, Special Issue Based on the Papers Presented at the Fourteenth International Conference on Intelligent Systems for Molecular Biology (ISMB-06), Fortaleza, Brazil"
   ,year	= "2006"
   ,volume      = 22
   ,number      = 14
   ,pages       = "e81-e89"
   ,code		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
   ,data		 = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/"
   ,abstract    = "One particularly time-consuming step in protein crystallography is interpreting the electron density map; that is, fitting a complete molecular model of the protein into a 3D image of the protein produced by the crystallographic process. In poor-quality electron density maps, the
interpretation may require a significant amount of a crystallographer's time. Our work investigates automating the time-consuming initial backbone
trace in poor-quality density maps. We describe ACMI (Automatic Crystallographic Map Interpreter), which uses a probabilistic model known as a Markov field to represent the protein. Residues of the protein are modeled as nodes in a graph, while edges model pairwise structural interactions. Modeling the protein in this manner allows the model to be flexible, considering an almost infinite number of possible conformations, while rejecting any that are physically impossible. Using an efficient algorithm for approximate inference, belief propagation, allows the most probable trace of the protein's backbone through the density map to be determined. We test ACMI on a set of ten protein density maps (at 2.5 to 4.0 Angstroms resolution), and compare our results to alternative approaches. At these resolutions, ACMI offers a more accurate backbone trace than current approaches."
}

@article{goadrich.mlj06
   ,author       = "M. Goadrich and L. Oliphant and J. Shavlik"
   ,title        = "Gleaner: Creating Ensembles of First-Order Clauses to Improve Recall-Precision Curves"
   ,journal    	= "Machine Learning"
   ,year         = "2006"
   ,volume       = "64"
   ,number	= "1/2/3"
   ,pages	= "231-262"
   ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/IE-protein-location/"
   ,filename	= "goadrich.mlj06.pdf"
   ,abstract = "Many domains in the field of Inductive Logic Programming (ILP) involve
   highly unbalanced data. A common way to measure performance in these domains is to use
   precision and recall instead of simply using accuracy. The goal of our research is to find
   new approaches within ILP particularly suited for large, highly-skewed domains.  We propose
   Gleaner, a randomized search method that collects good clauses from a broad spectrum of
   points along the recall dimension in recall-precision curves and employs an ''at least L of
   these K clauses'' thresholding method to combine sets of selected clauses. Our research
   focuses on Multi-Slot Information Extraction (IE), a task that typically involves many
   more negative examples than positive examples.  We formulate this problem into a relational
   domain, using two large testbeds involving the extraction of important relations from the
   abstracts of biomedical journal articles.  We compare Gleaner to ensembles of standard
   theories learned by Aleph, finding that Gleaner produces comparable testset results in a
   fraction of the training time."

}


@article{scott.NeuralComp92,
   author       = "G. Scott and J. Shavlik and W. Ray",
   title        = "Refining PID Controllers using Neural Networks",
   journal    	= "Neural Computation",
   year         = "1992",
   volume       = "4",
   number	= "5",
   pages	= "746-757",
   filename	= "scott.NeuralComp92.pdf"
   ,abstract = "The KBANN (Knowledge-Based Artificial Neural Networks) approach uses neural
  networks to refine knowledge that can be written in the form of simple
  propositional rules.  We extend this idea further by presenting the MANNCON
  (Multivariable Artificial Neural Network Control) algorithm by which the
  mathematical equations governing a PID (Proportional-Integral-Derivative)
  controller determine the topology and initial weights of a network, which is
  further trained using backpropagation.  We apply this method to the task of
  controlling the outflow and temperature of a water tank, producing
  statistically-significant gains in accuracy over both a standard neural
  network approach and a non-learning PID controller.  Furthermore, using the
  PID knowledge to initialize the weights of the network produces statistically
  less variation in testset accuracy when compared to networks initialized with
  small random numbers."

}


@article{mangasarian.jmlr04
  ,filename    	= "mangasarian.jmlr04.pdf"
  ,author    	= "O. Mangasarian and J. Shavlik and E. Wild"
  ,title    	= "Knowledge-Based Kernel Approximation"
  ,journal     	= "Journal of Machine Learning Research"
  ,year    	= "2004"
  ,volume	= "5"
  ,pages	= "1127-1141"
  ,abstract    	= "Prior knowledge, in the form of linear inequalities that need to be satisfied over multiple polyhedral sets, is incorporated into a function approximation generated by a linear combination of linear or nonlinear kernels. In addition, the approximation needs to satisfy conventional conditions such as having given exact or inexact function values at certain points. Determining such an approximation leads to a linear programming formulation. By using nonlinear kernels and mapping the prior polyhedral knowledge in the input space to one defined by the kernels, the prior knowledge translates into nonlinear inequalities in the original input space. Through a number of computational examples, including a real world breast cancer prognosis dataset, it is shown that prior knowledge can significantly improve function approximation."
}

@article{molla.aimag04
  ,filename    	= "molla.aimag04.pdf molla.aimag04.doc"
  ,author    	= "M. Molla and M. Waddell and D. Page and J. Shavlik"
  ,title    	= "Using Machine Learning to Design and Interpret Gene-Expression Microarrays"
  ,journal     	= "AI Magazine"
  ,note        	= "(To Appear in the Special Issue on Bioinformatics)"
  ,year    	= "2004"
  ,volume	= "25"
  ,number	= "1"
  ,pages	= "23-44"
  ,abstract    	= "Gene-expression microarrays, commonly called 'gene chips,' make it possible
to simultaneously measure the rate at which a cell or tissue is expressing - translating into a
protein - each of its thousands of genes.  One can use these comprehensive snapshots of
biological activity to infer regulatory pathways in cells, identify novel targets for drug
design, and improve the diagnosis, prognosis, and treatment planning for those suffering from
disease.  However, the amount of data this new technology produces is more than one can
manually analyze.  Hence, the need for automated analysis of microarray data offers an
opportunity for machine learning to have a significant impact on biology and medicine.  This
article describes microarray technology, the data it produces, and the types of
machine-learning tasks that naturally arise with this data.  It also reviews some of the recent
prominent applications of machine learning to gene-chip data, points to related tasks where
machine learning may have a further impact on biology and medicine, and describes additional
types of interesting data that recent advances in biotechnology allow biomedical researchers to
collect."
}

@article{bockhorst.bio03
   ,author 	= "J. Bockhorst and M. Craven and D. Page and J. Shavlik and J. Glasner"
   ,title	= "A Bayesian Network Approach to Operon Prediction"
   ,journal     = "Bioinformatics"
   ,volume	= "19"
   ,number	= "10"
   ,year        = "2003"
   ,pages	= "1227-1235"
   ,filename	= "bockhorst.bio03.pdf bockhorst.bio03.ps"
   ,abstract	= "Motivation: In order to understand transcription regulation in a
given prokaryotic genome, it is critical to identify operons, the fundamental units of
transcription, in such species.  While there are a growing number of organisms whose
sequence and gene coordinates are known, by and large their operons are not
known.  Results: We present a probabilisitic approach to predicting operons using Bayesian
networks.  Our approach exploits diverse evidence sources such as sequence and
expression data.  We evaluate our approach on the E. coli K-12 genome where our results
indicate we are able to identify over 78% of its operons at a 10% false positive
rate.  Also, empirical evaluation using a reduced set of data sources suggests that our
approach may have significant value for organisms that do not have as rich of
evidence sources as E. coli."
}

@article{eliassi-rad.umuai01
   ,author	= "T. Eliassi-Rad and J. Shavlik"
   ,title	= "A System for Building Intelligent Agents that Learn to
		   Retrieve and Extract Information"
   ,journal     = "International Journal on User Modeling and User-Adapted 
                   Interaction, Special Issue on User Modeling and Intelligent Agents"
   ,volume      = "13"
   ,number      = "1-2"
   ,year	= "2003"
   ,pages       = "35-88"
   ,filename    = "eliassi-rad.umuai01.pdf eliassi-rad.umuai01.ps"
   ,abstract    = "We present a system for rapidly and easily building instructable and
   self-adaptive software agents that retrieve and extract information.  Our Wisconsin 
   Adaptive Web Assistant (WAWA) constructs intelligent agents by accepting user preferences
   in the form of instructions.  These user-provided instructions are compiled into neural 
   networks that are responsible for the adaptive capabilities of an intelligent agent.  The
   agent's neural networks are modified via user-provided and system-constructed training 
   examples.  Users can create training examples by rating Web pages (or documents), but 
   more importantly WAWA's agents uses techniques from reinforcement learning to 
   internally create their own examples.  Users can also provide additional instruction 
   throughout the life of an agent.  Our experimental evaluations on a home-page finder agent 
   and a seminar-announcement extractor agent illustrate the value of using instructable 
   and adaptive agents for retrieving and extracting information." 
}


@article{molla.cbgi02
   ,filename    = "molla.cbgi02.pdf molla.cbgi02.doc molla.cbgi02.ppt"
   ,author	= "M. Molla and P. Andreae and J. Glasner and F. Blattner and J. Shavlik"
   ,title	= "Interpreting Microarray Expression Data Using Text Annotating the Genes"
   ,journal     = "Information Sciences"
   ,volume      = 146
   ,pages       = "75-88"
   ,note        =  "Also appears in: Proceedings of the 4th Conference on Computational Biology and Genome Informatics, Durham, NC"
   ,year	= "2002"
   ,abstract    = "Microarray expression data is being generated by the gigabyte all over the world with undoubted exponential increases to come. Annotated genomic data is also rapidly pouring into public databases. Our goal is to develop automated ways of combining these two sources of information to produce insight into the operation of cells under various conditions. Our approach is to use machine-learning techniques to identify characteristics of genes that are up-regulated or down-regulated in a particular microarray experiment. We seek models that are (a) accurate, (b) easy to interpret, and (c) stable to small variations in the training data. This paper explores the effectiveness of two standard machine-learning algorithms for this task: Na e Bayes (based on probability) and PFOIL (based on building rules). Although we do not anticipate using our learned models to predict expression levels of genes, we cast the task in a predictive framework, and evaluate the quality of the models in terms of their predictive power on genes held out from the training. The paper reports on experiments using actual E. coli microarray data, discussing the strengths and weaknesses of the two algorithms and demonstrating the trade-offs between accuracy, comprehensibility, and stability."
}

@article{tobler.ismb02
   ,filename    = "tobler.ismb02.pdf tobler.ismb02.doc tobler.ismb02.ppt"
   ,author	= "J. Tobler and M. Molla and E. Nuwaysir and R. Green and J. Shavlik"
   ,title	= "Evaluating Machine Learning Approaches for Aiding Probe Selection for Gene-Expression Arrays"
   ,journal     = "Bioinformatics, Special Issue Based on the Papers Presented at the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB-02), Edmonton, Canada"
   ,year	= "2002"
   ,volume      = 18
   ,pages       = "S164-S171"
   ,abstract    = "<b>Motivation:</b><br>Microarrays are a fast and cost-effective method of performing thousands of DNA hybridization experiments simultaneously.  DNA probes are typically used to measure the expression level of specific genes.   Because probes greatly vary in the quality of their hybridizations, choosing good probes is a difficult task.    If one could accurately choose probes that are likely to hybridize well, then fewer probes would be needed to represent each gene in a gene-expression microarray, and, hence, more genes could be placed on an array of a given physical size.  Our goal is to empirically evaluate how successfully three standard machine-learning algorithms - na e Bayes, decision trees, and artificial neural networks - can be applied to the task of predicting good probes.  Fortunately it is relatively easy to get training examples for such a learning task: place various probes on a gene chip, add a sample where the corresponding genes are highly expressed, and then record how well each probe measures the presence of its corresponding gene.  With such training examples, it is possible that an accurate predictor of probe quality can be learned. 
<br>
<b>Results:</b>
<br>
Two of the learning algorithms we investigate - na e Bayes and neural networks - learn to predict probe quality surprisingly well.  For example, in the top ten predicted probes for a given gene not used for training, on average about five rank in the top 2.5% of that gene's hundreds of possible probes.  Decision-tree induction and the simple approach of using predicted melting temperature to rank probes perform significantly worse than these two algorithms.  The features we use to represent probes are very easily computed and the time taken to score each candidate probe after training is minor.  Training the na e Bayes algorithm takes very little time, and while it takes over 10 times as long to train a neural network, that time is still not very substantial (on the order of a few hours on a desktop workstation). We also report the information contained in the features we use to describe the probes.  We find the fraction of cytosine in the probe to be the most informative feature.  We also find, not surprisingly, that the nucleotides in the middle of the probes sequence are more informative than those at the ends of the sequence."
}

@article{allex.bioinformatics
   ,author      = "C. Allex and J. Shavlik and F. Blattner"
   ,title       = "Neural Network Input Representations that Produce Accurate
Consensus Sequences from DNA Fragment Assemblies"
   ,journal     = "Bioinformatics"
   ,year        = 1999
   ,filename    = "allex.bioinformatics.pdf allex.bioinformatics.ps"
   ,volume      = 15
   ,pages       = "723-728"
   ,abstract    = " Motivation: Given inputs extracted from an aligned column 
   of DNA bases and the underlying Perkin Elmer Applied Biosystems (ABI) 
   fluorescent traces, our goal is to train a neural network to correctly 
   determine the consensus base for the column. Choosing an appropriate 
   network input representation is critical to success in this task. We 
   empirically compare five representations; one uses only base calls and 
   the others include trace information.
   
   Results: We attained the most accurate results from networks that
   incorporate trace information into their input representations. Based on
   estimates derived from using 10-fold cross-validation, the best network
   topology produces consensus accuracies ranging from 99.26% to over 99.98%
   for coverages from two to six aligned sequences. With a coverage of six,
   it makes only three errors in 20,000 consensus calls. In contrast, the 
   network that only uses base calls in its input representation has over
   double that error rate - eight errors in 20,000 consensus calls."
}


@article{opitz.jair97
   ,author	= "D. Opitz and J. Shavlik"
   ,title	= "Connectionist Theory Refinement: Genetically Searching the Space of Network Topologies"
   ,journal	= "Journal of Artificial Intelligence Research"
   ,year	= 1997
   ,filename	= "opitz.jair97.pdf opitz.jair97.ps"
   ,volume	= 6
   ,pages       = "177-209"
   ,abstract = "An algorithm that learns from a set of examples should 
       ideally be able to exploit the available resources of (a) abundant 
       computing power and (b) domain-specific knowledge to improve its 
       ability to generalize. Connectionist theory-refinement systems, which 
       use background knowledge to select a neural network's topology and 
       initial weights, have proven to be effective at exploiting domain-
       specific knowledge; however, most do not exploit available computing 
       power. This weakness occurs because they lack the ability to refine 
       the topology of the neural networks they produce, thereby limiting 
       generalization, especially when given impoverished domain theories. 
       We present the REGENT algorithm which uses (a) domain-specific 
       knowledge to help create an initial population of knowledge-based 
       neural networks and (b) genetic operators of crossover and mutation 
       (specifically designed for knowledge-based networks) to continually 
       search for better network topologies. Experiments on three real-world 
       domains indicate that our new algorithm is able to significantly 
       increase generalization compared to a standard connectionist theory-
       refinement system, as well as our previous algorithm for growing 
       knowledge-based networks. "
}

@article{craven.fgcs97
   ,author	= "M. Craven and J. Shavlik"
   ,title	= "Using Neural Networks for Data Mining"
   ,journal	= "Future Generation Computer Systems"
   ,year	= 1997
   ,filename	= "craven.fgcs97.pdf craven.fgcs97.ps"
   ,volume      = 13
   ,pages       = "211-229"
   ,abstract = "Neural networks have been successfully applied in a wide range 
      of supervised and unsupervised learning applications.  Neural-network 
      methods are not commonly used for data-mining tasks, however, because 
      they often produce incomprehensible models and require long training 
      times.  In this article, we describe neural-network learning algorithms 
      that are able to produce comprehensible models, and that do not require 
      excessive training times.  Specifically, we discuss two classes of 
      approaches for data mining with neural networks.  The first type of 
      approach, often called rule extraction, involves extracting symbolic 
      models from trained neural networks.  The second approach is to directly 
      learn simple, easy-to-understand networks.  We argue that, given the 
      current state of the art, neural-network methods deserve a place in the 
      tool boxes of data-mining specialists."
}

@article{craven.ijns97
   ,author	= "M. Craven and J. Shavlik"
   ,title	= "Understanding Time-Series Networks: A Case Study in
		   Rule Extraction"
   ,journal	= "International Journal of Neural Systems"
   ,year	= 1997
   ,filename	= "craven.ijns97.pdf craven.ijns97.ps"
   ,volume      = 8
   ,pages       = "373-384"
   ,abstract = "A significant limitation of neural networks is that the 
     representations they learn are usually incomprehensible to humans.  
     We have developed an algorithm, called TREPAN, for extracting 
     comprehensible, symbolic representations from trained neural networks.
     Given a trained network, TREPAN produces a decision tree that
     approximates the concept represented by the network.  In this article, 
     we discuss the application of TREPAN to a neural network trained on a 
     noisy time series task: predicting the Dollar-Mark exchange rate.  
     We present experiments that show that TREPAN is able to extract a 
     decision tree from this network that equals the network in terms of 
     predictive accuracy, yet provides a comprehensible concept representation.
     Moreover, our experiments indicate that decision trees induced directly
     from the training data using conventional algorithms do not match the 
     accuracy nor the comprehensibility of the tree extracted by TREPAN."
}


@article{opitz.consci96
   ,author	= "D. Opitz and J. Shavlik"
   ,title	= "Actively Searching for an Effective Neural-Network Ensemble"
   ,journal	= "Connection Science"
   ,year	= 1996
   ,volume	= 8
   ,number	= "3/4"
   ,pages	= "337-353"
   ,filename	= "opitz.consci96.pdf opitz.consci96.ps"
   ,data = "http://ftp.cs.wisc.edu/machine-learning/shavlik-group/datasets/ribosome/"
   ,abstract = " A neural-network ensemble is a very successful technique 
 where the outputs of a set of separately trained neural network are 
 combined to form one unified prediction.  An effective ensemble should 
 consist of a set of networks that are not only highly correct, but ones 
 that make their errors on different parts of the input space as well; 
 however, most existing techniques only indirectly address the problem of 
 creating such a set.  We present an algorithm called ADDEMUP that uses genetic
 algorithms to explicitly search for a highly diverse set of accurate trained
 networks.  ADDEMUP works by first creating an initial population, then uses 
 genetic operators to continually create new networks, keeping the set of 
 networks that are highly accurate while disagreeing with each other as much 
 as possible.  Experiments on four real-world domains show that ADDEMUP is 
 able to generate a set of trained networks that is more accurate than several 
 existing ensemble approaches.  Experiments also show that ADDEMUP is able 
 to effectively incorporate prior knowledge, if available, to improve the 
 quality of its ensemble."
}

@article{cherkauer.informatica95
   ,author	= "K. Cherkauer"
   ,title	= "Stuffing Mind into Computer: Knowledge and Learning for
                   Intelligent Systems"
   ,journal	= "Informatica"
   ,year	= 1995
   ,volume	= 19
   ,number	= 4
   ,pages	= "501-511"
   ,month	= "Nov"
   ,filename	= "cherkauer.informatica95.pdf cherkauer.informatica95.ps"
   ,abstract = "The task of somehow putting mind into a computer is one that has been
  pursued by artificial intelligence researchers for decades, and though
  we are getting closer, we have not caught it yet. Mind is an incredibly
  complex and poorly understood thing, but we should not let this stop us
  from continuing to strive toward the goal of intelligent computers. Two
  issues that are essential to this endeavor are knowledge and learning.
  These form the basis of human intelligence, and most people believe they
  are fundamental to achieving similar intelligence in computers. This paper
  explores issues surrounding knowledge acquisition and learning in
  intelligent artificial systems in light of both current philosophies of
  mind and the present state of artificial intelligence research. Its scope
  ranges from the mundane to the (almost) outlandish, with the goal of
  stimulating serious thought about where we are, where we would like to go,
  and how to get there in our attempts to render an intelligence in silicon."

}
  

@article{maclin.mlj96
   ,author	= "R. Maclin and J. Shavlik"
   ,title	= "Creating Advice-Taking Reinforcement Learners"
   ,journal	= "Machine Learning"
   ,volume	= 22
   ,pages	= "251-281"
   ,year	= 1996
   ,filename	= "http://www.springerlink.com/content/r867416747t553n7/fulltext.pdf maclin.mlj96.ps"
   ,abstract = "Learning from reinforcements is a promising approach for creating
   intelligent agents.  However, reinforcement learning usually requires a
   large number of training episodes.  We present and evaluate a design that
   addresses this shortcoming by allowing a connectionist Q-learner to accept
   advice given, at any time and in a natural manner, by an external observer.
   In our approach, the advice-giver watches the learner and occasionally makes
   suggestions, expressed as instructions in a simple imperative programming
   language.  Based on techniques from knowledge-based neural networks, we
   insert these programs directly into the agent's utility function.
   Subsequent reinforcement learning further integrates and refines the advice.
   We present empirical evidence that investigates several aspects of our
   approach and show that, given good advice, a learner can achieve
   statistically significant gains in expected reward.  A second experiment
   shows that advice improves the expected reward regardless of the stage of
   training at which it is given, while another study demonstrates that
   subsequent advice can result in further gains in reward.  Finally, we
   present experimental results that indicate our method is more powerful than
   a naive technique for making use of advice."
}
   

@article{opitz.kbs95
   ,author	= "D. Opitz and J. Shavlik"
   ,title	= "Dynamically Adding Symbolically Meaningful Nodes to 
                     Knowledge-Based Neural Networks"
   ,journal	= "Knowledge-Based Systems"
   ,volume	= 8
   ,number	= 6
   ,pages	= "301-311"
   ,year	= 1995
   ,filename	= "opitz.kbs95.pdf opitz.kbs95.ps"
   ,abstract = "Traditional connectionist theory-refinement systems map the dependencies
  of a domain-specific rule base into a neural network, and then refine
  this network using neural learning techniques.  Most of these systems,
  however, lack the ability to refine their network's topology and are thus
  unable to add new rules to the (reformulated) rule base.  Therefore, on
  domain theories that are lacking rules, generalization is poor, and
  training can corrupt the original rules, even those that were initially
  correct.  We present TopGen, an extension to the KBANN algorithm, that
  heuristically searches for possible expansions to KBANN's network.
  TopGen does this by dynamically adding hidden nodes to the neural
  representation of the domain theory, in a manner analogous to adding
  rules and conjuncts to the symbolic rule base.  Experiments indicate
  that our method is able to heuristically find effective places to add
  nodes to the knowledge bases of four real-world problems, as well as an
  artificial chess domain.  The experiments also verify that new nodes must
  be added in an intelligent manner.  Our algorithm showed statistically
  significant improvements over KBANN in all five domains."
}
  

@article{towell.aij94,
   author	= "G. Towell and J. Shavlik",
   title	= "Knowledge-Based Artificial Neural Networks",
   journal	= "Artificial Intelligence",
   year		= 1994,
   volume	= 70,
   number	= "1/2",
   pages	= "119-165",
   filename	= "towell.aij94.pdf towell.aij94.ps"
   ,abstract = "Explanation-based and empirical learning are two largely complementary methods
  of machine learning.  These approaches both have serious problems which
  preclude their being a general-purpose learning method.  However, a ``hybrid''
  learning method that combines explanation-based with empirical learning may be
  able to use the strengths of one learning method to address the weaknesses of
  the other, thereby resulting in a system superior to either approach in
  isolation.  KBANN (Knowledge-Based Artificial Neural Networks) is a hybrid
  learning system built on top of connectionist learning techniques.  It maps
  problem-specific ``domain theories'', represented in propositional logic, into
  neural networks and then refines this reformulated knowledge using
  backpropagation.  KBANN is evaluated by extensive empirical tests in the
  domain of molecular biology.  Among other results, these tests show that the
  networks created by KBANN are superior, in terms of their ability to correctly
  classify unseen examples, to a wide variety of learning systems, as well as
  techniques proposed by biologists."

}
  

@article{towell.mlj93,
   author	= "G. Towell and J. Shavlik",
   title	= "The Extraction of Refined Rules from Knowledge-Based
		   Neural Networks",
   journal	= "Machine Learning",
   year		= 1993,
   volume	= 13,
   number	= 1,
   pages	= "71-101",
   filename	= "http://www.springerlink.com/content/w1176lw785137344/fulltext.pdf towell.mlj93.ps"
   ,abstract = "Neural networks, despite their empirically-proven abilities, have been little
  used for the refinement of existing knowledge because this task requires a
  three-step process. First, knowledge must be inserted into a neural network.
  Second, the network must be refined. Third, the refined knowledge must be
  extracted from the network. We have previously described a method for the
  first step of this process. Standard neural learning techniques can accomplish
  the second step. In this paper, we propose and empirically evaluate a method
  for the final, and possibly most difficult, step. Our method efficiently
  extracts symbolic rules from trained neural networks. The four major results
  of empirical tests of this method are that the extracted rules:<BR>
  (1) closely reproduce the accuracy
    of the network from which they are extracted;<BR>
  (2) are superior to the rules produced by
    methods that directly refine symbolic rules;<BR>
  (3) are superior to those produced by
    previous techniques for extracting rules from trained neural networks;<BR>
  (4) are ``human comprehensible.''<BR>
   Thus, this method demonstrates that neural networks can be used to
  effectively refine symbolic knowledge.  Moreover, the rule-extraction
  technique developed herein contributes to the understanding of how symbolic
  and connectionist approaches to artificial intelligence can be
  profitably integrated."
}
  

@article{craven.aitools92,
   author	= "M. Craven and J. Shavlik",
   title	= "Visualizing Learning and Computation in Artificial
		   Neural Networks",
   year		= 1992,
   pages	= "399-425",
   journal	= "International Journal on Artificial Intelligence Tools",
   volume	= "1",
   number	= "3",
   filename	= "craven.ijait93.pdf craven.ijait93.ps"
   ,abstract = "Scientific visualization is the process of using graphical images to form
  succinct and lucid representations of numerical data.  Visualization has
  proven to be a useful method for understanding both learning and
  computation in artificial neural networks.  While providing a
  powerful and general technique for inductive learning, artificial
  neural networks are difficult to comprehend because they form
  representations that are encoded by a large number of real-valued
  parameters.  By viewing these parameters pictorially, a better
  understanding can be gained of how a network maps inputs into outputs.
  In this article, we survey a number of visualization techniques for
  understanding the learning and decision-making processes of neural
  networks.  We also describe our work in {\em knowledge-based neural
  networks} and the visualization techniques we have used to understand
  these networks.  In a knowledge-based neural network, the topology and
  initial weight values of the network are determined by an
  approximately-correct set of inference rules.  Knowledge-based
  networks are easier to interpret than conventional networks because
  of the synergy between visualization methods and the relation of the
  networks to symbolic rules."

}
  

@article{maclin.mlj93
   ,author	= "R. Maclin and J. Shavlik"
   ,title	= "Using Knowledge-based Neural Networks To Improve Algorithms:
		   Refining the {Chou-Fasman} Algorithm for Protein Folding"
   ,journal	= "Machine Learning"
   ,volume	= 11
   ,pages	= "195-215"
   ,year	= 1993
   ,filename	= "http://www.springerlink.com/content/l158848665k152n2/fulltext.pdf maclin.mlj93.ps"
   ,abstract = "This paper describes a connectionist method for refining algorithms
   represented as generalized finite-state automata.  The method translates the
   rule-like knowledge in an automaton into a corresponding artificial neural
   network, and then refines the reformulated automaton by applying
   backpropagation to a set of examples.  This technique for translating an
   automaton into a network extends the KBANN algorithm, a system that
   translates a set of propositional rules into a corresponding neural network.
   The extended system, FSKBANN, allows one to refine the large class of
   algorithms that can be represented as state-based processes.  As a test,
   FSKBANN is used to improve the Chou-Fasman algorithm, a method for
   predicting how globular proteins fold.  Empirical evidence shows that the
   multistrategy approach of FSKBANN leads to a statistically significantly
   more accurate solution than both the original Chou-Fasman algorithm and a
   neural network trained using the standard approach.  Extensive statistics
   report the types of errors made by the Chou-Fasman algorithm, the standard
   neural network, and by the FSKBANN network."
}
   

@article{craven.mlrgwp93,
   author	= "M. Craven and J. Shavlik",
   title	= "Machine Learning Approaches to Gene Recognition",
   journal	= "IEEE Expert",
   volume       = 9,
   number	= 2,
   year		= "1993",
   note		= "(The on-line file is a variant of the journal article.)",
   filename	= "craven.mlrgwp93.pdf craven.mlrgwp93.ps"
   ,abstract = "Currently, a major computational problem in molecular biology is to
  identify genes in uncharacterized DNA sequences.  The variation,
  complexity, and incompletely-understood nature of genes make it
  impractical to hand-code algorithms to recognize them.  Machine learning
  methods - which are able to form their own descriptions of genetic
  concepts - offer a promising approach to this problem.  This article
  surveys machine-learning approaches to identifying genes in DNA.  
  We discuss two broad classes of gene-recognition approaches: search by
  signal and search by content.  For both classes, we define the specific
  tasks that they address, describe how these tasks have been framed as
  machine-learning problems, and survey some of the machine-learning
  algorithms that have been applied to them."

}

@article{shavlik.tr92,
   author	= "J. Shavlik",
   title	= "A Framework for Combining Symbolic and Neural Learning",
   journal      = "Machine Learning",
   volume       = "14",
   number	= "3",
   year		= "1994",
   pages	= "321-331",
   note		= "(The local files are an extended version of the journal article.)",
   filename	= "http://www.springerlink.com/content/hkhv65jp0k652t11/fulltext.pdf shavlik.tr92.pdf shavlik.tr92.ps"
   ,abstract = "This article describes an approach to combining symbolic and connectionist
  approaches to machine learning.  A three-stage framework is presented and the
  research of several groups is reviewed with respect to this framework.  The
  first stage involves the insertion of symbolic knowledge into neural networks,
  the second addresses the refinement of this prior knowledge in its neural
  representation, while the third concerns the extraction of the refined
  symbolic knowledge.  Experimental results and open research issues are
  discussed."

}


@article{shavlik.ml91
   ,author	= "J. Shavlik and R. Mooney and G. Towell"
   ,title	= "Symbolic and Neural Network Learning Algorithms: An Experimental Comparison"
   ,journal	= "Machine Learning"
   ,volume	= 6
   ,number	= 2
   ,year	= 1991
   ,pages       = "111-143"
   ,filename 	= "http://www.springerlink.com/content/k3h16122731m9548/fulltext.pdf"
   ,abstract	= "Despite the fact that many symbolic and neural network (connectionist) learning algorithms address
   the same problem of learning from classified examples, very little is known regarding their comparative strengths
   and weaknesses. Experiments comparing the ID3 symbolic learning algorithm with the perception and backpropagation
   neural learning algorithms have been performed using five large, real-world data sets. Overall, backpropagation
   performs slightly better than the other two algorithms in terms of classification accuracy on new examples,
   but takes much longer to train. Experimental results suggest that backpropagation can work significantly better
   on data sets containing numerical data. Also analyzed empirically are the effects of (1) the amount of training
   data, (2) imperfect training examples, and (3) the encoding of the desired outputs. Backpropagation occasionally
   outperforms the other two systems when given relatively small amounts of training data. It is slightly more accurate
   than ID3 when examples are noisy or incompletely specified. Finally, backpropagation more effectively utilizes
   a ''distributed'' output encoding."
}
   
@article{shavlik.ml90
   ,author	= "J. Shavlik"
   ,title	= "Acquiring Recursive and Iterative Concepts with Explanation-Based Learning"
   ,journal	= "Machine Learning"
   ,volume	= 5
   ,number	= 1
   ,year	= 1990
   ,pages       = "39-70"
   ,filename 	= "http://www.springerlink.com/content/l09626h1ng137737/fulltext.pdf"
  ,abstract	= "In explanation-based learning, a specific problem's solution is 
   generalized into a form that can be later used to solve conceptually similar 
   problems.  Most research in explanation-based learning involves relaxing 
   constraints on the variables in the explanation of a specific example, rather 
   than generalizing the graphical structure of the explanation itself. However, 
   this precludes the acquisition of concepts where an iterative or recursive process 
   is implicitly represented in the explanation by a fixed number of applications. 
   This paper presents an algorithm that generalizes explanation structures and 
   reports empirical results that demonstrate the value of acquiring recursive 
   and iterative concepts. The BAGGER2 algorithm learns recursive and iterative 
   concepts, integrates results from multiple examples, and extracts useful 
   subconcepts during generalization.  On problems where learning a recursive 
   rule is not appropriate, the system produces the same result as as standard 
   explanation-based methods.  Applying the learned recursive rules only requires 
   a minor extension to a PROLOG-like problem solver, namely, the ability to 
   explicitly call a specific rule.  Empirical studies demonstrate that 
   generalizing the structure of explanations helps avoid the recently reported 
   negative effects of learning."
}
   
@article{shavlik.ai90,
   author       = "J. Shavlik and G. DeJong",
   title        = "Learning in Mathematically-Based Domains: Understanding and Generalizing Obstacle Cancellations",
   journal    	= "Artificial Intelligence",
   year         = "1990",
   volume       = "45",
   number	= "1-2",
   pages	= "1-45",
   filename	= "shavlik.ai90.pdf"
   ,abstract  = "Mathematical reasoning provides the basis for problem solving and
  learning in many complex domains. A model for applying explanation-based 
  learning in mathematically-based domains is presented, and an implemented 
  learning system is described.  In explanation-based learning, a specific 
  problem's solution is generalized into a form that can later be used to solve
  conceptually similar problems. The presented system's mathematical reasoning 
  processes are guided by the manner in which the variables are cancelled in 
  specific problem solutions.  Analyzing the cancellation of obstacles - 
  variables that preclude the direct evaluation of the problem's unknown - 
  leads to the generalization of the specific solution. Two important general 
  issues in explanation-based learning are also addressed.  Namely, 
  generalizing the number of entities in a situation and acquiring 
  efficiently-applicable concepts."
}
   
@article{shavlik.consci89
   ,author	= "J. Shavlik and G. Towell"
   ,title	= "Combining Explanation-based and Neural Learning: An Algorithm and Empirical Results"
   ,journal	= "Connection Science"
   ,year	= 1989
   ,volume	= 1
   ,number	= 3
   ,pages	= "233-255"
   ,filename	= "shavlik.consci89.pdf"
   ,abstract = "Machine learning is an area where both symbolic and neural 
   approaches have been heavily investigated. However, there has been little 
   research into the synergies achievable by combining these two learning 
   paradigms. A hybrid approach that combines the symbolically-oriented 
   explanation-based learning paradigm with the neural back-propagation 
   algorithm is described. Most realistic problems can never be formulated 
   exactly. However, there is much to be gained by utilizing the capacity to 
   reason nearly correctly.  In the presented EBL-ANN algorithm, a 
   ''roughly-correct'' explanatory capability leads to the acquisition of a 
   classification rule that is almost correct.  The rule is mapped into a neural 
   network, where subsequent refinement improves it. This approach overcomes 
   problems that arise when using imperfect domain theories to build 
   explanations and addresses the problem of choosing a good initial neural 
   network configuration.  Empirical results show that the hybrid system more 
   accurately learns concepts than an explanation-based system by itself and that
   the hybrid also learns much faster than a neural learning system by itself."
 }  

   -------------
   BOOK CHAPTERS
   -------------
@incollection{torrey.springer09,
 author = "L. Torrey and J. Shavlik and T. Walker and R. Maclin",
 filename = "torrey.springer09.pdf",
 title = "Transfer Learning via Advice Taking",
 booktitle = "Recent Advances in Machine Learning, dedicated to the memory of Ryszard S. Michalski",
 publisher = "Springer Studies in Computational Intelligence",
 year = "2009",
 editor = "J. Koronacki and S. Wirzchon and Z. Ras and J. Kacprzyk",
 abstract = "The goal of transfer learning is to speed up learning in a new task by transferring knowledge from one or more related source tasks. We describe a transfer method in which a reinforcement learner analyzes its experience in the source task and learns rules to use as advice in the target task. The rules, which are learned via inductive logic programming, describe the conditions under which an action is successful in the source task. The advice-taking algorithm used in the target task allows a reinforcement learner to benefit from rules even if they are imperfect. A human-provided mapping describes the alignment between the source and target tasks, and may also include advice about the differences between them. Using three tasks in the RoboCup simulated soccer domain, we demonstrate that this transfer method can speed up reinforcement learning substantially."
}

@incollection{torrey.handbook09,
  author 	= "L. Torrey and J. Shavlik",
  title 	= "Transfer Learning",
  booktitle 	= "Handbook of Research on Machine Learning Applications",
  filename 	= "torrey.handbook09.pdf",
  year 		= 2009,
  editor 	= "E. Soria and J. Martin and R. Magdalena and M. Martinez and A. Serrano",
  publisher 	= "IGI Global",
  abstract	= "Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned. While most machine learning algorithms are designed to address single tasks, the development of algorithms that facilitate transfer learning is a topic of ongoing interest in the machine-learning community. This chapter provides an introduction to the goals, formulations, and challenges of transfer learning. It surveys current research in this area, giving an overview of the state of the art and outlining the open problems. The survey covers transfer in both inductive learning and reinforcement learning, and discusses the issues of negative transfer and task mapping in depth."
}
@incollection{dimaio.crc08
   ,author      = "F. DiMaio and A. Soni and J. Shavlik"
   ,title       = "Machine Learning in Structural Biology: Interpreting 3D Protein Images"
   ,booktitle   = "Introduction to Machine Learning and Bioinformatics"
   ,editor      = "S. Mitra and S. Datta and T. Perkins and G. Michailidis"
   ,filename	= dimaio.crc08.pdf
   ,year        = 2008
   ,pages	= "237-276"
   ,publisher   = "Chapman & Hall/CRC Press"
   ,note 	= "(The on-line version is a pre-publication version of the chapter)"
   ,abstract    = "This chapter discusses an important problem that arises in structural biology: given an electron density map - a three-dimensional 'image' of a protein produced from crystallography - identify the chain of protein atoms contained within the image.  This introduction describes in detail the problem of density map interpretation, a background on the algorithms used in automatic interpretation, and a high-level overview of automated map interpretation. The chapter also describes four methods in detail, presenting them in chronological order of development and concludes with a discussion of the advantages and shortcomings of each method, as well as future research directions."
}

@incollection{torrey.springer08
   ,author      = "L. Torrey and J. Shavlik and T. Walker and R. Maclin"
   ,title       = "Rule Extraction for Transfer Learning"
   ,booktitle   = "Rule Extraction from Support Vector Machines"
   ,editor      = "J. Diederich"
   ,filename	= torrey.springer08.pdf
   ,year        = 2008
   ,pages	= "67-82"
   ,publisher   = "Springer"
   ,abstract    = "This chapter discusses transfer learning, which is one practical application of rule extraction. In transfer learning, information from one learning experience is applied to speed up learning in a related task. The chapter describes several techniques for transfer learning in SVM-based reinforcement learning, and shows results from a case study."   
}

@incollection{davis.srl07
   ,author      = "J. Davis and E. Burnside and I. Dutra and D. Page and R. Ramakrishnan and J. Shavlik and V. Santos Costa"
   ,title       = "Learning a New View of a Database: With an Application in Mammography"
   ,booktitle   = "Introduction to Statistical Relational Learning"
   ,editor      = "L. Getoor and B. Taskar"
   ,year        = 2007
   ,filename    = "davis.srl07.pdf"
   ,publisher   = "MIT Press"
   ,abstract    = "(This book chapter does not contain an abstract.)"
   
}

@incollection{mooney.ld02
   ,author      = "R. Mooney and P. Melville and L. Tang and J. Shavlik and I. Dutra and D. Page and V. Santos Costa"
   ,title       = "Relational Data Mining with Inductive Logic Programming for Link Discovery"
   ,booktitle   = "Data Mining: Next Generation Challenges and Future Directions"
   ,editor      = "H. Kargupta and A. Joshi"
   ,pages       = "239-254"
   ,year        = 2003
   ,filename    = "mooney.ld02.pdf mooney.ld02.ps"
   ,publisher   = "AAAI/MIT Press"
   ,abstract    = "Link discovery (LD) is an important task in data mining for
counter-terrorism and is the focus of DARPA's Evidence Extraction and Link
Discovery (EELD) research program.  Link discovery concerns the identification
of complex relational patterns that indicate potentially threatening activities
in large amounts of relational data.  Most data-mining methods assume data is
in the form of a feature-vector (a single relational table) and cannot handle
multi-relational data.  Inductive logic programming is a form of
relational data mining that discovers rules in first-order logic from
multi-relational data.  This paper discusses the application of ILP to
learning patterns for link discovery. "
   
}

@incollection{eliassi-rad.springer2001
   ,author      = "T. Eliassi-Rad and J. Shavlik"
   ,title       = "Intelligent Web Agents that Learn to Retrieve and Extract Information"
   ,booktitle   = "Intelligent Exploration of the Web"
   ,editor      = "P.S. Szczepaniak and F. Segovia and J. Kacprzyk and L.A. Zadeh"
   ,year        = 2003
   ,pages       = "255-274"
   ,filename    = "eliassi-rad.springer2001.pdf eliassi-rad.springer2001.doc"
   ,publisher   = "Springer-Verlag"
   ,abstract    = "We describe systems that use machine learning methods to retrieve and/or extract textual information from the Web.  In particular, we present our Wisconsin Adaptive Web Assistant (WAWA), which constructs a Web agent by accepting user preferences in form of instructions and adapting the agent's behavior as it encounters new information.  Our approach enables WAWA to rapidly build instructable and self-adaptive Web agents for both the information retrieval (IR) and information extraction (IE) tasks.  WAWA uses two neural networks, which provide adaptive capabilities for its agents.  User-provided instructions are compiled into these neural networks and are modified via training examples.  Users can create these training examples by rating pages that WAWA retrieves, but more importantly our system uses techniques from reinforcement learning to internally create its own examples.  Users can also provide additional instruction throughout the life of an agent.  Empirical results on several domains show the advantages of our approach."
   
}

@incollection{craven.mlrgwp92
   ,author      = "M. Craven and J. Shavlik"
   ,title       = "Investigating the Value of a Good Input Representation"
   ,booktitle   = "Computational Learning Theory and Natural Learning Systems, Volume III"
   ,editor      = "T. Petsche and S. Hanson and J. Shavlik"
   ,year        = 1995
   ,filename    = "craven.mlrgwp92.pdf craven.mlrgwp92.ps"
   ,publisher   = "{MIT} Press"
   
}


@incollection{opitz.clnl95
   ,author      = "D. Opitz and J. Shavlik"
   ,title       = "Using Heuristic Search to Expand Knowledge-Based
                   Neural Networks"
   ,booktitle   = "Computational Learning Theory and Natural Learning Systems,
                   Volume III"
   ,editor      = "T. Petsche and S. Hanson and J. Shavlik"
   ,year        = 1995
   ,pages       = "3-19"
   ,filename    = "opitz.clnl95.pdf opitz.clnl95.ps"
   ,publisher   = "{MIT} Press"
   
}
   Note: this work has been superseded by opitz.tr94.ps


@incollection{maclin.mlrgwp91
   ,author      = "R. Maclin and J. Shavlik"
   ,title       = "Refining Algorithms with Knowledge-Based Neural Networks:
                   Improving the Chou-Fasman Algorithm for Protein Folding"
   ,booktitle   = "Computational Learning Theory and Natural Learning Systems,
                   Volume I"
   ,editor      = "S. Hanson and G. Drastal and R. Rivest"
   ,year        = 1994
   ,filename    = "maclin.mlrgwp91.pdf maclin.mlrgwp91.ps"
   ,publisher   = "{MIT} Press"
   
}
   Note: see also maclin.aaai92


@incollection{cherkauer.inbook94,
   author	= "K. Cherkauer and J. Shavlik",
   title	= "Selecting Salient Features for Machine Learning from
		   Large Candidate Pools through Parallel Decision-Tree
		   Construction",
   booktitle	= "Massively Parallel Artificial Intelligence",
   editor	= "H. Kitano",
   year		= "1994",
   pages	= "102-136",
   address	= "Menlo Park, CA",
   publisher	= "AAAI Press/The MIT Press",
   filename	= "cherkauer.mpai.pdf cherkauer.mpai.ps"
   ,abstract = "The particular representation used to describe training and testing examples
  can have profound effects on an inductive algorithm's ability to learn.
  However, the space of possible representations is virtually infinite, so
  choosing a good representation is not a simple task. This chapter describes
  a method whereby the selection of a good input representation for
  classification tasks is automated. This technique, which we call DT-Select
  (``Decision Tree feature Selection''), builds decision trees, via a fast
  parallel implementation of ID3 (Quinlan, 1986), which attempt to correctly
  classify the training data. The internal nodes of the trees are features
  drawn from very large pools of complex general-purpose and domain-specific
  constructed features. Thus, the features included in the trees constitute
  compact and informative sets which can then be used as input representations
  for other learning algorithms attacking the same problem. We have implemented
  DT-Select on a parallel message-passing MIMD architecture, the Thinking
  Machines CM-5, enabling us to select from pools containing several hundred
  thousand features in reasonable time. We present here some work using this
  approach to produce augmentations of artificial neural network input
  representations for the molecular biology problem of predicting protein
  secondary structures."

}
  

@incollection{towell.ml493,
   author	= "G. Towell and J. Shavlik",
   title	= "Refining Symbolic Knowledge Using Neural Networks",
   booktitle	= "Machine Learning: An Integrated Approach",
   volume	= 4,
   editor	= "R. S. Michalski and G. Tecuci",
   year		= "1993",
   address	= "San Mateo, CA",
   publisher	= "Morgan Kaufmann",
   filename	= "towell.ml4.pdf towell.ml4.ps"
   
}


   -----------------
   TECHNICAL REPORTS
   -----------------


@techreport{fung.tr01
   ,author	= "G. Fung and O. Mangasarian and J. Shavlik"
   ,title	= "Knowledge-based Support Vector Machine Classifiers"
   ,institution	= "Data Mining Institute, University of Wisconsin"
   ,number	= "DMI TR 01-09"
   ,year	= "2001"
   ,note	= "(A shorter version of this paper appears in
                   Advances in Neural Information Processing [NIPS], 2002)"
   ,filename	= "fung.tr01.pdf fung.tr01.ps"
   ,abstract    = "Prior knowledge in the form of multiple polyhedral sets, each
  belonging to one of two categories, is introduced into a reformulation of a linear
  support vector machine classifier.  The resulting formulation leads to a linear program
  that can be solved efficiently.  Real world examples, from DNA sequencing and breast
  cancer prognosis, demonstrate the effectiveness of the proposed method.  Numerical 
  results show improvement in the test set accuracy after the incorporation of prior 
  knowledge into ordinary, data-based linear support vector machine classifiers.  One
  experiment also shows that a linear classifier, based solely on prior knowledge, far
  outperforms the direct application of prior knowledge rules to classify data."
}

@techreport{maclin.tr94
   ,author	= "R. Maclin and J. Shavlik"
   ,title	= "Incorporating Advice into Agents that Learn from
		   Reinforcements"
   ,institution	= "Department of Computer Sciences, University of Wisconsin"
   ,number	= "UW TR 1227"
   ,year	= "1994"
   ,note	= "(A shorter version appears in AAAI-94.)"
   ,filename	= "maclin.tr94.pdf maclin.tr94.ps"
   ,abstract =  "Learning from reinforcements is a promising approach for creating intelligent
  agents.  However, reinforcement learning usually requires a large number of
  training episodes.  We present a system called RATLE that addresses this
  shortcoming by allowing a connectionist Q-learner to accept advice given, at 
  any time and in a natural manner, by an external observer.  In RATLE, the
  advice-giver watches the learner and occasionally makes suggestions, expressed
  as instructions in a simple programming language.  Based on techniques from
  knowledge-based neural networks, RATLE inserts these programs directly into 
  the agent's utility function.  Subsequent reinforcement learning further 
  integrates and refines the advice.  We present empirical evidence that shows 
  our approach leads to statistically-significant gains in expected reward.
  Importantly, the advice improves the expected reward regardless of the stage 
  of training at which it is given."
}
  

   --------------
   WORKING PAPERS
   --------------
@techreport{dimaio.mlrgwp06
   ,author	= "F. DiMaio and J. Shavlik"
   ,filename	= "dimaio.mlrgwp06.pdf"
   ,title	= "Improving the Efficiency of Belief Propagation in Large Highly Connected Graphs"  
   ,institution	= "Department of Computer Sciences, University of Wisconsin"
   ,number	= "Machine Learning Research Group Working Paper 06-1"
   ,year	= "2006"
   ,abstract    = "We describe a part-based object-recognition framework, specialized to mining complex 3D objects from detailed 3D images. Objects are modeled as a collection of parts together with a pairwise potential function. The algorithm's key component is an efficient inference algorithm, based on belief propagation, that finds the optimal layout of parts, given some input image. Belief Propagation (BP) - a message passing method for approximate inference in graphical models - is well suited to this task. However, for large objects with many parts, even BP may be intractable. We present AggBP, a message aggregation scheme for BP, in which groups of messages are approximated as a single message, producing a message update analogous to that of mean-field methods. For objects consisting of N parts, we reduce CPU time and memory requirements from O(N^2) to O(N). We apply AggBP to both real-world and synthetic tasks. First, we use our framework to recognize protein fragments in three-dimensional images. Scaling BP to this task for even average-sized proteins is infeasible without our enhancements. We then use a synthetic ''object generator'' to test our algorithm's ability to locate a wide variety of part-based objects. These experiments show that our improvements result in minimal loss of accuracy, and in some cases produce a more accurate solution than standard BP."
}
@techreport{torrey.mlrgwp06
   ,author	= "L. Torrey and J. Shavlik and T. Walker and R. Maclin"
   ,title	= "Transfer Learning in Robocup"
   ,institution	= "Department of Computer Sciences, University of Wisconsin"
   ,number	= "Machine Learning Research Group Working Paper 06-2"
   ,year	= "2006"
   ,filename	= "torrey.mlrgwp06.pdf"
   ,abstract    = ""
}
@techreport{molla.mlrgwp04
   ,author	= "M. Molla and P. Andreae and J. Shavlik"
   ,title	= "Building Genome Expression Models using Microarray Expression Data and Text"
   ,institution	= "Department of Computer Sciences, University of Wisconsin"
   ,number	= "Machine Learning Research Group Working Paper 04-1"
   ,year	= "2004"
   ,filename	= "molla.mlrgwp04.pdf molla.mlrgwp04.doc"
   ,abstract    = "Microarray expression data is being generated by the gigabyte all over the world with undoubted exponential increases to come. Annotated genomic data is also rapidly pouring into public databases. Our goal is to develop automated ways of combining these two sources of information to produce insight into the operation of cells under various conditions. Our approach is to use machine-learning techniques to identify characteristics of genes that are up-regulated or down-regulated in a particular microarray experiment. We seek models that are both accurate and easy to interpret. This paper explores the effectiveness of two algorithms for this task: PFOIL (a standard machine-learning rule-building algorithm) and GORB (a new rule-building algorithm diviseddevised by us). We use a permutation test to evaluate the statistical significancequality of the learned models. The paper reports on experiments using actual E. coli microarray data, discussing the strengths and weaknesses of the two algorithms and demonstrating the trade-offs between accuracy and comprehensibility."
}

@techreport{craven.mlrgwp99
   ,author	= "M. Craven and J. Shavlik"
   ,title	= "Rule Extraction: Where Do We Go from Here?"
   ,institution	= "Department of Computer Sciences, University of Wisconsin"
   ,number	= "Machine Learning Research Group Working Paper 99-1"
   ,year	= "1999"
   ,filename    = "craven.mlrgwp99.pdf craven.mlrgwp99.ps"
   ,abstract    = "We argue that despite being an actively researched area 
   for nearly a decade, rule-extraction technology has not made as significant
   of an impact as it should have.  A confluence of trends, however, has made 
   the ability to extract comprehensible descriptions from complex learned models
   more important now than ever.  We argue that rule-extraction methods can have 
   a significant impact in the overlapping data-mining, machine-learning and 
   neural-network communities if research is focused on several commonly overlooked 
   issues.  We then briefly describe how we have tried to address these issues in 
   our own work."
}
   
@techreport{craven.mlrgwp95
   ,author	= "M. Craven and J. Shavlik"
   ,title	= "Extracting Tree-Structured Representations of Trained Networks"
   ,institution	= "Department of Computer Sciences, University of Wisconsin"
   ,number	= "Machine Learning Research Group Working Paper 95-1"
   ,year	= "1995"
   
}
  Note: this work has been superseded by craven.nips96.ps

@techreport{opitz.mlrgwp95,
   author	= "D. Opitz and J. Shavlik",
   title	= "Generating Accurate and Diverse Members of a Neural-Network 
                   Ensemble",
   institution	= "Department of Computer Sciences, University of Wisconsin",
   number	= "Machine Learning Research Group Working Paper 95-2",
   year		= "1995",
   note         = "(A version of this paper appears in {Advances in Neural
                   Information Processing, vol. 8,} 1996)",
   filename	= "opitz.mlrgwp95.pdf opitz.mlrgwp95.ps"
   ,abstract = "Neural-network ensembles have been shown to be very accurate classification
   techniques.  Previous work has shown that an effective ensemble should 
   consist of networks that are not only highly correct, but ones that make 
   their errors on different parts of the input space as well.  Most existing 
   techniques, however, only indirectly address the problem of creating such 
   a set of networks.  In this paper we present a technique called ADDEMUP 
   that uses genetic algorithms to directly search for an accurate and diverse 
   set of trained networks.  ADDEMUP works by first creating an initial 
   population, then uses genetic operators to continually create new networks, 
   keeping the set of networks that are as accurate as possible while 
   disagreeing with each other as much as possible.  Experiments on three DNA 
   problems show that ADDEMUP is able to generate a set of trained networks 
   that is more accurate than several existing approaches.  Experiments also 
   show that ADDEMUP is able to effectively incorporate prior knowledge, if 
   available, to improve the quality of its ensemble."

}
  Note: this work has been superseded by opitz.nips96.ps
   

@techreport{cherkauer.mlrgwp95,
   author	= "K. Cherkauer and J. Shavlik",
   title	= "Rapid Quality Estimation of Neural Network Input
                   Representations",
   institution	= "Department of Computer Sciences, University of Wisconsin",
   number	= "Machine Learning Research Group Working Paper 95-3",
   year		= "1995",
   note		= "(A version of this paper appears in {\em Advances in Neural
                   Information Processing, vol. 8,} 1996)",
   
}
  Note: this work has been superseded by cherkauer.nips96.ps


@techreport{gutstein.mlrgwp92,
   author	= "E. Gutstein",
   title	= "Learning from Students to Improve an Intelligent
		   Tutoring System",
   institution	= "Department of Computer Sciences, University of Wisconsin",
   number	= "Machine Learning Research Group Working Paper 92-3",
   year		= "1992",
   filename	= "gutstein.mlrgwp92.pdf gutstein.mlrgwp92.ps"
}