Artificial Intelligence If you are taking the Breadth Exam: Answer any four (4) out of the six Basic questions B1-B6. If you are taking the Depth Exam: Answer any four (4) out of the six Basic questions B1-B6 and answer any four (4) out of the 13 Advanced Depth questions D1-D13. Breadth (Basic) Questions B1. Search strategies can be characterized in terms of two factors: "Recovery of Pursuit" and "Scope of Evaluation." Recovery of Pursuit is the degree to which a search strategy allows recovery from disappointing search avenues to reaccess previously suspended alternatives. Scope of Evaluation is the number of alternatives considered in each decision. (a) Classify hill-climbing, depth-first, beam and best-first searches according to their relative values along these two dimensions. (b) Consider the following hybrid search strategy: BF-DF, in which best-first search is applied initially until a memory limit is reached and then depth-first search begins from the best leaf node. If it fails, depth-first begins again from the second best node, etc. Comment on the relative advantages and disadvantages of BF-DF over pure best-first and pure depth-first. B2. What is wrong with the following argument? People are widely distributed over the earth. Socrates is a person. Therefore, Socrates is widely distributed over the earth. How should the facts represented by these sentences be represented in first-order predicate logic so that this problem does not arise? B3. Explain the difference between forward chaining and backward chaining. Give three concrete examples: one in which forward chaining is better, one in which backward chaining is better, and one in which they are each about equally as effective. B4. Consider the task of learning from classified examples. Describe and analyze the similarities and differences of Rumelhart's backpropagation and Michalski's AQ algorithms with respect to this task. Discuss which differences you feel are superficial and which are fundamental. B5. Characterize GPS, show its structure, and illustrate how it works with a simple example (say from the blocks world). Next, describe one major deficiency of GPS as a planner that TWEAK overcame. Finally, describe one major shortcoming of TWEAK. As much as possible, give concrete examples. B6. It is important that AI systems be robust with respect to noisy data, uncertain and changing environments, inference rules that are incorrect in some situations, etc. Choose two different AI subfields (e.g., planning and natural language) and describe an existing approach from each that addresses some aspect of dealing with imperfect knowledge. Discuss the primary strengths and major weaknesses of each approach you select. Advanced Depth Questions D1. Individual neurons have been identified high up in the visual system of monekys that fire selectively to a particular face, in 70-200 milliseconds. But neurons are extremely slow computers, taking about 1.5 milliseconds to fire; so the serial depth of this entire perceptual process is only roughly 45-130 at most. The connections between the 20 or more visual areas involved appear to suggest that the minimal serial depth of the total network of neurons that information flows through is roughly 15-30. How can a computer vision program be got to achieve recognition of such complex objects as faces within such a shallow serial depth? Describe the kind of system (including the computer hardware on which it runs) that is most appropriate. Justify this choice, explaining why you think it is best, but also describing its faults and the problems that must be overcome to actually achieve a successful system within these time constraints. D2. Perceptual recognition systems often match "models" of each different class of object with the input image (or the image transformed). Give two contrasting examples of the kind of model used, for each of two different objects. Explain how complex objects (e.g., chair, house), where there can be many different types (e.g., ladderback, beanbag, wingback), can be handled. Describe an overall structure for such a system whose time complexity remains, for practical purposes, about the same even though the number of different object classes that must be recognized increases to to several thousand or so. D3. Three-dimensional object recognition from a single two- dimensional image requires the solution of two sub-problems: identifying what object we are looking at, and identifying its three-dimensional position and orientation relative to the viewer. These two sub-problems could be solved sequentially in either order, or simultaneously. Furthermore, this could be done hierarchically if the object can be defined in terms of parts and sub-parts. Compare and contrast two algorithms which have been developed which take different approaches to this process. Focus on the major ideas of each which illustrate the key assumptions and strategies used. D4. (a) Define the concept of gradient space as used in computer vision. (b) Describe how it has been used as a representation in (i) shape-from-shading and in (ii) shape-from-contour algorithms. Be specific as to the precise role that gradient space plays in these algorithms. D5. A great part of current work on connectionism involves backpropagation and it seems most of that is concerned with techniques for speeding it up. Why do you suppose it's so slow? Describe or propose a method for improving its speed. Does speeding up backpropagation mean that a loss of classification accuracy (on future examples) is inevitable? Assuming you are willing to accept slightly reduced future accuracy, briefly describe or propose a second method for reducing training time. D6. Briefly compare empirical (i.e., similarity-based) machine learning to explanation-based. What type of performance improvement does each paradigm primarily focus upon? What are the major strengths and weaknesses of each approach? Describe how these two paradigms could be integrated. Consider the merits of doing so and briefly describe an existing approach. What do you feel is the primary research question that must be solved in order to realize the promise of integrated learning? Justify your answer. D7. Consider the following about learning from examples. Assume that ex is an example who's classification is unknown and that train is a collection of preclassified training examples. In order to classify ex, inductive learning algorithms, such as ID3, would consider all of the training examples. However, exemplar (i.e., instance-based) algorithms would only consider those training examples near ex in feature space. Consider the hypothesis that systems like Quinlan's ID3 would benefit by making their decisions based on a "local" subset of the training examples. That is, rather than using all of train, they would only use the nearest-neighbors of ex. Briefly sketch how a "local-ID3" would work. Do you think the hypothesis is promising? Why or why not? Briefly propose an empirical study that would address the hypothesis. Describe what results you believe would be obtained from your experiment. Assume for the moment that some study showed looking at a local subset of the training data lead to increased classification accuracy. What negative traits would "local-ID3" possess? D8. Convert the following formulas to clauses and derive the empty clause, [], using paramodulation and binary resolution. Note: not all the formulas are needed for the proof. X Y ( X < Y -> f(X) < f(Y) ) - X Y ( X < Y -> g(X) < g(Y) ) X (f(g(X)) = X) X (g(f(X)) = X) X Y Z ( (X< Y Y X < Z)) X Y ( X = Y X < Y Y < X). X ( - (X < Z)) D9. Consider a search problem. As a simple example, say you want to print out all solutions to the 8 queens problem. Such problems are often easy to program in Prolog, since Prolog handles the searching for you. Also, many such problems exhibit a potential for parallelism; for example, in the queens problem, the solution set neatly partitions into subsets which could be generated in parallel. So, you might wish to program this in a parallel Prolog. Explain the advantages and disadvantages of programming such a search problem in a committed choice language, such as Parlog, GHC, or Concurrent Prolog. D10. Most current expert system shells are essentially re- implementations of EMYCIN. As such, what kinds of problems are they good for, what kinds of knowledge formalization do they use, and what kinds of inference can they do? By characterizing the knowledge formalization and the inference in terms of the first-order predicate calculus, indicate what kinds of knowledge they can't represent and what kinds of inference they can't do. D11. Illustrate the kinds of explanation of which contemporary state-of-the-art expert systems are capable and comment on what is missing compared to how a human expert explains his or her advice to a client and compared to how a human teacher explains things to a student. Describe some of the research prototypes that have been built to explore improved explanation capabilities, e.g., those of Clancey and Swartout. Why is good explanation capability so important for expert systems? D12. Discuss the transfer grammar and interlingua approaches to machine translation. Name one system of each type. What relation do the two approaches have to "knowledge-based" machine translation? D13. Describe the task of anaphora resolution in interactive and non-interactive discourse and text processing systems. Describe the various strategies that have been used or proposed to handle problems of anaphora resolution. How successful have they been or how successful are they likely to be?