Artificial Intelligence If you are taking the Breadth Exam: Answer any 4 out of the 6 Basic questions B1 - B6. If you are taking the Depth Exam: Answer any 4 out of the 6 Basic questions B1 - B6, and answer any 4 out of the 13 Advanced Depth questions D1 - D13. Breadth (Basic) Questions B1. The bias of a learning algorithm is defined to be all factors, other than the provided training examples, that influence the selection of possible solutions to the learning task at hand. Provide a rough computational complexity argument that justifies the need for bias in a learning algorithm. For two (2) out of the three (3) learning algorithms listed below, describe their biases and the role these biases play. Michalski's AQ Mitchell et al.'s EBG Rumelhart et al.'s Back-Propagation B2. One may attempt to parse English sentences as a context-free grammar -- that is, using productions such as ``S -> NP VP''. Consider the sentences The boy painted the bottom of the barrel with a paintbrush. The boy painted the barrel with green paint after dinner. The boy painted the barrel with the paintbrush with short bristles. Describe a simple context-free grammar which will yield a correct parse (among others) of these sentences. How many (not necessarily semantically correct) parses will be obtained for these sentences? Explain how a language recognition system might disambiguate these and find the correct parses. B3. Explain how alpha-beta search works for the game tic- tac-toe. More specifically, the players take turns putting their mark on a 3 by 3 board, and the first one to have 3 marks in a line (horizontal, vertical, or diagonal) wins. Do your search to depth 2 (you go, your opponent goes). Say you move first, and show in detail how the first move is determined. As a heuristic, use the number of threats by you minus the number of threats by your opponent, where a threat by a player is a line with 1 or 2 of that player's marks on it and none of the opponent's. Explore the center move first. You don't have to draw the entire tree search, but show enough to see explicitly how alpha-beta prunes the min-max tree. How many nodes of the min-max tree are left unsearched by alpha-beta? B4. One of the more important programs in the history of artificial intelligence has been the General Problem Solver (GPS) of Newell and Simon. Describe the main components of GPS, show their recursive structuring, illustrate the application of GPS to some particular problem domain by giving examples for that domain of GPS differences and operators, and describe why and how artificial intelligence scientists have decided that the "weak methods" of very general programs like GPS need to be supplemented. B5. (a) What are hidden units in connectionist networks? Why are they important? Consider both what can be expressed and what can be learned with and without hidden units. (b) Assume you have a training set of N examples and a testing set of M examples. Also assume that a connectionist network is trained, using the training set, until convergence (where convergence is defined as classification correctness on the training set remaining constant for 100,000 epochs). Qualitatively plot, as function of the number of hidden units, predicted classification correctness (at convergence) for both the training and test sets. Explain the form of your two curves. B6. Given an initial description of the world, the occurrence of some events, and some notion of causality, we need to infer what is true after the events occur. To do this, we need some kind of temporal representation and reasoning system. (a) How are basic events represented in the situation calculus? Give an example. (b) How are basic events represented in Allen's temporal logic? Give an example. How does it differ from the situation calculus? (c) What functions do the frame axioms in the situation calculus and the transitivity axioms in Allen's theory serve? Describe briefly. Advanced Depth Questions D1. Some explanation-based learning systems learn control knowledge, while others learn macro-operators. Describe the difference between these two types of acquired knowledge, and indicate the advantages and disadvantages of learning each type. Briefly describe one developed system that learns control knowledge and one that acquires macro-operators. Design an experiment that compares learning macro-operators to learning control knowledge. Describe the dependent and independent variables, and clearly state the experimental hypotheses. D2. Wyatt I. Gott has just received a large database of feature vectors (i.e., lists of attribute-value pairs). Each entry in the database contains the history of some small business over the last year (e.g., number of employees, products, monthly sales, monthly expenses, etc.). Wyatt assigns you the ill-defined task of producing interesting statements about this database. For four (4) of the eight (8) machine learning paradigms listed below, state how you would apply it to your assigned task or argue why doing so would be inappropriate. Be as specific as possible, mentioning specific systems where appropriate. Be sure to discuss issues such as computational tractability, sensitivity to noise and missing feature values, the need for you to provide a small amount of additional information (and any leverage this provides), etc. Similarity-Based Learning from Examples Conceptual Clustering and Discovery Systems Explanation-Based Learning Connectionist Learning Genetic Algorithms Analogical/Case-Based/Exemplar-Based Learning Formal Learnability Theory Integrated Approaches D3. Give a precise description of one-layer and multi-layer connectionist networks - the overall topology, the functions computed, the way learning works. Describe how these have been used, or could best be used, to handle one (1) of the following problems: object recognition; word pronunciation; logical expression evaluation (including but not limited to exclusive-or). Describe how well the approach you just presented worked (or would work), and suggest how it could be extended to address one (1) of the following issues: accommodation of delayed feedback during training choice of a good network topology incorporation of recursion D4. You've just received a call from NASA to help process and analyze images of Neptune received from Voyager. The set of images is a ``mosaic'' of partially overlapping images, each with differing imaging conditions (e.g., lighting and exposure time). Briefly describe procedures that you would try in order to reconstruct and enhance a single composite image from the set. D5. Multiresolution image representations can be used in many ways in computer vision. Two examples are fast feature detection and coarse-to-fine search. (a) Given an N x N image f and an M x M template t, specify how a match measure can be defined. (b) Instead of doing matching based on the given resolutions described in (a), say we wanted to match t at a finer scale. Given an increase in scale of s, briefly describe two alternative ways of changing the scale of f and/or t to achieve the desired result. What is the difference in cost of your two approaches? (c) Describe how coarse-to-fine search has been used in a published algorithm; e.g., for model-based recognition, or for stereo or motion analysis. D6. You are asked to start a project to develop a computer vision system that will guard buildings by taking a TV picture of each person, and admitting them only if the face is recognized. Explain what the problems are, and assess the difficulties and chances of success. Describe two approaches you would explore, and the previous work you would build on. Describe and assess the advantages and the problems with each approach. D7. Describe two contrasting multi-computer architectures that are appropriate for developing fast, real-time vision systems (ones that can recognize and track moving objects imaged in TV frames on the order of 500-by-500 resolution input roughly 30 times a second). Describe how big and fast each architecture is in its best present-day version (this might only be a paper design), and specify how big and fast it would need to be to handle this task. Describe the kind of program you would develop for each architecture, making clear where it is and is not important that the program fit the architecture. D8. In what sense are so-called "expert systems" expert? Present the typical structure and describe the usual capabilities of expert systems, illustrated in terms of two actual examples from the literature, and comment on the extent to which such systems do and do not constitute models of intelligence. D9. The capabilities of any particular expert system could be realized by means of conventional programming. Contrast the system structuring and implementation techniques of expert system technology with those of conventional programming, and comment on the relative advantages and disadvantages of the two approaches. D10. Consider the following clauses: 1. X +o 1 = X 2. X +o i(X) = 1 3. X +o (Y +o Z) = X +o (Y +o Z) 4. a +o a = a 5. - (a = 1) Add the standard equality axioms, and derive [] from these clauses using binary resolution only. Note that without the equality axioms, you cannot derive anything from these by binary resolution. X,Y,Z are variables; a and 1 are constants. D11. In Prolog, answers which follow logically from the program are not always computed. In this question, consider just pure Horn Prolog, and consider the following program: p(X,Y) :- p(X,B). p(X,Y) :- p(B,Y). p(a,b). Say the query is ``?- p(c,d)''. Logically, the answer should be ``yes''. (a) Explain why Prolog will not return this answer under any of the possible 6 ways of ordering the program. (b) Explain how to modify the Prolog search by adding subsumption checks so that this answer is obtained. Your answer should include an explanation of what subsumption is. (c) In view of (b), why don't existing Prolog systems use subsumption checks? (d) Will the modification described in (b) be complete (in the sense that any answer which follows logically from the program will indeed be returned in a finite amount of time)? Explain the reasoning behind your answer. D12. Discuss the use of frames and semantic features in sentence parsing and text analysis. Illustrate your discussion with an example. What techniques might be used to convert a frame-based sentence parser into one that uses semantic features exclusively? D13. Discuss techniques for knowledge-based machine translation in both specialized and general application domains. What levels of success are currently obtainable, and what limitations seem to be imposed by unresolved epistemological problems?