Artificial Intelligence

          If you are taking the Breadth Exam:

               Answer any four (4) out of the six Basic questions B1-B6.

          If you are taking the Depth Exam:

               Answer any four (4) out of the six Basic questions B1-B6 and
               answer any four (4) out of the 13 Advanced  Depth  questions
          D1-D13.


                             Breadth (Basic) Questions

          B1.  Search strategies can  be  characterized  in  terms  of  two
               factors:  "Recovery  of  Pursuit" and "Scope of Evaluation."
               Recovery of Pursuit is the degree to which a search strategy
               allows   recovery   from  disappointing  search  avenues  to
               reaccess  previously  suspended  alternatives.    Scope   of
               Evaluation  is the number of alternatives considered in each
               decision.

              (a) Classify hill-climbing, depth-first, beam and  best-first
                 searches  according  to  their relative values along these
                 two dimensions.

              (b) Consider the following hybrid search strategy:

                   BF-DF, in which best-first search is  applied  initially
                   until  a  memory  limit  is reached and then depth-first
                   search begins from the best leaf  node.   If  it  fails,
                   depth-first begins again from the second best node, etc.

                Comment on the relative  advantages  and  disadvantages  of
                BF-DF over pure best-first and pure depth-first.


          B2.  What is wrong with the following argument?

                 People are widely distributed over the earth.  Socrates is
                 a  person.  Therefore, Socrates is widely distributed over
                 the earth.

              How should  the  facts  represented  by  these  sentences  be
              represented  in  first-order  predicate  logic  so  that this
              problem does not arise?


          B3.  Explain the difference between forward chaining and backward
               chaining.   Give  three  concrete  examples:  one  in  which
               forward chaining is better, one in which  backward  chaining
               is  better,  and one in which they are each about equally as
               effective.


          B4.  Consider the task  of  learning  from  classified  examples.
               Describe  and  analyze  the  similarities and differences of
               Rumelhart's backpropagation and  Michalski's  AQ  algorithms
               with  respect  to  this task.  Discuss which differences you
               feel are superficial and which are fundamental.


          B5.  Characterize GPS, show its structure, and illustrate how  it
               works  with  a  simple  example (say from the blocks world).
               Next, describe one major deficiency of GPS as a planner that
               TWEAK  overcame.  Finally, describe one major shortcoming of
               TWEAK.  As much as possible, give concrete examples.


          B6.  It is important that AI systems be robust  with  respect  to
               noisy  data,  uncertain and changing environments, inference
               rules that are incorrect in some  situations,  etc.   Choose
               two  different  AI  subfields  (e.g.,  planning  and natural
               language) and describe an existing approach from  each  that
               addresses  some  aspect of dealing with imperfect knowledge.
               Discuss the primary strengths and major weaknesses  of  each
               approach you select.


                             Advanced Depth Questions

          D1.  Individual neurons have  been  identified  high  up  in  the
               visual   system  of  monekys  that  fire  selectively  to  a
               particular face, in 70-200 milliseconds.   But  neurons  are
               extremely  slow  computers, taking about 1.5 milliseconds to
               fire; so the serial depth of this entire perceptual  process
               is only roughly 45-130 at most.  The connections between the
               20 or more visual areas involved appear to suggest that  the
               minimal  serial  depth  of the total network of neurons that
               information flows through is roughly 15-30.

               How  can  a  computer  vision  program  be  got  to  achieve
               recognition  of  such complex objects as faces within such a
               shallow  serial  depth?   Describe  the   kind   of   system
               (including  the  computer hardware on which it runs) that is
               most appropriate.  Justify this choice, explaining  why  you
               think  it  is  best,  but also describing its faults and the
               problems  that  must  be  overcome  to  actually  achieve  a
               successful system within these time constraints.


          D2.  Perceptual recognition systems often match "models" of  each
               different class of object with the input image (or the image
               transformed).  Give two contrasting examples of the kind  of
               model  used, for each of two different objects.  Explain how
               complex objects (e.g., chair, house),  where  there  can  be
               many  different types (e.g., ladderback, beanbag, wingback),
               can be handled.  Describe an overall structure  for  such  a
               system   whose   time   complexity  remains,  for  practical
               purposes, about the same even though the number of different
               object  classes  that  must  be  recognized  increases to to
               several thousand or so.


          D3.  Three-dimensional object  recognition  from  a  single  two-
               dimensional image requires the solution of two sub-problems:
               identifying what object we are looking at,  and  identifying
               its  three-dimensional  position and orientation relative to
               the  viewer.   These  two  sub-problems  could   be   solved
               sequentially    in    either   order,   or   simultaneously.
               Furthermore, this could be done hierarchically if the object
               can be defined in terms of parts and sub-parts.  Compare and
               contrast two algorithms which have been developed which take
               different  approaches  to  this process.  Focus on the major
               ideas of each  which  illustrate  the  key  assumptions  and
               strategies used.


          D4.  (a) Define the concept of gradient space as used in computer
               vision.

               (b) Describe how it has been used as a representation in (i)
               shape-from-shading    and    in    (ii)   shape-from-contour
               algorithms.   Be  specific  as  to  the  precise  role  that
               gradient space plays in these algorithms.


          D5.  A great part  of  current  work  on  connectionism  involves
               backpropagation  and it seems most of that is concerned with
               techniques for speeding it up.  Why do you suppose  it's  so
               slow?  Describe or propose a method for improving its speed.
               Does  speeding  up  backpropagation  mean  that  a  loss  of
               classification  accuracy (on future examples) is inevitable?
               Assuming you are willing to accept slightly  reduced  future
               accuracy,  briefly  describe  or propose a second method for
               reducing training time.


          D6.  Briefly compare empirical (i.e.,  similarity-based)  machine
               learning  to  explanation-based.   What  type of performance
               improvement does each paradigm primarily focus  upon?   What
               are  the  major  strengths  and weaknesses of each approach?
               Describe  how  these  two  paradigms  could  be  integrated.

               Consider  the  merits  of  doing  so and briefly describe an
               existing approach.  What do you feel is the primary research
               question that must be solved in order to realize the promise
               of integrated learning?  Justify your answer.


          D7.  Consider the following about learning from examples.

                 Assume that ex  is  an  example  who's  classification  is
                 unknown  and  that  train is a collection of preclassified
                 training examples.  In order  to  classify  ex,  inductive
                 learning  algorithms,  such  as ID3, would consider all of
                 the   training   examples.    However,   exemplar   (i.e.,
                 instance-based)   algorithms  would  only  consider  those
                 training examples near ex in feature space.

              Consider the hypothesis that systems like Quinlan's ID3 would
              benefit  by  making their decisions based on a "local" subset
              of the training examples.  That is, rather than using all  of
              train,  they  would  only  use  the  nearest-neighbors of ex.
              Briefly sketch how a "local-ID3" would work.   Do  you  think
              the  hypothesis  is  promising?   Why  or  why  not?  Briefly
              propose an empirical study that would address the hypothesis.
              Describe what results you believe would be obtained from your
              experiment.  Assume for the moment  that  some  study  showed
              looking  at  a  local  subset  of  the  training data lead to
              increased  classification  accuracy.   What  negative  traits
              would "local-ID3" possess?


          D8.  Convert the following formulas to  clauses  and  derive  the
               empty   clause,   [],   using   paramodulation   and  binary
               resolution.  Note: not all the formulas are needed  for  the
               proof.


                    X  Y ( X < Y  -> f(X) < f(Y) )
                   -  X  Y ( X < Y  -> g(X) < g(Y) )

                    X (f(g(X)) = X)
                    X (g(f(X)) = X)

                    X  Y  Z ( (X< Y  Y <Z) -> X < Z))
                    X  Y ( X = Y  X < Y  Y < X).
                    X ( - (X < Z))


          D9.  Consider a search problem.  As a  simple  example,  say  you
               want  to  print  out  all solutions to the 8 queens problem.
               Such problems are often easy to  program  in  Prolog,  since
               Prolog  handles  the  searching  for  you.   Also, many such
               problems exhibit a potential for parallelism;  for  example,
               in  the  queens  problem, the solution set neatly partitions
               into subsets which could be generated in parallel.  So,  you
               might wish to program this in a parallel Prolog.

               Explain the advantages and disadvantages of programming such
               a  search  problem  in  a committed choice language, such as
               Parlog, GHC, or Concurrent Prolog.


          D10. Most  current  expert  system  shells  are  essentially  re-
               implementations  of EMYCIN.  As such, what kinds of problems
               are they good for, what kinds of knowledge formalization  do
               they  use,  and  what  kinds  of  inference can they do?  By
               characterizing the knowledge formalization and the inference
               in  terms  of  the  first-order predicate calculus, indicate
               what kinds of knowledge they can't represent and what  kinds
               of inference they can't do.


          D11. Illustrate the kinds of explanation  of  which  contemporary
               state-of-the-art  expert  systems are capable and comment on
               what is missing compared to how a human expert explains  his
               or  her  advice  to  a  client  and  compared to how a human
               teacher explains things to a student.  Describe some of  the
               research prototypes that have been built to explore improved
               explanation  capabilities,  e.g.,  those  of   Clancey   and
               Swartout.   Why  is good explanation capability so important
               for expert systems?


          D12. Discuss the transfer grammar and interlingua  approaches  to
               machine  translation.   Name  one system of each type.  What
               relation do the two  approaches  have  to  "knowledge-based"
               machine translation?


          D13. Describe the task of anaphora resolution in interactive  and
               non-interactive   discourse  and  text  processing  systems.
               Describe the various  strategies  that  have  been  used  or
               proposed  to  handle  problems  of anaphora resolution.  How
               successful have they been or how successful are they  likely
               to be?