Operating Systems Depth Exam

	UNIVERSITY OF WISCONSIN-MADISON Computer Sciences Department
Operating Systems Depth Exam		Fall 2003

Instructions: There are seven questions on this exam; answer any six of the following questions.

1. Networking Traffic Characteristics:

There are many architectural differences between packet switched networks such as the Internet and circuit switched networks such as the telephone system. However, many of the same ideas for design and provisioning of packet switched networks are borrowed from circuit switching which may or may not make sense based on empirical traffic analysis.

A.: Compare and contrast the characteristics of traffic in packet switched networks with telephone traffic carried over circuit switched networks.
B.: What influence do network protocols such as TCP have on self-similarity?
C.: What are the network engineering implications that arise from self-similarity?

2. Gang Scheduling:

Assume that you have a parallel program consisting of many processes running the same algorithm. Each process runs on a separate workstation or cluster node and communicates with its peer processes via sockets. Gang scheduling attempts to make sure that each process of the entire job is scheduled to run simultaneously and de-scheduled simultaneously.

A.: Why is gang scheduling important to the performance of a parallel program? Give an example of how the program may run poorly and use system resources inefficiently if the system does not use gang scheduling.
B.: Consider a cluster of Unix/Linux computers connected on a local network. How would the latencies incurred by communication over the local networks influence the effectiveness of gang scheduling.
C.: Consider the same cluster environment from part B. How would the use of a general purpose operating system kernel influence the effectiveness of gang scheduling. Be specific about what aspect of the operating system would be positive or negative influences.

3. Sandboxing:

Sandboxing is the ability to execute a program in a way that prevents it from doing any harm. Such a mechanism might allow arbitrary email attachments to be safely opened.

A.: List three different parts of the operating system that a malicious program might attempt to disrupt or harm, and how the malicious program might cause these disruptions.
B.: How might you prevent such harm using common operating system features. You can choose to answer this question based whichever operating system you are most familiar. What are the advantages and disadvantages of this approach.
C.: How would you prevent such hard using a virtual machine? What are the advantages and disadvantages of this approach.

4. Links:

Most versions of Unix support multiple names for the same file. There two different facilities to support this, hard and symbolic links.

A.: Briefly explain how each kind of link is implemented in the Unix kernel.
B.: What are the differences between the two kinds of links from the point of view of the application programmer and the end user? What are the advantages and disadvantages of each kind of link.
C.: Unix does not allow hard links to directories. Why?
D.: Consider a new way of implementing file names. There are no directories. Instead, each volume has a single B-tree index mapping full absolute pathnames (character strings starting with a slash and possibly containing other slashes) to inumbers. Compare this implementation with the implementation you described in part A. What common file system operations are likely to be faster or slower?

5. Working Set:

A.: Describe the Working Set model of program behavior. How is the working set of a process defined? What are the main features of any memory management algorithm inspired by the working set model?
B.: Describe the Working Set (WS) algorithm for memory management. Why is the WS algorithm, as strictly defined, impractical to implement in a real system? What sort of approximations or compromises are commonly used to create "working-set-like" algorithms?
C.: Any useful memory allocation algorithm must have an component that exercises "load control": controlling the number of active processes. Why? What is the WS policy on load control?
D.: The WS-CLOCK algorithm of Carr and Hennessy combines features of the CLOCK (also called "Second Chance") algorithm with features of WS. One difference between WS and WS-CLOCK is that WS-CLOCK only keeps track of the resident portion of the working set. How does this fact interfere with load control? What does WS-CLOCK do to mitigate this problem?

6. The End-to-end Argument:

This question is about the "end-to-end" argument, as described by Saltzer, Reed, and Clark.

A.: Describe the end-to-end argument. What is the main thrust of the argument? Use examples if you like, but make sure to clearly state what you think the end-to-end argument is.
B.: Consider the problem of a "careful" file transfer, where a file residing on the disk of machine A must be transferred to a disk on machine B. What are the steps required to move the file from one disk to another, and (more importantly) where might failures occur?
C.: Still considering the careful file transfer in the context of the end-to-end argument, how might the file transfer application ensure that the file is correctly copied from one disk to the other? Would having a reliable communication protocol such as TCP be helpful?

7. Hardware versus Software RAID:

RAID disk systems are often used in data storage. In this question, we explore some aspects of RAID design.

A.: Many RAID systems are implemented in "hardware", i.e., as a box that connects to your system via SCSI bus. Within the box, there are many specialized hardware components that the RAID architect can take advantage of to improve performance. What are some commonly used hardware components within a RAID, and how are they used to improve performance?
B.: A "software" RAID system is an in-kernel pseudo-device driver that emulates RAID functionality on top of disks that are attached to the machine. To the file system above, it appears as a single, large disk, but internally, the software RAID driver spreads disk requests across the devices. What are some of the major differences between software and hardware RAID? (what advantages and disadvantages does each type of RAID have?)
C.: Finally, consider a distributed RAID system, consisting of numerous server machines with disks, with a client striping data in RAID-5 style across the machines (disks) of the system. When the client needs to update a single block in a stripe, 4 distinct I/Os must occur; this is often called the "small write problem" in RAID-5. Assume a failure occurs in the middle of the RAID-5 "small write" update. What kinds of problems can arise, and how could you design a system that recovers from them?