_U_N_I_V_E_R_S_I_T_Y _O_F _W_I_S_C_O_N_S_I_N-_M_A_D_I_S_O_N _C_O_M_P_U_T_E_R _S_C_I_E_N_C_E_S _D_E_P_A_R_T_M_E_N_T _D_A_T_A_B_A_S_E _S_C_R_E_E_N_I_N_G _E_X_A_M_S _S_P_R_I_N_G _1_9_8_8 _B_R_E_A_D_T_H _E_X_A_M Answer any _f_o_u_r (4) of the following _f_i_v_e (5) ques- tions. You may _n_o_t select questions from the Depth Exam. Before beginning to answer a question, be sure to read it carefully. If you are confused about what the question means, state any assumptions you have made in formulating your answer. Good Luck! _1. _L_o_c_k _M_a_n_a_g_e_m_e_n_t El Cheapo Software (ECS) Corporation is building a multi-user DBMS. Since you're a UW database type, they've retained you as a consultant (at the usual UW rate of $1K/nanosecond) to design their transaction manager. (a) Specify the design (data structures and algorithms) of a 2PL lock manager with deadlock detection that imple- ments the full S/X/IS/IX/SIX hierarchical locking pro- tocol. Since it is extremely important to be able to set and release locks very quickly, your design should be as efficient as possible. In particular, a transac- tion should be able to release all of its locks with a single call to the lock manager. (Be sure to illus- trate your data structure design via a small example.) (b) What extensions/changes would be necessary to extend this lock manager for use in a distributed DBMS? _2. _R_e_c_o_v_e_r_y ECS has chosen Lindsay's page-oriented logging scheme for recovery so that data can be updated in place. However, they're pretty confused about checkpoints, since they thought a log was sufficient to make recovery work. (a) Briefly explain why they need to take checkpoints. (b) Describe two possible checkpointing strategies. One should be optimized for an environment where failures are rare, and the other for an environment where failures are common. For each strategy, explain what should be logged, what should be forced to disk, and when, both during normal system operation and at check- point time. (c) How might a very large buffer pool be used to come up with a strategy that works well in both environments? Briefly describe such a strategy. _3. _R_e_l_a_t_i_o_n_a_l _Q_u_e_r_y _L_a_n_g_u_a_g_e_s Consider the following relational schema: EMP(eno, ename, address, salary, mgr-eno) key (eno) DEPT(dno, dname, floor, budget) key (dno) WORKS-IN(eno, dno) key (eno, dno) Write the following queries in SQL (or in QUEL if you prefer, but stick with one or the other for all five queries). (a) Print the names of employees that work in a department with a budget of more than $100,000 per year. (b) Print the names and addresses of employees whose managers earn more than $75,000 per year. (c) Print the names of employees who work in all of the departments that are located on the 7th floor. (d) Print the average salary of Toy department employees. (e) For each department with at least 20 employees, print its maximum employee salary. _4. _D_a_t_a _M_o_d_e_l_s Describe an algorithm to translate a schema in the E-R (Entity-Relationship) model into an efficient schema in the relational model and into an efficient schema in the network model. The E-R schema contains no fancy primitives, i.e., no weak entity sets, no generalization, and no aggregation. It only contains entity sets, relationship sets, and attri- butes. It also specifies the cardinality ratio of the par- ticipating entity sets for each relationship set. Note: By efficient schema, we mean a schema with as few semantic items as possible. _5. _F_u_n_c_t_i_o_n_a_l _D_e_p_e_n_d_e_n_c_i_e_s A recently converted ex-antitheorist has hired you to teach him about functional dependencies. Follow the steps below to do that. 9 9 (a) Define functional dependencies. (b) Elaborate on the issue of cyclic functional dependen- cies, i.e., A depends on B, B depends on C, ..., Z depends on A. Can we have such dependencies? If not, explain why not. If so, explain what the properties of A,...,Z must be in such cases. (c) Given a set of functional dependencies on a relation R, give an algorithm that produces all the candidate keys of R. 9 9 _D_E_P_T_H _E_X_A_M Begin by taking the breadth exam, following the instructions given there. Then answer _t_h_r_e_e (3) of the fol- lowing _f_o_u_r (4) questions. _6. _D_e_a_d_l_o_c_k_s In a distributed DBMS, an important problem related to concurrency control is the possibility of distributed deadlocks. (a) Describe how IBM's R* system, which uses the algorithm described in the Obermarck paper, detects and resolves global deadlocks. What assumptions is the algorithm based on? (b) Discuss the tradeoffs between using the Obermarck algo- rithm, centralized deadlock detection, and simple timeouts. How often should global deadlock detection be performed in the Obermarck scheme, and can you sug- gest a way to dynamically control the detection inter- val? (c) What problem(s) would the Obermarck algorithm have in a distributed database machine like Gamma, where a typi- cal transaction consists of a set of processes that run in parallel on all nodes? Explain, preferably using an example, and then recommend a better approach. (Hint: Assume that each transaction consists of a master pro- cess that initiates a set of slave processes, waits for them all to finish, and then coordinates the commit protocol.) _7. _A_c_c_e_s_s _M_e_t_h_o_d_s Consider the Grid File data structure for indexing a data file over several attributes. (a) Describe the data structure, and explain how exact- match queries are processed. (b) Describe how the Grid File grows when inserts are per- formed, giving illustrative examples. (c) Describe two possible approaches for organizing the grid directory on disk. Explain the advantages and disadvantages of each approach. (d) 9 9 (i) Describe a way to represent 2-dimensional geometric objects in a Grid File, in such a way that you can easily answer queries of the form Find all objects inside this (given) object?. Explain how you would answer such queries. (ii) Explain whether your representation also lets you answer queries of the form Find all objects that intersect this (given) object? efficiently. _8. _J_o_i_n _T_e_c_h_n_i_q_u_e_s Excluding pipelining, queries that involve multiple joins have traditionally been processed by implementing each join separately, using temporary relations along the way. (a) Extend the nested loops algorithm, without using pipe- lining, to devise an algorithm that processes a multi- way join without using temporary relations. (Show pseudocode for a 4-way join having the following join clause: "R1.A = R2.A and R2.B = R3.B and R3.C = R4.C".) (b) How would the algorithm change if the query was cyclic? (Apply the change to your algorithm from (a) assuming that the query is enhanced with the additional join clause "R4.D = R1.D".) (c) Identify the advantages and disadvantages of the new algorithm as compared to the traditional approach of using multiple 2-way joins and temporaries. _9. _C_o_n_c_u_r_r_e_n_c_y _C_o_n_t_r_o_l Anticipating your overpowering urge to think about serializability, we have obligingly provided the following questions. (a) State a necessary and sufficient condition for serial- izability in a system with only read and write locks. Clearly state all assumptions about the model of tran- sactions and the data they access. (b) Consider a model of transactions in which write locks do _n_o_t allow a transaction to read. Is the condition in (a) still (i) necessary? (ii) sufficient? In each case, if your answer is yes, explain; if not, give a counter-example. (c) Suppose that the set of data items accessed by the transactions forms a tree, and that there is only one lock type, which allows the holder to both read and write. (A lock on an item does _n_o_t lock all descen- dants!) Of course, two transactions cannot simultane- ously hold a lock on the same item. Show that the fol- lowing protocol ensures serializability: (i) A transaction can request its first lock on any data item, and (ii) A transaction can subsequently request a lock on a data item only if it holds a lock on the parent. In particular, the root can never be locked unless it is the first item locked by a transaction. 9 9