INTERPRETING OO7 RESULTS Many people have urged us to make the OO7 benchmark results easier to use by providing some mechanism for condensing the test results to a single number that can then be used to rank systems. We agree that currently the benchmark report includes far too many numbers, and we are currently examining ways to condense the benchmark report to a more managable size. However, we do not anticipate ever reducing the benchmark to a single number. This is not out of bashfulness or fear of hurting people's feelings; rather, it is a result of our belief that for a knowledgable customer, a multiple number benchmark is far more useful than any single number benchmark could be. To illustrate some of the difficulties with a single number condensation of the OO7 results, here are three ways that have been suggested to us for producing a single number. (We hesitate to even list these rankings here, since we do not want to "bless" any of these ranking methods as the official OO7 single-number result.) I. Number of first places on benchmark tests. In the technical report we report 105 numbers (tests) from each system; for this measure we simply count the number of times each system had the lowest number for a test. By this measure, the results are: 1. E/Exodus 61 2. Versant 22 3. Ontos 19 4. Objectivity 3 II. Weighted ranking of places. By this we mean one point for a first place, two points for a second place, three points for a third place, and four points for a fourth. (Here lower is better.) By this measure, the results are: 1. E/Exodus 173 2. Ontos 264 3. Versant 283 4. Objectivity 330 III. Geometric mean. That is, the 105th root of the product of all the test results. By this measure the results are: 1. E/Exodus 14.19 2. Objectivity 33.88 3. Versant 33.94 (We couldn't place Ontos in this ranking because at this date we do not have numbers for the medium 9 OO7 database for Ontos.) Which ranking is correct? All rankings give some information, but none tells the full story. So, what do we expect the reader of the benchmark report to do? Perhaps we are naive, but the way we think the benchmark should be used is for each reader to examine the parts of the benchmark that seem relevant for the reader's workload. For example, if your workload does a lot of pointer-chasing over cached data, look at t1-hot for the small database. If your workload does sparse traversals over data that you expect to be disk resident much of the time, look at t6-cold. If you do a lot of exact match queries over cached data, look at q1 hot. If you do range queries over disk-resident data, look at q2b and/or q2c. And so forth. Currently we are working on a multiuser OO7 benchmark. Once again, we expect to release results as a list of numbers rather than as a single number. Also, we expect that people will immediately begin reporting single-number derivatives of the benchmark results. There is nothing wrong with this; we would just like to state for the record that we are not the ones producing those single number rankings. Mike Carey David DeWitt Jeff Naughton