INTERPRETING OO7 RESULTS

Many people have urged us to make the OO7 benchmark results easier to
use by providing some mechanism for condensing the test results to a
single number that can then be used to rank systems.  We agree that
currently the benchmark report includes far too many numbers, and we
are currently examining ways to condense the benchmark report to a
more managable size.  However, we do not anticipate ever reducing the
benchmark to a single number.  This is not out of bashfulness or fear
of hurting people's feelings; rather, it is a result of our belief
that for a knowledgable customer, a multiple number benchmark is far
more useful than any single number benchmark could be.

To illustrate some of the difficulties with a single number
condensation of the OO7 results, here are three ways that have been
suggested to us for producing a single number.  (We hesitate to even
list these rankings here, since we do not want to "bless" any of these
ranking methods as the official OO7 single-number result.)

I.  Number of first places on benchmark tests.

In the technical report we report 105 numbers (tests) from each
system; for this measure we simply count the number of times each
system had the lowest number for a test.

By this measure, the results are:

1.  E/Exodus		61
2.  Versant		22
3.  Ontos		19
4.  Objectivity		3

II.  Weighted ranking of places.

By this we mean one point for a first place, two points for a second
place, three points for a third place, and four points for a fourth.
(Here lower is better.)

By this measure, the results are:

1.  E/Exodus		173
2.  Ontos		264
3.  Versant		283
4.  Objectivity		330

III.  Geometric mean.

That is, the 105th root of the product of all the test results.

By this measure the results are:

1.  E/Exodus		14.19
2.  Objectivity		33.88
3.  Versant	        33.94

(We couldn't place Ontos in this ranking because at this date
we do not have numbers for the medium 9 OO7 database for Ontos.)

Which ranking is correct?  All rankings give some information, but
none tells the full story.

So, what do we expect the reader of the benchmark report to do?
Perhaps we are naive, but the way we think the benchmark should be
used is for each reader to examine the parts of the benchmark that
seem relevant for the reader's workload.  For example, if your
workload does a lot of pointer-chasing over cached data, look at
t1-hot for the small database.  If your workload does sparse
traversals over data that you expect to be disk resident much of the
time, look at t6-cold.  If you do a lot of exact match queries over
cached data, look at q1 hot.  If you do range queries over
disk-resident data, look at q2b and/or q2c.  And so forth.

Currently we are working on a multiuser OO7 benchmark.  Once again, we
expect to release results as a list of numbers rather than as a single
number.  Also, we expect that people will immediately begin reporting
single-number derivatives of the benchmark results.  There is nothing
wrong with this; we would just like to state for the record that we
are not the ones producing those single number rankings.

Mike Carey
David DeWitt
Jeff Naughton