Postscript available at URLs: ftp://ftp.cs.wisc.edu/markhill/Theses/richard_kessler_toc.ps ftp://ftp.cs.wisc.edu/markhill/Theses/richard_kessler_body.ps %T Analysis of Multi-Megabyte Secondary CPU Cache Memories %A Richard Eugene Kessler %D July 1991 %I Univ. of Wisconsin %R Computer Sciences Technical Report #1032 %K Kessler Thesis %Y %Y Abstract: %Y %Y This dissertation investigates multi-megabyte secondary caches. In a %Y multi-level cache hierarchy, secondary caches service processor memory %Y references that cannot be serviced by smaller and faster primary %Y caches. With faster processors and expanding main memories, a %Y multi-megabyte cache is increasingly vital because it shields processor %Y memory references from costly main memory accesses, even when the %Y processor references a large address space. Multi-megabyte secondary %Y caches allow processors to execute at the speeds they are capable of, %Y even when there is a large processor-to-main-memory speed gap. %Y %Y This analysis uses a new collection of memory address traces that is %Y appropriate for multi-megabyte cache simulation. These traces %Y thoroughly characterize several large workloads, and are long enough %Y (billions of instructions) to overcome multi-megabyte cache %Y cold-start. This dissertation includes the first comparison of two %Y previously-proposed trace-sampling techniques that can reduce %Y long-trace simulation requirements: SET SAMPLING and TIME SAMPLING. %Y Under a range of conditions, set sampling produces more accurate cache %Y performance estimates with less trace data than time sampling. %Y %Y This dissertation examines many alternative cache designs. It shows %Y that multi-megabyte secondary caches are extremely useful with large %Y processor-to-main-memory speed gaps. Furthermore, associativity is %Y needed for smaller secondary caches, but may not be necessary in %Y multi-megabyte caches; multi-megabyte cache block sizes should be %Y larger than for smaller caches, and the block size that equals the %Y fixed latency and transfer time miss penalty components is a good %Y design point. %Y %Y Finally, this dissertation introduces and solves the problems caused by %Y the interaction of virtual memory and real-indexed multi-megabyte %Y caches. Since the placement of pages in main memory also places data %Y in the cache, a poor page placement will cause poor cache performance. %Y This dissertation introduces several new CAREFUL PAGE MAPPING %Y algorithms to improve the page placement, and shows that they eliminate %Y 10%-20% of the direct-mapped real-indexed cache misses for the long %Y traces. In other words, this dissertation develops software techniques %Y that can make a hardware direct-mapped cache appear about 50% larger. %Y