Hit Time Up:
Performance Evaluation Previous:
This section shows that HAC achieves lower miss rates than the best page-caching, and dual-buffering systems in the literature.
We show that HAC achieves lower miss rates than page-caching systems by comparing it to QuickStore [WD94], which is the best page-caching system in the literature. QuickStore uses a CLOCK algorithm to manage the client cache. Like HAC, it uses 8 KB pages and 32-bit pointers in objects in both the client cache and on disk; its database size is approximately the same size as HAC's, and it is also clustered using time of creation, as specified in the OO7 benchmark. It uses an interesting pointer-swizzling technique that stores pointers on disk in their swizzled form, together with a mapping object that maps the virtual memory page frame indices of the swizzled pointers to logical page identifiers. When a page is fetched, QuickStore also fetches the page's mapping object and attempts to map the pages that are referenced by the fetched page at the virtual memory addresses specified in the mapping object. If it succeeds, no format conversions are necessary; otherwise, it corrects the pointers in the page to point to a different virtual memory frame. This technique allows QuickStore to use small 32-bit object pointers without severely restricting the maximum database size or introducing any overhead on hit time.
We also compare HAC to a system we created called FPC (fast page caching). FPC is identical to HAC except that it uses a perfect LRU replacement policy to select pages for eviction and always evicts entire pages. We implemented FPC because we wanted to compare the miss rate of HAC and page-caching systems over a wide range of cache sizes and traversals, and only limited experimental results were available for QuickStore. We explain in Section 4.2.3 why experiments showing that HAC outperforms FPC allow us to conclude that it would do even better in a comparison with QuickStore.
To show HAC achieves lower miss rates than dual-buffering systems, we compare its miss rate with that of GOM [KK94], the dual-buffering system with the lowest miss rates in the literature. GOM partitions the client cache statically into object and page buffers, each managed using a perfect LRU replacement policy. When there is a cache miss, the missing page is fetched into the page buffer and the least-recently used page P is chosen for eviction. Rather than evicting all the objects in P, GOM copies the recently-used objects in P to the object buffer. If P is refetched, the objects from P that are cached in the object buffer are immediately put back in P. Pages with a large fraction of recently used objects are protected from eviction. To make room for retained objects, the least recently used objects in the object buffer are evicted. To reduce fragmentation, storage is managed using a buddy system. GOM's database is clustered (like ours) using time of creation. GOM uses 96-bit pointers and has 12-byte per-object overheads at the server.
Kemper and Kossmann [KK94] show that the cache management strategy used in GOM leads to a lower miss rate than the eager copying strategy used by object-caching systems [KGBW90][WD92][KK90][C+94b] which fetch pages from the server. The eager copying strategy copies objects from the page buffer to the object buffer on first use and copies modified objects back to their home pages when a transaction commits. In contrast to GOM, objects can be accessed only when they are in the object buffer; therefore, this buffer is much larger than the page buffer. Since GOM achieves lower miss rates than these object-caching systems, any miss rate reduction HAC achieves relative to GOM would be larger relative to these systems.
Table 2 shows the number of fetches for HAC, FPC, and QuickStore for cold traversals T6 and T1 of the medium database. The QuickStore results were obtained with a 12 MB client cache and were reported in [WD94]. HAC and FPC use a smaller cache size, adjusted to account for the size of the indirection table in traversal T1: HAC used a 7.7 MB cache, FPC used a 9.4 MB cache. The indirection table is large because the entries in the table are 16 bytes and the objects accessed by this traversal are small: 29 bytes on average. The overhead for HAC is higher (55%) because most objects in the cache are installed in the indirection table, whereas for FPC it is lower (27%) because only approximately half of the objects in the cache are installed. The unrealistically small object sizes of the OO7 database represent a worst case for the indirection-table overhead.
Table 2: Misses, Cold traversals, Medium database
HAC and FPC have the same number of misses in traversal T6 because they are all cold cache misses. In traversal T1, HAC has 24% fewer fetches than FPC, because it has 38% fewer capacity misses (there are 3662 cold cache misses), due to the use of object caching. QuickStore has a higher miss rate than both systems in both traversals: HAC and FPC have 21% fewer misses than QuickStore in T6 because of the extra misses required to fetch the mapping objects, and FPC has 3% fewer fetches than QuickStore in T1 - mainly for the same reason - but also because FPC uses perfect LRU whereas QuickStore uses CLOCK.
The miss rate reduction achieved by HAC relative to the two page-caching systems is not very impressive for this particular traversal and cache size because T1 is a traversal with very good clustering (as discussed in Section 4.1.1), and the cache can fit only 55% of the objects accessed by the traversal.
This section presents results of experiments comparing the miss rates of HAC and FPC. We argue that for all traversals with equal or lower clustering quality than T1, any miss rate reductions HAC achieves relative to FPC would be even higher relative to QuickStore. This is true because FPC outperforms QuickStore in traversal T1, and as clustering becomes worse, FPC's performance becomes relatively better. As clustering degrades, FPC's overhead decreases because traversals with worse clustering access a lower fraction of objects per page, resulting in a smaller number of indirection table entries and more space to store objects; QuickStore's overhead remains constant because it always fetches one mapping object per page, regardless of the quality of clustering. For traversals with better clustering than T1, QuickStore might outperform FPC, but we expect these traversals to be very uncommon, as discussed in Section 4.1.1.
The first set of additional experiments ran traversals T6, T1-, T1, and T1+ on the medium database with varying client cache sizes starting both from cold and hot caches. Figure 5 shows the number of misses of HAC and FPC for the hot traversals. The x-axis shows the sum of the client cache size and the indirection table size; since FPC uses less space in the indirection table than HAC, at any particular size, it has more space in the cache than HAC does. We do not present the graphs for the cold traversals because they are similar; the main difference is that they include the constant cost of cold cache misses (506 for T6 and 3662 for the other traversals).
Figure 5: Client cache misses, Hot traversals, Medium database
For the excellent-clustering case of traversal T1+, HAC and FPC have an almost identical miss rate, because HAC behaves like a page-caching system when clustering is very good. This traversal accesses an average fraction of objects per page that is larger than the retention fraction R and, since there is good spatial locality, these objects have identical usage values. Therefore, the threshold value selected by HAC to compact a page will frequently cause all its objects to be discarded.
For the other traversals, when the cache is very small or very large, the miss rates of HAC and FPC are similar: in the first case, neither of the two systems is able to retain many useful objects in the cache resulting in a high miss rate for both of them; and if the cache is large enough to hold all the pages used by the traversal, both HAC and FPC perform equally well since there are no cache misses. HAC achieves much lower miss rates than FPC in the middle range because it can cache useful objects without needing to cache their pages; e.g., for the average clustering case (T1-), HAC achieves a maximum reduction in the number of fetches relative to FPC of 11192 (FPC has 11192 fetches and HAC has none).
HAC is space-efficient: it needs only 11% more cache space than the bare minimum to run T1- without cache misses, and it needs only 1% more than the minimum if the number of secondary scan pointers is increased to 23. HAC requires 20 times less memory than FPC to run traversal T6 without cache misses, 2.5 times less memory to run traversal T1- and 62% less memory to run traversal T1; as expected, these gains decrease as the quality of clustering increases.
The second set of experiments ran our dynamic traversals. The results were qualitatively similar to the ones described above: the performance of HAC and FPC was similar when clustering was excellent (i.e., when all operations executed T1+ traversals of the composite parts), and HAC achieved much lower miss rates when most of the operations executed traversals T6, T1- and T1.
Figure 6: Client cache misses, Dynamic traversal
Figure 6 presents miss rates for a dynamic traversal where each operation randomly executes T1- or T1 such that 80% of the object accesses are performed by T1- operations and 20% by T1 operations (above-average clustering quality).
We now compare HAC's performance with that of GOM. Figure 7 presents fetch counts for GOM, HAC, and HAC-BIG running a cold T1 traversal of the small OO7 database with varying client cache sizes. Since GOM partitions the client cache into object and page buffers statically, all data for GOM were obtained by manual tuning of the buffer sizes to achieve ``the best possible'' [KK94] performance; the sizes of the buffers were tuned for each cache size and for each traversal (e.g., tuning was different for T1 and T2b). The results for GOM were obtained from Kossmann [Kos97].
We introduce HAC-BIG to separate the performance effects of smaller objects and better cache management, which in combination cause HAC to outperform GOM at all cache sizes. HAC-BIG is like HAC except that we padded its objects to be approximately the same size as GOM's (HAC-BIG's database is actually 6% larger than GOM's). The differences between HAC and HAC-BIG result from using smaller objects, and the differences between HAC-BIG and GOM result from the better cache management in HAC-BIG. (Despite using page caching, even FPC has a lower miss rate than GOM for all cache sizes, because it uses small objects.)
Both HAC and HAC-BIG use 4 KB pages like GOM. We conservatively did not correct the cache sizes for HAC and HAC-BIG to account for our 16-byte indirection table entries because GOM uses a resident object table that introduces a 36-byte overhead for each object [Kos97][Kos95].
Figure 7: Client cache misses, Cold T1 Traversal, Small database
The most important point is that HAC-BIG outperforms GOM even though GOM's results required manual tuning of the sizes of the object and page buffers. Of course, in a real system, such tuning would not be possible, and a poor adjustment can hurt performance. By contrast, HAC adapts dynamically to different workloads with no user intervention.
GOM requires a cache size of 4.7 MB to hold all the objects accessed by the traversal whereas HAC-BIG only requires 3.9 MB. The difference is due to two problems with GOM's cache management scheme: storage fragmentation and static partitioning of the cache between page and object buffers (which causes space to be wasted by useless objects contained in cached pages). Note that reducing the size of GOM's page buffer does not necessarily improve performance, because GOM's page buffer is already too small to hold some pages long enough for the traversal to access all their objects of interest. This causes GOM to refetch some pages even when all the objects accessed by the traversal fit in the cache, e.g., for a cache size of 4.7 MB, GOM refetches 11% of the pages.
Finally, GOM incurs overheads that we did not account for in this comparison: it uses perfect LRU, which introduces a large overhead on hit time, and its resident object table entries are 20 bytes larger than ours.
Hit Time Up:
Performance Evaluation Previous:
Miguel Castro, Atul Adya, Barbara Liskov, and Andrew Myers
Copyright ©1997 by the Association for Computing Machinery