Next: Replacement in the Background Up: Hybrid Adaptive Caching Previous: Compaction
It is more difficult to achieve low space and time overheads in object-caching systems than in page-caching systems because there are many more objects than pages. Despite this difficulty, our replacement policy achieves low miss rates with low space and time overheads. We keep track of object usage by storing 4 bits per object that indicate both how recently and how frequently the object has been used. This object usage information is used to compute frame usage information, but frame usage is only computed occasionally because its computation is expensive. Frames are selected for compaction based on their last computed usage; objects within the frame are retained or discarded based on how their usage compares to that of their frame.
It has been shown that cache replacement algorithms which take into account both recency and frequency of accesses - 2Q [JS94], LRU-K [OOW93] and frequency-based replacement [RD90] - can outperform LRU because they can evict recently used pages that are used infrequently. However, the algorithms that have been proposed so far have high space and time overheads.
HAC maintains per-object usage statistics that capture information on both recency and frequency of accesses while incurring very low space overheads. Headers of installed objects contain 4 usage bits. The most significant usage bit is set each time a method is invoked on the object. Besides its low space overhead, this scheme has low impact on hit time; it adds at most two extra instructions and no extra processor cache misses to a method call.
The usage value is decayed periodically by shifting right by one; thus, each usage bit corresponds to a decay period and it is set if the object was accessed in that period. Our scheme considers objects with higher usage (interpreting the usage as a 4-bit integer) as more valuable, i.e., objects that were accessed in more recent periods are more valuable and when the last accesses to two objects occurred in the same period, their value is ordered using the history of accesses in previous periods. Therefore, our scheme acts like LRU but with a bias towards protecting objects that were frequently accessed in the recent past. To further increase this bias and to distinguish objects that have been used in the past from objects that have never been used, we add one to the usage bits before shifting; we found experimentally that this increment reduces miss rates by up to 20% in some workloads. Our decay rule is identical to the one used to decay reference counts in [RD90], but its purpose is quite different.
Usage bits are much cheaper in both space and time than maintaining either an LRU chain or the data structures used by 2Q, LRU-K and frequency based replacement. For example, the system described in [Kos95] uses a doubly linked list to implement LRU. This imposes an overhead of at least an extra 8 bytes per object for the LRU information, which increases the client cache miss rate and also increases hit times substantially by adding up to three extra processor cache misses per object access.
HAC uses fine-grained concurrency control [Gru97][AGLM95] and some objects in a cached page may be invalidated (when they become stale) while the rest of the page remains valid. We set the usage of invalid objects to 0, which ensures their timely removal from the cache. In contrast, page-caching systems that use fine-grained concurrency control (e.g., adaptive callback locking [CFZ94], which does concurrency control by adaptively locking either objects or pages) waste cache space holding invalid objects because they always cache full pages.
We could implement replacement by evicting the objects with the lowest usage in the cache, but this approach may pick objects from a large number of frames. Since HAC compacts objects retained from these frames, compaction time may be unacceptable, e.g., compacting 126 frames could take up to 1 second in our experimental environment. Therefore, we compute usage values for frames and use these values to select frames to compact and indirectly objects to evict.
Our goals in freeing a frame are to retain hot objects and to free space. The frame usage value reflects these goals. It is a pair <T, H>. T is the threshold: when the frame is discarded, only hot objects, whose usage is greater than T, will be retained. H is the fraction of objects in the frame that are hot at threshold T. We require H to be less than the retention fraction, R, where R is a parameter of our system. T is the minimum usage value that results in an H that meets this constraint.
Figure 3: Usage statistics for frames
Frame usage is illustrated (for R = 2/3) in Figure 3. For frame F1, T = 2 would not be sufficient since this would lead to H = 5/6; therefore we have T = 3. For frame F2, T = 0 provides a small enough value for H.
We use object count as an estimate for the amount of space occupied by the objects because it is expensive to compute this quantity accurately in the HAC prototype; it is a reasonable estimate if the average object size is much smaller than a page.
Selecting a value for system parameter R involves a tradeoff between retaining hot objects to reduce miss rate and achieving low replacement overhead by reducing the number of frames whose contents need to be compacted. If R is large, only very cold objects will be discarded but little space may be recovered; if R is small, more space is freed but the miss rate may increase since hot objects may be evicted. We have found experimentally that R=3 works well; Section 4.1.2 describes the sensitivity experiments we used to determine the value of R and other system parameters.
HAC uses a no-steal [GR93] cache management policy, i.e., objects that were modified by a transaction cannot be evicted from the cache until the transaction commits. The frame usage is adjusted accordingly, to take into account the fact that modified objects are retained regardless of their usage value: when computing the usage of a frame, we use the maximum usage value for modified objects rather than their actual usage value.
The goal for replacement is to free the least valuable frame. Frame F is less valuable than frame G if its usage is lower:
F.T < G.T or (F.T = G.T and F.H < G.H)
i.e., either F's hot objects are likely to be less useful than G's, or the hot objects are equally useful but more objects will be evicted from F than from G. For example, in Figure 3, F2 has lower usage than F1.
Although in theory one could determine the least valuable frame by examining all frames, such an approach would be much too expensive. Therefore, HAC selects the victim from among a set of candidates. A few frames are added to this set during each epoch, i.e., at each fetch. A frame's usage is computed when it is added to the set; since this computation is expensive, we retain frames in the candidate set, thus increasing the number of candidates for replacement at later fetches without increasing replacement overhead. Since frame usage information that was calculated long ago is likely to be out of date, we remove entries that have remained in the candidate set for E epochs. We use a value of 20 for E (see Section 4.1.2).
We add new members to the candidate set using a variant of the CLOCK algorithm [Cor69]. We organize the cache as a circular array of frames and maintain a primary scan pointer and N secondary scan pointers into this array; these pointers are equidistant from each other in the array. When a frame needs to be freed, HAC computes the usage value for S contiguous frames starting at the primary pointer and adds those frames to the candidate set; then it increments the primary scan pointer by S. We use a value of S=3 (see Section 4.1.2).
The secondary pointers are used to ensure the timely eviction of uninstalled objects, i.e., objects that have not been used since their page was fetched into the cache. These objects are good candidates for eviction provided they have not been fetched too recently into the cache [RD90][OOW93]. The secondary pointers are used to find frames with a large number of uninstalled objects. For each secondary pointer, HAC determines the number of installed objects in S contiguous frames starting at that scan pointer and then advances the pointer by S. It enters a frame in the candidate set if the fraction of installed objects in the frame is less than R; the thresholds of these frames will always be zero since uninstalled objects have a usage value of zero. In each epoch, at most NxS members are added to the candidate set from the secondary scan pointers. The secondary scan pointers introduce low overhead because HAC keeps track of the number of installed objects in each frame and no scanning through the usage values of objects is needed.
Since scan pointers are equidistant from one another, a newly fetched page will be protected until approximately another 1/((N+1)S) of the frames in the cache has been freed. The choice of N is thus important: a low number of secondary scan pointers will lead to wasting cache space with uninstalled objects but a larger number can increase miss rates by prematurely evicting recently fetched objects. We found experimentally that a value of 2 for N works well (see Section 4.1.2); it protects recently fetched objects from eviction until approximately 1/9 of the frames in the cache have been freed.
Figure 4: Adding new members to the candidate set
Figure 4 shows an example for a 300-page cache. The three scan pointers are spaced 100 frames apart. After scanning, four frames (25, 26, 27, and 127) have been added to the set; the other frames at the secondary pointers were not added since they did not have enough uninstalled objects.
HAC decays object usage when it computes frame usage at the primary scan pointer: it needs to scan the objects at this point, and therefore decaying object usage adds almost no overhead. However, if there are no fetches, usage values will not be decayed and it will be difficult to predict which objects are less likely to be accessed. If this were a problem, we could perform additional decays of object usage when the fetch rate is very low. These decays would not impose a significant overhead if performed infrequently, e.g., every 10 seconds.
The obvious strategy is to free the lowest-usage frame in the candidate set. However, we modify this strategy a little to support the following important optimization. As discussed earlier, HAC relies on the use of an indirection table to achieve low cache replacement overhead. Indirection can increase hit times, because each object access may require dereferencing the indirection entry's pointer to the object, and above all, may introduce an extra cache miss. This overhead is reduced by ensuring the following invariant: an object for which there is a direct pointer in the stack or registers is guaranteed not to move or be evicted. The Theta compiler takes advantage of this invariant by loading the indirection entry's pointer into a local variable and using it repeatedly without the indirection overhead; other compilers could easily do the same. We ran experiments to evaluate the effect of pinning frames referenced by the stack or registers and found it had a negligible effect on miss rates.
To preserve the above invariant, the client scans the stack and registers and conservatively determines the frames that are being referenced from the stack. It frees the lowest-usage frame in the candidate set that is not accessible from the stack or registers; if several frames have the same usage, the frame added to the candidate set most recently is selected, since its usage information is most accurate. To make the selection process fast, the candidate set was designed to have O(logE) cost for removing the lowest-usage frame.
When a frame fills up with compacted objects, we compute its usage and insert it in the candidate set. This is desirable because objects moved to that frame may have low usage values compared to pages that are currently present in the candidate set.
The fact that the usage information for some candidates is old does not cause valuable objects to be discarded. If an object in the frame being compacted has been used since that frame was added to the candidate set, it will be retained, since its usage is greater than the threshold. At worst, old frame-usage information may cause us to recover less space from that frame than expected.
Next: Replacement in the Background
Up: Hybrid Adaptive Caching
Miguel Castro, Atul Adya, Barbara Liskov, and Andrew Myers
Copyright ©1997 by the Association for Computing Machinery