Abstract
Caching and other latency tolerating techniques have been quite successful in maintaining high memory system performance for general purpose processors. However, TLB misses have become a serious bottleneck as working sets are growing beyond the capacity of TLBs.
This work presents one of the first attempts to hide TLB miss latency by using preloading techniques. We present results for traditional next-page TLB miss preloading - an approach shown to cut some of the misses. However, a key contribution of this work is a novel TLB miss prediction algorithm based on the concept of “recency”, and we show that it can predict over 55% of the TLB misses for the five commercial applications considered.
- 1 T. Austin and G. Sohi, "High-Bandwidth Address Translation for Multiple-Issue Processors," in Proceedings of the 22nd Ann. Int. Symp. on Computer Architecture, pp. 158-167, 1995.]] Google ScholarDigital Library
- 2 M. Cekleov and M. Dubois, "Virtual-Address Caches, Part 1: Problems and Solutions in Uniprocessors" pp. 64-71, in IEEE Micro, Nov/Dec 1997.]] Google ScholarDigital Library
- 3 J. Chase, H. Levy, and M. Feeley, "Sharing and Protection in a Single-Address-Space Operating System," in ACM Trans. on Computer Systems, pp. 271-307, Nov. 1994.]] Google ScholarDigital Library
- 4 B.Chemlik, "The SHADE simulator", Sun Labs T.R. 1993.]]Google Scholar
- 5 J. Chen and A. Borg, "A Simulation Based Study of TLB Performance," in Proceedings of the 19th Ann. Int. Symp. on Computer Architecture, pages 114-123]] Google ScholarDigital Library
- 6 H.K.J. Chu, "Zero-Copy TCP in Solaris", in 1996 USENIX Annual Technical Conference, January 22-26, 1996, San Diego, California]] Google ScholarDigital Library
- 7 D.W. Clark and J.S. Emer, "Performance of the VAX-11/780 Translation Buffers: Simulation and Measurement," in ACM Trans. on Computer Systems, vol. 3, no. 1, 1985.]] Google ScholarDigital Library
- 8 E Dahlgren and E Stenstr6m "Evaluation of Stride and Sequential Hardware-based Prefetching in Shared-Memory Multiprocessors," in IEEE Trans. on Parallel and Distributed Systems, Vol. 7, No. 4, pp. 385-398, April 1996.]] Google ScholarDigital Library
- 9 J. Huck and J. Hays, "Architecture Support for Translation Table Management in Large Address Space Machines," in Proceedings of the 20th Ann. Int. Symp. on Computer Architecture, pp. 39-50, May 1993.]] Google ScholarDigital Library
- 10 B. Jacob and T. Mudge, "Software-Managed Address Translation," in Proceedings of the 3rd Int. Symp. on High-Pelformance Computer Architecture, pp. 156-167, Feb 1997.]] Google ScholarDigital Library
- 11 B. Jacob and T. Mudge, "A Look at Several Memory Management Units and TLB-Refill Mechanisms and Page Table Organizations," in ASPLOS-VIII, pp. 295-306. 1998.]] Google ScholarDigital Library
- 12 http://www.speech.cs.cmu.edu/speech/sphinx.html]]Google Scholar
- 13 K. Bala, M.F. Kaashoek, W.E.Weihl, "Software Prefetching and Caching for Translation Lookaside Buffers", in Proceedings of the First Symposium on Operating System Design and Implementation, November 1994.]] Google ScholarDigital Library
- 14 R.L. Mattson, J. Gecsei, D. Slutz, and I.L. Traiger, "Evaluation Techniques for Storage Hierarchies", in IBM Systems Journal 9 (2):pp.78-117, 1970]]Google ScholarDigital Library
- 15 J. S. Park and G. S. Ahn, "A Software-controlled Prefetching Mechanism for Software-managed TLBs," in Mic~vprocessing and Microprogramming, Vol .41, No 2. pp. 121-136, May, 1995.]] Google ScholarDigital Library
- 16 X. Qiu and M. Dubois, "Options for Dynamic Address Translation in COMAs," in Proceedings of the 25th Ann. Int. Symp. on Computer Architecture, pp. 214-225, June 1998.]] Google ScholarDigital Library
- 17 X. Qiu and M. Dubois, "Tolerating Late Memory Traps in ILP Processors," in Proc. of 26th Ann. Int. Symp. on Computer Architecture, pp. 76-87, 1999.]] Google ScholarDigital Library
- 18 M. Talluri and M. Hill, "Surpassing the TLB Performance of Superpages with Less Operating System Support," in Proceedings of the Sixth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct 1994.]] Google ScholarDigital Library
- 19 M. Talluri, S. Kong, M. Hill, and D. Patterson, "Tradeoffs in Supporting Two Page Sizes," in Proceedings of the 19th Ann. Int. Symp. on Computer Architecture, May 1992.]] Google ScholarDigital Library
- 20 B. Wheeler and B. N. Bershad, "Consistency Management for Virtually Indexed Caches," in Proceedings of the Fifth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct 1992.]] Google ScholarDigital Library
- 21 http://www.fluent.com]]Google Scholar
- 22 http://www.newtek.com]]Google Scholar
- 23 pnmrotate, part of Net PBM distribution, version 7: ftp:// wuarchive.wustl.edu/graphics/graphics/packages/NetPBM]]Google Scholar
- 24 AMD K-7 Product announcement at microprocessor forum. http ://www.amd.com/products/cpg/k7/micropforum.html]]Google Scholar
- 25 HAL SPARC64-III, Microprocessor Report, Dec 8, 1997 http ://www.hal. com/home/sp arc 64- 3_mda.html]]Google Scholar
- 26 A. Seznec, "A Case for Two-Way Skewed-Associative Caches", Proc. 20th Annual Symposium on Computer Architecture, pp. 169-178, May 1993]] Google ScholarDigital Library
Index Terms
- Recency-based TLB preloading
Recommendations
Inter-core cooperative TLB for chip multiprocessors
ASPLOS '10Translation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for ...
Recency-based TLB preloading
ISCA '00: Proceedings of the 27th annual international symposium on Computer architectureCaching and other latency tolerating techniques have been quite successful in maintaining high memory system performance for general purpose processors. However, TLB misses have become a serious bottleneck as working sets are growing beyond the capacity ...
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs
Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
Comments