Recency-based TLB preloading

Authors:
Ashley Saulsbury

Sun Microsystems Laboratories, 901 San Antonio Road, Palo Alto, CA

Sun Microsystems Laboratories, 901 San Antonio Road, Palo Alto, CA
View Profile

,
Fredrik Dahlgren

Ericsson Mobile Communications AB, Mobile Phones and Terminals, SE-221 83, Lund, Sweden

Ericsson Mobile Communications AB, Mobile Phones and Terminals, SE-221 83, Lund, Sweden
View Profile

,
Per Stenström

Dept. of Computer Engineering, Chalmers Univ. of Technology, SE-412 96 Gothenburg, Sweden

Dept. of Computer Engineering, Chalmers Univ. of Technology, SE-412 96 Gothenburg, Sweden
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 28 Issue 2May 2000pp 117–127https://doi.org/10.1145/342001.339666

Published:01 May 2000Publication History

ACM SIGARCH Computer Architecture News

Abstract

Caching and other latency tolerating techniques have been quite successful in maintaining high memory system performance for general purpose processors. However, TLB misses have become a serious bottleneck as working sets are growing beyond the capacity of TLBs.

This work presents one of the first attempts to hide TLB miss latency by using preloading techniques. We present results for traditional next-page TLB miss preloading - an approach shown to cut some of the misses. However, a key contribution of this work is a novel TLB miss prediction algorithm based on the concept of “recency”, and we show that it can predict over 55% of the TLB misses for the five commercial applications considered.

References

1 T. Austin and G. Sohi, "High-Bandwidth Address Translation for Multiple-Issue Processors," in Proceedings of the 22nd Ann. Int. Symp. on Computer Architecture, pp. 158-167, 1995.]] Google ScholarDigital Library
2 M. Cekleov and M. Dubois, "Virtual-Address Caches, Part 1: Problems and Solutions in Uniprocessors" pp. 64-71, in IEEE Micro, Nov/Dec 1997.]] Google ScholarDigital Library
3 J. Chase, H. Levy, and M. Feeley, "Sharing and Protection in a Single-Address-Space Operating System," in ACM Trans. on Computer Systems, pp. 271-307, Nov. 1994.]] Google ScholarDigital Library
4 B.Chemlik, "The SHADE simulator", Sun Labs T.R. 1993.]]Google Scholar
5 J. Chen and A. Borg, "A Simulation Based Study of TLB Performance," in Proceedings of the 19th Ann. Int. Symp. on Computer Architecture, pages 114-123]] Google ScholarDigital Library
6 H.K.J. Chu, "Zero-Copy TCP in Solaris", in 1996 USENIX Annual Technical Conference, January 22-26, 1996, San Diego, California]] Google ScholarDigital Library
7 D.W. Clark and J.S. Emer, "Performance of the VAX-11/780 Translation Buffers: Simulation and Measurement," in ACM Trans. on Computer Systems, vol. 3, no. 1, 1985.]] Google ScholarDigital Library
8 E Dahlgren and E Stenstr6m "Evaluation of Stride and Sequential Hardware-based Prefetching in Shared-Memory Multiprocessors," in IEEE Trans. on Parallel and Distributed Systems, Vol. 7, No. 4, pp. 385-398, April 1996.]] Google ScholarDigital Library
9 J. Huck and J. Hays, "Architecture Support for Translation Table Management in Large Address Space Machines," in Proceedings of the 20th Ann. Int. Symp. on Computer Architecture, pp. 39-50, May 1993.]] Google ScholarDigital Library
10 B. Jacob and T. Mudge, "Software-Managed Address Translation," in Proceedings of the 3rd Int. Symp. on High-Pelformance Computer Architecture, pp. 156-167, Feb 1997.]] Google ScholarDigital Library
11 B. Jacob and T. Mudge, "A Look at Several Memory Management Units and TLB-Refill Mechanisms and Page Table Organizations," in ASPLOS-VIII, pp. 295-306. 1998.]] Google ScholarDigital Library
12 http://www.speech.cs.cmu.edu/speech/sphinx.html]]Google Scholar
13 K. Bala, M.F. Kaashoek, W.E.Weihl, "Software Prefetching and Caching for Translation Lookaside Buffers", in Proceedings of the First Symposium on Operating System Design and Implementation, November 1994.]] Google ScholarDigital Library
14 R.L. Mattson, J. Gecsei, D. Slutz, and I.L. Traiger, "Evaluation Techniques for Storage Hierarchies", in IBM Systems Journal 9 (2):pp.78-117, 1970]]Google ScholarDigital Library
15 J. S. Park and G. S. Ahn, "A Software-controlled Prefetching Mechanism for Software-managed TLBs," in Mic~vprocessing and Microprogramming, Vol .41, No 2. pp. 121-136, May, 1995.]] Google ScholarDigital Library
16 X. Qiu and M. Dubois, "Options for Dynamic Address Translation in COMAs," in Proceedings of the 25th Ann. Int. Symp. on Computer Architecture, pp. 214-225, June 1998.]] Google ScholarDigital Library
17 X. Qiu and M. Dubois, "Tolerating Late Memory Traps in ILP Processors," in Proc. of 26th Ann. Int. Symp. on Computer Architecture, pp. 76-87, 1999.]] Google ScholarDigital Library
18 M. Talluri and M. Hill, "Surpassing the TLB Performance of Superpages with Less Operating System Support," in Proceedings of the Sixth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct 1994.]] Google ScholarDigital Library
19 M. Talluri, S. Kong, M. Hill, and D. Patterson, "Tradeoffs in Supporting Two Page Sizes," in Proceedings of the 19th Ann. Int. Symp. on Computer Architecture, May 1992.]] Google ScholarDigital Library
20 B. Wheeler and B. N. Bershad, "Consistency Management for Virtually Indexed Caches," in Proceedings of the Fifth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct 1992.]] Google ScholarDigital Library
21 http://www.fluent.com]]Google Scholar
22 http://www.newtek.com]]Google Scholar
23 pnmrotate, part of Net PBM distribution, version 7: ftp:// wuarchive.wustl.edu/graphics/graphics/packages/NetPBM]]Google Scholar
24 AMD K-7 Product announcement at microprocessor forum. http ://www.amd.com/products/cpg/k7/micropforum.html]]Google Scholar
25 HAL SPARC64-III, Microprocessor Report, Dec 8, 1997 http ://www.hal. com/home/sp arc 64- 3_mda.html]]Google Scholar
26 A. Seznec, "A Case for Two-Way Skewed-Associative Caches", Proc. 20th Annual Symposium on Computer Architecture, pp. 169-178, May 1993]] Google ScholarDigital Library

Index Terms

Recency-based TLB preloading
1. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Semiconductor memory

Recommendations

Inter-core cooperative TLB for chip multiprocessors
ASPLOS '10

Translation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for ...
Read More
Recency-based TLB preloading
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture

Caching and other latency tolerating techniques have been quite successful in maintaining high memory system performance for general purpose processors. However, TLB misses have become a serious bottleneck as working sets are growing beyond the capacity ...
Read More
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 28, Issue 2
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
May 2000
325 pages
ISSN:0163-5964
DOI:10.1145/342001
Chairmen:
Alan Berenbaum
Lucent Technologies, Berkeley Heights, NJ
,
Joel Emer
Compaq Computer Corp., Palo Alto, CA
Issue’s Table of Contents
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
June 2000
327 pages
ISBN:1581132328
DOI:10.1145/339647
Chairmen:
Alan Berenbaum
Lucent Technologies
,
Joel Emer
Compaq Computer Corp.
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2000
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 113
  Total Citations
  View Citations
- 1,055
  Total Downloads
- Downloads (Last 12 months)106
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Recency-based TLB preloading

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Inter-core cooperative TLB for chip multiprocessors

Recency-based TLB preloading

TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs