research-article

Adaptive set pinning: managing shared caches in chip multiprocessors

Authors:
Shekhar Srikantaiah

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

,
Mahmut Kandemir

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

,
Mary Jane Irwin

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systemsMarch 2008Pages 135–144https://doi.org/10.1145/1346281.1346299

Published:01 March 2008Publication History

ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

Pages 135–144

ABSTRACT

As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource allocation. Shared cache management is a crucial CMP design aspect for the performance of the system. This paper first presents a new classification of cache misses - CII: Compulsory, Inter-processor and Intra-processor misses - for CMPs with shared caches to provide a better understanding of the interactions between memory transactions of different processors at the level of shared cache in a CMP. We then propose a novel approach, called set pinning, for eliminating inter-processor misses and reducing intra-processor misses in a shared cache. Furthermore, we show that an adaptive set pinning scheme improves over the benefits obtained by the set pinning scheme by significantly reducing the number of off-chip accesses. Extensive analysis of these approaches with SPEComp 2001 benchmarks is performed using a full system simulator. Our experiments indicate that the set pinning scheme achieves an average improvement of 22.18% in the L2 miss rate while the adaptive set pinning scheme reduces the miss rates by an average of 47.94% as compared to the traditional shared cache scheme. They also improve the performance by 7.24% and 17.88% respectively.

Supplemental Material

1346299.mp4

mp4

121.3 MB

Download

Available for Download

other

Slides from the presentation

zip

p135-shekhar-slides.zip (23.2 MB)

Supplemental material for Adaptive set pinning: managing shared caches in chip multiprocessors

mp3

1346299.mp3 (8.8 MB)

References

V. Aslot, M. J. Domeika, R. Eigenmann, G. Gaertner, W. B. Jones, and B. Parady. Specomp: A new benchmark suite for measuring parallel computer performance. In Proc. of the International Workshop on OpenMP Applications and Tools, pages 1--10, West Lafayette, 2001. Google ScholarDigital Library
B. M. Beckmann and D. A. Wood. ASR: Adaptive Selective Replication for CMP Caches. In Proc. of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, 2006. Google ScholarDigital Library
B.M. Beckmann and D. A.Wood. Managing wire delay in large chipmultiprocessor caches. In Proc. of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, pages 319--330, Portland, 2004. Google ScholarDigital Library
J. Chang and G. S. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. In Proc. of the 21st ACM International Conference on Supercomputing, Seattle, 2007. Google ScholarDigital Library
J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In Proc. of the International Symposium on Computer Architecture, Boston, 2006. Google ScholarDigital Library
Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing replication, communication, and capacity allocation in cmps. In Proc. of the 32nd Annual International Symposium on Computer Architecture, pages 357--368, Madison, 2005. Google ScholarDigital Library
J. D. Collins and D. M. Tullsen. Runtime identification of cache conflict misses: The adaptive miss buffer. ACM Trans. Comput. Syst., 19(4):413--439, 2001. Google ScholarDigital Library
M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenstrom. The detection and elimination of useless misses in multiprocessors. In Proc. of the 20th Annual International Symposium on Computer Architecture, pages 88--97, San Diego, 1993. Google ScholarDigital Library
J. L. Hennessy and D. A. Patterson. Computer Architecture : A Quantitative Approach; second edition. Morgan Kaufmann, 1996. HEN j2 96:1 1.Ex. Google ScholarDigital Library
M. D. Hill. Aspects of Cache Memory and Instruction Buffer Performance. PhD thesis, EECS Department, University of California, Berkeley, 1987. Google ScholarDigital Library
M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Transactions on Computers, 38(12):1612--1629, 1989. Google ScholarDigital Library
L. Hsu, R. Iyer, S. Makineni, S. Reinhardt, and D. Newell. Exploring the cache design space for large scale cmps. SIGARCH Computer Architecture News, 33(4):24--33, 2005. Google ScholarDigital Library
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. of the 17th Annual International Symposium on Computer Architecture, Seattle, 1990. Google ScholarDigital Library
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proc. of the 13th International Conference on Parallel Architectures and Compilation Techniques, Paris, 2004. Google ScholarDigital Library
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21--29, 2005. Google ScholarDigital Library
C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the last line of defense before hitting the memory wall for cmps. In Proc. of International Symposium on High Performance Computer Architecture, Madrid, 2004. Google ScholarDigital Library
J. Lu, A. Das, W.-C. Hsu, K. Nguyen, and S. G. Abraham. Dynamic helper threaded prefetching on the sun ultrasparc cmp processor. In Proc. of the International Symposium on Microarchitecture, Barcelona, 2005. Google ScholarDigital Library
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002. Google ScholarDigital Library
G. Memik, G. Reinman, and W. H. Mangione-Smith. Just say no: Benefits of early cache miss determination. In Proc. of the 9th International Symposium on High-Performance Computer Architecture, Anaheim, February 2003. Google ScholarDigital Library
G. Memik, G. Reinman, and W. H. Mangione-Smith. Reducing energy and delay using efficient victim caches. In Proc. of the 2003 International Symposium on Low Power Electronics and Design, Seoul, 2003. Google ScholarDigital Library
P. Petoumenos, G. Keramidas, H. Zeffer, S. Kaxiras, and E. Hagersten. Modeling cache sharing on chip multiprocessor architectures. In Proc. of the IEEE International Symposium on Workload Characterization, 2006.Google ScholarCross Ref
M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, 2006. Google ScholarDigital Library
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proc. of the 15th International Conference on Parallel architectures and Compilation Techniques, Seattle, 2006. Google ScholarDigital Library
X. Shi, Z. Yang, J. Peir, L. Peng, Y.-K. Chen, Lee, and Liang. Coterminous locality and coterminous group data prefetching on chipmultiprocessors. In Proc. of the 20th International Parallel and Distributed Processing Symposium, Rhodes Island, 2006. Google ScholarDigital Library
B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM J. Res. Dev., 49(4/5):505--521, 2005. Google ScholarDigital Library
B. Sonah and M. R. Ito. Modeling rate-based dynamic cache sharing for distributed vod systems. In Proc. of the The International Conference on Information Technology: Coding and Computing, page 489, 2000. Google ScholarDigital Library
L. Spracklen, Y. Chou, and S. G. Abraham. Effective instruction prefetching in chip multiprocessors for modern commercial applications. In Proc. of International Symposium on High-Performance Computer Architecture, San Francisco, February 2005. Google ScholarDigital Library
R. A. Sugumar. Multi Configuration Simulation Algorithms for the Evaluation of Computer Architecture Designs. PhD thesis, UMich, 1993. Google ScholarDigital Library
R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with application to miss characterization. In ACM SIGMETRICS Conference on Measurment and Modeling of Computer Systems, Santa Clara, 1993. Google ScholarDigital Library
G. E. Suh, L. Rudolph, and S. Devadas Dynamic Partitioning of Shared Cache Memory. J. Supercomput. 28, 1, Apr. 2004, 7--26. Google ScholarDigital Library
D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared l2 caches on multicore systems in software. In Proc. of the Workshop on the Interaction between Operating Systems and Computer Architecture, San Diego, 2007.Google Scholar
N. Topham, A. Gonzalez, and J. Gonzalez. The design and performance of a conflict-avoiding cache. In Proc. of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pages 71--80, Research Triangle Park, 1997. Google ScholarDigital Library
H. Vandierendonck, P. Manet, and J.-D. Legat. Application-specific reconfigurable xor-indexing to eliminate cache conflict misses. In Proc. of the Conference on Design, Automation and Test in Europe, pages 357--362, Munich, 2006. Google ScholarDigital Library
D. Wood and A. Alameldeen. Interactions between compression and prefetching in chip multiprocessors. In Proc. of the 13th International Symposium on High-Performance Computer Architecture, Phoenix, 2007. Google ScholarDigital Library
C. Zhang. Balanced cache: Reducing conflict misses of direct-mapped caches. In Proc. of International Symposium on Computer Architecture, Boston, 2006. Google ScholarDigital Library
M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proc. of the 32nd Annual International Symposium on Computer Architecture, Madison, 2005. Google ScholarDigital Library

Index Terms

Adaptive set pinning: managing shared caches in chip multiprocessors
1. General and reference
  1. Cross-computing tools and techniques
    1. Design
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Adaptive set pinning: managing shared caches in chip multiprocessors
ASPLOS '08

As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource ...
Read More
Adaptive set pinning: managing shared caches in chip multiprocessors
ASPLOS '08

As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource ...
Read More
Adaptive set pinning: managing shared caches in chip multiprocessors
ASPLOS '08

As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
March 2008
352 pages
ISBN:9781595939586
DOI:10.1145/1346281
General Chair:
Susan Eggers
University of Washington, USA
,
Program Chair:
James Larus
Microsoft Research, USA
ACM SIGPLAN Notices Volume 43, Issue 3
ASPLOS '08
March 2008
339 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1353536
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 36, Issue 1
ASPLOS '08
March 2008
339 pages
ISSN:0163-5964
DOI:10.1145/1353534
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 42, Issue 2
ASPLOS '08
March 2008
339 pages
ISSN:0163-5980
DOI:10.1145/1353535
Issue’s Table of Contents
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CMP
inter-processor
intra-processor
set pinning
shared cache
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS XIII Paper Acceptance Rate31of127submissions,24%Overall Acceptance Rate535of2,713submissions,20%
More
Upcoming Conference
ASPLOS '24

Sponsor:

sigarch

sigarch

sigarch

29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 27 - May 1, 2024

La Jolla , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 111
  Total Citations
  View Citations
- 1,489
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptive set pinning: managing shared caches in chip multiprocessors

ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Adaptive set pinning: managing shared caches in chip multiprocessors

Adaptive set pinning: managing shared caches in chip multiprocessors

Adaptive set pinning: managing shared caches in chip multiprocessors