skip to main content
10.1145/1346281.1346299acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Adaptive set pinning: managing shared caches in chip multiprocessors

Published:01 March 2008Publication History

ABSTRACT

As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource allocation. Shared cache management is a crucial CMP design aspect for the performance of the system. This paper first presents a new classification of cache misses - CII: Compulsory, Inter-processor and Intra-processor misses - for CMPs with shared caches to provide a better understanding of the interactions between memory transactions of different processors at the level of shared cache in a CMP. We then propose a novel approach, called set pinning, for eliminating inter-processor misses and reducing intra-processor misses in a shared cache. Furthermore, we show that an adaptive set pinning scheme improves over the benefits obtained by the set pinning scheme by significantly reducing the number of off-chip accesses. Extensive analysis of these approaches with SPEComp 2001 benchmarks is performed using a full system simulator. Our experiments indicate that the set pinning scheme achieves an average improvement of 22.18% in the L2 miss rate while the adaptive set pinning scheme reduces the miss rates by an average of 47.94% as compared to the traditional shared cache scheme. They also improve the performance by 7.24% and 17.88% respectively.

Skip Supplemental Material Section

Supplemental Material

1346299.mp4

mp4

121.3 MB

References

  1. V. Aslot, M. J. Domeika, R. Eigenmann, G. Gaertner, W. B. Jones, and B. Parady. Specomp: A new benchmark suite for measuring parallel computer performance. In Proc. of the International Workshop on OpenMP Applications and Tools, pages 1--10, West Lafayette, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. M. Beckmann and D. A. Wood. ASR: Adaptive Selective Replication for CMP Caches. In Proc. of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B.M. Beckmann and D. A.Wood. Managing wire delay in large chipmultiprocessor caches. In Proc. of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, pages 319--330, Portland, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Chang and G. S. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. In Proc. of the 21st ACM International Conference on Supercomputing, Seattle, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In Proc. of the International Symposium on Computer Architecture, Boston, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing replication, communication, and capacity allocation in cmps. In Proc. of the 32nd Annual International Symposium on Computer Architecture, pages 357--368, Madison, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. D. Collins and D. M. Tullsen. Runtime identification of cache conflict misses: The adaptive miss buffer. ACM Trans. Comput. Syst., 19(4):413--439, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenstrom. The detection and elimination of useless misses in multiprocessors. In Proc. of the 20th Annual International Symposium on Computer Architecture, pages 88--97, San Diego, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. L. Hennessy and D. A. Patterson. Computer Architecture : A Quantitative Approach; second edition. Morgan Kaufmann, 1996. HEN j2 96:1 1.Ex. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. D. Hill. Aspects of Cache Memory and Instruction Buffer Performance. PhD thesis, EECS Department, University of California, Berkeley, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Transactions on Computers, 38(12):1612--1629, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Hsu, R. Iyer, S. Makineni, S. Reinhardt, and D. Newell. Exploring the cache design space for large scale cmps. SIGARCH Computer Architecture News, 33(4):24--33, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. of the 17th Annual International Symposium on Computer Architecture, Seattle, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proc. of the 13th International Conference on Parallel Architectures and Compilation Techniques, Paris, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21--29, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the last line of defense before hitting the memory wall for cmps. In Proc. of International Symposium on High Performance Computer Architecture, Madrid, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Lu, A. Das, W.-C. Hsu, K. Nguyen, and S. G. Abraham. Dynamic helper threaded prefetching on the sun ultrasparc cmp processor. In Proc. of the International Symposium on Microarchitecture, Barcelona, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Memik, G. Reinman, and W. H. Mangione-Smith. Just say no: Benefits of early cache miss determination. In Proc. of the 9th International Symposium on High-Performance Computer Architecture, Anaheim, February 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Memik, G. Reinman, and W. H. Mangione-Smith. Reducing energy and delay using efficient victim caches. In Proc. of the 2003 International Symposium on Low Power Electronics and Design, Seoul, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Petoumenos, G. Keramidas, H. Zeffer, S. Kaxiras, and E. Hagersten. Modeling cache sharing on chip multiprocessor architectures. In Proc. of the IEEE International Symposium on Workload Characterization, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  22. M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proc. of the 15th International Conference on Parallel architectures and Compilation Techniques, Seattle, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Shi, Z. Yang, J. Peir, L. Peng, Y.-K. Chen, Lee, and Liang. Coterminous locality and coterminous group data prefetching on chipmultiprocessors. In Proc. of the 20th International Parallel and Distributed Processing Symposium, Rhodes Island, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM J. Res. Dev., 49(4/5):505--521, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Sonah and M. R. Ito. Modeling rate-based dynamic cache sharing for distributed vod systems. In Proc. of the The International Conference on Information Technology: Coding and Computing, page 489, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Spracklen, Y. Chou, and S. G. Abraham. Effective instruction prefetching in chip multiprocessors for modern commercial applications. In Proc. of International Symposium on High-Performance Computer Architecture, San Francisco, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. A. Sugumar. Multi Configuration Simulation Algorithms for the Evaluation of Computer Architecture Designs. PhD thesis, UMich, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with application to miss characterization. In ACM SIGMETRICS Conference on Measurment and Modeling of Computer Systems, Santa Clara, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. E. Suh, L. Rudolph, and S. Devadas Dynamic Partitioning of Shared Cache Memory. J. Supercomput. 28, 1, Apr. 2004, 7--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared l2 caches on multicore systems in software. In Proc. of the Workshop on the Interaction between Operating Systems and Computer Architecture, San Diego, 2007.Google ScholarGoogle Scholar
  32. N. Topham, A. Gonzalez, and J. Gonzalez. The design and performance of a conflict-avoiding cache. In Proc. of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pages 71--80, Research Triangle Park, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Vandierendonck, P. Manet, and J.-D. Legat. Application-specific reconfigurable xor-indexing to eliminate cache conflict misses. In Proc. of the Conference on Design, Automation and Test in Europe, pages 357--362, Munich, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Wood and A. Alameldeen. Interactions between compression and prefetching in chip multiprocessors. In Proc. of the 13th International Symposium on High-Performance Computer Architecture, Phoenix, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Zhang. Balanced cache: Reducing conflict misses of direct-mapped caches. In Proc. of International Symposium on Computer Architecture, Boston, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proc. of the 32nd Annual International Symposium on Computer Architecture, Madison, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive set pinning: managing shared caches in chip multiprocessors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
        March 2008
        352 pages
        ISBN:9781595939586
        DOI:10.1145/1346281
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 43, Issue 3
          ASPLOS '08
          March 2008
          339 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1353536
          Issue’s Table of Contents
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 36, Issue 1
          ASPLOS '08
          March 2008
          339 pages
          ISSN:0163-5964
          DOI:10.1145/1353534
          Issue’s Table of Contents
        • cover image ACM SIGOPS Operating Systems Review
          ACM SIGOPS Operating Systems Review  Volume 42, Issue 2
          ASPLOS '08
          March 2008
          339 pages
          ISSN:0163-5980
          DOI:10.1145/1353535
          Issue’s Table of Contents

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 March 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASPLOS XIII Paper Acceptance Rate31of127submissions,24%Overall Acceptance Rate535of2,713submissions,20%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader