ABSTRACT
As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource allocation. Shared cache management is a crucial CMP design aspect for the performance of the system. This paper first presents a new classification of cache misses - CII: Compulsory, Inter-processor and Intra-processor misses - for CMPs with shared caches to provide a better understanding of the interactions between memory transactions of different processors at the level of shared cache in a CMP. We then propose a novel approach, called set pinning, for eliminating inter-processor misses and reducing intra-processor misses in a shared cache. Furthermore, we show that an adaptive set pinning scheme improves over the benefits obtained by the set pinning scheme by significantly reducing the number of off-chip accesses. Extensive analysis of these approaches with SPEComp 2001 benchmarks is performed using a full system simulator. Our experiments indicate that the set pinning scheme achieves an average improvement of 22.18% in the L2 miss rate while the adaptive set pinning scheme reduces the miss rates by an average of 47.94% as compared to the traditional shared cache scheme. They also improve the performance by 7.24% and 17.88% respectively.
Supplemental Material
Available for Download
Slides from the presentation
Supplemental material for Adaptive set pinning: managing shared caches in chip multiprocessors
- V. Aslot, M. J. Domeika, R. Eigenmann, G. Gaertner, W. B. Jones, and B. Parady. Specomp: A new benchmark suite for measuring parallel computer performance. In Proc. of the International Workshop on OpenMP Applications and Tools, pages 1--10, West Lafayette, 2001. Google ScholarDigital Library
- B. M. Beckmann and D. A. Wood. ASR: Adaptive Selective Replication for CMP Caches. In Proc. of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, 2006. Google ScholarDigital Library
- B.M. Beckmann and D. A.Wood. Managing wire delay in large chipmultiprocessor caches. In Proc. of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, pages 319--330, Portland, 2004. Google ScholarDigital Library
- J. Chang and G. S. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. In Proc. of the 21st ACM International Conference on Supercomputing, Seattle, 2007. Google ScholarDigital Library
- J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In Proc. of the International Symposium on Computer Architecture, Boston, 2006. Google ScholarDigital Library
- Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing replication, communication, and capacity allocation in cmps. In Proc. of the 32nd Annual International Symposium on Computer Architecture, pages 357--368, Madison, 2005. Google ScholarDigital Library
- J. D. Collins and D. M. Tullsen. Runtime identification of cache conflict misses: The adaptive miss buffer. ACM Trans. Comput. Syst., 19(4):413--439, 2001. Google ScholarDigital Library
- M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenstrom. The detection and elimination of useless misses in multiprocessors. In Proc. of the 20th Annual International Symposium on Computer Architecture, pages 88--97, San Diego, 1993. Google ScholarDigital Library
- J. L. Hennessy and D. A. Patterson. Computer Architecture : A Quantitative Approach; second edition. Morgan Kaufmann, 1996. HEN j2 96:1 1.Ex. Google ScholarDigital Library
- M. D. Hill. Aspects of Cache Memory and Instruction Buffer Performance. PhD thesis, EECS Department, University of California, Berkeley, 1987. Google ScholarDigital Library
- M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Transactions on Computers, 38(12):1612--1629, 1989. Google ScholarDigital Library
- L. Hsu, R. Iyer, S. Makineni, S. Reinhardt, and D. Newell. Exploring the cache design space for large scale cmps. SIGARCH Computer Architecture News, 33(4):24--33, 2005. Google ScholarDigital Library
- N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. of the 17th Annual International Symposium on Computer Architecture, Seattle, 1990. Google ScholarDigital Library
- S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proc. of the 13th International Conference on Parallel Architectures and Compilation Techniques, Paris, 2004. Google ScholarDigital Library
- P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21--29, 2005. Google ScholarDigital Library
- C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the last line of defense before hitting the memory wall for cmps. In Proc. of International Symposium on High Performance Computer Architecture, Madrid, 2004. Google ScholarDigital Library
- J. Lu, A. Das, W.-C. Hsu, K. Nguyen, and S. G. Abraham. Dynamic helper threaded prefetching on the sun ultrasparc cmp processor. In Proc. of the International Symposium on Microarchitecture, Barcelona, 2005. Google ScholarDigital Library
- P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002. Google ScholarDigital Library
- G. Memik, G. Reinman, and W. H. Mangione-Smith. Just say no: Benefits of early cache miss determination. In Proc. of the 9th International Symposium on High-Performance Computer Architecture, Anaheim, February 2003. Google ScholarDigital Library
- G. Memik, G. Reinman, and W. H. Mangione-Smith. Reducing energy and delay using efficient victim caches. In Proc. of the 2003 International Symposium on Low Power Electronics and Design, Seoul, 2003. Google ScholarDigital Library
- P. Petoumenos, G. Keramidas, H. Zeffer, S. Kaxiras, and E. Hagersten. Modeling cache sharing on chip multiprocessor architectures. In Proc. of the IEEE International Symposium on Workload Characterization, 2006.Google ScholarCross Ref
- M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, 2006. Google ScholarDigital Library
- N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proc. of the 15th International Conference on Parallel architectures and Compilation Techniques, Seattle, 2006. Google ScholarDigital Library
- X. Shi, Z. Yang, J. Peir, L. Peng, Y.-K. Chen, Lee, and Liang. Coterminous locality and coterminous group data prefetching on chipmultiprocessors. In Proc. of the 20th International Parallel and Distributed Processing Symposium, Rhodes Island, 2006. Google ScholarDigital Library
- B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM J. Res. Dev., 49(4/5):505--521, 2005. Google ScholarDigital Library
- B. Sonah and M. R. Ito. Modeling rate-based dynamic cache sharing for distributed vod systems. In Proc. of the The International Conference on Information Technology: Coding and Computing, page 489, 2000. Google ScholarDigital Library
- L. Spracklen, Y. Chou, and S. G. Abraham. Effective instruction prefetching in chip multiprocessors for modern commercial applications. In Proc. of International Symposium on High-Performance Computer Architecture, San Francisco, February 2005. Google ScholarDigital Library
- R. A. Sugumar. Multi Configuration Simulation Algorithms for the Evaluation of Computer Architecture Designs. PhD thesis, UMich, 1993. Google ScholarDigital Library
- R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with application to miss characterization. In ACM SIGMETRICS Conference on Measurment and Modeling of Computer Systems, Santa Clara, 1993. Google ScholarDigital Library
- G. E. Suh, L. Rudolph, and S. Devadas Dynamic Partitioning of Shared Cache Memory. J. Supercomput. 28, 1, Apr. 2004, 7--26. Google ScholarDigital Library
- D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared l2 caches on multicore systems in software. In Proc. of the Workshop on the Interaction between Operating Systems and Computer Architecture, San Diego, 2007.Google Scholar
- N. Topham, A. Gonzalez, and J. Gonzalez. The design and performance of a conflict-avoiding cache. In Proc. of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pages 71--80, Research Triangle Park, 1997. Google ScholarDigital Library
- H. Vandierendonck, P. Manet, and J.-D. Legat. Application-specific reconfigurable xor-indexing to eliminate cache conflict misses. In Proc. of the Conference on Design, Automation and Test in Europe, pages 357--362, Munich, 2006. Google ScholarDigital Library
- D. Wood and A. Alameldeen. Interactions between compression and prefetching in chip multiprocessors. In Proc. of the 13th International Symposium on High-Performance Computer Architecture, Phoenix, 2007. Google ScholarDigital Library
- C. Zhang. Balanced cache: Reducing conflict misses of direct-mapped caches. In Proc. of International Symposium on Computer Architecture, Boston, 2006. Google ScholarDigital Library
- M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proc. of the 32nd Annual International Symposium on Computer Architecture, Madison, 2005. Google ScholarDigital Library
Index Terms
- Adaptive set pinning: managing shared caches in chip multiprocessors
Recommendations
Adaptive set pinning: managing shared caches in chip multiprocessors
ASPLOS '08As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource ...
Adaptive set pinning: managing shared caches in chip multiprocessors
ASPLOS '08As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource ...
Adaptive set pinning: managing shared caches in chip multiprocessors
ASPLOS '08As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource ...
Comments