ABSTRACT
Event tracing is an important tool for understanding the performance of parallel applications. As concurrency increases in leadership-class computing systems, the quantity of performance log data can overload the parallel file system, perturbing the application being observed. In this work we present a solution for event tracing at leadership scales. We enhance the I/O forwarding system software to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system for this type of traffic. Furthermore, we augment the I/O forwarding system with a write buffering capability to limit the impact of artificial perturbations from log data accesses on traced applications. To validate the approach, we modify the Vampir tracing toolset to take advantage of this new capability and show that the approach increases the maximum traced application size by a factor of 5x to more than 200,000 processes.
- Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., and Zheng, F. DataStager: Scalable data staging services for petascale applications. In Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing (HPDC) (2009), pp. 39--48. Google ScholarDigital Library
- Ali, N., Carns, P., Iskra, K., Kimpe, D., Lang, S., Latham, R., Ross, R., Ward, L., and Sadayappan, P. Scalable I/O forwarding framework for high-performance computing systems. In Proceedings of the 11th IEEE International Conference on Cluster Computing (CLUSTER) (2009).Google ScholarCross Ref
- Bent, J., Gibson, G., Grider, G., McClelland, B., Nowoczynski, P., Nunez, J., Polte, M., and Wingate, M. PLFS: A checkpoint filesystem for parallel applications. In Proceedings of 21st ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2009). Google ScholarDigital Library
- Bland, A., Kendall, R., Kothe, D., Rogers, J., and Shipman, G. Jaguar: The world's most powerful computer. In Proceedings of the 51st Cray User Group Meeting (CUG) (2009).Google Scholar
- Carns, P., Ligon III, W., Ross, R., and Wyckoff, P. BMI: A network abstraction layer for parallel I/O. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, Workshop on Communication Architecture for Clusters (CAC) (2005). Google ScholarDigital Library
- Chen, J. H., Choudhary, A., de Supinski, B., DeVries, M., Hawkes, E. R., Klasky, S., Liao, W. K., Ma, K. L., Mellor-Crummey, J., Podhorszki, N., Sankaran, R., Shende, S., and Yoo, C. S. Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science & Discovery 2, 1 (2009), 015001.Google ScholarCross Ref
- Ching, A., Choudhary, A., Coloma, K., Liao, W., Ross, R., and Gropp, W. Noncontiguous I/O access through MPI-IO. In Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid) (2003), pp. 104--111. Google ScholarDigital Library
- Docan, C., Parashar, M., and Klasky, S. DART: A substrate for high speed asynchronous data IO. In Proceedings of the 17th International Symposium on High Performance Distributed Computing (HPDC) (2008). Google ScholarDigital Library
- Frings, W., Wolf, F., and Petkov, V. Scalable massively parallel I/O to task-local files. In Proceedings of 21st ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2009). Google ScholarDigital Library
- Ghemawat, S., Gobioff, H., and Leung, S. The Google File System. SIGOPS Operating Systems Review 37 (Oct. 2003), 29--43. Google ScholarDigital Library
- Gygi, F., Duchemin, I., Donadio, D., and Galli, G. Practical algorithms to facilitate large-scale first-principles molecular dynamics. Journal of Physics: Conference Series 180, 1 (2009).Google ScholarCross Ref
- Hildebrand, D., and Honeyman, P. Exporting storage systems in a scalable manner with pNFS. In Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST) (2005), pp. 18--27. Google ScholarDigital Library
- IEEE POSIX Standard 1003.1 2004 Edition. http://www.opengroup.org/onlinepubs/000095399/functions/write.html.Google Scholar
- Iskra, K., Romein, J. W., Yoshii, K., and Beckman, P. ZOID: I/O-forwarding infrastructure for petascale architectures. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) (2008), pp. 153--162. Google ScholarDigital Library
- Jagode, H., Dongarra, J., Alam, S., Vetter, J., Spear, W., and Malony, A. D. A holistic approach for performance measurement and analysis for petascale applications. In Proceedings of the 9th International Conference on Computational Science (ICCS) (2009), vol. 2, pp. 686--695. Google ScholarDigital Library
- Jones, T., Dawson, S., Neely, R., Tuel, W., Brenner, L., Fier, J., Blackmore, R., Caffrey, P., and Maskell, B. Improving the scalability of parallel jobs by adding parallel awareness. In Proceedings of the 15th ACM/IEEE International Conference on High Performance Networking and Computing (SC) (2003). Google ScholarDigital Library
- Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M. S., and Nagel, W. E. The Vampir performance analysis tool-set. In Tools for High Performance Computing (2008), M. Resch, R. Keller, V. Himmler, B. Krammer, and A. Schulz, Eds., Springer Verlag, pp. 139--155.Google ScholarCross Ref
- Lofstead, J. F., Klasky, S., Schwan, K., Podhorszki, N., and Jin, C. Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments (CLADE) (2008), pp. 15--24. Google ScholarDigital Library
- MPI Forum. MPI-2: Extensions to the Message-Passing Interface. http://www.mpi-forum.org/docs/docs.html, 1997.Google Scholar
- Muelder, C., Gygi, F., and Ma, K.-L. Visual analysis of inter-process communication for large-scale parallel computing. IEEE Transactions on Visualization and Computer Graphics 15, 6 (2009), 1129--1136. Google ScholarDigital Library
- Muelder, C., Sigovan, C., Ma, K.-L., Cope, J., Lang, S., Iskra, K., Beckman, P., and Ross, R. Visual analysis of I/O system behavior for high-end computing. In Proceedings of the 3rd International Workshop on Large-Scale System and Application Performance (LSAP) (2011). Google ScholarDigital Library
- Nisar, A., Liao, W., and Choudhary, A. Scaling parallel I/O performance through I/O delegate and caching system. In Proceedings of 20th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2008). Google ScholarDigital Library
- Ohta, K., Kimpe, D., Cope, J., Iskra, K., Ross, R., and Ishikawa, Y. Optimization techniques at the I/O forwarding layer. In Proceedings of the 12th IEEE International Conference on Cluster Computing (CLUSTER) (2010). Google ScholarDigital Library
- Pedretti, K., Brightwell, R., and Williams, J. Cplant#8482; runtime system support for multi-processor and heterogeneous compute nodes. In Proceedings of the 4th IEEE International Conference on Cluster Computing (CLUSTER) (2002), pp. 207--214. Google ScholarDigital Library
- Petascale Data Storage Institute. http://www.pdsi-scidac.org/.Google Scholar
- Peterka, T., Goodell, D., Ross, R., Shen, H.-W., and Thakur, R. A configurable algorithm for parallel image-compositing applications. In Proceedings of 21st ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2009). Google ScholarDigital Library
- Romein, J. Fcnp: Fast I/O on the Blue Gene/P. In Parallel and Distributed Processing Techniques and Applications (PDPTA'09) (2009).Google Scholar
- Shipman, G., Dillow, D., Oral, S., and Wang, F. The Spider center wide file system; from concept to reality. In Proceedings of the 51st Cray User Group Meeting (CUG) (2009).Google Scholar
- Vishwanath, V., Hereld, M., Iskra, K., Kimpe, D., Morozov, V., Papka, M., Ross, R., and Yoshii, K. Accelerating I/O forwarding in IBM Blue Gene/P systems. In Proceedings of 22nd ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2010). Google ScholarDigital Library
- Wylie, B. J. N., Geimer, M., Mohr, B., Böhme, D., Szebenyi, Z., and Wolf, F. Large-scale performance analysis of Sweep3D with the Scalasca toolset. Parallel Processing Letters 20, 4 (2010), 397--414.Google ScholarCross Ref
- Yoshii, K., Iskra, K., Naik, H., Beckman, P., and Broekema, P. C. Performance and scalability evaluation of 'Big Memory' on Blue Gene Linux. International Journal of High Performance Computing Applications 25, 2 (2011), 148--160. Google ScholarDigital Library
- Yu, H., Sahoo, R. K., Howson, C., Almási, G., Castanos, J. G., Gupta, M., Moreira, J. E., Parker, J. J., Engelsiepen, T. E., Ross, R. B., Thakur, R., Latham, R., and Gropp, W. D. High performance file I/O for the Blue Gene/L supercomputer. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA) (2006), pp. 187--196.Google ScholarCross Ref
Index Terms
- Enabling event tracing at leadership-class scale through I/O forwarding middleware
Recommendations
Optimizing I/O forwarding techniques for extreme-scale event tracing
Programming development tools are a vital component for understanding the behavior of parallel applications. Event tracing is a principal ingredient to these tools, but new and serious challenges place event tracing at risk on extreme-scale machines. As ...
Hierarchical Memory Buffering Techniques for an In-Memory Event Tracing Extension to the Open Trace Format 2
ICPP '13: Proceedings of the 2013 42nd International Conference on Parallel ProcessingOne of the most urgent challenges in event based performance analysis is the enormous amount of collected data. A real-time event reduction is crucial to enable a complete in-memory event tracing workflow, which circumvents the limitations of current ...
MPI-focused Tracing with OTFX: An MPI-aware In-memory Event Tracing Extension to the Open Trace Format 2
EuroMPI '15: Proceedings of the 22nd European MPI Users' Group MeetingPerformance analysis tools are more than ever inevitable to develop applications that utilize the enormous computing resources of high performance computing (HPC) systems. In event-based performance analysis the amount of collected data is one of the ...
Comments