Modern scientific software in high performance computing is often complex, and many parallel applications and libraries depend on several other software or libraries. Developers and users of such complex software often use package managers for building them. Package managers depend on humans to codify package constraints (for dependency and version selection), and the dependency graph of a software package can often become large (hundreds of vertices). In addition, package constraints often become outdated and inconsistent over time since they are maintained by different people for different packages, which is a laborious task. This can result in package builds to fail for certain package configurations. In this talk, we present a methodology that uses historical build results to assist a package manager in selecting the best versions of package dependencies with an aim to improve the likelihood of a successful build. We utilize a machine learning (ML) model to predict the probability of build outcomes of different configurations of packages in the Spack package manager. When evaluated on common scientific software stacks, this ML model-based approach is able to achieve a 13% higher success rate in building packages than the default version selection mechanism in Spack.
Slides will be available for download here after the presentation.
Daniel Nichols is a fifth year Ph.D. student working with the Parallel Software and Systems Group at the University of Maryland and advised by Professor Abhinav Bhatele. His research interests lie at the intersection of high-performance computing and machine learning, where he focuses on applying machine learning to computer systems problems that arise in supercomputing. His work enables more efficient use of supercomputers through intelligent job scheduling and resource placement, large language model-guided performance optimizations and machine learning driven performance modeling. He is the recipient of the 2024 ACM-IEEE CS George Michael Memorial HPC Fellowship.