AI for Datacenter Optimization (ADOPT'22)

In conjunction with the IEEE International Parallel and Distributed Processing Symposium

Artificial intelligence (AI) and Machine learning (ML) workloads are an increasingly larger share of the compute workloads in traditional High-Performance Computing (HPC) centers and commercial cloud systems. This has led to a new focus on approaches to optimized resource usage, allocations and deployment of new AI frameworks. With these changes, there is a need to better understand HPC/cloud/datacenter operations with the goal of developing improved scheduling policies, identifying inefficiencies in resource utilization, energy/power consumption, failure prediction, as well as identifying policy violations. Simultaneously, there is a growing interest in addressing the increasing power requirements for AI training and inference operations in HPC/Cloud environments with the goal of minimizing the accompanying effects on climate change. There are several publicly available datasets such as the Blue Waters System Monitoring dataset, Philly traces by Microsoft Inc., the Google Cluster Usage traces and the Atlas cluster trace repository that can enable this research. In addition, the recently released MIT Supercloud Dataset includes monitoring logs from the MIT Supercloud system which include time series of CPU and GPU usage by jobs, memory usage, file system logs, and physical monitoring data. This workshop will focus on AI/ML approaches to datacenter operations, power/energy modeling, scheduling, optimization and monitoring tools. In addition to these topics, this workshop solicits papers that address any of the challenges in enabling large-scale AI in HPC/Cloud environments, including:

  • Operational insights from deploying and scaling AI/ML workloads in shared systems
  • Parallelization strategies for AI training
  • Optimization strategies including algorithmic and hardware innovations for faster training and inference 
  • Scheduling strategies for AI and traditional HPC workloads in large scale environments
  • Approaches and instrumentation for measurement of the climate impact of AI

Submission deadline: March 15th, 2022 

Link for paper submissions will be posted here soon.