AI for Datacenter Optimization (ADOPT’23)

The ADOPT Workshop will be held on May 15th, in conjunction with the IEEE International Parallel and Distributed Processing Symposium in St. Petersburg, Florida.

Agengda:

8:45am – 9:00am – Opening remarks

9am – 10am – Invited talks

10am – 10:30am – Break

10:30 – 11am – Predicting Hard Disk Drive faults, failures and associated misbehaviors – Christopher Harrison, Hennish Balu, Ines Dutra

11am – 11:30am – Wireless enabled Inter-Chiplet Communication in DNN Hardware Accelerators – Maurizio Palesi, Enrico Russo, Abhijit Das, John Jose

11:30am – noon – Datacenter Challenge update and closing remarks

Call for papers:

Artificial intelligence (AI) and Machine learning (ML) workloads are an increasingly larger share of the compute workloads in High-Performance Computing (HPC) centers and commercial cloud systems. The compute requirements of AI have led to a corresponding increase in the energy usage and carbon footprint associated with the training and deployment of these models. The combined effect has highlighted the need for approaches to optimized resource usage, allocations and deployment of new AI frameworks. Addressing this challenge requires a deeper understanding of HPC/cloud/datacenter operations in order to develop improved scheduling policies, identify inefficiencies in resource utilization, lower energy/power consumption and develop efficient approaches to datacenter management. Simultaneously, there is a growing interest in HPC/Cloud environments in minimizing the accompanying effects on climate change. This workshop will focus on the development and deployment of AI/ML approaches to datacenter operations, power/energy modeling, scheduling, optimization and monitoring tools. In addition to these topics, this workshop solicits papers that address any of the challenges in enabling large-scale AI in HPC/Cloud environments, including:

•   Carbon/Energy reduction in AI/ML
•   Instrumentation for measurement of the climate impact of AI
•   Operational insights from deploying and scaling AI/ML workloads in shared systems
•   Parallelization strategies for AI training
•   Optimization strategies including algorithmic and hardware innovations for faster training and inference
•   Scheduling strategies for AI and traditional HPC workloads in large scale environments

Submission deadlines:

Paper submission deadline – February 14, 2023
Author notification – March 1, 2023
Camera ready deadline – March 7, 2023

Submission Details: Submissions will be evaluated in a single blind process. Papers must not exceed 8 single-spaced, double-column pages using 10-point font on 8.5×11 inch pages (excluding references). Manuscripts must use IEEE Conference style available here: https://www.ieee.org/conferences/publishing/templates.html . Submitted papers must not be under review at another conference, journal or workshop. One of the authors of accepted papers is expected to present at the workshop.

Link for paper submissions : https://ssl.linklings.net/conferences/ipdps/

Program Chair

Stephanie Brink, LLNL

Program Committee

Albert Reuther, MIT
Prof. Devesh Tiwari, Northeastern University
Julie Mullen, MIT
Siddharth Samsi, MIT