Monday, 06 March,
Fotini Karinou, Microsoft, UK
Laurent Schares, IBM TJ Watson Research Center, USA
Chongjin Xie, Alibaba, USA
Data center workloads are continuously growing due to various emerging applications calling for higher bandwidth, lower latency, and more power-efficient networks. Machine Learning (ML) workloads, in particular, grow exponentially in size every year, and training them requires clusters of thousands of interconnected accelerators with Tbps-scale I/O bandwidth per node today. This new hardware needed to serve emerging applications will require innovation in designing and building networks that can scale AI supercomputers without exploding the overall power consumption and cost. This session will discuss emerging trends, including, for example, (1) composable systems with disaggregated resources (GPUs, CPUs, storage/memory) being co-located as a pool that is accessed via a local network and (2) reconfigurable network topologies to provision bandwidth on demand. It will focus on the challenges and opportunities for photonics and will try to address some of the following questions:
- Will optics penetrate composable systems or the accelerator-to-accelerator space?
- Reconfigurable network topologies - what role will they play?
- Ethernet: will it continue to be the driving design paradigm in the AI era?
- CXL or proprietary interconnects: what protocol will dominate the chip-scale fabrics?
Rui Wang, Google LLC, USA
Manya Ghobadi, Massachusetts Institute of Technology, USA
Larry Dennison, NVIDIA Corporation, USA
Binzhang Fu, Alibaba Cloud, China
High-Perfromance Networks for Disaggregated Systems
Ram Huggahalli, Microsoft Azure Hardware Architecuture, USA