Monday, 06 March,
Fotini Karinou, Microsoft, UK
Laurent Schares, IBM TJ Watson Research Center, USA
Chongjin Xie, Alibaba, USA
Data center workloads are continuously growing due to various emerging applications calling for higher bandwidth, lower latency, and more power-efficient networks. Machine Learning (ML) workloads, in particular, grow exponentially in size every year, and training them requires clusters of thousands of interconnected accelerators with Tbps-scale I/O bandwidth per node today. This new hardware needed to serve emerging applications will require innovation in designing and building networks that can scale AI supercomputers without exploding the overall power consumption and cost. This session will discuss emerging trends, including, for example, (1) composable systems with disaggregated resources (GPUs, CPUs, storage/memory) being co-located as a pool that is accessed via a local network and (2) reconfigurable network topologies to provision bandwidth on demand. It will focus on the challenges and opportunities for photonics and will try to address some of the following questions:
- Will optics penetrate composable systems or the accelerator-to-accelerator space?
- Reconfigurable network topologies - what role will they play?
- Ethernet: will it continue to be the driving design paradigm in the AI era?
- CXL or proprietary interconnects: what protocol will dominate the chip-scale fabrics?
Binzhang Fu, Alibaba Cloud, China
High-Performance Networks for Disaggregated Systems
Larry Dennison, NVIDIA Corporation, USA
High Performance Networks for AI Training
Manya Ghobadi, Massachusetts Institute of Technology, USA
High-Performance Optical Networks for ML Workloads
Rui Wang, Google LLC, USA
Reconfigurable Topology in Google’s Datacenter Networks
Ram Huggahalli, Microsoft Azure Hardware Architecture, USA
Optical Opportunities in Datacenters Servers – II