Fotini Karinou, Microsoft, UK
Laurent Schares, IBM TJ Watson Research Center, USA
Chongjin Xie, Alibaba, USA
Data center workloads are continuously growing due to a variety of emerging applications that are calling for higher bandwidth, lower latency and more power-efficient networks. Machine Learning (ML) workloads, in particular, grow exponentially in size every year, and training them requires clusters of thousands of interconnected accelerators with Tbps-scale I/O bandwidth per node, today. This new hardware needed to serve emerging applications will require innovation on how we design and build networks that can scale AI supercomputers without exploding the overall power consumption and cost.This session will discuss emerging trends including, for example, (1) Composable systems with disaggregated resources (GPUs, CPUs, storage/memory) being co-located as a pool that is accessed via a local network, (2) reconfigurable network topologies to provision bandwidth on demand. It will focus on the challenges and opportunities for photonics and will try to address some of the following questions:
- Will optics penetrate into composable systems or the accelerator-to-accelerator space?
- Reconfigurable network topologies - what role will they play?
- Ethernet: will it continue to be the driving design paradigm in the AI era?
- CXL or proprietary interconnects: what protocol will dominate the chip-scale fabrics?