01 Jan 0001
00:00 - 00:00
Artificial intelligence? Generalized intelligence? Superintelligence? Delivering on this promise requires ever larger models, enormous training datasets, and millions of GPUs. Distributed training enables workloads to be spread over large geographic regions, separated by hundreds or even thousands of kilometers. Novel networks are being introduced to interconnect these compute clusters, resolving power density problems, but introducing new challenges such as restricted bandwidth, network latency, transceiver complexity, and power dissipation.
The key questions to address in this workshop are:
- Which design constraints (latency, total bandwidth, power consumption) constrain distributed training the most?
- What is the practical limit on the maximum distances between geo-distributed AI data centers?
- Which direction should optical transceiver technologies (DSP, FEC, etc) evolve?
- How are transport-layer technologies enabling or hindering the design of large training clusters?
- Does hollow-core fiber or optical switching help?
- Can standards keep up?
- Will we see training clusters further evolve to support “Agentic AI”?
Organizers
-
Brandon Buscaino
Ciena, Canada
-
Sergejs Makovejs
Corning, UnitedKingdom
-
Jeffrey Rahn
Meta, USA
-
Jesse Simsarian
Nokia, USA