HKUST, Hong Kong,
China
The conference will be streamed live.
The event can be accessed online at: https://hkust-gz-edu-cn.zoom.us/j/93515738063?pwd=alQZnxBIelgzeqoZW3LPC1b9V8V2q5.1
|
9:00-9:10
|
Open remarks |
|---|---|
|
09:10-09:40
|
Invited talk |
|
Uno: A One-Stop Solution for Inter- and Intra-Datacenter Congestion Control and Reliable Connectivity (Link to article) (Presentation slides) Presented by Tommaso Bonato (Microsoft and ETH, Zürich) Abstract (Click arrow to reveal)Cloud computing and AI workloads are driving unprecedented demand for efficient communication within and across datacenters. However, the coexistence of intra- and inter-datacenter traffic within datacenters, plus the disparity between the RTTs of intra- and inter-datacenter networks, complicates congestion management and traffic routing. Particularly, faster congestion responses of intra-datacenter traffic causes rate unfairness when competing with slower inter-datacenter flows. Additionally, inter-datacenter messages suffer from slow loss recovery and, thus, require reliability. Existing solutions overlook these challenges and handle inter- and intra-datacenter congestion with separate control loops or at different granularities. We propose Uno, a unified system for both inter- and intra-datacenter environments that integrates a transport protocol for rapid congestion reaction and fair rate control with a load-balancing scheme that combines erasure coding and adaptive routing. Our findings show that Uno significantly improves the completion times of both inter- and intra-datacenter flows compared to state-of-the-art methods such as Gemini. Speaker bio (Click arrow to reveal)Tommaso Bonato received his BSc in computer science and engineering from the University of Bologna, Italy, and his MSc in computer science from ETH Zürich. During his studies he interned at the European Space Agency (ESA) and Amazon. He is currently pursuing his PhD at ETH Zürich under the supervision of Professor Torsten Hoefler. Tommaso is also a student researcher at Microsoft working on networking topics and Ultra Ethernet. His main research interests include networking protocols and topologies, high performance computing, and AI workloads. |
|
|
09:40-10:25
|
AI4Net paper |
|
Self-supervised Application-level Network Traffic Inversion (Link to article) (Presentation slides) Presented by Shaked Leibzirer (Huawei, Tel-Aviv) Abstract (Click arrow to reveal)Fine-grained monitoring of network traffic is essential for application-level analysis. This need is particularly critical in campus networks as well as large-scale enterprise and service provider environments. However, due to resource limitations, traffic data is often collected using packet sampling techniques, which obstruct accurate temporal analysis. Although statistical scaling methods can estimate approximate total traffic volumes, they typically fail to capture fine-grained temporal dynamics. In this work, we propose a hybrid reconstruction framework that incorporates statistical modeling and attention-based deep neural network (DNN), which allows recovery of high-resolution application-level traffic from sampled data. We evaluated the proposed method on real-world campus network traces. The results demonstrate that our method outperforms traditional smoothing and interpolation techniques, enabling precise estimation of application-specific traffic patterns. Speaker bio (Click arrow to reveal)Shaked Leibzirer is a research scientist at Huawei's Tel-Aviv research center, where he leads R&D in network performance optimization, real-time statistical simulation, and large language model–driven systems. His work spans algorithm design, lightweight task-specific fine-tuning, RAG systems, and the integration of LLMs into network engineering workflows, including a patented innovation in network reliability. Shaked holds an M.Sc. summa cum laude in Mathematics from the Technion, with research in random matrices and simplicial complexes, and has been recognized with multiple awards for both scientific excellence and innovation. |
|
|
10:25-10:45
|
Coffee break |
|
10:45-11:30
|
Net4AI paper |
|
Latency-Optimal Load Balancing For Distributed MoE Inference (Link to article) Presented by German Sviridov (AMD, Singapore) Abstract (Click arrow to reveal)Expert parallelism (EP) has emerged as a promising approach for scaling mixture-of-experts (MoE) inference across multiple devices. However, EP introduces imbalanced workloads among devices, resulting in poor performance and suboptimal hardware utilization rates. Speaker bio (Click arrow to reveal)German Sviridov received his PhD from Politecnico di Torino and he is currently a Staff Researcher at AMD Singapore. His research interests revolve around data center networking and systems for AI. |
|
|
11:30-12:30
|
Invited talk |
|
From Homogeneous to Disaggregated Architectures for Large Model Inference (Presentation slides) Presented by Mingxing Zhang (Tsinghua University, Beijing) Abstract (Click arrow to reveal)Traditional large model inference architectures have been predominantly GPU-centric, owing to their significant advantages in both compute power and bandwidth. However, as GPU utilization approaches its bottleneck, achieving further cost reductions requires exploring new optimization paths. Speaker bio (Click arrow to reveal)ZHANG, Mingxing is a Tenure-track Assistant Professor at Tsinghua University, focusing primarily on memory system research. Initiator of the open-source KVCache.AI projects Mooncake and KTransformers. Authored over 30 papers in prestigious international conferences and journals such as OSDI, SOSP, ASPLOS, HPCA, and EuroSys, including awards such as Best Paper at FAST, Distinguished Paper at SIGSOFT, and the first OSDI paper from a domestic Chinese university. Recipient of the ChinaSys Rising Star and Outstanding Doctoral Dissertation Awards, IEEE TCSC Outstanding Dissertation Award, selected into the China Association for Science and Technology’s Young Talent Support Project. Previously served as Chief Algorithm Technology Expert and Dean of the Innovation Research Institute at Sangfor Technologies, where incubated products have been deployed to tens of thousands of customers. |
|
|
12:25-14:00
|
Lunch |
|
14:00-14:45
|
Net4AI paper |
|
SCALE-CCL: A Scalable Collective Communication Library for Wide-Area Distributed Training (Link to article) (Presentation slides) Presented by Jiaheng Xiong (Politecnico di Milano) Abstract (Click arrow to reveal)Collective Communication Libraries (CCLs) are widely used to coordinate and optimize data exchange among multiple GPUs. As training clusters increasingly span multiple datacenters for scalability and resource pooling, applying CCLs over wide-area networks (WANs) becomes essential. However, existing CCLs are primarily designed for stable, low-latency intra-datacenter environments. In contrast, WANs exhibit dynamic and unpredictable conditions that hinder traditional CCLs. While some recent solutions adopt global optimization techniques, such as Integer Linear Programming (ILP), to improve communication efficiency, these methods assume static topologies and full network visibility, which are often invalid in WAN environments characterized by link variability, limited observability, and dynamic traffic patterns. To address these challenges, we propose SCALE-CCL, a scalable collective heuristic algorithm for optimizing AllGather operations in WAN environments. SCALE-CCL generates global schedules offline from lightweight aggregated metrics (e.g., sampled bandwidth, queue status, and reception history). Thanks to its sub-second synthesis time, the schedule can be recomputed whenever needed, either before each training round or when significant WAN changes are observed, without incurring noticeable overhead.Compared to the state-of-the-art ILP-based solution TE-CCL and baselines including a shortest-path heuristic (SPH) and NCCL, SCALE-CCL reduces scheduling time by up to four orders of magnitude, while keeping the AllGather completion time within a 10% optimality gap in more than 90% of the evaluated scenarios (never exceeding 15%), and it consistently outperforms SPH and NCCL in WAN settings. Speaker bio (Click arrow to reveal)Jiaheng Xiong is a first-year Ph.D. student at Politecnico di Milano, supervised by Prof. Francesco Musumeci and Prof. Massimo Tornatore. His research focuses on wide-area distributed training, collective communication optimization, and cross-datacenter network architectures. He works closely with industrial partners on accelerating large-scale AI workloads across geographically distributed datacenters. His broader interests include traffic engineering, networked systems, and the interplay between AI and networking. |
|
|
14:45-15:15
|
Coffee break |
|
15:15-16:00
|
Net4AI paper |
|
You've got a few GPUs, now what?! --- Experimenting with a Nano-Cluster for Distributed Training of AI Models (Link to article) Presented by Giuseppe Aceto (University of Napoli Federico II) Abstract (Click arrow to reveal)The rapid growth of deep learning models---particularly large language models (LLMs)---has led to a widespread reliance on high-performance datacenters equipped with thousands of GPUs and fast interconnects. However, such infrastructures are often out of reach for academic researchers and institutions with limited computational budgets, and cloud services with GPUs offerings may be unadvisable for privacy or security reasons, or for the difficulty in predicting the total cost. While literature focuses almost exclusively on middle-to-hyper scale clusters, in this work we investigate the feasibility and limitations of distributed training of AI models in constrained environments by employing a minimal datacenter setup, composed of only two servers, each with 128 CPU cores, 512 GB of RAM, 2 consumer-grade GPUs, and interconnected via standard 1 Gigabit Ethernet. We focus on the scalability and the traffic generated during the training of AI architectures (ResNet-18 and GPT2) of different complexity in terms of the number of parameters and layers, and under various parallelization strategies and batch-size configurations. Our experimental results highlight critical bottlenecks in network communication and model synchronization, but also identify viable configurations that offer acceptable training throughput despite the limited hardware. This study aims to provide practical insights into low-cost, multi-GPU, multi-server training setups and inform future efforts to democratize deep learning research. Speaker bio (Click arrow to reveal)Giuseppe Aceto is Associate Professor at the University of Napoli Federico II. He has a PhD in telecommunication engineering from the same University. His teachings include computer programming and blockchains. His research work falls in monitoring of network performance and security, with focus on advanced Deep Learning architectures for traffic classification and prediction, on which he has co-authored highly-cited papers in top-notch international journals. He is the recipient of several awards including the 2018 Best Journal Paper Award by IEEE CSIM, and the Computer Networks 2020 Best Paper Award. |
|
|
16:00-16:40
|
Invited talk |
|
Debriefing the Open Innovation Platform for UnifiedBus (Presentation slides) Presented by Wenjia Wei (Huawei, Nanjing) Abstract (Click arrow to reveal)UnifiedBus (UB) serves as a high-performance interconnect protocol specifically designed for SuperPod networks, unifying IO, memory access, and communication. To empower the academic community to explore innovations in architecture, protocols, and algorithms within this framework, we have introduced five open innovation platforms. In this talk, we present these platforms in detail: the UB Simulation Platform, UB Protocol Verification Platform, Scale-Up Prototype Platform, SuperPod Networking Innovation Platform, and Scale-Out Prototype Platform. By making these tools accessible for rapid verification and experimentation, our goal is to foster an open ecosystem for collaborative research and mutual success in next-generation networking. Speaker bio (Click arrow to reveal)Wenjia Wei received his Ph.D. degree in information and communication engineering from University of Science and Technology of China (USTC), in 2020. He is currently an Technical Expert at Huawei Technologies. His main interests include Network Modeling and Simulation. |
|
|
16:40-17:00
|
Wrap-up discussion |
If you have any problems or questions, please contact us via e-mail at: chairs@inet4ai.org