About

I am a third-year Ph.D. student at MMLab, CUHK, advised by Prof. Dahua Lin. My research interests lie in broad area of MLSys, especially efficient large scale DNN training and inference. Before joining CUHK, I received my Bachelor’s degree in Computer Science from University of Chinese Academy of Sciences, advised by Prof. Shiguang Shan.

I will be on the job market for 2025. Please feel free to reach out if you have openings in industry or academia.

My detailed CV can be found here.

News

[Aug. 2024] Proteus is finally accepted by TPDS!
[Aug. 2024] Start my internship as AWS, Santa Clara!
[July 2024] We announce a survey about LLM training system and infra, check arXiv!
[July 2024] SKVQ is accepted by COLM 2024. Congratulations to Duanmu!
[May 2024] MuxServe is accepted by ICML 2024!
[Apr. 2024] I will attend NSDI ‘24 in person at Santa Clara, CA. See you there!

Education

The Chinese University of Hong Kong
Aug. 2021 - July 2025 (Expected)
Ph.D. Candidate in Department of Information Engineering

University of Chinese Academy of Sciences
Sep. 2016 - July 2020
B.E. in Computer Science and Technology

Experience

Catalyst, CMU
Research Intern, Apr. 2022 - May. 2023
Advisors: Zhihao Jia, Minjia Zhang, Xupeng Miao
Cost-efficient DNN training and inference.

MMLab, CUHK
Research Assiant, Aug. 2020 - Apr. 2022
Advisors: Dahua Lin, Shengen Yan, Xiuhong Li
Auto parallel DNN training.

MMLab, CUHK
Research Assiant, July 2019 - July 2020
Mentors: Dahua Lin, Xingcheng Zhang
Optmize large scale data parallel training performance. With sparse communication and system optimization, We train alexnet in 1 minute on a 1000 V100 cluster with Parrots (a DL framework similar to PyTorch).

Publications

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design (ICML ’25)
Haojie Duanmu, Xiuhong Li, Zhihang Yuan, Size Zheng Jiangfei Duan$^{*}$, Xingcheng Zhang, and Dahua Lin
In Proceedings of the International Conference on Machine Learning (ICML), July 2025.
[Paper]
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu$^{*}$, Jiangfei Duan$^{*}$, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Chuanfu Xiao, Xingcheng Zhang, Dahua Lin, and Chao Yang
In Proceedings of the Conference on Machine Learning and Systems (MLSys), May 2025.
[Paper]
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng Zhang, and Dahua Lin
In Proceedings of the Conference on Language Modeling (COLM Spotlight), October 2024.
[Paper]
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
Jiangfei Duan, Runyu Lu, Haojie Duanmu, Xiuhong Li, Xingcheng Zhang, Dahua Lin, Ion Stoica, and Hao Zhang
In Proceedings of the International Conference on Machine Learning (ICML), July 2024.
[Paper], [Code], [Blog], [Video (Chinese)]
Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning
Chang Chen, Xiuhong Li, Qianchao Zhu, Jiangfei Duan, Peng Sun, Xingcheng Zhang, and Chao Yang
In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2024.
Best Paper Award
[Paper], [Video (Chinese)]
SpotServe: Serving Generative Large Language Models on Preemptible Instances
Xupeng Miao$^{*}$, Chunan Shi$^{*}$, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, and Zhihao Jia
In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2024.
Distinguished Artifact Award
IEEE Micro Top Picks Honorable Mention
[Paper], [Code]
Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances
Jiangfei Duan$^{*}$, Ziang Song$^{*}$, Xupeng Miao$^{*}$, Xiaoli Xi, Dahua Lin, Harry Xu, Minjia Zhang, and Zhihao Jia
In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI), April 2024.
[Paper], [Code]
Proteus: Simulating the Performance of Distributed DNN Training
Jiangfei Duan, Xiuhong Li, Ping Xu, Xingcheng Zhang, Shengen Yan, Yun Liang, and Dahua Lin
IEEE Transactions on Parallel and Distributed Systems 2024 (TPDS), August 2024.
[Paper], [Code]

Survey

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan$^{*}$, Shuo Zhang$^{*}$, Zerui Wang$^{*}$, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, and Peng Sun
arXiv Preprint, 2024
[Paper]

Teaching

TA, IERG3050: Simulation and Statistical Analysis, Fall 2021, CUHK
TA, CSCI2100: Data Structure, Spring 2022, CUHK

Services

AEC Member: MLSys 2023, OSDI 2024, ATC 2024

Awards

Best Paper Award, ASPLOS 2024
Distinguished Artifact Award, ASPLOS 2024
Outstanding Graduate of Beijing, 2020
Outstanding Graduate of University of Chinese Academy of Sciences, 2020
Tang Lixin Scholarship, 2019
First-class Academic Scholarship, UCAS (top 5%), 2017,2018

Jiangfei Duan