About

I am a third-year Ph.D. student at MMLab, CUHK, advised by Prof. Dahua Lin. My research interests lie in broad area of MLSys, especially efficient large scale DNN training and inference. Before joining CUHK, I received my Bachelor’s degree in Computer Science from University of Chinese Academy of Sciences, advised by Prof. Shiguang Shan.

I will be on the job market for 2025. Please feel free to reach out if you have openings in industry or academia.

My detailed CV can be found here.

News


  • [Aug. 2024] Proteus is finally accepted by TPDS!
  • [Aug. 2024] Start my internship as AWS, Santa Clara!
  • [July 2024] We announce a survey about LLM training system and infra, check arXiv!
  • [July 2024] SKVQ is accepted by COLM 2024. Congratulations to Duanmu!
  • [May 2024] MuxServe is accepted by ICML 2024!
  • [Apr. 2024] I will attend NSDI ‘24 in person at Santa Clara, CA. See you there!

Education


Image


The Chinese University of Hong Kong
Aug. 2021 - July 2025 (Expected)
Ph.D. Candidate in Department of Information Engineering

Image


University of Chinese Academy of Sciences
Sep. 2016 - July 2020
B.E. in Computer Science and Technology

Experience


Catalyst, CMU
Research Intern, Apr. 2022 - May. 2023
Advisors: Zhihao Jia, Minjia Zhang, Xupeng Miao
Cost-efficient DNN training and inference.

MMLab, CUHK
Research Assiant, Aug. 2020 - Apr. 2022
Advisors: Dahua Lin, Shengen Yan, Xiuhong Li
Auto parallel DNN training.

MMLab, CUHK
Research Assiant, July 2019 - July 2020
Mentors: Dahua Lin, Xingcheng Zhang
Optmize large scale data parallel training performance. With sparse communication and system optimization, We train alexnet in 1 minute on a 1000 V100 cluster with Parrots (a DL framework similar to PyTorch).

Publications


  • Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
    Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Chuanfu Xiao, Xingcheng Zhang, Dahua Lin, and Chao Yang
    arXiv Preprint, 2024
    [Paper]

  • SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
    Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng Zhang, and Dahua Lin
    In Proceedings of the Conference on Language Modeling (COLM), October 2024.
    [Paper]

  • MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
    Jiangfei Duan, Runyu Lu, Haojie Duanmu, Xiuhong Li, Xingcheng Zhang, Dahua Lin, Ion Stoica, and Hao Zhang
    In Proceedings of the International Conference on Machine Learning (ICML), July 2024.
    [Paper], [Code], [Blog], [Video (Chinese)]

  • Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning
    Chang Chen, Xiuhong Li, Qianchao Zhu, Jiangfei Duan, Peng Sun, Xingcheng Zhang, and Chao Yang
    In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2024.
    Best Paper Award
    [Paper], [Video (Chinese)]

  • SpotServe: Serving Generative Large Language Models on Preemptible Instances
    Xupeng Miao$^{*}$, Chunan Shi$^{*}$, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, and Zhihao Jia
    In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2024.
    Distinguished Artifact Award
    [Paper], [Code]

  • Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances
    Jiangfei Duan$^{*}$, Ziang Song$^{*}$, Xupeng Miao$^{*}$, Xiaoli Xi, Dahua Lin, Harry Xu, Minjia Zhang, and Zhihao Jia
    In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI), April 2024.
    [Paper], [Code]

  • Proteus: Simulating the Performance of Distributed DNN Training
    Jiangfei Duan, Xiuhong Li, Ping Xu, Xingcheng Zhang, Shengen Yan, Yun Liang, and Dahua Lin
    IEEE Transactions on Parallel and Distributed Systems 2024 (TPDS), August 2024.
    [Paper], [Code]

Survey

  • Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
    Jiangfei Duan$^{*}$, Shuo Zhang$^{*}$, Zerui Wang$^{*}$, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, and Peng Sun
    arXiv Preprint, 2024
    [Paper]

Teaching


TA, IERG3050: Simulation and Statistical Analysis, Fall 2021, CUHK
TA, CSCI2100: Data Structure, Spring 2022, CUHK

Services


AEC Member: MLSys 2023, OSDI 2024, ATC 2024

Awards


Best Paper Award, ASPLOS 2024
Distinguished Artifact Award, ASPLOS 2024
Outstanding Graduate of Beijing, 2020
Outstanding Graduate of University of Chinese Academy of Sciences, 2020
Tang Lixin Scholarship, 2019
First-class Academic Scholarship, UCAS (top 5%), 2017,2018