About

I am a third-year Ph.D. student at MMLab, CUHK, advised by Prof. Dahua Lin. My research interests lie in broad area of MLSys, especially efficient large scale DNN training and inference. Before joining CUHK, I received my Bachelor’s degree in Computer Science from University of Chinese Academy of Sciences, advised by Prof. Shiguang Shan.

I will be on the job market for 2025. Please feel free to reach out if you have openings in industry or academia.

My detailed CV can be found here.

News


  • [May 2024] MuxServe is accepted by ICML 2024!
  • [Apr. 2024] I will attend NSDI ‘24 in person at Santa Clara, CA. See you there!

Education


Image


The Chinese University of Hong Kong
Aug. 2021 - July 2025 (Expected)
Ph.D. Candidate in Department of Information Engineering

Image


University of Chinese Academy of Sciences
Sep. 2016 - July 2020
B.E. in Computer Science and Technology

Experience


Catalyst, CMU
Research Intern, Apr. 2022 - May. 2023
Advisors: Prof. Zhihao Jia, Dr. Minjia Zhang, Dr. Xupeng Miao
Cost-efficient DNN training and inference.

MMLab, CUHK
Research Assiant, Aug. 2020 - Apr. 2022
Advisors: Prof. Dahua Lin, Prof. Shengen Yan, Prof. Xiuhong Li
Auto parallel DNN training.

MMLab, CUHK
Research Assiant, July 2019 - July 2020
Mentors: Prof. Dahua Lin, Xingcheng Zhang
Optmize large scale data parallel training performance. With sparse communication and system optimization, We train alexnet in 1 minute on a 1000 V100 cluster with Parrots (a DL framework similar to PyTorch).

Publications


  • SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
    Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng Zhang, and Dahua Lin
    arXiv Preprint, 2024
    [Paper]

  • MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving
    Jiangfei Duan, Runyu Lu, Haojie Duanmu, Xiuhong Li, Xingcheng Zhang, Dahua Lin, Ion Stoica, and Hao Zhang
    In Proceedings of the International Conference on Machine Learning (ICML), July 2024.
    [Paper]

  • Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning
    Chang Chen, Xiuhong Li, Qianchao Zhu, Jiangfei Duan, Peng Sun, Xingcheng Zhang, and Chao Yang
    In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2024.
    Best Paper Award
    [Paper]

  • SpotServe: Serving Generative Large Language Models on Preemptible Instances
    Xupeng Miao$^{*}$, Chunan Shi$^{*}$, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, and Zhihao Jia
    In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2024.
    Distinguished Artifact Award
    [Paper], [Code]

  • Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances
    Jiangfei Duan$^{*}$, Ziang Song$^{*}$, Xupeng Miao$^{*}$, Xiaoli Xi, Dahua Lin, Harry Xu, Minjia Zhang, and Zhihao Jia
    In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI), April 2024.
    [Paper], [Code]

  • Proteus: Simulating the Performance of Distributed DNN Training
    Jiangfei Duan, Xiuhong Li, Ping Xu, Xingcheng Zhang, Shengen Yan, Yun Liang, and Dahua Lin
    arXiv Preprint, 2023
    [Paper], [Code]

Teaching


TA, IERG3050: Simulation and Statistical Analysis, Fall 2021, CUHK
TA, CSCI2100: Data Structure, Spring 2022, CUHK

Services


AEC Member: MLSys 2023, OSDI 2024, ATC 2024

Awards


Best Paper Award, ASPLOS 2024
Distinguished Artifact Award, ASPLOS 2024
Outstanding Graduate of Beijing, 2020
Outstanding Graduate of University of Chinese Academy of Sciences, 2020
Tang Lixin Scholarship, 2019
First-class Academic Scholarship, UCAS (top 5%), 2017,2018