Publications

Publications by categories in reversed chronological order. 1 represents co-first author.

2025

  1. ASPLOS
    Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs
    Yixuan Mei, Yonghao Zhuang,  Xupeng Miao,  Juncheng Yang and 2 more authors
    Proceedings of ASPLOS Conference 2025
  2. ASPLOS
    GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
    Byungsoo Jeon, Mengdi Wu, Shiyi Cao,  Sunghyun Kim and 10 more authors
    Proceedings of ASPLOS Conference 2025

2024

  1. NeurIPS
    LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
    Xiaonan Nie, Qibin Liu, Fangcheng Fu,  Shenhan Zhu and 5 more authors
    Proceedings of NeurIPS Conference 2024
  2. SOSP
    Enabling Parallelism Hot Switching for Efficient Training of Large Language Models
    Hao Ge, Fangcheng Fu, Haoyang Li,  Xuanyu Wang and 6 more authors
    Proceedings of SOSP Conference 2024
  3. SC
    Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs
    Mingkuan Xu, Shiyi Cao,  Xupeng Miao,  Umut Acar and 1 more author
    Proceedings of SC Conference 2024
  4. SIGMOD
    Demystifying Data Management for Large Language Models (Tutorial)
    Xupeng Miao, Zhihao Jia,  and Bin Cui
    Proceedings of SIGMOD Conference 2024
  5. ASPLOS
    SpotServe: Serving Generative Large Language Models on Preemptible Instances (Distinguished Artifact Award)
    Xupeng Miao, Chunan Shi, Jiangfei Duan,  Xiaoli Xi and 3 more authors
    Proceedings of ASPLOS Conference 2024
  6. ASPLOS
    SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
    Xupeng Miao, Gabriele Oliaro, Zhihao Zhang,  Xinhao Cheng and 10 more authors
    Proceedings of ASPLOS Conference 2024
  7. ASPLOS
    Optimal Kernel Orchestration for Tensor Programs with Korch
    Muyan Hu, Ashwin Venkatram, Shreyashri Biswas,  Balamurugan Marimuthu and 7 more authors
    Proceedings of ASPLOS Conference 2024
  8. NSDI
    Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances
    Jiangfei Duan1, Ziang Song1Xupeng Miao1,  Xiaoli Xi and 4 more authors
    Proceedings of NSDI Conference 2024
  9. ACL
    Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models (Outstanding Paper Award)
    Zhengxin Zhang, Dan Zhao,  Xupeng Miao,  Gabriele Oliaro and 3 more authors
    Proceedings of ACL Conference 2024
  10. IJCAI
    X-former Elucidator: Reviving Efficient Attention for Long Context Language Modeling
    Xupeng Miao, Shenhan Zhu, Fangcheng Fu,  Ziyu Guo and 4 more authors
    Proceedings of IJCAI Conference 2024
  11. VLDB
    Experimental Analysis of Large-scale Learnable Vector Storage Compression
    Hailin Zhang, Penghao Zhao,  Xupeng Miao,  Yingxia Shao and 3 more authors
    Proc. VLDB Endow. 2024
  12. ICDE
    MFIX: An Efficient and Reliable Index Advisor via Multi-Fidelity Bayesian Optimization
    Zhuo Chang, Xinyi Zhang, Yang Li,  Xupeng Miao and 2 more authors
    Proceedings of ICDE Conference 2024
  13. TKDE
    Improving Automatic Parallel Training via Balanced Memory Workload Optimization
    Yujie Wang, Youhe Jiang,  Xupeng Miao,  Fangcheng Fu and 4 more authors
    IEEE Transactions on Knowledge and Data Engineering 2024
  14. AAAI
    Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
    Zihao Yu, Haoyang Li, Fangcheng Fu,  Xupeng Miao and 1 more author
    In Proceedings of AAAI Conference 2024
  15. EACL
    Generative Dense Retrieval: Memory Can Be a Burden
    Peiwen Yuan, Xinglin Wang, Shaoxiong Feng,  Boyuan Pan and 4 more authors
    Proceedings of EACL Conference 2024
  16. CSUR
    Distributed Graph Neural Network Training: A Survey
    Yingxia Shao, Hongzheng Li, Xizhi Gu,  Hongbo Yin and 5 more authors
    ACM Computing Surveys 2024

2023

  1. arXiv
    Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
    Xupeng Miao, Gabriele Oliaro, Zhihao Zhang,  Xinhao Cheng and 3 more authors
    arXiv preprint arXiv:2312.15234 2023
  2. OSDI
    EinNet: Optimizing Tensor Programs with Derivation-Based Transformations
    Liyan Zheng, Haojie Wang, Jidong Zhai,  Muyan Hu and 7 more authors
    Proceedings of OSDI Conference 2023
  3. VLDB
    SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training
    Xupeng Miao, Yining Shi, Zhi Yang,  Bin Cui and 1 more author
    Proc. VLDB Endow. 2023
  4. VLDB
    Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
    Xupeng Miao, Yujie Wang, Youhe Jiang,  Chunan Shi and 3 more authors
    Proc. VLDB Endow. 2023
  5. VLDB
    Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
    Xiaonan Nie, Yi Liu, Fangcheng Fu,  Jinbao Xue and 4 more authors
    Proc. VLDB Endow. (Industry) 2023
  6. SIGMOD
    FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
    Xiaonan Nie,  Xupeng Miao, Zilong Wang,  Jilong Xue and 4 more authors
    Proceedings of SIGMOD Conference 2023
  7. IJCAI
    OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
    Youhe Jiang, Fangcheng Fu,  Xupeng Miao,  Xiaonan Nie and 1 more author
    Proceedings of IJCAI Conference 2023
  8. NeurIPS
    Model-enhanced Vector Index
    Hailin Zhang, Yujing Wang, Qi Chen,  Ruiheng Chang and 15 more authors
    Proceedings of NeurIPS Conference 2023
  9. NeurIPS
    Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
    Zihao Yu, Haoyang Li, Fangcheng Fu,  Xupeng Miao and 1 more author
    In Proceedings of NeurIPS ML for Systems (MLSys) Workshop 2023
  10. AAAI
    CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
    Ziyu Guo, Renrui Zhang, Longtian Qiu,  Xianzheng Ma and 3 more authors
    Proceedings of AAAI Conference 2023

2022

  1. SCIS
    Hetu: A highly efficient automatic parallel distributed deep learning system
    Xupeng Miao, Xiaonan Nie, Hailin Zhang,  Tong Zhao and 1 more author
    Sci. China Inf. Sci. 2022
  2. VLDB
    HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework (Best Scalable Data Science Paper Award)
    Xupeng Miao, Hailin Zhang, Yining Shi,  Xiaonan Nie and 3 more authors
    Proc. VLDB Endow. 2022
  3. VLDB
    Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates
    Fangcheng Fu,  Xupeng Miao, Jiawei Jiang,  Huanran Xue and 1 more author
    Proc. VLDB Endow. 2022
  4. SIGMOD
    HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training
    Xupeng Miao, Yining Shi, Hailin Zhang,  Xin Zhang and 3 more authors
    In Proceedings of SIGMOD Conference 2022
  5. VLDBJ
    P2CG: A Privacy Preserving Collaborative Graph Neural Network Training Framework
    Xupeng Miao, Wentao Zhang, Yuezihan Jiang,  Fangcheng Fu and 5 more authors
    The VLDB Journal 2022
  6. ICDE
    TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting
    Xiaonan Nie,  Xupeng Miao, Zhi Yang,  and Bin Cui
    In Proceedings of ICDE Conference 2022
  7. ICDE
    HET-KG: Communication-Efficient Knowledge Graph Embedding Training via Hotness-Aware Cache
    Sicong Dong1Xupeng Miao1, Pengkai Liu,  Xin Wang and 2 more authors
    In Proceedings of ICDE Conference 2022
  8. ICDE
    Zoomer: Boosting Retrieval on Web-scale Graphs by Regions of Interest
    Yuezihan Jiang, Yu Cheng, Hanyu Zhao,  Wentao Zhang and 5 more authors
    In Proceedings of ICDE Conference 2022
  9. CIKM
    Scalable Graph Sampling on GPUs with Compressed Graph
    Hongbo Yin, Yingxia Shao,  Xupeng Miao,  Yawen Li and 1 more author
    In Proceedings of CIKM Conference 2022
  10. ICDE Poster
    Lasagne: A Multi-Layer Graph Convolutional Network Framework via Node-aware Deep Architecture (Extended Abstract)
    Xupeng Miao, Wentao Zhang, Yingxia Shao,  Bin Cui and 3 more authors
    In Proceedings of ICDE Conference 2022
  11. 软件学报
    Graph Neural Network Training Acceleration over Multi-GPUs
    Xupeng Miao, Yujie Wang, Jia Shen,  Yingxia Shao and 1 more author
    In Journal of Software (Chinese) 2022
  12. arXiv
    HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System
    Xiaonan Nie, Pinxue Zhao,  Xupeng Miao,  Tong Zhao and 1 more author
    arXiv preprint arXiv:2203.14685 2022
  13. CVPR
    PointCLIP: Point Cloud Understanding by CLIP
    Renrui Zhang, Ziyu Guo, Wei Zhang,  Kunchang Li and 5 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022
  14. ICML
    OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
    Youhe Jiang,  Xupeng Miao, Xiaonan Nie,  and Bin Cui
    In Proceedings of ICML Hardware Aware Efficient Training (HAET) Workshop 2022

2021

  1. SIGMOD
    Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce
    Xupeng Miao, Xiaonan Nie, Yingxia Shao,  Zhi Yang and 3 more authors
    In Proceedings of SIGMOD Conference 2021
  2. ICDE Poster
    CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs (Extended Abstract)
    Xupeng Miao, Lingxiao Ma, Zhi Yang,  Yingxia Shao and 3 more authors
    In Proceedings of ICDE Conference 2021
  3. TKDE
    Lasagne: A multi-layer graph convolutional network framework via node-aware deep architecture
    Xupeng Miao, Wentao Zhang, Yingxia Shao,  Bin Cui and 3 more authors
    IEEE Transactions on Knowledge and Data Engineering 2021
  4. SIGKDD
    DeGNN: Improving Graph Neural Networks with Graph Decomposition
    Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang,  Zhichao Han and 16 more authors
    In Proceedings of SIGKDD Conference 2021
  5. SIGKDD
    ROD: Reception-aware Online Distillation for Sparse Graphs
    Wentao Zhang, Yuezihan Jiang, Yang Li,  Zeang Sheng and 5 more authors
    In Proceedings of SIGKDD Conference 2021
  6. VLDBJ
    Memory-aware framework for fast and scalable second-order random walk over billion-edge natural graphs
    Yingxia Shao, Shiyue Huang, Yawen Li,  Xupeng Miao and 2 more authors
    The VLDB Journal 2021
  7. arXiv
    EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
    Xiaonan Nie,  Xupeng Miao, Shijie Cao,  Lingxiao Ma and 6 more authors
    arXiv preprint arXiv:2112.14397 2021

2020

  1. TKDE
    Cuwide: Towards efficient flow-based training for sparse wide models on gpus
    Xupeng Miao, Lingxiao Ma, Zhi Yang,  Yingxia Shao and 3 more authors
    IEEE Transactions on Knowledge and Data Engineering 2020
  2. SIGMOD
    Reliable Data Distillation on Graph Convolutional Network
    Wentao Zhang1Xupeng Miao1, Yingxia Shao,  Jiawei Jiang and 3 more authors
    In Proceedings of SIGMOD 2020
  3. SIGMOD
    Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs
    Yingxia Shao, Shiyue Huang,  Xupeng Miao,  Bin Cui and 1 more author
    In Proceedings of SIGMOD 2020
  4. ICDE
    PSGraph: How Tencent trains extremely large-scale graphs with Spark?
    Jiawei Jiang, Pin Xiao, Lele Yu,  Xiaosen Li and 4 more authors
    In Proceedings of ICDE Conference 2020

2019

  1. SIGMOD
    PS2: Parameter Server on Spark
    Zhipeng Zhang, Bin Cui, Yingxia Shao,  Lele Yu and 2 more authors
    In Proceedings of SIGMOD 2019