publications

2026

Preprint

FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving

Wenyan Chen, Chengzhi Lu, Yanying Lin, and Dmitrii Ustiugov

Preprint, 2026

DOI PDF
Preprint

CodecSight: Leveraging Video Codec Signals for Efficient Streaming VLM Inference

Yulin Zou, Yan Chen, Wenyan Chen, JooYoung Park, Shivaraman Nitin, Tao Luo, and 2 more authors

Preprint, 2026

DOI PDF
EuroSys

High Throughput and Low Latency LLM Serving via Adaptive KV Caching

Wenyan Chen, Chengzhi Lu, Huanle Xu, Kejiang Ye, and Chengzhong Xu

In proceedings of European Conference on Computer Systems, 2026

DOI PDF Slides
INFOCOM

FedSUV: Validity and Utility-guided Client Selection for Federated Learning

Xiaosong Chen, Wenyan Chen, Yuanhang Chen, and Huanle Xu

In proceedings of IEEE International Conference on Computer Communications, 2026

2025

SoCC

FedDance: Efficient Participant Selection for Federated Learning in Highly Dynamic Environments

Yuanhang Chen, Xiaosong Chen, Wenyan Chen, and Huanle Xu

In proceedings of the annual ACM Symposium on Cloud Computing, 2025

DOI PDF
EuroSys

Multiplexing Dynamic Deep Learning Workloads with SLO-awareness in GPU Clusters

Wenyan Chen, Chengzhi Lu, Huanle Xu, Kejiang Ye, and Chengzhong Xu

In proceedings of European Conference on Computer Systems, 2025

DOI PDF Slides

2024

SC

SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing

Chengzhi Lu, Huanle Xu, Yudan Li, Wenyan Chen, Kejiang Ye, and Chengzhong Xu

In proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2024

DOI PDF
CSCWD

EINS: Edge-Cloud Deep Model Inference with Network-Efficiency Schedule in Serverless

Shijie Peng, Yanying Lin, Wenyan Chen, Yingfei Tang, Xu Duan, and Kejiang Ye

In proceedings of the International Conference on Computer Supported Cooperative Work in Design, 2024

DOI PDF

2023

SC

Interference-aware multiplexing for deep learning in gpu clusters: A middleware approach

Wenyan Chen, Zizhao Mo, Huanle Xu, Kejiang Ye, and Chengzhong Xu

In proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2023

DOI PDF Code Slides

2021

CLUSTER

Rptcn: Resource prediction for high-dynamic workloads in clouds based on deep learning

Wenyan Chen, Chengzhi Lu, Kejiang Ye, Yang Wang, and Chengzhong Xu

In proceedings of IEEE International Conference on Cluster Computing, 2021

DOI PDF

2020

HPCC

HySync: Hybrid federated learning with effective synchronization

Guomei Shi, Li Li, Jun Wang, Wenyan Chen, Kejiang Ye, and Chengzhong Xu

In proceedings of IEEE International Conference on High Performance Computing & Communications, 2020

DOI PDF
HDIS

Understanding the workload characteristics in alibaba: A view from directed acyclic graph analysis

Chengzhi Lu, Wenyan Chen, Kejiang Ye, and Chengzhong Xu

In proceedings of International Conference on High Performance Big Data and Intelligent Systems, 2020

DOI PDF
JCST

Interference analysis of co-located container workloads: a perspective from hardware performance counters

Wenyan Chen, Kejiang Ye, Chengzhi Lu, Dongdai Zhou, and Chengzhong Xu

Journal of Computer science and Technology, 2020

DOI PDF

2019

HPCC

Co-locating online workload and offline workload in the cloud: An interference analysis

Wenyan Chen, Kejiang Ye, and Chengzhong Xu

In proceedings of IEEE International Conference on High Performance Computing & Communications, 2019

DOI PDF
ICPADS

Adgs: Anomaly detection and localization based on graph similarity in container-based clouds

Chengzhi Lu, Kejiang Ye, Wenyan Chen, and Chengzhong Xu

In proceedings of the IEEE International Conference on Parallel and Distributed Systems, 2019

DOI PDF

2018

ICPADS

How does the workload look like in production cloud? analysis and clustering of workloads on alibaba cluster trace

Wenyan Chen, Kejiang Ye, Yang Wang, Guoyao Xu, and Chengzhong Xu

In proceedings of the IEEE International Conference on Parallel and Distributed Systems, 2018

DOI PDF