| 2025 | A Fast Sparse Triangular Solve for Structured-grid Problems on Heterogeneous Processors. Zhengding Hu, Yi Zong, Jingwei Sun, Wei Xue, Guangzhong Sun |
| 2025 | A High-Accuracy Sketch for Measuring Low-Entropy Flows in Distributed AI Training. Jin Wang, Chenye Zhu, Jinbin Hu |
| 2025 | ADAPT: Dynamic Grouping and Cross-Group Aggregation for GC-Efficient Log-Structured Storage in SSD Arrays. Ruisong Zhou, Peng Wang, Chunhua Li, Ke Zhou, Hui Li |
| 2025 | AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs. Sasindu Wijeratne, Rajgopal Kannan, Viktor K. Prasanna |
| 2025 | Accelerating Erasure Coding on Persistent Memory via Adaptive Prefetcher Scheduling. Guanglei Xu, Hai Zhou, Yuchong Hu, Dan Feng, Renzhi Xiao |
| 2025 | Accelerating Multi-Output GBDTs with GPUs. Hanfeng Liu, Xuemei Peng, Zeyi Wen |
| 2025 | Accelerating an Electromagnetic Simulation via Memory-Constrained Task-Based Load Balancing. Jonathan Lifflander, Nicole Slattengren, Philippe P. Pébay, Pierre L. Pebay, Caleb Schilly, Robert A. Pfeiffer, Joseph D. Kotulski |
| 2025 | Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning. Waylon Luo, Jiapeng Zhao, Tong Zhan, Qiang Guan |
| 2025 | Amber: Towards Fast and Space-Efficient Incremental Checkpointing in Large Language Model Training. Zhiqiang Wang, Wenzhe Zhu, Zaigui Zhang, Chaomei Yan, Fan Guo, Yongkun Li, Yinlong Xu |
| 2025 | Architecture-Aware Models of AI Engines for High-Performance Matrix Matrix Multiplication. Elliott D. Binder, Jeffrey Low, Tze Meng Low |
| 2025 | Auto-Stencil: Performance-Driven Stencil Optimization with Hardware Feedback for LLMs. Quan Deng, Lin Gan, Hongkun Yu, Wenlai Zhao, Guangwen Yang |
| 2025 | Automated FPGA Accelerator Generation Framework for Transformers with Dataflow Optimization. Wenqi Lou, Yunji Qin, Zihao Wang, Chao Wang, Lei Gong, Xuehai Zhou |
| 2025 | BMapper: A Scalable and Efficient Framework for Brain Simulations Acceleration on Supercomputers. Yubing Bao, Zhihui Lu, Qiang Duan, Xin Du, Zhongyu Chen, Yicong Zhao, Xiaoyi Li, Yandan Tan, Shuhan Yang, Ziyi Wang, Yang Chen, Yang Xu |
| 2025 | Bridging Cache-Friendliness and Concurrency: A Locality-Optimized In-Memory B-Skiplist. Yicong Luo, Senhe Hao, Brian Wheatman, Prashant Pandey, Helen Xu |
| 2025 | COF: Cycle and transmission co-mapping framework for CNN mapping in PIM architecture. Xianfa Zhou, Tun Li, Yuhuan Xia, Ruiyu Zhang |
| 2025 | Carbon-Aware Workflow Scheduling with Fixed Mapping and Deadline Constraint. Dominik Schweisgut, Anne Benoit, Yves Robert, Henning Meyerhenke |
| 2025 | CompreGel: Efficient Distributed Graph Propagation via Error-Bounded Lossy Message Compression. Tianhao Wu, Da Yan, Qihao Cheng, Lyuheng Yuan, Sheng Di, Jiao Han, Zhongyi Huang, Ji Cheng |
| 2025 | CoreTuner: Predicting and Scheduling Framework for Optimizing the Joint Allocation of CPU and GPU in Training Cluster. Hao Dong, Yuehao Xu, Xiaohui Wang, Xinhua Ji, Zhijun Ding |
| 2025 | Cross-Architecture Performance Analysis Using the RAJA Performance Suite. Dewi Yokelson, Stephanie Brink, Jason Burmark, Michael McKinsey, Befikir Bogale, Ian Lumsden, Michela Taufer, Tom Scogland, Olga Pearce |
| 2025 | Cycle-Aware Parallel Optimization for Mitigating ZZ Crosstalk on Quantum Hardware. Jiayi Zhong, Yuxin Deng |
| 2025 | Deadline-Aware Scheduling of Mixed-Criticality Tasks. Maxime Gonthier, Kyle Chard, Ian T. Foster, Loris Marchal, Frédéric Vivien |
| 2025 | Decision Shuffle: Efficient Pre-scheduling System for Push-based Shuffle in DAG Computing Frameworks. Shihao Zhang, Chi Zhang, Chentao Wu, Jie Li, Minyi Guo, Hui Li, Liqiang Zhang |
| 2025 | Design and Optimization of GPU-Aware MPI Allreduce Using Direct Sendrecv Communication. Chen-Chun Chen, Jinghan Yao, Hari Subramoni, Dhabaleswar K. Panda |
| 2025 | Design of Interposer Interconnection Network Based on High-Radix Interposer Routers. Xue Xiao, Yi Dai, Yanqiang Sun, Jianmin Zhang, Tiejun Li |
| 2025 | ESC: Effective Submanifold Convolution using Tensor Cores. Xuezhu Wang, Hailong Yang, Xin You, Yufan Xu, Xiaoyan Liu, Siqi Wang, Kaige Zhang, Mingzhen Li, Zhongzhi Luan, Yi Liu, Depei Qian |
| 2025 | Efficient Construction of Large Search Spaces for Auto-Tuning. Floris-Jan Willemsen, Rob V. van Nieuwpoort, Ben van Werkhoven |
| 2025 | Efficient Cross-Datacenter Congestion Control with Fast Control Loops. Baosen Zhao, Jianan Sun, Xu Zhou, Wanghong Yang, Wenji Du, Fukang Chen, Yongmao Ren, Stefan Schmid |
| 2025 | Efficient Parallel Algorithms for Dynamic Percolation Centrality. Prajjwal Nijhara, Lokesh Venkatachalam, Agam Harpreet Singh, Athreya Chandramouli, Sayantan Jana, Kishore Kothapalli, Dip Sankar Banerjee |
| 2025 | FLEX: Leveraging FPGA-CPU Synergy for Mixed-Cell-Height Legalization Acceleration. Xingyu Liu, Jiawei Liang, Linfeng Du, Yipu Zhang, Chaofang Ma, Hanwei Fan, Jiang Xu, Wei Zhang |
| 2025 | Fast Exact Diameter Computation of Sparse Graphs. Cameron Bradley, Anju Mongandampulath Akathoott, Martin Burtscher |
| 2025 | Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores. Brian Curless, Michael Gowanlock |
| 2025 | FedWCM: Unleashing the Potential of Momentum-based Federated Learning in Long-Tailed Scenarios. Tianle Li, Yongzhi Huang, Linshan Jiang, Qipeng Xie, Chang Liu, Wenfeng Du, Lu Wang, Kaishun Wu |
| 2025 | HHOTuner: Efficient Performance Tuning with Harris Hawks Optimization. Akash Dutta, Ali Jannesari |
| 2025 | HMGraph: Boosting GNN Training on Hierarchical Memory via Coordinated Cache. Lizhi Zhang, Menghan Jia, Zhiquan Lai, Qiao Li, Yiming Zhang, Dongsheng Li |
| 2025 | HeatList: The Case for Retrofitting In-memory Range Index with Hotspot Awareness. Junru Shen, Miao Cai, Kangyue Gao, Baoliu Ye, Guo Cheng |
| 2025 | Heterogeneity-aware Federated Edge Learning via UAV Sampling and D2D Communications. Yanfeng Lu, Tao Wu, Chao Chang, Hongjun Wang, Mingxing Ke, Jian Wang |
| 2025 | Heterogeneity-aware Task Scheduling based on Personalized Federated Reinforcement Learning. Xin Yong, Li Yan, Zhuozhao Li |
| 2025 | IRIS-MASH: Efficient Multi-device Asynchronous Multi-Stream Heterogeneous Computing. Narasinga Rao Miniskar, Aaron R. Young, Mohammad Alaul Haque Monil, Kazi Asifuzzaman, Beau Johnston, Keita Teranishi, Jeffrey S. Vetter |
| 2025 | It Takes Two: Accelerating Accurate Federated Learning through Pipelined Intra-Batch Data Sampling and Training. Chenghao Nu, Zhe Zhang, Ye Li, Yanchao Zhao |
| 2025 | Joint Prediction and Matching for Computing Resource Exchange Platforms. Da Huo, Zhenzhe Zheng, Xiaoyao Huang, Hao Chen, Jianfeng Hu, Zhiyong Yan, Fan Wu, Jie Wu |
| 2025 | Joint Task Scheduling and Resource Allocation in Cloud-Edge Collaborative Computing Systems. Boyu Du, Jingya Zhou, Jin Wang, Jiangwei Wang, Zhijun Li |
| 2025 | LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling. Zhongchun Zhou, Chengtao Lai, Wei Zhang |
| 2025 | Leave No One Behind: Fair and Efficient Tiered Memory Management for Multi-Applications. Wenda Tang, Yiduo Wang, Yanwen Wang, Jie Wu |
| 2025 | Lias: Leveraging Performance Counters for Interference Quantification and Mitigation in Multi-processor Systems. Yangfan Qiao, Zhuozhao Li |
| 2025 | MixLoRA: An Efficient Multi-Tenant Framework for Concurrently Serving Diverse LoRA Models in Large Language Models. Ronghuai Chen, Ce Yu, Hao Fu, Xiaoteng Hu, Bin Yang |
| 2025 | Multiprocessor Scheduling with Memory Constraints: Fundamental Properties and Finding Optimal Solutions. Pál András Papp, Toni Böhnlein, Albert-Jan Nicholas Yzelman |
| 2025 | OVERT: Orchestrating Vector-Scalar Execution for Efficient SpMV on Modern CPUs. Kelun Lei, Hailong Yang, Kaige Zhang, Shaokang Du, Marc Casas, Yufan Xu, Zhongzhi Luan, Yi Liu, Depei Qian |
| 2025 | One GPU, Many Ranks: Enabling Performance and Energy-Efficient In-Transit Visualization via Resource Sharing. Matheus Costa, Philippe O. A. Navaux, Silvio Rizzi, Arthur Francisco Lorenzon |
| 2025 | Optimizing Direct Convolutions on High-Performance Multi-Core DSPs. Pengyu Wang, Xiaotian Chen, Jianbin Fang, Peng Zhang, Yonggang Che, Chun Huang, Jie Ren |
| 2025 | Optimizing Incomplete Cholesky Factorization on MIMD Many-core Architecture. Yongzhen Shi, Qinglin Wang, Jie Liu, Lian Wang, Zhiyan Liu, Bingwei Wang, Feiming Liu, Xiangdong Pei |
| 2025 | Optimizing NumPy with SVE Acceleration on ARM Architectures. Kuldeep Pal, Aniket P. Garade, Deepika H. V, Haribabu P, S. A. Kumar, S. D. Sudarsan |
| 2025 | Origami: Efficient ML-Driven Metadata Load Balancing for Distributed File Systems. Yiduo Wang, Wenda Tang, Linghang Meng, Liang Li, Jie Wu |
| 2025 | P3P-Fed: Peer-to-Peer Personalized Federated Learning with DHT-based Local Clustering. Sooho Jang, Ahyeon Lim, Yuchan Lee, Sookwang Lee, Jaehwan Lee |
| 2025 | PISCES: Push-Pull Hybrid Optimization for Graph Pattern Matching. Changjie Xu, Ke Meng, Zhiheng Lin, Guangming Tan |
| 2025 | PTWalker: Cache-Efficient Random Walks via Alternating Dual-Subgraph Walker Updating. Shuai Lin, Rui Wang, Zaigui Zhang, Long Deng, Wenzhe Zhu, Yongkun Li, Yinlong Xu |
| 2025 | ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks. Joshua Hoke Davis, Daniel Nichols, Ishan Khillan, Abhinav Bhatele |
| 2025 | ParaCOSM: A Parallel Framework for Continuous Subgraph Matching. Haibin Lai, Sicheng Zhou, Site Fan, Zhuozhao Li |
| 2025 | Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision. Evelyne Ringoot, Rabab Alomairy, Valentin Churavy, Alan Edelman |
| 2025 | Pisces: Towards Adaptive and Fair Congestion Control via Multi-Agent Meta-Reinforcement Learning. He Bai, Hui Li, Jianming Que, Minglong Zhang, Zhiqiang Hu, Ximing Xu, Bing Lin, Runhuai Huang, Junyang Qiu, Shaowen Deng |
| 2025 | Power Capping of GPU Servers for Machine Learning Inference Optimization. Yuan Ma, Srinivasan Subramaniyan, Xiaorui Wang |
| 2025 | Proceedings of the 54th International Conference on Parallel Processing, ICPP 2025, San Diego, CA, USA, September 8-11, 2025 |
| 2025 | Q-GEAR: Improving quantum simulation framework. Ziqing Guo, Jan Balewski, Ziwen Pan |
| 2025 | Revisiting Multi-threaded Compaction in LSM-trees: Enabling Compaction Pipelining. Hongsu Byun, Honghyeon Yoo, Sungyong Park |
| 2025 | SINA: Accelerating Time Synchronization in Large-Scale Network Simulation Using In-Network Allreduce. Dinghuang Hu, Dezun Dong, Xiangke Liao |
| 2025 | SYgraph: A Portable Heterogeneous Graph Analytics Framework for GPUs. Antonio De Caro, Gennaro Cordasco, Biagio Cosenza |
| 2025 | Scaling Distributed Graph Processing to Hundreds of GPUs. George M. Slota, Michael Mandulak |
| 2025 | Scheduling based on Block Features for Concurrent Inference with Unseen DNN Models on GPU. Diaohan Luo, Zhen Tang, Heran Gao, Yuewen Wu, Heng Wu, Xi Han, Wenbo Zhang |
| 2025 | SmartBlock: Adaptive Block Floating Point Quantization for Efficient DNN Acceleration. Xin Ju, Jingkui Yang, Mei Wen, Jun He, Jing Feng, Minjin Tang, Zhaoyun Chen, Yang Shi |
| 2025 | Solving Extended Flexible Job Shop Scheduling Problems with Deep Reinforcement Learning. Haonan Jiang, Yusen Li, Xiaoguang Liu, Gang Wang, Xuebo Zhang |
| 2025 | SpeedSketch: An Ultra-Fast Sketch Generation and Delta Encoding Framework for Delta Compression. Fengkui Yang, Yuanzhang Wang, Chunhua Li, Ke Zhou, Hui Li |
| 2025 | SpiderCache: Semantic-Aware Caching Strategy for DNN Training. Zesong Wang, Peng Fang, Fang Wang, Hong Jiang, Yimin Lu, Zhan Shi, Dan Feng |
| 2025 | TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks. Ziji Shi, Le Jiang, Ang Wang, Jie Zhang, Chencan Wu, Yong Li, Xiaokui Xiao, Wei Lin, Jialin Li |
| 2025 | TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference. Hongbin Zhang, Taosheng Wei, Zhenyi Zheng, Jiangsu Du, Zhiguang Chen, Yutong Lu |
| 2025 | Thievory: Graph Processing with Multi-GPU Memory Stealing. João Brotas, Ricardo Nobre, Aleksandar Ilic |
| 2025 | VES: Vectorized Sparse General Matrix-Matrix Multiplication on Multi-Core DSPs. Chuhe Hong, Qinglin Wang, Xing Peng, Gencheng Liu, Qingyang Zhang, Xinhai Chen, Jie Liu |
| 2025 | ViReC: The Virtual Register Context Architecture for Efficient Near-Memory Multithreading. Matthew Barondeau, Sophia Jiang, Jonathan Beard, Andreas Gerstlauer |
| 2025 | WinRS: Accelerate Winograd Backward-Filter Convolution with Tiny Workspace. Zhiyi Zhang, Junshi Chen, Jingwei Sun, Pengfei Zhang, Zhuopin Xu, Jun Shi, Qi Wang |
| 2025 | ZTP: A Scalable and Lightweight Privacy-Preserving Blockchain via Scale-Free Quorums and Geometric Fragmentation. Abdullah Al-Mamun, Dongfang Zhao, Gagan Agrawal, Ahmed Aleroud, Mohamed I. Ibrahem |
| 2025 | pyGinkgo: A Sparse Linear Algebra Operator Framework for Python. Keshvi Tuteja, Gregor Olenik, Roman Mishchuk, Yu-Hsiang Tsai, Markus Götz, Achim Streit, Hartwig Anzt, Charlotte Debus |