| 2025 | A General and Scalable GCN Training Framework on CPU Supercomputers. Chen Zhuang, Peng Chen, Xin Liu, Rio Yokota, Nikoli Dryden, Lingqi Zhang, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib |
| 2025 | AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations. Fulin Nan, Ronglong Wu, Zhirong Shen, Jiahui Yang, Li Cheng, Zheng Chen, Yiming Zhang, Jiwu Shu |
| 2025 | ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training. Yuhang Liang, Xinyi Li, Jie Ren, Ang Li, Bo Fang, Jieyang Chen |
| 2025 | Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores. Haisha Zhao, San Li, Jiaheng Wang, Chunbao Zhou, Jue Wang, Zhikuang Xin, Shunde Li, Zhiqiang Liang, Zhijie Pan, Fang Liu, Yan Zeng, Yangang Wang, Xuebin Chi |
| 2025 | Accelerating GNNs on GPU Sparse Tensor Cores through N: M Sparsity-Oriented Graph Reordering. Jou-An Chen, Hsin-Hsuan Sung, Ruifeng Zhang, Ang Li, Xipeng Shen |
| 2025 | Adaptive Parallel Training for Graph Neural Networks. Kaihao Ma, Renjie Liu, Xiao Yan, Zhenkun Cai, Xiang Song, Minjie Wang, Yichao Li, James Cheng |
| 2025 | Aggregating Funnels for Faster Fetch&Add and Queues. Younghun Roh, Yuanhao Wei, Eric Ruppert, Panagiota Fatourou, Siddhartha Jayanti, Julian Shun |
| 2025 | An AI-Enhanced 1km-Resolution Seamless Global Weather and Climate Model to Achieve Year-Scale Simulation Speed using 34 Million Cores. Xiaohui Duan, Yi Zhang, Kai Xu, Haohuan Fu, Bin Yang, Yiming Wang, Yilun Han, Siyuan Chen, Zhuangzhuang Zhou, Chenyu Wang, Dongqiang Huang, Huihai An, Xiting Ju, Haopeng Huang, Zhuang Liu, Wei Xue, Weiguo Liu, Bowen Yan, Jianye Hou, Maoxue Yu, Wenguang Chen, Jian Li, Zhao Jing, Hailong Liu, Lixin Wu |
| 2025 | Balanced Allocations over Efficient Queues: A Fast Relaxed FIFO Queue. Kåre von Geijer, Philippas Tsigas, Elias Johansson, Sebastian Hermansson |
| 2025 | BerryBees: Breadth First Search by Bit-Tensor-Cores. Yuyao Niu, Marc Casas |
| 2025 | Big Atomics and Fast Hash Tables. Daniel Anderson, Guy E. Blelloch, Siddhartha V. Jayanti |
| 2025 | Boost Lock-free Queue and Stack with Batching. Ao Li, WenHai Li, Yuan Chen, Lingfeng Deng |
| 2025 | COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers. Baixi Sun, Weijin Liu, J. Gregory Pauloski, Jiannan Tian, Jinda Jia, Daoce Wang, Boyuan Zhang, Mingkai Zheng, Sheng Di, Sian Jin, Zhao Zhang, Xiaodong Yu, Kamil A. Iskra, Pete Beckman, Guangming Tan, Dingwen Tao |
| 2025 | Crystality: A Programming Model for Smart Contracts on Parallel EVMs. Hao Wang, Minghao Pan, Jiaping Wang |
| 2025 | DORADD: Deterministic Parallel Execution in the Era of Microsecond-Scale Computing. Zhengqing Liu, Musa Unal, Matthew J. Parkinson, Marios Kogias |
| 2025 | EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUs. Anna Yue, Pen-Chung Yew, Sanyam Mehta |
| 2025 | Effectively Virtual Page Prefetching via Spatial-Temporal Patterns for Memory-intensive Cloud Applications. Yun Wang, Liang Chen, Tianmai Deng, Ben Luo, Yibin Shen, Zhixiang Wei, Yixiao Xu, Minglang Huang, Zhengwei Qi |
| 2025 | Fairer and More Scalable Reader-Writer Locks by Optimizing Queue Management. Takashi Hoshino, Kenjiro Taura |
| 2025 | FastBWA: Practical and Cost-Efficient Genome Sequence Alignment Pipeline. Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, Guangming Tan |
| 2025 | FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units. Haozhi Han, Kun Li, Wei Cui, Donglin Bai, Yiwei Zhang, Liang Yuan, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang |
| 2025 | FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores. Jinliang Shi, Shigang Li, Youxuan Xu, Rongtian Fu, Xueying Wang, Tong Wu |
| 2025 | FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property. Runxin Zhong, Yuyang Jin, Chen Zhang, Kinman Lei, Shuangyu Li, Jidong Zhai |
| 2025 | Frontier-guided Graph Reordering. Xinmiao Zhang, Cheng Liu, Shengwen Liang, Chenwei Xiong, Yu Zhang, Lei Zhang, Huawei Li, Xiaowei Li |
| 2025 | GLumin: Fast Connectivity Check Based on LUTs For Efficient Graph Pattern Mining. Weichen Cao, Ke Meng, Zhiheng Lin, Guangming Tan |
| 2025 | Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion. Hulin Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng |
| 2025 | Helios: Efficient Distributed Dynamic Graph Sampling for Online GNN Inference. Jie Sun, Zuocheng Shi, Li Su, Wenting Shen, Zeke Wang, Yong Li, Wenyuan Yu, Wei Lin, Fei Wu, Bingsheng He, Jingren Zhou |
| 2025 | High-performance Visual Semantics Compression for AI-Driven Science. Boyuan Zhang, Luanzheng Guo, Jiannan Tian, Jinyang Liu, Daoce Wang, Fanjiang Ye, Chengming Zhang, Jan Strube, Nathan R. Tallent, Dingwen Tao |
| 2025 | Improving Tridiagonalization Performance on GPU Architectures. Hansheng Wang, Zhekai Duan, Zitian Zhao, Siqi Wu, Saiqi Zheng, Qiao Li, Xu Jiang, Shaoshuai Zhang |
| 2025 | Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers. Yiwei Zhang, Kun Li, Liang Yuan, Haozhi Han, Yunquan Zhang, Ting Cao, Mao Yang |
| 2025 | LibRTS: A Spatial Indexing Library by Ray Tracing. Liang Geng, Rubao Lee, Xiaodong Zhang |
| 2025 | MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models. Elias Frantar, Roberto L. Castro, Jiale Chen, Torsten Hoefler, Dan Alistarh |
| 2025 | Magneto: Accelerating Parallel Structures in DNNs via Co-Optimization of Operators. Zhanyuan Di, Leping Wang, Ziyi Ren, En Shao, Jie Zhao, Siyuan Feng, Dingwen Tao, Guangming Tan, Ninghui Sun |
| 2025 | Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism. Weijian Liu, Mingzhen Li, Guangming Tan, Weile Jia |
| 2025 | Minimizing speculation overhead in a parallel recognizer for regular texts. Angelo Borsotti, Luca Breveglieri, Angelo Morzenti, Stefano Crespi-Reghizzi |
| 2025 | PANNS: Enhancing Graph-based Approximate Nearest Neighbor Search through Recency-aware Construction and Parameterized Search. Xizhe Yin, Chao Gao, Zhijia Zhao, Rajiv Gupta |
| 2025 | Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra. Julian Bellavita, Thomas Pasquali, Laura Del Rio Martin, Flavio Vella, Giulia Guidi |
| 2025 | Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2025, Las Vegas, NV, USA, March 1-5, 2025 |
| 2025 | Publish on Ping: A Better Way to Publish Reservations in Memory Reclamation for Concurrent Data Structures. Ajay Singh, Trevor Brown |
| 2025 | RT-BarnesHut: Accelerating Barnes-Hut Using Ray-Tracing Hardware. Vani Nagarajan, Rohan Gangaraju, Kirshanthan Sundararajah, Artem Pelenitsyn, Milind Kulkarni |
| 2025 | Reciprocating Locks. Dave Dice, Alex Kogan |
| 2025 | SBMGT: Scaling Bayesian Multinomial Group Testing. Weicong Chen, Hao Qi, Curtis Tatsuoka, Xiaoyi Lu |
| 2025 | SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs. Yongkang Zhang, Haoxuan Yu, Chenxia Han, Cheng Wang, Baotong Lu, Yunzhe Li, Zhifeng Jiang, Yang Li, Xiaowen Chu, Huaicheng Li |
| 2025 | Semi-StructMG: A Fast and Scalable Semi-Structured Algebraic Multigrid. Yi Zong, Chensong Zhang, Longjiang Mu, Jianchun Wang, Jian Sun, Xiaowen Xu, Xinliang Wang, Peinan Yu, Wei Xue |
| 2025 | Setting a Course for Post-Moore Software Performance. Charles E. Leiserson |
| 2025 | Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm. Zhibin Wang, Xi Lin, Xue Li, Pinhuan Wang, Ziheng Meng, Hang Liu, Chen Tian, Sheng Zhong |
| 2025 | TensorMD: Molecular Dynamics Simulation with Ab Initio Accuracy of 50 Billion Atoms. Yucheng Ouyang, Ying Liu, Honghui Shang, Zhenchuan Chen, Jiahao Shan, Huimin Cui, Xiaobing Feng, Xin Chen, Xingyu Gao, Lifang Wang, Haifeng Song, Rongfen Lin, Fang Li |
| 2025 | Transactional Data Structures with Orthogonal Metadata. Yaodong Sheng, Ahmed Hassan, Michael F. Spear |
| 2025 | Triangle Counting on Tensor Cores. Yuang Chen, Jeffrey Xu Yu |
| 2025 | TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs. Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Huangliang Dai, Sheng Di, Franck Cappello, Zizhong Chen |
| 2025 | WaterWise: Co-optimizing Carbon- and Water-Footprint Toward Environmentally Sustainable Cloud Computing. Yankai Jiang, Rohan Basu Roy, Raghavendra Kanakagiri, Devesh Tiwari |
| 2025 | WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training. Junfeng Lin, Ziming Liu, Yang You, Jun Wang, Weihao Zhang, Rong Zhao |