PPoPP B

51 papers

YearTitle / Authors
2025A General and Scalable GCN Training Framework on CPU Supercomputers.
Chen Zhuang, Peng Chen, Xin Liu, Rio Yokota, Nikoli Dryden, Lingqi Zhang, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib
2025AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations.
Fulin Nan, Ronglong Wu, Zhirong Shen, Jiahui Yang, Li Cheng, Zheng Chen, Yiming Zhang, Jiwu Shu
2025ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training.
Yuhang Liang, Xinyi Li, Jie Ren, Ang Li, Bo Fang, Jieyang Chen
2025Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.
Haisha Zhao, San Li, Jiaheng Wang, Chunbao Zhou, Jue Wang, Zhikuang Xin, Shunde Li, Zhiqiang Liang, Zhijie Pan, Fang Liu, Yan Zeng, Yangang Wang, Xuebin Chi
2025Accelerating GNNs on GPU Sparse Tensor Cores through N: M Sparsity-Oriented Graph Reordering.
Jou-An Chen, Hsin-Hsuan Sung, Ruifeng Zhang, Ang Li, Xipeng Shen
2025Adaptive Parallel Training for Graph Neural Networks.
Kaihao Ma, Renjie Liu, Xiao Yan, Zhenkun Cai, Xiang Song, Minjie Wang, Yichao Li, James Cheng
2025Aggregating Funnels for Faster Fetch&Add and Queues.
Younghun Roh, Yuanhao Wei, Eric Ruppert, Panagiota Fatourou, Siddhartha Jayanti, Julian Shun
2025An AI-Enhanced 1km-Resolution Seamless Global Weather and Climate Model to Achieve Year-Scale Simulation Speed using 34 Million Cores.
Xiaohui Duan, Yi Zhang, Kai Xu, Haohuan Fu, Bin Yang, Yiming Wang, Yilun Han, Siyuan Chen, Zhuangzhuang Zhou, Chenyu Wang, Dongqiang Huang, Huihai An, Xiting Ju, Haopeng Huang, Zhuang Liu, Wei Xue, Weiguo Liu, Bowen Yan, Jianye Hou, Maoxue Yu, Wenguang Chen, Jian Li, Zhao Jing, Hailong Liu, Lixin Wu
2025Balanced Allocations over Efficient Queues: A Fast Relaxed FIFO Queue.
Kåre von Geijer, Philippas Tsigas, Elias Johansson, Sebastian Hermansson
2025BerryBees: Breadth First Search by Bit-Tensor-Cores.
Yuyao Niu, Marc Casas
2025Big Atomics and Fast Hash Tables.
Daniel Anderson, Guy E. Blelloch, Siddhartha V. Jayanti
2025Boost Lock-free Queue and Stack with Batching.
Ao Li, WenHai Li, Yuan Chen, Lingfeng Deng
2025COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers.
Baixi Sun, Weijin Liu, J. Gregory Pauloski, Jiannan Tian, Jinda Jia, Daoce Wang, Boyuan Zhang, Mingkai Zheng, Sheng Di, Sian Jin, Zhao Zhang, Xiaodong Yu, Kamil A. Iskra, Pete Beckman, Guangming Tan, Dingwen Tao
2025Crystality: A Programming Model for Smart Contracts on Parallel EVMs.
Hao Wang, Minghao Pan, Jiaping Wang
2025DORADD: Deterministic Parallel Execution in the Era of Microsecond-Scale Computing.
Zhengqing Liu, Musa Unal, Matthew J. Parkinson, Marios Kogias
2025EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUs.
Anna Yue, Pen-Chung Yew, Sanyam Mehta
2025Effectively Virtual Page Prefetching via Spatial-Temporal Patterns for Memory-intensive Cloud Applications.
Yun Wang, Liang Chen, Tianmai Deng, Ben Luo, Yibin Shen, Zhixiang Wei, Yixiao Xu, Minglang Huang, Zhengwei Qi
2025Fairer and More Scalable Reader-Writer Locks by Optimizing Queue Management.
Takashi Hoshino, Kenjiro Taura
2025FastBWA: Practical and Cost-Efficient Genome Sequence Alignment Pipeline.
Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, Guangming Tan
2025FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units.
Haozhi Han, Kun Li, Wei Cui, Donglin Bai, Yiwei Zhang, Liang Yuan, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang
2025FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores.
Jinliang Shi, Shigang Li, Youxuan Xu, Rongtian Fu, Xueying Wang, Tong Wu
2025FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property.
Runxin Zhong, Yuyang Jin, Chen Zhang, Kinman Lei, Shuangyu Li, Jidong Zhai
2025Frontier-guided Graph Reordering.
Xinmiao Zhang, Cheng Liu, Shengwen Liang, Chenwei Xiong, Yu Zhang, Lei Zhang, Huawei Li, Xiaowei Li
2025GLumin: Fast Connectivity Check Based on LUTs For Efficient Graph Pattern Mining.
Weichen Cao, Ke Meng, Zhiheng Lin, Guangming Tan
2025Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion.
Hulin Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng
2025Helios: Efficient Distributed Dynamic Graph Sampling for Online GNN Inference.
Jie Sun, Zuocheng Shi, Li Su, Wenting Shen, Zeke Wang, Yong Li, Wenyuan Yu, Wei Lin, Fei Wu, Bingsheng He, Jingren Zhou
2025High-performance Visual Semantics Compression for AI-Driven Science.
Boyuan Zhang, Luanzheng Guo, Jiannan Tian, Jinyang Liu, Daoce Wang, Fanjiang Ye, Chengming Zhang, Jan Strube, Nathan R. Tallent, Dingwen Tao
2025Improving Tridiagonalization Performance on GPU Architectures.
Hansheng Wang, Zhekai Duan, Zitian Zhao, Siqi Wu, Saiqi Zheng, Qiao Li, Xu Jiang, Shaoshuai Zhang
2025Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers.
Yiwei Zhang, Kun Li, Liang Yuan, Haozhi Han, Yunquan Zhang, Ting Cao, Mao Yang
2025LibRTS: A Spatial Indexing Library by Ray Tracing.
Liang Geng, Rubao Lee, Xiaodong Zhang
2025MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.
Elias Frantar, Roberto L. Castro, Jiale Chen, Torsten Hoefler, Dan Alistarh
2025Magneto: Accelerating Parallel Structures in DNNs via Co-Optimization of Operators.
Zhanyuan Di, Leping Wang, Ziyi Ren, En Shao, Jie Zhao, Siyuan Feng, Dingwen Tao, Guangming Tan, Ninghui Sun
2025Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism.
Weijian Liu, Mingzhen Li, Guangming Tan, Weile Jia
2025Minimizing speculation overhead in a parallel recognizer for regular texts.
Angelo Borsotti, Luca Breveglieri, Angelo Morzenti, Stefano Crespi-Reghizzi
2025PANNS: Enhancing Graph-based Approximate Nearest Neighbor Search through Recency-aware Construction and Parameterized Search.
Xizhe Yin, Chao Gao, Zhijia Zhao, Rajiv Gupta
2025Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra.
Julian Bellavita, Thomas Pasquali, Laura Del Rio Martin, Flavio Vella, Giulia Guidi
2025Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2025, Las Vegas, NV, USA, March 1-5, 2025
2025Publish on Ping: A Better Way to Publish Reservations in Memory Reclamation for Concurrent Data Structures.
Ajay Singh, Trevor Brown
2025RT-BarnesHut: Accelerating Barnes-Hut Using Ray-Tracing Hardware.
Vani Nagarajan, Rohan Gangaraju, Kirshanthan Sundararajah, Artem Pelenitsyn, Milind Kulkarni
2025Reciprocating Locks.
Dave Dice, Alex Kogan
2025SBMGT: Scaling Bayesian Multinomial Group Testing.
Weicong Chen, Hao Qi, Curtis Tatsuoka, Xiaoyi Lu
2025SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs.
Yongkang Zhang, Haoxuan Yu, Chenxia Han, Cheng Wang, Baotong Lu, Yunzhe Li, Zhifeng Jiang, Yang Li, Xiaowen Chu, Huaicheng Li
2025Semi-StructMG: A Fast and Scalable Semi-Structured Algebraic Multigrid.
Yi Zong, Chensong Zhang, Longjiang Mu, Jianchun Wang, Jian Sun, Xiaowen Xu, Xinliang Wang, Peinan Yu, Wei Xue
2025Setting a Course for Post-Moore Software Performance.
Charles E. Leiserson
2025Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm.
Zhibin Wang, Xi Lin, Xue Li, Pinhuan Wang, Ziheng Meng, Hang Liu, Chen Tian, Sheng Zhong
2025TensorMD: Molecular Dynamics Simulation with Ab Initio Accuracy of 50 Billion Atoms.
Yucheng Ouyang, Ying Liu, Honghui Shang, Zhenchuan Chen, Jiahao Shan, Huimin Cui, Xiaobing Feng, Xin Chen, Xingyu Gao, Lifang Wang, Haifeng Song, Rongfen Lin, Fang Li
2025Transactional Data Structures with Orthogonal Metadata.
Yaodong Sheng, Ahmed Hassan, Michael F. Spear
2025Triangle Counting on Tensor Cores.
Yuang Chen, Jeffrey Xu Yu
2025TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs.
Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Huangliang Dai, Sheng Di, Franck Cappello, Zizhong Chen
2025WaterWise: Co-optimizing Carbon- and Water-Footprint Toward Environmentally Sustainable Cloud Computing.
Yankai Jiang, Rohan Basu Roy, Raghavendra Kanakagiri, Devesh Tiwari
2025WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training.
Junfeng Lin, Ziming Liu, Yang You, Jun Wang, Weihao Zhang, Rong Zhao