| 2024 | A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs. Jinchen Xu, Guanghui Song, Bei Zhou, Fei Li, Jiangwei Hao, Jie Zhao |
| 2024 | A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs. Meng Pang, Xiang Fei, Peng Qu, Youhui Zhang, Zhaolin Li |
| 2024 | AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping. Seongyeon Park, Junguk Hong, Jaeyong Song, Hajin Kim, Youngsok Kim, Jinho Lee |
| 2024 | Are Your Epochs Too Epic? Batch Free Can Be Harmful. Daewoo Kim, Trevor Brown, Ajay Singh |
| 2024 | Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication. Lukas Gianinazzi, Alexandros Nikolaos Ziogas, Langwen Huang, Piotr Luczynski, Saleh Ashkboosh, Florian Scheidl, Armon Carigiet, Chio Ge, Nabil Abubaker, Maciej Besta, Tal Ben-Nun, Torsten Hoefler |
| 2024 | CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers. Brian Wheatman, Randal C. Burns, Aydin Buluç, Helen Xu |
| 2024 | ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores. Yuetao Chen, Kun Li, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang |
| 2024 | Exploiting Fine-Grained Redundancy in Set-Centric Graph Pattern Mining. Zhiheng Lin, Ke Meng, Chaoyang Shui, Kewei Zhang, Junmin Xiao, Guangming Tan |
| 2024 | Extreme-scale Direct Numerical Simulation of Incompressible Turbulence on the Heterogeneous Many-core System. Jiabin Xie, Guangnan Feng, Han Huang, Junxuan Feng, Zhiguang Chen, Yutong Lu |
| 2024 | Fast American Option Pricing using Nonlinear Stencils. Zafar Ahmad, Reilly Browne, Rezaul Chowdhury, Rathish Das, Yushen Huang, Yimin Zhu |
| 2024 | Fast Kronecker Matrix-Matrix Multiplication on GPUs. Abhinav Jangda, Mohit Yadav |
| 2024 | FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters. Shenggan Cheng, Xuanlei Zhao, Guangyang Lu, Jiarui Fang, Tian Zheng, Ruidong Wu, Xiwen Zhang, Jian Peng, Yang You |
| 2024 | Gallatin: A General-Purpose GPU Memory Manager. Hunter McCoy, Prashant Pandey |
| 2024 | GraphCube: Interconnection Hierarchy-aware Graph Processing. Xinbiao Gan, Guang Wu, Shenghao Qiu, Feng Xiong, Jiaqi Si, Jianbin Fang, Dezun Dong, Chunye Gong, Tiejun Li, Zheng Wang |
| 2024 | INFINEL: An efficient GPU-based processing method for unpredictable large output graph queries. Sungwoo Park, Seyeon Oh, Min-Soo Kim |
| 2024 | Language-Agnostic Static Deadlock Detection for Futures. Stefan K. Muller |
| 2024 | Liger: Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model Inference. Jiangsu Du, Jinhui Wei, Jiazhi Jiang, Shenggan Cheng, Dan Huang, Zhiguang Chen, Yutong Lu |
| 2024 | Locks as a Resource: Fairly Scheduling Lock Occupation with CFL. Jonggyu Park, Young Ik Eom |
| 2024 | Memory Bounds for Concurrent Bounded Queues. Vitaly Aksenov, Nikita Koval, Petr Kuznetsov, Anton Paramonov |
| 2024 | OsirisBFT: Say No to Task Replication for Scalable Byzantine Fault Tolerant Analytics. Kasra Jamshidi, Keval Vora |
| 2024 | POSTER: Accelerating High-Precision Integer Multiplication used in Cryptosystems with GPUs. Zhuoran Ji, Zhaorui Zhang, Jiming Xu, Lei Ju |
| 2024 | POSTER: Enabling Extreme-Scale Phase Field Simulation with In-situ Feature Extraction. Zhichen Feng, Jialin Li, Yaqian Gao, Shaobo Tian, Huang Ye, Jian Zhang |
| 2024 | POSTER: FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN Inferences. Lixian Ma, Haoruo Chen, En Shao, Leping Wang, Quan Chen, Guangming Tan |
| 2024 | POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization. Juntao Zhao, Borui Wan, Chuan Wu, Yanghua Peng, Haibin Lin |
| 2024 | POSTER: OCToPus: Semantic-aware Concurrency Control for Blockchain Transactions. dePaul Miller, Henry F. Korth, Roberto Palmieri |
| 2024 | POSTER: Optimizing Collective Communications with Error-bounded Lossy Compression for GPU Clusters. Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur |
| 2024 | POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table Design. Guofeng Feng, Weile Jia, Ninghui Sun, Guangming Tan, Jiajia Li |
| 2024 | POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters. Shunde Li, Junyu Gu, Jue Wang, Tiechui Yao, Zhiqiang Liang, Yumeng Shi, Shigang Li, Weiting Xi, Shushen Li, Chunbao Zhou, Yangang Wang, Xuebin Chi |
| 2024 | POSTER: Pattern-Aware Sparse Communication for Scalable Recommendation Model Training. Jiaao He, Shengqi Chen, Jidong Zhai |
| 2024 | POSTER: RELAX: Durable Data Structures with Swift Recovery. Almog Zur, Nachshon Cohen, Michal Friedman, Erez Petrank |
| 2024 | POSTER: RadiK: Scalable Radix Top-K Selection on GPUs. Yifei Li, Bole Zhou, Jiejing Zhang, Xuechao Wei, Yinghan Li, Yingda Chen |
| 2024 | POSTER: StructMG: A Fast and Scalable Structured Multigrid. Yi Zong, Xinliang Wang, Haopeng Huang, Chensong Zhang, Xiaowen Xu, Jian Sun, Bowen Yan, Qin Wang, Sicong Li, Zhaohui Ding, Wei Xue |
| 2024 | Parallel Integer Sort: Theory and Practice. Xiaojun Dong, Laxman Dhulipala, Yan Gu, Yihan Sun |
| 2024 | Parallel k-Core Decomposition with Batched Updates and Asynchronous Reads. Quanquan C. Liu, Julian Shun, Igor Zablotchi |
| 2024 | ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms. Magdalen Dobson Manohar, Zheqi Shen, Guy E. Blelloch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, Yihan Sun |
| 2024 | Practical Hardware Transactional vEB Trees. Mohammad Khalaji, Trevor Brown, Khuzaima Daudjee, Vitaly Aksenov |
| 2024 | Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2024, Edinburgh, United Kingdom, March 2-6, 2024 Michel Steuwer, I-Ting Angelina Lee, Milind Chabbi |
| 2024 | Pure: Evolving Message Passing To Better Leverage Shared Memory Within Nodes. James Psota, Armando Solar-Lezama |
| 2024 | Recurrence Analysis for Automatic Parallelization of Subscripted Subscripts. Akshay Bhosale, Rudolf Eigenmann |
| 2024 | Scaling Up Transactions with Slower Clocks. Pedro Ramalhete, Andreia Correia |
| 2024 | Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips. Ismet Dagli, Mehmet E. Belviranli |
| 2024 | Sparsity in Deep Neural Nets (Keynote). Nir Shavit |
| 2024 | Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU. Xiaoyan Liu, Xuegui Zheng, Hailong Yang, Zhongzhi Luan, Depei Qian |
| 2024 | Towards Scalable Unstructured Mesh Computations on Shared Memory Many-Cores. Haozhong Qiu, Chuanfu Xu, Jianbin Fang, Liang Deng, Jian Zhang, Qingsong Wang, Yue Ding, Zhe Dai, Yonggang Che, Shizhao Chen, Jie Liu |
| 2024 | Training one DeePMD Model in Minutes: a Step towards Online Learning. Siyu Hu, Tong Zhao, Qiuchen Sha, Enji Li, Xiangyu Meng, Liping Liu, Lin-Wang Wang, Guangming Tan, Weile Jia |
| 2024 | VERLIB: Concurrent Versioned Pointers. Guy E. Blelloch, Yuanhao Wei |