PPoPP B

46 papers

YearTitle / Authors
2024A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs.
Jinchen Xu, Guanghui Song, Bei Zhou, Fei Li, Jiangwei Hao, Jie Zhao
2024A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs.
Meng Pang, Xiang Fei, Peng Qu, Youhui Zhang, Zhaolin Li
2024AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping.
Seongyeon Park, Junguk Hong, Jaeyong Song, Hajin Kim, Youngsok Kim, Jinho Lee
2024Are Your Epochs Too Epic? Batch Free Can Be Harmful.
Daewoo Kim, Trevor Brown, Ajay Singh
2024Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication.
Lukas Gianinazzi, Alexandros Nikolaos Ziogas, Langwen Huang, Piotr Luczynski, Saleh Ashkboosh, Florian Scheidl, Armon Carigiet, Chio Ge, Nabil Abubaker, Maciej Besta, Tal Ben-Nun, Torsten Hoefler
2024CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers.
Brian Wheatman, Randal C. Burns, Aydin Buluç, Helen Xu
2024ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores.
Yuetao Chen, Kun Li, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang
2024Exploiting Fine-Grained Redundancy in Set-Centric Graph Pattern Mining.
Zhiheng Lin, Ke Meng, Chaoyang Shui, Kewei Zhang, Junmin Xiao, Guangming Tan
2024Extreme-scale Direct Numerical Simulation of Incompressible Turbulence on the Heterogeneous Many-core System.
Jiabin Xie, Guangnan Feng, Han Huang, Junxuan Feng, Zhiguang Chen, Yutong Lu
2024Fast American Option Pricing using Nonlinear Stencils.
Zafar Ahmad, Reilly Browne, Rezaul Chowdhury, Rathish Das, Yushen Huang, Yimin Zhu
2024Fast Kronecker Matrix-Matrix Multiplication on GPUs.
Abhinav Jangda, Mohit Yadav
2024FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters.
Shenggan Cheng, Xuanlei Zhao, Guangyang Lu, Jiarui Fang, Tian Zheng, Ruidong Wu, Xiwen Zhang, Jian Peng, Yang You
2024Gallatin: A General-Purpose GPU Memory Manager.
Hunter McCoy, Prashant Pandey
2024GraphCube: Interconnection Hierarchy-aware Graph Processing.
Xinbiao Gan, Guang Wu, Shenghao Qiu, Feng Xiong, Jiaqi Si, Jianbin Fang, Dezun Dong, Chunye Gong, Tiejun Li, Zheng Wang
2024INFINEL: An efficient GPU-based processing method for unpredictable large output graph queries.
Sungwoo Park, Seyeon Oh, Min-Soo Kim
2024Language-Agnostic Static Deadlock Detection for Futures.
Stefan K. Muller
2024Liger: Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model Inference.
Jiangsu Du, Jinhui Wei, Jiazhi Jiang, Shenggan Cheng, Dan Huang, Zhiguang Chen, Yutong Lu
2024Locks as a Resource: Fairly Scheduling Lock Occupation with CFL.
Jonggyu Park, Young Ik Eom
2024Memory Bounds for Concurrent Bounded Queues.
Vitaly Aksenov, Nikita Koval, Petr Kuznetsov, Anton Paramonov
2024OsirisBFT: Say No to Task Replication for Scalable Byzantine Fault Tolerant Analytics.
Kasra Jamshidi, Keval Vora
2024POSTER: Accelerating High-Precision Integer Multiplication used in Cryptosystems with GPUs.
Zhuoran Ji, Zhaorui Zhang, Jiming Xu, Lei Ju
2024POSTER: Enabling Extreme-Scale Phase Field Simulation with In-situ Feature Extraction.
Zhichen Feng, Jialin Li, Yaqian Gao, Shaobo Tian, Huang Ye, Jian Zhang
2024POSTER: FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN Inferences.
Lixian Ma, Haoruo Chen, En Shao, Leping Wang, Quan Chen, Guangming Tan
2024POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.
Juntao Zhao, Borui Wan, Chuan Wu, Yanghua Peng, Haibin Lin
2024POSTER: OCToPus: Semantic-aware Concurrency Control for Blockchain Transactions.
dePaul Miller, Henry F. Korth, Roberto Palmieri
2024POSTER: Optimizing Collective Communications with Error-bounded Lossy Compression for GPU Clusters.
Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur
2024POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table Design.
Guofeng Feng, Weile Jia, Ninghui Sun, Guangming Tan, Jiajia Li
2024POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters.
Shunde Li, Junyu Gu, Jue Wang, Tiechui Yao, Zhiqiang Liang, Yumeng Shi, Shigang Li, Weiting Xi, Shushen Li, Chunbao Zhou, Yangang Wang, Xuebin Chi
2024POSTER: Pattern-Aware Sparse Communication for Scalable Recommendation Model Training.
Jiaao He, Shengqi Chen, Jidong Zhai
2024POSTER: RELAX: Durable Data Structures with Swift Recovery.
Almog Zur, Nachshon Cohen, Michal Friedman, Erez Petrank
2024POSTER: RadiK: Scalable Radix Top-K Selection on GPUs.
Yifei Li, Bole Zhou, Jiejing Zhang, Xuechao Wei, Yinghan Li, Yingda Chen
2024POSTER: StructMG: A Fast and Scalable Structured Multigrid.
Yi Zong, Xinliang Wang, Haopeng Huang, Chensong Zhang, Xiaowen Xu, Jian Sun, Bowen Yan, Qin Wang, Sicong Li, Zhaohui Ding, Wei Xue
2024Parallel Integer Sort: Theory and Practice.
Xiaojun Dong, Laxman Dhulipala, Yan Gu, Yihan Sun
2024Parallel k-Core Decomposition with Batched Updates and Asynchronous Reads.
Quanquan C. Liu, Julian Shun, Igor Zablotchi
2024ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms.
Magdalen Dobson Manohar, Zheqi Shen, Guy E. Blelloch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, Yihan Sun
2024Practical Hardware Transactional vEB Trees.
Mohammad Khalaji, Trevor Brown, Khuzaima Daudjee, Vitaly Aksenov
2024Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2024, Edinburgh, United Kingdom, March 2-6, 2024
Michel Steuwer, I-Ting Angelina Lee, Milind Chabbi
2024Pure: Evolving Message Passing To Better Leverage Shared Memory Within Nodes.
James Psota, Armando Solar-Lezama
2024Recurrence Analysis for Automatic Parallelization of Subscripted Subscripts.
Akshay Bhosale, Rudolf Eigenmann
2024Scaling Up Transactions with Slower Clocks.
Pedro Ramalhete, Andreia Correia
2024Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips.
Ismet Dagli, Mehmet E. Belviranli
2024Sparsity in Deep Neural Nets (Keynote).
Nir Shavit
2024Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU.
Xiaoyan Liu, Xuegui Zheng, Hailong Yang, Zhongzhi Luan, Depei Qian
2024Towards Scalable Unstructured Mesh Computations on Shared Memory Many-Cores.
Haozhong Qiu, Chuanfu Xu, Jianbin Fang, Liang Deng, Jian Zhang, Qingsong Wang, Yue Ding, Zhe Dai, Yonggang Che, Shizhao Chen, Jie Liu
2024Training one DeePMD Model in Minutes: a Step towards Online Learning.
Siyu Hu, Tong Zhao, Qiuchen Sha, Enji Li, Xiangyu Meng, Liping Liu, Lin-Wang Wang, Guangming Tan, Weile Jia
2024VERLIB: Concurrent Versioned Pointers.
Guy E. Blelloch, Yuanhao Wei