| 2021 | A fast work-efficient SSSP algorithm for GPUs. Kai Wang, Don Fussell, Calvin Lin |
| 2021 | A lock-free relaxed concurrent queue for fast work distribution. Giorgos Kappes, Stergios V. Anastasiadis |
| 2021 | A more pragmatic implementation of the lock-free, ordered, linked list. Jesper Larsson Träff, Manuel Pöter |
| 2021 | A novel memory-efficient deep learning training framework via error-bounded lossy compression. Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao |
| 2021 | Advanced synchronization techniques for task-based runtime systems. David Álvarez, Kevin Sala, Marcos Maroñas, Aleix Roca, Vicenç Beltran |
| 2021 | An efficient uncertain graph processing framework for heterogeneous architectures. Heng Zhang, Lingda Li, Donglin Zhuang, Rui Liu, Shuang Song, Dingwen Tao, Yanjun Wu, Shuaiwen Leon Song |
| 2021 | An ownership policy and deadlock detector for promises. Caleb Voss, Vivek Sarkar |
| 2021 | ApproxTuner: a compiler and runtime system for adaptive approximations. Hashim Sharif, Yifan Zhao, Maria Kotsifakou, Akash Kothari, Ben Schreiber, Elizabeth Wang, Yasmin Sarita, Nathan Zhao, Keyur Joshi, Vikram S. Adve, Sasa Misailovic, Sarita V. Adve |
| 2021 | Are dynamic memory managers on GPUs slow?: a survey and benchmarks. Martin Winter, Mathias Parger, Daniel Mlakar, Markus Steinberger |
| 2021 | Asynchrony versus bulk-synchrony for a generalized N-body problem from genomics. Marquita Ellis, Aydin Buluç, Katherine A. Yelick |
| 2021 | BiPart: a parallel and deterministic hypergraph partitioner. Sepideh Maleki, Udit Agarwal, Martin Burtscher, Keshav Pingali |
| 2021 | Bundled references: an abstraction for highly-concurrent linearizable range queries. Jacob Nelson, Ahmed Hassan, Roberto Palmieri |
| 2021 | Compiler support for near data computing. Mahmut Taylan Kandemir, Jihyun Ryoo, Xulong Tang, Mustafa Karaköy |
| 2021 | Constant-time snapshots with applications to concurrent data structures. Yuanhao Wei, Naama Ben-David, Guy E. Blelloch, Panagiota Fatourou, Eric Ruppert, Yihan Sun |
| 2021 | Corder: cache-aware reordering for optimizing graph analytics. Yuang Chen, Yeh-Ching Chung |
| 2021 | DAPPLE: a pipelined data parallel approach for training large models. Shiqing Fan, Yi Rong, Chen Meng, Zongyan Cao, Siyu Wang, Zhen Zheng, Chuan Wu, Guoping Long, Jun Yang, Lixue Xia, Lansong Diao, Xiaoyong Liu, Wei Lin |
| 2021 | DFOGraph: an I/O- and communication-efficient system for distributed fully-out-of-core graph processing. Jiping Yu, Wei Qin, Xiaowei Zhu, Zhenbo Sun, Jianqiang Huang, Xiaohan Li, Wenguang Chen |
| 2021 | Dynamic scaling for low-precision learning. Ruobing Han, Min Si, James Demmel, Yang You |
| 2021 | EGEMM-TC: accelerating scientific computing on tensor cores with extended precision. Boyuan Feng, Yuke Wang, Guoyang Chen, Weifeng Zhang, Yuan Xie, Yufei Ding |
| 2021 | Efficient algorithms for persistent transactional memory. Pedro Ramalhete, Andreia Correia, Pascal Felber |
| 2021 | Efficiently reclaiming memory in concurrent search data structures while bounding wasted memory. Daniel Solomon, Adam Morrison |
| 2021 | Efficiently running SpMV on long vector architectures. Constantino Gómez, Filippo Mantovani, Erich Focht, Marc Casas |
| 2021 | Exploring deep reuse in winograd CNN inference. Ruofan Wu, Feng Zhang, Zhen Zheng, Xiaoyong Du, Xipeng Shen |
| 2021 | Extending MapReduce framework with locality keys. Yifeng Chen, Bei Wang, Xiaolin Wang |
| 2021 | Extracting clean performance models from tainted programs. Marcin Copik, Alexandru Calotoiu, Tobias Grosser, Nicolas Wicki, Felix Wolf, Torsten Hoefler |
| 2021 | FFT blitz: the tensor cores strike back. Sultan Durrani, Muhammad Saad Chughtai, Abdul Dakkak, Wen-Mei Hwu, Lawrence Rauchwerger |
| 2021 | GPTune: multitask learning for autotuning exascale applications. Yang Liu, Wissam M. Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James Weldon Demmel, Xiaoye S. Li |
| 2021 | I/O lower bounds for auto-tuning of convolutions in CNNs. Xiaoyang Zhang, Junmin Xiao, Guangming Tan |
| 2021 | Improving communication by optimizing on-node data movement with data layout. Tuowen Zhao, Mary W. Hall, Hans Johansen, Samuel Williams |
| 2021 | In-situ workflow auto-tuning through combining component models. Tong Shu, Yanfei Guo, Justin M. Wozniak, Xiaoning Ding, Ian T. Foster, Tahsin M. Kurç |
| 2021 | Investigating the semantics of futures in transactional memory systems. Jingna Zeng, Shady Issa, Paolo Romano, Luís E. T. Rodrigues, Seif Haridi |
| 2021 | Lightweight preemptive user-level threads. Shumpei Shiina, Shintaro Iwasaki, Kenjiro Taura, Pavan Balaji |
| 2021 | Modernizing parallel code with pattern analysis. Roberto Castañeda Lozano, Murray Cole, Björn Franke |
| 2021 | NBR: neutralization based reclamation. Ajay Singh, Trevor Brown, Ali José Mashtizadeh |
| 2021 | On group mutual exclusion for dynamic systems. Shreyas Gokhale, Sahil Dhoked, Neeraj Mittal |
| 2021 | On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization. Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, Torsten Hoefler |
| 2021 | OrcGC: automatic lock-free memory reclamation. Andreia Correia, Pedro Ramalhete, Pascal Felber |
| 2021 | PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual Event, Republic of Korea, February 27- March 3, 2021 Jaejin Lee, Erez Petrank |
| 2021 | Parallel binary code analysis. Xiaozhu Meng, Jonathon M. Anderson, John M. Mellor-Crummey, Mark W. Krentel, Barton P. Miller, Srdan Milakovic |
| 2021 | Reasoning about recursive tree traversals. Yanjun Wang, Jinwei Liu, Dalin Zhang, Xiaokang Qiu |
| 2021 | Scaling implicit parallelism via dynamic control replication. Michael Bauer, Wonchan Lee, Elliott Slaughter, Zhihao Jia, Mario Di Renzo, Manolis Papadakis, Galen M. Shipman, Patrick S. McCormick, Michael Garland, Alex Aiken |
| 2021 | ShadowVM: accelerating data plane for data analytics with bare metal CPUs and GPUs. Zhifang Li, Mingcong Han, Shangwei Wu, Chuliang Weng |
| 2021 | Simplifying low-level GPU programming with GAS. Da Yan, Wei Wang, Xiaowen Chu |
| 2021 | Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory. Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li, Jiajia Li |
| 2021 | Synthesizing optimal collective algorithms. Zixian Cai, Zhengyang Liu, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi |
| 2021 | TurboTransformers: an efficient GPU serving system for transformer models. Jiarui Fang, Yang Yu, Chengduo Zhao, Jie Zhou |
| 2021 | Understanding a program's resiliency through error propagation. Zhimin Li, Harshitha Menon, Kathryn M. Mohror, Peer-Timo Bremer, Yarden Livnat, Valerio Pascucci |
| 2021 | Understanding and bridging the gaps in current GNN performance optimizations. Kezhao Huang, Jidong Zhai, Zhen Zheng, Youngmin Yi, Xipeng Shen |
| 2021 | Verifying C11-style weak memory libraries. Sadegh Dalvandi, Brijesh Dongol |