| 2020 | <u>G</u>PU <u>i</u>nitiated <u>O</u>penSHMEM: correct and efficient intra-kernel networking for dGPUs. Khaled Hamidouche, Michael LeBeane |
| 2020 | A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs. Peng Jiang, Changwan Hong, Gagan Agrawal |
| 2020 | A parallel sparse tensor benchmark suite on CPUs and GPUs. Jiajia Li, Mahesh Lakshminarasimhan, Xiaolong Wu, Ang Li, Catherine Olschanowsky, Kevin J. Barker |
| 2020 | A supernodal all-pairs shortest path algorithm. Piyush Sao, Ramakrishnan Kannan, Prasun Gera, Richard W. Vuduc |
| 2020 | A tool for top-down performance analysis of GPU-accelerated applications. Keren Zhou, Mark Krentel, John M. Mellor-Crummey |
| 2020 | A wait-free universal construction for large objects. Andreia Correia, Pedro Ramalhete, Pascal Felber |
| 2020 | ArcherGear: data race equivalencing for expeditious HPC debugging. Samuel Thayer, Ganesh Gopalakrishnan, Ian Briggs, Michael Bentley, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee |
| 2020 | Breaking master-slave model between host and FPGAs. Jaume Bosch, Miquel Vidal, Antonio Filgueras, Carlos Álvarez, Daniel Jiménez-González, Xavier Martorell, Eduard Ayguadé |
| 2020 | Detecting and reproducing error-code propagation bugs in MPI implementations. Daniel DeFreez, Antara Bhowmick, Ignacio Laguna, Cindy Rubio-González |
| 2020 | ELDA: LDA made efficient via algorithm-system codesign submission. Shilong Wang, Da Li, Hengyong Yu, Hang Liu |
| 2020 | Fast concurrent data sketches. Arik Rinberg, Alexander Spiegelman, Edward Bortnikov, Eshcar Hillel, Idit Keidar, Lee Rhodes, Hadar Serviansky |
| 2020 | Functional faults. Gali Sheffi, Erez Petrank |
| 2020 | Identifying scalability bottlenecks for large-scale parallel programs with graph analysis. Yuyang Jin, Haojie Wang, Xiongchao Tang, Torsten Hoefler, Xu Liu, Jidong Zhai |
| 2020 | Increasing the parallelism of graph coloring via shortcutting. Ghadeer Alabandi, Evan Powers, Martin Burtscher |
| 2020 | Kite: efficient and available release consistency for the datacenter. Vasilis Gavrielatos, Antonios Katsarakis, Vijay Nagarajan, Boris Grot, Arpit Joshi |
| 2020 | MatRox: modular approach for improving data locality in hierarchical (Mat)rix App(Rox)imation. Bangtian Liu, Kazem Cheshmi, Saeed Soori, Michelle Mills Strout, Maryam Mehri Dehnavi |
| 2020 | Neighbor-list-free molecular dynamics on sunway TaihuLight supercomputer. Xiaohui Duan, Ping Gao, Meng Zhang, Tingjian Zhang, Hongsong Meng, Yuxuan Li, Bertil Schmidt, Haohuan Fu, Lin Gan, Wei Xue, Guangwen Yang, Weiguo Liu |
| 2020 | Nesting and composition in transactional data structure libraries. Gal Assa, Hagar Meir, Guy Golan-Gueta, Idit Keidar, Alexander Spiegelman |
| 2020 | No barrier in the road: a comprehensive study and optimization of ARM barriers. Nian Liu, Binyu Zang, Haibo Chen |
| 2020 | Non-blocking interpolation search trees with doubly-logarithmic running time. Trevor Brown, Aleksandar Prokopec, Dan Alistarh |
| 2020 | Nonblocking persistent software transactional memory. H. Alan Beadle, Wentao Cai, Haosen Wen, Michael L. Scott |
| 2020 | Oak: a scalable off-heap allocated key-value map. Hagar Meir, Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Idit Keidar, Eran Meir, Gali Sheffi, Yoav Zuriel |
| 2020 | On the fly MHP analysis. Sonali Saha, V. Krishna Nandivada |
| 2020 | Optimizing GPU programs by partial evaluation. Aleksey Tyurin, Daniil Berezun, Semyon V. Grigorev |
| 2020 | Optimizing batched Winograd convolution on GPUs. Da Yan, Wei Wang, Xiaowen Chu |
| 2020 | Overlapping host-to-device copy and computation using hidden unified memory. Jaehoon Jung, Daeyoung Park, Youngdong Do, Jungho Park, Jaejin Lee |
| 2020 | PLUM: static parallel program locality analysis under uniform multiplexing. Fangzhou Liu, Dong Chen, Wesley Smith, Chen Ding |
| 2020 | PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, USA, February 22-26, 2020 Rajiv Gupta, Xipeng Shen |
| 2020 | Parallel and distributed bounded model checking of multi-threaded programs. Omar Inverso, Catia Trubiani |
| 2020 | Parallel determinacy race detection for futures. Yifan Xu, Kyle Singer, I-Ting Angelina Lee |
| 2020 | Practical parallel hypergraph algorithms. Julian Shun |
| 2020 | Reflector: a fine-grained I/O tracker for HPC systems. Abdullah Al-Mamun, Jialin Liu, Tonglin Li, Quincey Koziol, Zhongyi Zhai, Junyan Qian, Haoting Shen, Dongfang Zhao |
| 2020 | Restricted memory-friendly lock-free bounded queues. Nikita Koval, Vitaly Aksenov |
| 2020 | Revisiting linpack algorithm on large-scale CPU-GPU heterogeneous systems. Chaoyang Shui, Xianzhi Yu, Yujin Yan, Yinshan Wang, Ke Meng, Guangming Tan |
| 2020 | Scalable top-k retrieval with Sparta. Gali Sheffi, Dmitry Basin, Edward Bortnikov, David Carmel, Idit Keidar |
| 2020 | Scaling concurrent queues by using HTM to profit from failed atomic operations. Or Ostrovsky, Adam Morrison |
| 2020 | Scaling out speculative execution of finite-state machines with parallel merge. Yang Xia, Peng Jiang, Gagan Agrawal |
| 2020 | Taming unbalanced training workloads in deep learning with partial collective operations. Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler |
| 2020 | Testing concurrency on the JVM with lincheck. Nikita Koval, Maria Sokolova, Alexander Fedorov, Dan Alistarh, Dmitry Tsitelov |
| 2020 | Understand the overheads of storage data structures on persistent memory. Abdullah Al Raqibul Islam, Dong Dai |
| 2020 | Understanding and optimizing persistent memory allocation. Wentao Cai, Haosen Wen, H. Alan Beadle, Mohammad Hedayati, Michael L. Scott |
| 2020 | Universal wait-free memory reclamation. Ruslan Nikolaev, Binoy Ravindran |
| 2020 | Using sample-based time series data for automated diagnosis of scalability losses in parallel programs. Lai Wei, John M. Mellor-Crummey |
| 2020 | XIndex: a scalable learned index for multicore data storage. Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, Haibo Chen |
| 2020 | YewPar: skeletons for exact combinatorial search. Blair Archibald, Patrick Maier, Rob Stewart, Phil Trinder |
| 2020 | spECK: accelerating GPU sparse matrix-matrix multiplication through lightweight analysis. Mathias Parger, Martin Winter, Daniel Mlakar, Markus Steinberger |
| 2020 | waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data. Jiannan Tian, Sheng Di, Chengming Zhang, Xin Liang, Sian Jin, Dazhao Cheng, Dingwen Tao, Franck Cappello |