| 2019 | A GPU memory efficient speed-up scheme for training ultra-deep neural networks: poster. Jinrong Guo, Wantao Liu, Wang Wang, Qu Lu, Songlin Hu, Jizhong Han, Ruixuan Li |
| 2019 | A coordinated tiling and batching framework for efficient GEMM on GPUs. Xiuhong Li, Yun Liang, Shengen Yan, Liancheng Jia, Yinghan Li |
| 2019 | A distributed hypervisor for resource aggregation: poster. Yubin Chen, Zhuocheng Ding, Jin Zhang, Yun Wang, Zhengwei Qi, Haibing Guan |
| 2019 | A pattern based algorithmic autotuner for graph processing on GPUs. Ke Meng, Jiajia Li, Guangming Tan, Ninghui Sun |
| 2019 | A round-efficient distributed betweenness centrality algorithm. Loc Hoang, Matteo Pontecorvi, Roshan Dathathri, Gurbinder Gill, Bozhi You, Keshav Pingali, Vijaya Ramachandran |
| 2019 | A specialized B-tree for concurrent datalog evaluation. Herbert Jordan, Pavle Subotic, David Zhao, Bernhard Scholz |
| 2019 | Accelerating distributed stochastic gradient descent with adaptive periodic parameter averaging: poster. Peng Jiang, Gagan Agrawal |
| 2019 | Adaptive sparse matrix-matrix multiplication on the GPU. Martin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger |
| 2019 | Adaptive sparse tiling for sparse matrix multiplication. Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, P. Sadayappan |
| 2019 | Automated multi-dimensional elasticity for streaming runtimes: poster. Xiang Ni, Scott Schneider, Raju Pavuluri, Jonathan Kaus, Kun-Lung Wu |
| 2019 | BASMAT: bottleneck-aware sparse matrix-vector multiplication auto-tuning on GPGPUs. Athena Elafrou, Georgios I. Goumas, Nectarios Koziris |
| 2019 | Beyond human-level accuracy: computational challenges in deep learning. Joel Hestness, Newsha Ardalani, Gregory F. Diamos |
| 2019 | Blockchain abstract data type: poster. Emmanuelle Anceaume, Antonella Del Pozzo, Romaric Ludinard, Maria Potop-Butucaru, Sara Tucci Piergiovanni |
| 2019 | Building parallel programming language constructs in the AbleC extensible C compiler framework: a PPoPP tutorial. Travis Carlson, Eric Van Wyk |
| 2019 | Checking linearizability using hitting families. Burcu Kulahcioglu Ozkan, Rupak Majumdar, Filip Niksic |
| 2019 | Compiler-assisted adaptive program scheduling in big.LITTLE systems: poster. Marcelo Novaes, Vinicius Petrucci, Abdoulaye Gamatié, Fernando Magno Quintão Pereira |
| 2019 | Corrected trees for reliable group communication. Martin Küttler, Maksym Planeta, Jan Bierbaum, Carsten Weinhold, Hermann Härtig, Amnon Barak, Torsten Hoefler |
| 2019 | Creating repeatable, reusable experimentation pipelines with popper: tutorial. Ivo Jimenez, Jay F. Lofstead, Carlos Maltzahn |
| 2019 | CuLDA_CGS: solving large-scale LDA problems on GPUs. Xiaolong Xie, Yun Liang, Xiuhong Li, Wei Tan |
| 2019 | Data-flow/dependence profiling for structured transformations. Fabian Gruber, Manuel Selva, Diogo Sampaio, Christophe Guillon, Antoine Moynault, Louis-Noël Pouchet, Fabrice Rastello |
| 2019 | Efficient race detection with futures. Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, I-Ting Angelina Lee |
| 2019 | Encapsulated open nesting for STM: fine-grained higher-level conflict detection. Martin Bättig, Thomas R. Gross |
| 2019 | Engineering a high-performance GPU B-Tree. Muhammad A. Awad, Saman Ashkiani, Rob Johnson, Martin Farach-Colton, John D. Owens |
| 2019 | Exploiting the input sparsity to accelerate deep neural networks: poster. Xiao Dong, Lei Liu, Guangli Li, Jiansong Li, Peng Zhao, Xueying Wang, Xiaobing Feng |
| 2019 | GOPipe: a granularity-oblivious programming framework for pipelined stencil executions on GPU. Chanyoung Oh, Zhen Zheng, Xipeng Shen, Jidong Zhai, Youngmin Yi |
| 2019 | GPOP: a cache and memory-efficient framework for graph processing over partitions. Kartik Lakhotia, Rajgopal Kannan, Sourav Pati, Viktor K. Prasanna |
| 2019 | GPU-based 3D cryo-EM reconstruction with key-value streams: poster. Kunpeng Wang, Shizhen Xu, Hongkun Yu, Haohuan Fu, Guangwen Yang |
| 2019 | Harmonia: a high throughput B+tree for GPUs. Zhaofeng Yan, Yuzhe Lin, Lu Peng, Weihua Zhang |
| 2019 | High performance distributed deep learning: a beginner's guide. Dhabaleswar K. Panda, Ammar Ahmad Awan, Hari Subramoni |
| 2019 | High-throughput image alignment for connectomics using frugal snap judgments: poster. Tim Kaler, Brian Wheatman, Sarah Wooders |
| 2019 | Implementing parallel and concurrent tree structures. Yihan Sun, Guy E. Blelloch |
| 2019 | Incremental flattening for nested data parallelism. Troels Henriksen, Frederik Thorøe, Martin Elsman, Cosmin E. Oancea |
| 2019 | LOFT: lock-free transactional data structures. Avner Elizarov, Guy Golan-Gueta, Erez Petrank |
| 2019 | Leveraging hardware TM in Haskell. Ryan Yates, Michael L. Scott |
| 2019 | Lightweight hardware transactional memory profiling. Qingsen Wang, Pengfei Su, Milind Chabbi, Xu Liu |
| 2019 | Lock-free channels for programming via communicating sequential processes: poster. Nikita Koval, Dan Alistarh, Roman Elizarov |
| 2019 | Making concurrent algorithms detectable: poster. Naama Ben-David, Guy E. Blelloch, Michal Friedman, Yuanhao Wei |
| 2019 | Managing application parallelism via parallel efficiency regulation: poster. Sharanyan Srikanthan, Princeton Ferro, Sayak Chakraborti, Sandhya Dwarkadas |
| 2019 | Modular transactions: bounding mixed races in space and time. Brijesh Dongol, Radha Jagadeesan, James Riely |
| 2019 | Optimizing GPU programs by register demotion: poster. Putt Sakdhnagool, Amit Sabne, Rudolf Eigenmann |
| 2019 | Optimizing computation-communication overlap in asynchronous task-based programs: poster. Emilio Castillo, Nikhil Jain, Marc Casas, Miquel Moretó, Martin Schulz, Ramón Beivide, Mateo Valero, Abhinav Bhatele |
| 2019 | Optimizing graph processing on GPUs using approximate computing: poster. Somesh Singh, Rupesh Nasre |
| 2019 | Performance portable C++ programming with RAJA. David Beckingsale, Richard D. Hornung, Tom Scogland, Arturo Vargas |
| 2019 | Proactive work stealing for futures. Kyle Singer, Yifan Xu, I-Ting Angelina Lee |
| 2019 | Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, Washington, DC, USA, February 16-20, 2019 Jeffrey K. Hollingsworth, Idit Keidar |
| 2019 | Processing transactions in a predefined order. Mohamed M. Saad, Masoomeh Javidi Kishi, Shihao Jing, Sandeep Hans, Roberto Palmieri |
| 2019 | Profiling based out-of-core hybrid method for large neural networks: poster. Yuki Ito, Haruki Imai, Tung D. Le, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo |
| 2019 | Programming quantum computers: a primer with IBM Q and D-Wave exercises. Frank Mueller, Greg Byrd, Patrick Dreher |
| 2019 | Provably and practically efficient granularity control. Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, Mike Rainey |
| 2019 | QTLS: high-performance TLS asynchronous offload framework with Intel® QuickAssist technology. Xiaokang Hu, Changzheng Wei, Jian Li, Brian Will, Ping Yu, Lu Gong, Haibing Guan |
| 2019 | S-EnKF: co-designing for scalable ensemble Kalman filter. Junmin Xiao, Shijie Wang, Weiqiang Wan, Xuehai Hong, Guangming Tan |
| 2019 | SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPU. Hao Wang, Liang Geng, Rubao Lee, Kaixi Hou, Yanfeng Zhang, Xiaodong Zhang |
| 2019 | Scheduling HPC workloads on heterogeneous-ISA architectures: poster. Mohamed Lamine Karaoui, Anthony Carno, Robert Lyerly, Sang-Hoon Kim, Pierre Olivier, Changwoo Min, Binoy Ravindran |
| 2019 | Semantics-aware scheduling policies for synchronization determinism. Qi Zhao, Zhengyi Qiu, Guoliang Jin |
| 2019 | Stretching the capacity of hardware transactional memory in IBM POWER architectures. Ricardo Filipe, Shady Issa, Paolo Romano, João Barreto |
| 2019 | T-thinker: a task-centric distributed framework for compute-intensive divide-and-conquer algorithms. Da Yan, Guimu Guo, Md Mashiur Rahman Chowdhury, M. Tamer Özsu, John C. S. Lui, Weida Tan |
| 2019 | Throughput-oriented GPU memory allocation. Isaac Gelado, Michael Garland |
| 2019 | Toward efficient architecture-independent algorithms for dynamic programs: poster. Mohammad Mahdi Javanmard, Pramod Ganapathi, Rathish Das, Zafar Ahmad, Stephen L. Tschudi, Rezaul Chowdhury |
| 2019 | Transitive joins: a sound and efficient online deadlock-avoidance policy. Caleb Voss, Tiago Cogumbreiro, Vivek Sarkar |
| 2019 | VEBO: a vertex- and edge-balanced ordering heuristic to load balance parallel graph processing. Jiawen Sun, Hans Vandierendonck, Dimitrios S. Nikolopoulos |
| 2019 | Verifying C11 programs operationally. Simon Doherty, Brijesh Dongol, Heike Wehrheim, John Derrick |