| 2016 | A high-performance parallel algorithm for nonnegative matrix factorization. Ramakrishnan Kannan, Grey Ballard, Haesun Park |
| 2016 | A programming system for future proofing performance critical libraries. Li-Wen Chang, Izzat El Hajj, Hee-Seok Kim, Juan Gómez-Luna, Abdul Dakkak, Wen-mei W. Hwu |
| 2016 | A scalable lock-free hash table with open addressing. Jesper Puge Nielsen, Sven Karlsson |
| 2016 | A wait-free queue as fast as fetch-and-add. Chaoran Yang, John M. Mellor-Crummey |
| 2016 | AUTOGEN: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs. Rezaul Alam Chowdhury, Pramod Ganapathi, Jesmin Jahan Tithi, Charles Bachmeier, Bradley C. Kuszmaul, Charles E. Leiserson, Armando Solar-Lezama, Yuan Tang |
| 2016 | Adding approximate counters. Guy L. Steele Jr., Jean-Baptiste Tristan |
| 2016 | Affinity-aware work-stealing for integrated CPU-GPU processors. Naila Farooqui, Rajkishore Barik, Brian T. Lewis, Tatiana Shpeisman, Karsten Schwan |
| 2016 | An interval constrained memory allocator for the Givy GAS runtime. François Gindraud, Fabrice Rastello, Albert Cohen, François Broquedis |
| 2016 | Articulation points guided redundancy elimination for betweenness centrality. Lei Wang, Fan Yang, Liangji Zhuang, Huimin Cui, Fang Lv, Xiaobing Feng |
| 2016 | Be my guest: MCS lock now welcomes guests. Tianzheng Wang, Milind Chabbi, Hideaki Kimura |
| 2016 | Benchmarking weak memory models. Carl G. Ritson, Scott Owens |
| 2016 | CUDA acceleration for Xen virtual machines in infiniband clusters with rCUDA. Javier Prades, Carlos Reaño, Federico Silla |
| 2016 | Causal consistency: beyond memory. Matthieu Perrin, Achour Mostéfaoui, Claude Jard |
| 2016 | Coarse grain parallelization of deep neural networks. Marc González Tallada |
| 2016 | Concurrent hash tables: fast Tobias Maier, Peter Sanders, Roman Dementiev |
| 2016 | Contention-conscious, locality-preserving locks. Milind Chabbi, John M. Mellor-Crummey |
| 2016 | DSMR: a shared and distributed memory algorithm for single-source shortest path problem. Saeed Maleki, Donald Nguyen, Andrew Lenharth, María Jesús Garzarán, David A. Padua, Keshav Pingali |
| 2016 | Data-centric combinatorial optimization of parallel code. Hao Luo, Guoyang Chen, Pengcheng Li, Chen Ding, Xipeng Shen |
| 2016 | Declarative coordination of graph-based parallel programs. Flávio Cruz, Ricardo Rocha, Seth Copen Goldstein |
| 2016 | Distributed Halide. Tyler Denniston, Shoaib Kamil, Saman P. Amarasinghe |
| 2016 | DomLock: a new multi-granularity locking technique for hierarchies. Saurabh Kalikar, Rupesh Nasre |
| 2016 | Drinking from both glasses: combining pessimistic and optimistic tracking of cross-thread dependences. Man Cao, Minjia Zhang, Aritra Sengupta, Michael D. Bond |
| 2016 | ESTIMA: extrapolating scalability of in-memory applications. Georgios Chatzopoulos, Aleksandar Dragojevic, Rachid Guerraoui |
| 2016 | Effect of portable fine-grained locality on energy efficiency and performance in concurrent search trees. Ibrahim Umar, Otto J. Anshus, Phuong Hoai Ha |
| 2016 | Efficient distributed workstealing via matchmaking. Hrushit Parikh, Vinit Deodhar, Ada Gavrilovska, Santosh Pande |
| 2016 | Exploiting accelerators for efficient high dimensional similarity search. Sandeep R. Agrawal, Christopher M. Dee, Alvin R. Lebeck |
| 2016 | GPU multisplit. Saman Ashkiani, Andrew A. Davidson, Ulrich Meyer, John D. Owens |
| 2016 | Generic messages: capability-based shared memory parallelism for event-loop systems. Luca Salucci, Daniele Bonetta, Stefan Marr, Walter Binder |
| 2016 | Grain graphs: OpenMP performance analysis made easy. Ananya Muddukrishna, Peter A. Jonsson, Artur Podobas, Mats Brorsson |
| 2016 | Gunrock: a high-performance graph processing library on the GPU. Yangzihao Wang, Andrew A. Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, John D. Owens |
| 2016 | High performance model based image reconstruction. Xiao Wang, Amit Sabne, Sherman J. Kisner, Anand Raghunathan, Charles A. Bouman, Samuel P. Midkiff |
| 2016 | Hybrid CPU-GPU scheduling and execution of tree traversals. Jianqiao Liu, Nikhil Hegde, Milind Kulkarni |
| 2016 | Improving efficacy of internal binary search trees using local recovery. Arunmoezhi Ramachandran, Neeraj Mittal |
| 2016 | Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. Tiziano De Matteis, Gabriele Mencagli |
| 2016 | Lease/release: architectural support for scaling contended data structures. Syed Kamran Haider, William Hasenplaugh, Dan Alistarh |
| 2016 | Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format. Duane Merrill, Michael Garland |
| 2016 | Multi-core on-the-fly SCC decomposition. Vincent Bloemen, Alfons Laarman, Jaco van de Pol |
| 2016 | NUMA-aware scheduling and memory allocation for data-flow task-parallel applications. Andi Drebes, Antoniu Pop, Karine Heydemann, Nathalie Drach, Albert Cohen |
| 2016 | OPR: deterministic group replay for one-sided communication. Xuehai Qian, Koushik Sen, Paul Hargrove, Costin Iancu |
| 2016 | On designing NUMA-aware concurrency control for scalable transactional memory. Mohamed Mohamedin, Roberto Palmieri, Sebastiano Peluso, Binoy Ravindran |
| 2016 | On ordering transaction commit. Mohamed M. Saad, Roberto Palmieri, Binoy Ravindran |
| 2016 | Optimistic concurrency with OPTIK. Rachid Guerraoui, Vasileios Trigonakis |
| 2016 | Parallel type-checking with haskell using saturating LVars and stream generators. Ryan R. Newton, Ömer S. Agacan, Peter P. Fogg, Sam Tobin-Hochstadt |
| 2016 | Preemption-aware planning on big-data systems. Marco Rabozzi, Matteo Mazzucchelli, Roberto Cordone, Giovanni Matteo Fumarola, Marco D. Santambrogio |
| 2016 | Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12-16, 2016 Rafael Asenjo, Tim Harris |
| 2016 | Production-guided concurrency debugging. Nuno Machado, Brandon Lucia, Luís E. T. Rodrigues |
| 2016 | Refined transactional lock elision. Dave Dice, Alex Kogan, Yossi Lev |
| 2016 | SPIRIT: a runtime system for distributed irregular tree applications. Nikhil Hegde, Jianqiao Liu, Milind Kulkarni |
| 2016 | Samsara parallel: a non-BSP parallel-in-time model. Yifeng Chen, Kun Huang, Bei Wang, Guohui Li, Xiang Cui |
| 2016 | Scalable adaptive NUMA-aware lock: combining local locking and remote locking for efficient concurrency. Mingzhe Zhang, Francis C. M. Lau, Cho-Li Wang, Luwei Cheng, Haibo Chen |
| 2016 | The virtues of conflict: analysing modern concurrency. Ganesh Narayanaswamy, Saurabh Joshi, Daniel Kroening |
| 2016 | Tidex: a mutual exclusion lock. Pedro Ramalhete, Andreia Correia |
| 2016 | Unifying fixed code and fixed data mapping of load-imbalanced pipelined loops. Aristeidis Mastoras, Thomas R. Gross |
| 2016 | User-assisted storage reuse determination for dynamic task graphs. Mehmet Can Kurt, Bin Ren, Sriram Krishnamoorthy, Gagan Agrawal |
| 2016 | Verification of MPI Java programs using software model checking. Waqas ur Rehman, Muhammad Sohaib Ayub, Junaid Haroon Siddiqui |
| 2016 | Work stealing for interactive services to meet target latency. Jing Li, Kunal Agrawal, Sameh Elnikety, Yuxiong He, I-Ting Angelina Lee, Chenyang Lu, Kathryn S. McKinley |