| 2015 | A collection-oriented programming model for performance portability. Saurav Muralidharan, Michael Garland, Bryan Catanzaro, Albert Sidelnik, Mary W. Hall |
| 2015 | A framework for practical parallel fast matrix multiplication. Austin R. Benson, Grey Ballard |
| 2015 | A hierarchical approach to reducing communication in parallel graph algorithms. Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger |
| 2015 | A library for portable and composable data locality optimizations for NUMA systems. Zoltan Majó, Thomas R. Gross |
| 2015 | A parallel algorithm for global states enumeration in concurrent systems. Yen-Jung Chang, Vijay K. Garg |
| 2015 | A programming model and runtime system for significance-aware energy-efficient computing. Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, Dimitrios S. Nikolopoulos |
| 2015 | An OpenACC-based unified programming model for multi-accelerator systems. Jungwon Kim, Seyong Lee, Jeffrey S. Vetter |
| 2015 | Are web applications ready for parallelism? Cosmin Radoi, Stephan Herhut, Jaswanth Sreeram, Danny Dig |
| 2015 | Automatic scalable atomicity via semantic locking. Guy Golan-Gueta, G. Ramalingam, Mooly Sagiv, Eran Yahav |
| 2015 | Barrier elision for production parallel programs. Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushik Sen, John M. Mellor-Crummey, Costin Iancu |
| 2015 | CASTLE: fast concurrent internal binary search tree using edge-based locking. Arunmoezhi Ramachandran, Neeraj Mittal |
| 2015 | Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency. Yuan Tang, Ronghui You, Haibin Kan, Jesmin Jahan Tithi, Pramod Ganapathi, Rezaul Alam Chowdhury |
| 2015 | Combining phase identification and statistic modeling for automated parallel benchmark generation. Ye Jin, Mingliang Liu, Xiaosong Ma, Qing Liu, Jeremy Logan, Norbert Podhorszki, Jong Youl Choi, Scott Klasky |
| 2015 | Decoupled load balancing. Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato |
| 2015 | Diagnosing the causes and severity of one-sided message contention. Nathan R. Tallent, Abhinav Vishnu, Hubertus Van Dam, Jeff Daily, Darren J. Kerbyson, Adolfy Hoisie |
| 2015 | Distributed memory code generation for mixed Irregular/Regular computations. Mahesh Ravishankar, Roshan Dathathri, Venmugil Elango, Louis-Noël Pouchet, J. Ramanujam, Atanas Rountev, P. Sadayappan |
| 2015 | Dynamic deadlock verification for general barrier synchronisation. Tiago Cogumbreiro, Raymond Hu, Francisco Martins, Nobuko Yoshida |
| 2015 | Efficient and reasonable object-oriented concurrency. Scott West, Sebastian Nanz, Bertrand Meyer |
| 2015 | Fence placement for legacy data-race-free programs via synchronization read detection. Andrew J. McPherson, Vijay Nagarajan, Susmit Sarkar, Marcelo Cintra |
| 2015 | GStream: a graph streaming processing method for large-scale graphs on GPUs. Hyunseok Seo, Jinwook Kim, Min-Soo Kim |
| 2015 | Gunrock: a high-performance graph processing library on the GPU. Yangzihao Wang, Andrew A. Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, John D. Owens |
| 2015 | High performance locks for multi-level NUMA systems. Milind Chabbi, Michael W. Fagan, John M. Mellor-Crummey |
| 2015 | JAWS: a JavaScript framework for adaptive CPU-GPU work sharing. Xianglan Piao, Channoh Kim, Younghwan Oh, Huiying Li, Jincheon Kim, Hanjun Kim, Jae W. Lee |
| 2015 | Low-overhead software transactional memory with progress guarantees and strong semantics. Minjia Zhang, Jipeng Huang, Man Cao, Michael D. Bond |
| 2015 | MPI+Threads: runtime contention and remedies. Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, Satoshi Matsuoka |
| 2015 | More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms. Vincent Gramoli |
| 2015 | NUMA-aware graph-structured analytics. Kaiyuan Zhang, Rong Chen, Haibo Chen |
| 2015 | On optimizing machine learning workloads via kernel fusion. Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan |
| 2015 | Optimization of asynchronous graph processing on GPU with hybrid coloring model. Xuanhua Shi, Junling Liang, Sheng Di, Bingsheng He, Hai Jin, Lu Lu, Zhixiang Wang, Xuan Luo, Jianlong Zhong |
| 2015 | PLUTO+: near-complete modeling of affine transformations for parallelism and locality. Aravind Acharya, Uday Bondhugula |
| 2015 | Performance implications of dynamic memory allocators on transactional memory systems. Alexandro Baldassin, Edson Borin, Guido Araujo |
| 2015 | Predicate RCU: an RCU for scalable concurrent updates. Maya Arbel, Adam Morrison |
| 2015 | Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA, February 7-11, 2015 Albert Cohen, David Grove |
| 2015 | SYNC or ASYNC: time to fuse for distributed graph-parallel computation. Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, Haibo Chen |
| 2015 | Scalable and efficient implementation of 3d unstructured meshes computation: a case study on matrix assembly. Loïc Thébault, Eric Petit, Quang Dinh |
| 2015 | Section based program analysis to reduce overhead of detecting unsynchronized thread communication. Madan Mohan Das, Gabriel Southern, Jose Renau |
| 2015 | SemCache++: semantics-aware caching for efficient multi-GPU offloading. Nabeel AlSaber, Milind Kulkarni |
| 2015 | Software partitioning of hardware transactions. Lingxiang Xiang, Michael L. Scott |
| 2015 | Static/Dynamic validation of MPI collective communications in multi-threaded context. Emmanuelle Saillard, Patrick Carribault, Denis Barthou |
| 2015 | The SprayList: a scalable relaxed priority queue. Dan Alistarh, Justin Kopinsky, Jerry Li, Nir Shavit |
| 2015 | The lazy happens-before relation: better partial-order reduction for systematic concurrency testing. Paul Thomson, Alastair F. Donaldson |
| 2015 | The lock-free k-LSM relaxed priority queue. Martin Wimmer, Jakob Gruber, Jesper Larsson Träff, Philippas Tsigas |
| 2015 | Tiles: a new language mechanism for heterogeneous parallelism. Yifeng Chen, Xiang Cui, Hong Mei |
| 2015 | Towards batched linear solvers on accelerated hardware platforms. Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, Jack J. Dongarra |
| 2015 | VirtCL: a framework for OpenCL device abstraction and management. Yi-Ping You, Hen-Jung Wu, Yeh-Ning Tsai, Yen-Ting Chao |