| 2012 | A GPU implementation of inclusion-based points-to analysis. Mario Méndez-Lojo, Martin Burtscher, Keshav Pingali |
| 2012 | A hybrid approach of OpenMP for clusters. Okwan Kwon, Fahed Jubair, Rudolf Eigenmann, Samuel P. Midkiff |
| 2012 | A lock-free, array-based priority queue. Yujie Liu, Michael F. Spear |
| 2012 | A methodology for creating fast wait-free data structures. Alex Kogan, Erez Petrank |
| 2012 | A performance analysis framework for identifying potential benefits in GPGPU applications. Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, Richard W. Vuduc |
| 2012 | A speculation-friendly binary search tree. Tyler Crain, Vincent Gramoli, Michel Raynal |
| 2012 | A work-stealing scheduler for X10's task parallelism with suspension. Olivier Tardieu, Haichuan Wang, Haibo Lin |
| 2012 | Adapting the polyhedral model as a framework for efficient speculative parallelization. Alexandra Jimborean, Philippe Clauss, Benoît Pradelle, Luis Mastrangelo, Vincent Loechner |
| 2012 | Algorithm-based fault tolerance for dense matrix factorizations. Peng Du, Aurélien Bouteiller, George Bosilca, Thomas Hérault, Jack J. Dongarra |
| 2012 | An infrastructure for dynamic optimization of parallel programs. Albert Noll, Thomas R. Gross |
| 2012 | An overview of CMPI: network performance aware MPI in the cloud. Yifan Gong, Bingsheng He, Jianlong Zhong |
| 2012 | An overview of Medusa: simplified graph processing on GPUs. Jianlong Zhong, Bingsheng He |
| 2012 | Automatic communication optimizations through memory reuse strategies. Muthu Manikandan Baskaran, Nicolas Vasilache, Benoît Meister, Richard Lethin |
| 2012 | Automatic datatype generation and optimization. Fredrik Kjolstad, Torsten Hoefler, Marc Snir |
| 2012 | BDDT: : block-level dynamic dependence analysis for deterministic task-based parallelism. George Tzenakis, Angelos Papatriantafyllou, John Kesapides, Polyvios Pratikakis, Hans Vandierendonck, Dimitrios S. Nikolopoulos |
| 2012 | CPHASH: a cache-partitioned hash table. Zviad Metreveli, Nickolai Zeldovich, M. Frans Kaashoek |
| 2012 | Collective algorithms for sub-communicators. Anshul Mittal, Nikhil Jain, Thomas George, Yogish Sabharwal, Sameer Kumar |
| 2012 | Communication avoiding successive band reduction. Grey Ballard, James Demmel, Nicholas Knight |
| 2012 | Communication-centric optimizations by dynamically detecting collective operations. Torsten Hoefler, Timo Schneider |
| 2012 | Concurrent breakpoints. Chang-Seo Park, Koushik Sen |
| 2012 | Concurrent tries with efficient non-blocking snapshots. Aleksandar Prokopec, Nathan Grasso Bronson, Phil Bagwell, Martin Odersky |
| 2012 | DOJ: dynamically parallelizing object-oriented programs. Yong Hun Eom, Stephen Yang, James Christopher Jenista, Brian Demsky |
| 2012 | Deterministic parallel random-number generation for dynamic-multithreading platforms. Charles E. Leiserson, Tao B. Schardl, Jim Sukha |
| 2012 | Efficient SIMD code generation for irregular kernels. Seonggun Kim, Hwansoo Han |
| 2012 | Efficient deadlock avoidance for streaming computation with filtering. Jeremy D. Buhler, Kunal Agrawal, Peng Li, Roger D. Chamberlain |
| 2012 | Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors. Sara S. Baghsorkhi, Isaac Gelado, Matthieu Delahaye, Wen-mei W. Hwu |
| 2012 | Establishing a Miniapp as a programmability proxy. Andrew Stone, John M. Dennis, Michelle Strout |
| 2012 | Extending a C-like language for portable SIMD programming. Roland Leißa, Sebastian Hack, Ingo Wald |
| 2012 | Faster topology-aware collective algorithms through non-minimal communication. Paul Sack, William Gropp |
| 2012 | FlexBFS: a parallelism-aware implementation of breadth-first search on GPU. Gu Liu, Hong An, Wenting Han, Xiaoqiang Li, Tao Sun, Wei Zhou, Xuechao Wei, Xulong Tang |
| 2012 | GKLEE: concolic verification and test generation for GPUs. Guodong Li, Peng Li, Geoffrey Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, Sreeranga P. Rajan |
| 2012 | GPU-based NFA implementation for memory efficient high speed regular expression matching. Yuan Zu, Ming Yang, Zhonghu Xu, Lin Wang, Xin Tian, Kunyang Peng, Qunfeng Dong |
| 2012 | Internally deterministic parallel algorithms can be fast. Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Julian Shun |
| 2012 | LHlf: lock-free linear hashing (poster paper). Donghui Zhang, Per-Åke Larson |
| 2012 | Lock cohorting: a general technique for designing NUMA locks. David Dice, Virendra J. Marathe, Nir Shavit |
| 2012 | Mechanizing the expert dense linear algebra developer. Bryan Marker, Andy Terrel, Jack Poulson, Don S. Batory, Robert A. van de Geijn |
| 2012 | NDetermin: inferring nondeterministic sequential specifications for parallelism correctness. Jacob Burnim, Tayfun Elmas, George C. Necula, Koushik Sen |
| 2012 | OpenCL as a unified programming model for heterogeneous CPU/GPU clusters. Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee |
| 2012 | OpenMP-style parallelism in data-centered multicore computing with R. Lei Jiang, Pragneshkumar B. Patel, George Ostrouchov, Ferdinand Jamitzky |
| 2012 | Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA. Christophe Alias, Alain Darte, Alexandru Plesco |
| 2012 | PARRAY: a unifying array representation for heterogeneous parallelism. Yifeng Chen, Xiang Cui, Hong Mei |
| 2012 | Performance analysis of parallel constraint-based local search. Yves Caniou, Daniel Diaz, Florian Richoux, Philippe Codognet, Salvador Abreu |
| 2012 | Portable parallel performance from sequential, productive, embedded domain-specific languages. Shoaib Kamil, Derrick Coetzee, Scott Beamer, Henry Cook, Ekaterina Gonina, Jonathan Harper, Jeffrey Morlan, Armando Fox |
| 2012 | Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, February 25-29, 2012 J. Ramanujam, P. Sadayappan |
| 2012 | Programming parallel embedded and consumer applications in OpenMP superscalar. Michael Andersch, Chi Ching Chi, Ben H. H. Juurlink |
| 2012 | RACECAR: a heuristic for automatic function specialization on multi-core heterogeneous systems. John Robert Wernsing, Greg Stitt |
| 2012 | Revisiting the combining synchronization technique. Panagiota Fatourou, Nikolaos D. Kallimanis |
| 2012 | S: a scripting language for high-performance RESTful web services. Daniele Bonetta, Achille Peternier, Cesare Pautasso, Walter Binder |
| 2012 | Scalable GPU graph traversal. Duane Merrill, Michael Garland, Andrew S. Grimshaw |
| 2012 | Scalable framework for mapping streaming applications onto multi-GPU systems. Huynh Phung Huynh, Andrei Hagiescu, Weng-Fai Wong, Rick Siow Mong Goh |
| 2012 | Scalable parallel debugging with statistical assertions. Minh Ngoc Dinh, David Abramson, Chao Jin, Andrew Gontarek, Bob Moench, Luiz De Rose |
| 2012 | Scalable parallel minimum spanning forest computation. Sadegh Nobari, Thanh-Tung Cao, Panagiotis Karras, Stéphane Bressan |
| 2012 | Speculative parallelization on GPGPUs. Min Feng, Rajiv Gupta, Laxmi N. Bhuyan |
| 2012 | Synchronization views for event-loop actors. Joeri De Koster, Stefan Marr, Theo D'Hondt |
| 2012 | The boat hull model: adapting the roofline model to enable performance prediction for parallel computing. Cedric Nugteren, Henk Corporaal |
| 2012 | Using GPU's to accelerate stencil-based computation kernels for the development of large scale scientific applications on heterogeneous systems. Jian Tao, Marek Blazewicz, Steven R. Brandt |
| 2012 | Verification of software barriers. Alexander Malkis, Anindya Banerjee |
| 2012 | Wait-free linked-lists. Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank |