| 2012 | An analysis of computational workloads for the ORNL Jaguar system. Wayne Joubert, Shi-Quan Su |
| 2012 | An efficient work-distribution strategy for gridding radio-telescope data on GPUs. John W. Romein |
| 2012 | An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs. Jiajia Li, Xingjian Li, Guangming Tan, Mingyu Chen, Ninghui Sun |
| 2012 | Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors. Nishkam Ravi, Yi Yang, Tao Bao, Srimat T. Chakradhar |
| 2012 | Blue Gene/Q: design for sustained multi-petaflop computing. Michael Gschwind |
| 2012 | CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures. Quan Chen, Minyi Guo, Zhiyi Huang |
| 2012 | CRQ-based fair scheduling on composable multicore architectures. Tao Sun, Hong An, Tao Wang, Haibo Zhang, Xiufeng Sui |
| 2012 | CVP: an energy-efficient indirect branch prediction with compiler-guided value pattern. Mingxing Tan, Xianhua Liu, Tong Tong, Xu Cheng |
| 2012 | Channel borrowing: an energy-efficient nanophotonic crossbar architecture with light-weight arbitration. Yi Xu, Jun Yang, Rami G. Melhem |
| 2012 | Characterizing and improving the use of demand-fetched caches in GPUs. Wenhao Jia, Kelly A. Shaw, Margaret Martonosi |
| 2012 | Collective algorithms for sub-communicators. Anshul Mittal, Nikhil Jain, Thomas George, Yogish Sabharwal, Sameer Kumar |
| 2012 | Composable, non-blocking collective operations on power7 IH. Gabriel Ilie Tanase, Gheorghe Almási, Hanhong Xue, Charles Archer |
| 2012 | Congestion avoidance on manycore high performance computing systems. Miao Luo, Dhabaleswar K. Panda, Khaled Z. Ibrahim, Costin Iancu |
| 2012 | Data-driven fault tolerance for work stealing computations. Wenjing Ma, Sriram Krishnamoorthy |
| 2012 | Distributed replay protocol for distributed uniprocessors. Mengjie Mao, Hong An, Bobin Deng, Tao Sun, Xuechao Wei, Wei Zhou, Wenting Han |
| 2012 | Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems. Fengguang Song, Stanimire Tomov, Jack J. Dongarra |
| 2012 | Enhancing the performance of assisted execution runtime systems through hardware/software techniques. Gokcen Kestor, Roberto Gioiosa, Osman S. Unsal, Adrián Cristal, Mateo Valero |
| 2012 | Exploiting communication and packaging locality for cost-effective large scale networks. Keith D. Underwood, Eric Borch |
| 2012 | Fast loop-level data dependence profiling. Hongtao Yu, Zhiyuan Li |
| 2012 | Fault resilience of the algebraic multi-grid solver. Marc Casas-Guix, Bronis R. de Supinski, Greg Bronevetsky, Martin Schulz |
| 2012 | Fault tolerant preconditioned conjugate gradient for sparse linear system solution. Manu Shantharam, Sowmyalatha Srinivasmurthy, Padma Raghavan |
| 2012 | GPU merge path: a GPU merging algorithm. Oded Green, Robert McColl, David A. Bader |
| 2012 | Hardware support for enforcing isolation in lock-based parallel programs. Paruj Ratanaworabhan, Martin Burtscher, Darko Kirovski, Benjamin G. Zorn |
| 2012 | HiRe: using hint & release to improve synchronization of speculative threads. Liang Han, Xiaowei Jiang, Wei Liu, Youfeng Wu, James Tuck |
| 2012 | High performance supercomputers: should the individual processor be more than a brick? Yale N. Patt |
| 2012 | High-performance code generation for stencil computations on GPU architectures. Justin Holewinski, Louis-Noël Pouchet, P. Sadayappan |
| 2012 | International Conference on Supercomputing, ICS'12, Venice, Italy, June 25-29, 2012 Utpal Banerjee, Kyle A. Gallivan, Gianfranco Bilardi, Manolis Katevenis |
| 2012 | Locality & utility co-optimization for practical capacity management of shared last level caches. Dongyuan Zhan, Hong Jiang, Sharad C. Seth |
| 2012 | Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities. Nagendra Dwarakanath Gulur, R. Manikantan, Mahesh Mehendale, R. Govindarajan |
| 2012 | On the communication complexity of 3D FFTs and its implications for Exascale. Kenneth Czechowski, Casey Battaglino, Chris McClanahan, Kartik Iyer, P.-K. Yeung, Richard W. Vuduc |
| 2012 | One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation. Ziyu Guo, Bo Wu, Xipeng Shen |
| 2012 | Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture. Janani Mukundan, Saugata Ghose, Robert Karmazin, Engin Ipek, José F. Martínez |
| 2012 | Quantifying the effectiveness of load balance algorithms. Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato |
| 2012 | SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee |
| 2012 | Space-round tradeoffs for MapReduce computations. Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, Eli Upfal |
| 2012 | Sparse matrix-vector multiply on the HICAMP architecture. John P. Stevenson, Amin Firoozshahian, Alex Solomatnikov, Mark Horowitz, David R. Cheriton |
| 2012 | UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique. Somayeh Sardashti, David A. Wood |
| 2012 | Unified memory optimizing architecture: memory subsystem control with a unified predictor. Yasuo Ishii, Mary Inaba, Kei Hiraki |
| 2012 | clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. Bor-Yiing Su, Kurt Keutzer |