| 2012 | A low-overhead dynamic optimization framework for multicores. Christopher W. Fletcher, Rachael Harding, Omer Khan, Srinivas Devadas |
| 2012 | A software memory partition approach for eliminating bank-level interference in multicore systems. Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, Chengyong Wu |
| 2012 | A yoke of oxen and a thousand chickens for heavy lifting graph processing. Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto, Matei Ripeanu |
| 2012 | APCR: an adaptive physical channel regulator for on-chip interconnects. Lei Wang, Poornachandran Kumar, Ki Hwan Yum, Eun Jung Kim |
| 2012 | Acceleration of bulk memory operations in a heterogeneous multicore architecture. Jong-Hyuk Lee, Ziyi Liu, Xiaonan Tian, Dong Hyuk Woo, Weidong Shi, Dainis Boumber, Yonghong Yan, Kyeong-An Kwon |
| 2012 | Application-aware prefetch prioritization in on-chip networks. Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Anand Sivasubramaniam, Onur Mutlu, Chita R. Das |
| 2012 | Application-to-core mapping policies to reduce memory interference in multi-core systems. Reetuparna Das, Rachata Ausavarungnirun, Onur Mutlu, Akhilesh Kumar, Mani Azimi |
| 2012 | Auto-parallelizing stateful distributed streaming applications. Scott Schneider, Martin Hirzel, Bugra Gedik, Kun-Lung Wu |
| 2012 | Bandwidth bandit: quantitative characterization of memory contention. David Eklov, Nikos Nikoleris, David Black-Schaffer, Erik Hagersten |
| 2012 | Base-delta-immediate compression: practical data compression for on-chip caches. Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry |
| 2012 | Boost.SIMD: generic programming for portable SIMDization. Pierre Estérie, Mathias Gaunard, Joel Falcou, Jean-Thierry Lapresté, Brigitte Rozoy |
| 2012 | Branch and data herding: reducing control and memory divergence for error-tolerant GPU applications. John Sartori, Rakesh Kumar |
| 2012 | Chrysalis analysis: incorporating synchronization arcs in dataflow-analysis-based parallel monitoring. Michelle L. Goodstein, Shimin Chen, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry |
| 2012 | Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability. Md. Kamruzzaman, Steven Swanson, Dean M. Tullsen |
| 2012 | Compiling to avoid communication. Kathy Yelick |
| 2012 | Complexity-effective multicore coherence. Alberto Ros, Stefanos Kaxiras |
| 2012 | Database analytics acceleration using FPGAs. Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, Sameh W. Asaad |
| 2012 | Design of a storage processing unit. Peng Li, Kevin Gomez, David J. Lilja |
| 2012 | Efficient techniques for predicting cache sharing and throughput. Andreas Sandberg, David Black-Schaffer, Erik Hagersten |
| 2012 | Energy-efficient cache partitioning for future CMPs. Karthik T. Sundararajan, Timothy M. Jones, Nigel P. Topham |
| 2012 | Energy-efficient workload mapping in heterogeneous systems with multiple types of resources. Cong Liu |
| 2012 | Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics. Ashay Rane, James C. Browne |
| 2012 | Evaluation of blue Gene/Q hardware support for transactional memories. Amy Wang, Matthew Gaudet, Peng Wu, José Nelson Amaral, Martin Ohmacht, Christopher Barton, Raúl Silvera, Maged M. Michael |
| 2012 | Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme. Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil |
| 2012 | Fine-grained parallel traversals of irregular data structures. Bin Ren, Gagan Agrawal, James R. Larus, Todd Mytkowicz, Tomi Poutanen, Wolfram Schulte |
| 2012 | HaLock: hardware-assisted lock contention detection in multithreaded applications. Yongbing Huang, Zehan Cui, Licheng Chen, Wenli Zhang, Yungang Bao, Mingyu Chen |
| 2012 | Hardware acceleration in the IBM PowerEN processor: architecture and performance. Anil Krishna, Timothy Heil, Nicholas Lindberg, Farnaz Toussi, Steven VanderWiel |
| 2012 | Hardware prefetchers for emerging parallel applications. Biswabandan Panda, Shankar Balachandran |
| 2012 | High-performance analysis of filtered semantic graphs. Aydin Buluç, Armando Fox, John R. Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams |
| 2012 | Inference and declaration of independence: impact on deterministic task parallelism. Foivos S. Zakkak, Dimitrios Chasapis, Polyvios Pratikakis, Angelos Bilas, Dimitrios S. Nikolopoulos |
| 2012 | Integrating nanophotonics in GPU microarchitecture. Nilanjan Goswami, Zhongqi Li, Ajit Verma, Ramkumar Shankar, Tao Li |
| 2012 | International Conference on Parallel Architectures and Compilation Techniques, PACT '12, Minneapolis, MN, USA - September 19 - 23, 2012 Pen-Chung Yew, Sangyeun Cho, Luiz DeRose, David J. Lilja |
| 2012 | Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches. Mainak Chaudhuri, Jayesh Gaur, Nithiyanandan Bashyam, Sreenivas Subramoney, Joseph Nuzman |
| 2012 | Layout-oblivious optimization for matrix computations. Huimin Cui, Qing Yi, Jingling Xue, Xiaobing Feng |
| 2012 | Linearly compressed pages: a main memory compression framework with low complexity and low latency. Gennady Pekhimenko, Todd C. Mowry, Onur Mutlu |
| 2012 | Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. Vijay Sathish, Michael J. Schulte, Nam Sung Kim |
| 2012 | LumiNOC: a power-efficient, high-performance, photonic network-on-chip for future parallel architectures. Cheng Li, Mark Browning, Paul V. Gratz, Samuel Palermo |
| 2012 | MaSiF: machine learning guided auto-tuning of parallel skeletons. Alexander Collins, Christian Fensch, Hugh Leather |
| 2012 | Making data prefetch smarter: adaptive prefetching on POWER7. Víctor Jiménez, Roberto Gioiosa, Francisco J. Cazorla, Alper Buyuktosunoglu, Pradip Bose, Francis P. O'Connell |
| 2012 | Making it practical and effective: fast and precise may-happen-in-parallel analysis. Congming Chen, Wei Huo, Xiaobing Feng |
| 2012 | Many-thread aware instruction-level parallelism: architecting shader cores for GPU computing. Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, Huiyang Zhou |
| 2012 | Mileage-based contention management in transactional memory. Woojin Choi, Lihang Zhao, Jeff Draper |
| 2012 | Multi2Sim: a simulation framework for CPU-GPU computing. Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, David R. Kaeli |
| 2012 | Off-chip access localization for NoC-based multicores. Wei Ding, Mahmut T. Kandemir, Yuanrui Zhang, Emre Kultursay |
| 2012 | Optimal bypass monitor for high performance last-level caches. Lingda Li, Dong Tong, Zichao Xie, Junlin Lu, Xu Cheng |
| 2012 | Optimizing datacenter power with memory system levers for guaranteed quality-of-service. Kshitij Sudan, Sadagopan Srinivasan, Rajeev Balasubramonian, Ravi R. Iyer |
| 2012 | PEPON: performance-aware hierarchical power budgeting for NoC based multicores. Akbar Sharifi, Asit K. Mishra, Shekhar Srikantaiah, Mahmut T. Kandemir, Chita R. Das |
| 2012 | PGCapping: exploiting power gating for power capping and core lifetime balancing in CMPs. Kai Ma, Xiaorui Wang |
| 2012 | PS-Dir: a scalable two-level directory cache. Joan J. Valls, Alberto Ros, Julio Sahuquillo, María Engracia Gómez, José Duato |
| 2012 | Phase-based scheduling and thread migration for heterogeneous multicore processors. Lina Sawalha, Ronald D. Barnes |
| 2012 | Pointy: a hybrid pointer prefetcher for managed runtime systems. Ioana Burcea, Livio Soares, Andreas Moshovos |
| 2012 | Power-aware multi-core simulation for early design stage hardware/software co-optimization. Wim Heirman, Souradip Sarkar, Trevor E. Carlson, Ibrahim Hur, Lieven Eeckhout |
| 2012 | Power-efficient computing for compute-intensive GPGPU applications. Syed Zohaib Gilani, Nam Sung Kim, Michael J. Schulte |
| 2012 | Power-efficient time-sensitive mapping in heterogeneous systems. Cong Liu, Jian Li, Wei Huang, Juan Rubio, Evan Speight, Xiaozhu Lin |
| 2012 | Practically private: enabling high performance CMPs through compiler-assisted data classification. Yong Li, Rami G. Melhem, Alex K. Jones |
| 2012 | Probabilistic diagnosis of performance faults in large-scale parallel applications. Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin |
| 2012 | RISE: improving the streaming processors reliability against soft errors in gpgpus. Jingweijia Tan, Xin Fu |
| 2012 | ReCaP: a region-based cure for the common cold cache. Jason Zebchuk, Harold W. Cain, Vijayalakshmi Srinivasan, Andreas Moshovos |
| 2012 | Riposte: a trace-driven compiler and parallel VM for vector code in R. Justin Talbot, Zachary DeVito, Pat Hanrahan |
| 2012 | Runtime detection and optimization of collective communication patterns. Torsten Hoefler, Timo Schneider |
| 2012 | Sandboxing transactional memory. Luke Dalessandro, Michael L. Scott |
| 2012 | Scalability-based manycore partitioning. Hiroshi Sasaki, Teruo Tanimoto, Koji Inoue, Hiroshi Nakamura |
| 2012 | Shared memory multiplexing: a novel way to improve GPGPU throughput. Yi Yang, Ping Xiang, Mike Mantor, Norm Rubin, Huiyang Zhou |
| 2012 | SkipCache: miss-rate aware cache management. Kanakagiri Raghavendra, Tripti S. Warrier, Madhu Mutyam |
| 2012 | Speculative dynamic vectorization for HW/SW co-designed processors. Rakesh Kumar, Alejandro Martínez, Antonio González |
| 2012 | Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications. Zhijia Zhao, Bo Wu, Xipeng Shen |
| 2012 | Strategies based on green policies to the grid resource allocation. Fábio Coutinho, Luís Alfredo V. de Carvalho |
| 2012 | Supporting stateful tasks in a dataflow graph. Vladimir Gajinov, Srdjan Stipic, Osman S. Unsal, Tim Harris, Eduard Ayguadé, Adrián Cristal |
| 2012 | System-level power-performance efficiency modeling for emergent GPU architectures. Shuaiwen Song, Kirk W. Cameron |
| 2012 | TMNOC: a case of HTM and NoC co-design for increased energy efficiency and concurrency. Lihang Zhao, Woojin Choi, Jeffrey T. Draper |
| 2012 | The changing role of supercomputing. Peter J. Ungaro |
| 2012 | The evicted-address filter: a unified mechanism to address both cache pollution and thrashing. Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, Todd C. Mowry |
| 2012 | Top500 versus sustained performance: the top problems with the top500 list - and what to do about them. William T. C. Kramer |
| 2012 | Transactional event profiling in a best-effort hardware transactional memory system. Matthew Gaudet, José Nelson Amaral |
| 2012 | Transactional prefetching: narrowing the window of contention in hardware transactional memory. Anurag Negi, Adrià Armejach, Adrián Cristal, Osman S. Unsal, Per Stenström |
| 2012 | Transparent runtime deadlock elimination. Hari K. Pyla, Srinidhi Varadarajan |
| 2012 | Using combined profiling to decide when thread level speculation is profitable. Arnamoy Bhattacharyya |
| 2012 | Visualizing transactional memory. Justin Emile Gottschlich, Maurice Herlihy, Gilles Pokam, Jeremy G. Siek |
| 2012 | Workload and power budget partitioning for single-chip heterogeneous processors. Hao Wang, Vijay Sathish, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim |
| 2012 | XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems. Ronald G. Dreslinski, Thomas Manville, Korey Sewell, Reetuparna Das, Nathaniel Ross Pinckney, Sudhir Satpathy, David T. Blaauw, Dennis Sylvester, Trevor N. Mudge |