| 2014 | A run-time power manager exploiting software parallelism. Simon Holmbacka, Sébastien Lafond, Johan Lilius |
| 2014 | A runtime support mechanism for fast mode switching of a self-morphing core for power efficiency. Sudarshan Srinivasan, Nithesh Kurella, Israel Koren, Rance Rodrigues, Sandip Kundu |
| 2014 | ADHA: automatic data layout framework for heterogeneous architectures. Deepak Majeti, Kuldeep S. Meel, Rajkishore Barik, Vivek Sarkar |
| 2014 | ATCache: reducing DRAM cache latency via a small SRAM tag cache. Cheng-Chieh Huang, Vijay Nagarajan |
| 2014 | Active learning accelerated automatic heuristic construction for parallel program mapping. William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather |
| 2014 | Adaptive heterogeneous scheduling for integrated GPUs. Rashid Kaleem, Rajkishore Barik, Tatiana Shpeisman, Brian T. Lewis, Chunling Hu, Keshav Pingali |
| 2014 | An event-based language for dynamic binary translation frameworks. Serguei Makarov, Angela Demke Brown, Ashvin Goel |
| 2014 | ArrayTool: a lightweight profiler to guide array regrouping. Xu Liu, Kamal Sharma, John M. Mellor-Crummey |
| 2014 | Automatic execution of single-GPU computations across multiple GPUs. Javier Cabezas, Lluís Vilanova, Isaac Gelado, Thomas B. Jablin, Nacho Navarro, Wen-mei W. Hwu |
| 2014 | Automatic optimization of thread-coarsening for graphics processors. Alberto Magni, Christophe Dubach, Michael F. P. O'Boyle |
| 2014 | Automatic parallelism through macro dataflow in high-level array languages. Pushkar Ratnalikar, Arun Chauhan |
| 2014 | Bitwise data parallelism in regular expression matching. Robert D. Cameron, Thomas C. Shermer, Arrvindh Shriraman, Kenneth S. Herdy, Dan Lin, Benjamin R. Hull, Meng Lin |
| 2014 | Bounded memory scheduling of dynamic task graphs. Dragos Sbirlea, Zoran Budimlic, Vivek Sarkar |
| 2014 | CAWS: criticality-aware warp scheduling for GPGPU workloads. Shin-Ying Lee, Carole-Jean Wu |
| 2014 | COLORIS: a dynamic cache partitioning system using page coloring. Ying Ye, Richard West, Zhuoqun Cheng, Ye Li |
| 2014 | Coarrays in GNU Fortran. Alessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, Damian W. I. Rouson |
| 2014 | Compiler support for selective page migration in NUMA architectures. Guilherme Piccoli, Henrique Nazaré Santos, Raphael Ernani Rodrigues, Christiane Pousa, Edson Borin, Fernando Magno Quintão Pereira |
| 2014 | Consolidated conflict detection for hardware transactional memory. Lihang Zhao, Jeffrey T. Draper |
| 2014 | Cooperative cache scrubbing. Jennifer B. Sartor, Wim Heirman, Stephen M. Blackburn, Lieven Eeckhout, Kathryn S. McKinley |
| 2014 | D Davoud Anoushe Jamshidi, Mehrzad Samadi, Scott A. Mahlke |
| 2014 | Data remapping for an energy efficient burst chop in DRAM memory systems. Sudharsan Jagathrakshakan, Venkata Kalyan Tavva, Madhu Mutyam |
| 2014 | Data-reuse optimizations for pipelined tiling with parametric tile sizes. Alexandre Isoard |
| 2014 | DeSTM: harnessing determinism in STMs for application development. Kaushik Ravichandran, Ada Gavrilovska, Santosh Pande |
| 2014 | Design for scalability in enterprise SSDs. Arash Tavakkol, Mohammad Arjomand, Hamid Sarbazi-Azad |
| 2014 | Design of a hybrid MPI-CUDA benchmark suite for CPU-GPU clusters. Tejaswi Agarwal, Michela Becchi |
| 2014 | Domain-specific models for innovation in analytics. Bob Blainey |
| 2014 | EFetch: optimizing instruction fetch for event-driven webapplications. Gaurav Chadha, Scott A. Mahlke, Satish Narayanasamy |
| 2014 | From petascale to the pocket: Adaptively scaling parallel programs for mobile SoCs. Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger |
| 2014 | Graph-based performance accounting for chip multiprocessor memory systems. Magnus Jahre |
| 2014 | Heterogeneous microarchitectures trump voltage scaling for low-power cores. Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Ronald G. Dreslinski, Thomas F. Wenisch, Scott A. Mahlke |
| 2014 | ILP and TLP in shared memory applications: a limit study. Ehsan Fatehi, Paul Gratz |
| 2014 | Improving performance of streaming applications with filtering and control messages. Peng Li, Jeremy Buhler |
| 2014 | International Conference on Parallel Architectures and Compilation, PACT '14, Edmonton, AB, Canada, August 24-27, 2014 José Nelson Amaral, Josep Torrellas |
| 2014 | Internet of mobile things: challenges and opportunities. Klara Nahrstedt |
| 2014 | Invyswell: a hybrid transactional memory for haswell's restricted transactional memory. Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy |
| 2014 | KLA: a new algorithmic paradigm for parallel graph computations. Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger |
| 2014 | LCA: a memory link and cache-aware co-scheduling approach for CMPs. Alexandros-Herodotos Haritatos, Georgios I. Goumas, Nikos Anastopoulos, Konstantinos Nikas, Kornilios Kourtis, Nectarios Koziris |
| 2014 | Locality-aware memory association for multi-target worksharing in OpenMP. Thomas R. W. Scogland, Wu-chun Feng |
| 2014 | Measuring flexibility in single-ISA heterogeneous processors. Erik Tomusk, Christophe Dubach, Michael F. P. O'Boyle |
| 2014 | Memory scheduling towards high-throughput cooperative heterogeneous computing. Hao Wang, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim |
| 2014 | OpenTuner: an extensible framework for program autotuning. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, Saman P. Amarasinghe |
| 2014 | Optimizing stencil code via locality of computation. Yulong Luo, Guangming Tan |
| 2014 | PATS: pattern aware scheduling and power gating for GPGPUs. Qiumin Xu, Murali Annavaram |
| 2014 | PEMOGEN: automatic adaptive performance modeling during program runtime. Arnamoy Bhattacharyya, Torsten Hoefler |
| 2014 | Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels. Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil |
| 2014 | Processing big data graphs on memory-restricted systems. Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger |
| 2014 | Protection and utilization in shared cache through rationing. Raj Parihar, Jacob Brock, Chen Ding, Michael C. Huang |
| 2014 | RCS: runtime resource and core scaling for power-constrained multi-core processors. Hamid Reza Ghasemi, Nam Sung Kim |
| 2014 | Realm: an event-based low-level runtime for distributed memory architectures. Sean Treichler, Michael Bauer, Alex Aiken |
| 2014 | Rollback-free value prediction with approximate loads. Bradley Thwaites, Gennady Pekhimenko, Hadi Esmaeilzadeh, Amir Yazdanbakhsh, Onur Mutlu, Jongse Park, Girish Mururu, Todd C. Mowry |
| 2014 | SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling. Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, Jeffrey S. Vetter |
| 2014 | SQRL: hardware accelerator for collecting software data structures. Snehasish Kumar, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin, Jordon Phillips |
| 2014 | Shuffling: a framework for lock contention aware thread scheduling for multicore multiprocessor systems. Kishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan |
| 2014 | SpongeDirectory: flexible sparse directories utilizing multi-level memristors. Lunkai Zhang, Dmitri B. Strukov, Hebatallah Saadeldeen, Dongrui Fan, Mingzhe Zhang, Diana Franklin |
| 2014 | Stratified sampling for even workload partitioning. Jeeva Paudel, José Nelson Amaral |
| 2014 | Tiling and optimizing time-iterated computations on periodic domains. Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache |
| 2014 | Trading cache hit rate for memory performance. Wei Ding, Mahmut T. Kandemir, Diana R. Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli |
| 2014 | Using STT-RAM to enable energy-efficient near-threshold chip multiprocessors. Xiang Pan, Radu Teodorescu |
| 2014 | VAST: the illusion of a large memory space for GPUs. Janghaeng Lee, Mehrzad Samadi, Scott A. Mahlke |
| 2014 | Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs. Rahul Garg, Laurie J. Hendren |
| 2014 | Versatile and scalable parallel histogram construction. Wookeun Jung, Jongsoo Park, Jaejin Lee |
| 2014 | Virtues and limitations of commodity hardware transactional memory. Nuno Diegues, Paolo Romano, Luís E. T. Rodrigues |
| 2014 | Warp-aware trace scheduling for GPUs. James A. Jablin, Thomas B. Jablin, Onur Mutlu, Maurice Herlihy |
| 2014 | What is the cost of weak determinism? Cedomir Segulja, Tarek S. Abdelrahman |
| 2014 | XStream: cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs. Biswabandan Panda, Shankar Balachandran |
| 2014 | kMAF: automatic kernel-level management of thread and data affinity. Matthias Diener, Eduardo Henrique Molina da Cruz, Philippe Olivier Alexandre Navaux, Anselm Busse, Hans-Ulrich Heiß |