| 2016 | A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs. Nitin Chugh, Vinay Vasista, Suresh Purini, Uday Bondhugula |
| 2016 | A Static Cut-off for Task Parallel Programs. Shintaro Iwasaki, Kenjiro Taura |
| 2016 | Accelerating Linked-list Traversal Through Near-Data Processing. Byungchul Hong, Gwangsun Kim, Jung Ho Ahn, Yongkee Kwon, Hongsik Kim, John Kim |
| 2016 | Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading. Zhen Jia, Chao Xue, Guancheng Chen, Jianfeng Zhan, Lixin Zhang, Yonghua Lin, H. Peter Hofstee |
| 2016 | Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs. Gwangsun Kim, Jiyun Jeong, John Kim, Mark Stephenson |
| 2016 | Big Data Analytics on Flash Storage with Accelerators. Arvind |
| 2016 | Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data Processing: Think Big, See Small. Mingcong Song, Yang Hu, Yunlong Xu, Chao Li, Huixiang Chen, Jingling Yuan, Tao Li |
| 2016 | CAF: Core to Core Communication Acceleration Framework. Yipeng Wang, Ren Wang, Andrew Herdrich, James Tsai, Yan Solihin |
| 2016 | Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference. Yong Zhao, Jia Rao, Qing Yi |
| 2016 | Combating the Reliability Challenge of GPU Register File at Low Supply Voltage. Jingweijia Tan, Shuaiwen Leon Song, Kaige Yan, Xin Fu, Andrés Márquez, Darren J. Kerbyson |
| 2016 | EXCITE-VM: Extending the Virtual Memory System to Support Snapshot Isolation Transactions. Heiner Litz, Benjamin Braun, David R. Cheriton |
| 2016 | Energy Aware Persistence: Reducing Energy Overheads of Memory-based Persistence in NVMs. Sudarsun Kannan, Moinuddin K. Qureshi, Ada Gavrilovska, Karsten Schwan |
| 2016 | Fusion of Parallel Array Operations. Mads Ruben Burgdorff Kristensen, Simon Andreas Frimann Lund, Troels Blum, James Avery |
| 2016 | Greater Performance and Better Efficiency: Predicated Execution has shown us the way. Yale N. Patt |
| 2016 | Hash Map Inlining. Dibakar Gope, Mikko H. Lipasti |
| 2016 | Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding. Bruno Bodin, Luigi Nardi, M. Zeeshan Zia, Harry Wagstaff, Govind Sreekar Shenoy, Murali Krishna Emani, John Mawer, Christos Kotselidis, Andy Nisbet, Mikel Luján, Björn Franke, Paul H. J. Kelly, Michael F. P. O'Boyle |
| 2016 | MicroSpec: Speculation-Centric Fine-Grained Parallelization for FSM Computations. Junqiao Qiu, Zhijia Zhao, Bin Ren |
| 2016 | OAWS: Memory Occlusion Aware Warp Scheduling. Bin Wang, Yue Zhu, Weikuan Yu |
| 2016 | Online Scalability Characterization of Data-Parallel Programs on Many Cores. Younghyun Cho, Surim Oh, Bernhard Egger |
| 2016 | Optimizing Indirect Memory References with milk. Vladimir Kiriansky, Yunming Zhang, Saman P. Amarasinghe |
| 2016 | POSTER: An Integrated Vector-Scalar Design on an In-order ARM Core. Milan Stanic, Oscar Palomar, Timothy Hayes, Ivan Ratkovic, Osman S. Unsal, Adrián Cristal, Mateo Valero |
| 2016 | POSTER: An Optimization of Dataflow Architectures for Scientific Applications. Xiaowei Shen, Xiaochun Ye, Xu Tan, Da Wang, Zhimin Zhang, Dongrui Fan, Zhimin Tang |
| 2016 | POSTER: Collective Dynamic Parallelism for Directive Based GPU Programming Languages and Compilers. Guray Ozen, Eduard Ayguadé, Jesús Labarta |
| 2016 | POSTER: Easy PRAM-based High-Performance Parallel Programming with ICE. Fady Ghanim, Rajeev Barua, Uzi Vishkin |
| 2016 | POSTER: Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics. Alberto Ros, Carl Leonardsson, Christos Sakalis, Stefanos Kaxiras |
| 2016 | POSTER: Exploiting Asymmetric Multi-Core Processors with Flexible System Sofware. Kallia Chronaki, Miquel Moretó, Marc Casas, Alejandro Rico, Rosa M. Badia, Eduard Ayguadé, Jesús Labarta, Mateo Valero |
| 2016 | POSTER: Fault-tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory Support. Florian Haas, Sebastian Weis, Theo Ungerer, Gilles Pokam, Youfeng Wu |
| 2016 | POSTER: Firestorm: Operating Systems for Power-Constrained Architectures. Sankaralingam Panneerselvam, Michael M. Swift |
| 2016 | POSTER: Fly-Over: A Light-Weight Distributed Power-Gating Mechanism For Energy-Efficient Networks-on-Chip. Rahul Boyapati, Jiayi Huang, Ningyuan Wang, Kyung Hoon Kim, Ki Hwan Yum, Eun Jung Kim |
| 2016 | POSTER: Hybrid Data Dependence Analysis for Loop Transformations. Diogo Nunes Sampaio, Alain Ketterlin, Louis-Noël Pouchet, Fabrice Rastello |
| 2016 | POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism. Tsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, Timothy G. Rogers |
| 2016 | POSTER: SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization. Jee Ho Ryoo, Mitesh R. Meswani, Reena Panda, Lizy K. John |
| 2016 | POSTER: hVISC: A Portable Abstraction for Heterogeneous Parallel Systems. Prakalp Srivastava, Maria Kotsifakou, Matthew D. Sinclair, Rakesh Komuravelli, Vikram S. Adve, Sarita V. Adve |
| 2016 | POSTER: ξ-TAO: A Cache-centric Execution Model and Runtime for Deep Parallel Multicore Topologies. Miquel Pericàs |
| 2016 | Power Tuning HPC Jobs on Power-Constrained Systems. Neha Gholkar, Frank Mueller, Barry Rountree |
| 2016 | Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 11-15, 2016 Ayal Zaks, Bilha Mendelson, Lawrence Rauchwerger, Wen-mei W. Hwu |
| 2016 | Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling. Paul Caheny, Marc Casas, Miquel Moretó, Hervé Gloaguen, Maxime Saintes, Eduard Ayguadé, Jesús Labarta, Mateo Valero |
| 2016 | Reduction Drawing: Language Constructs and Polyhedral Compilation for Reductions on GPU. Chandan Reddy, Michael Kruse, Albert Cohen |
| 2016 | Resource Conscious Reuse-Driven Tiling for GPUs. Prashant Singh Rawat, Changwan Hong, Mahesh Ravishankar, Vinod Grover, Louis-Noël Pouchet, Atanas Rountev, P. Sadayappan |
| 2016 | Rinnegan: Efficient Resource Use in Heterogeneous Architectures. Sankaralingam Panneerselvam, Michael M. Swift |
| 2016 | Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management. Andi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen, Nathalie Drach |
| 2016 | Scaling Data Analytics with Moore's Law. Kunle Olukotun |
| 2016 | Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Chita R. Das |
| 2016 | Sparso: Context-driven Optimizations of Sparse Linear Algebra. Hongbo Rong, Jongsoo Park, Lingxiang Xiang, Todd A. Anderson, Mikhail Smelyanskiy |
| 2016 | Speculatively Exploiting Cross-Invocation Parallelism. Jialu Huang, Prakash Prabhu, Thomas B. Jablin, Soumyadeep Ghosh, Sotiris Apostolakis, Jae W. Lee, David I. August |
| 2016 | Student Research Poster: A Low Complexity Cache Sharing Mechanism to Address System Fairness. Vicent Selfa, Julio Sahuquillo, Salvador Petit, María Engracia Gómez |
| 2016 | Student Research Poster: A Scalable General Purpose System for Large-Scale Graph Processing. Jiawen Sun |
| 2016 | Student Research Poster: Compiling Boolean Circuits to Non-deterministic Branching Programs to be Implemented by Light Switching Circuits. Vladislav Tartakovsky |
| 2016 | Student Research Poster: From Processing-in-Memory to Processing-in-Storage. Roman Kaplan |
| 2016 | Student Research Poster: Network Controller Emulation on a Sidecore for Unmodified Virtual Machines. Arthur Kiyanovski |
| 2016 | Student Research Poster: Slack-Aware Shared Bandwidth Management in GPUs. Saumay Dublish |
| 2016 | Student Research Poster: Software Out-of-Order Execution for In-Order Architectures. Kim-Anh Tran |
| 2016 | Tardis 2.0: Optimized Time Traveling Coherence for Relaxed Consistency Models. Xiangyao Yu, Hongzhe Liu, Ethan Zou, Srinivas Devadas |
| 2016 | Vectorization of Multibyte Floating Point Data Formats. Andrew Anderson, David Gregg |
| 2016 | WearCore: A Core for Wearable Workloads. Sanyam Mehta, Josep Torrellas |
| 2016 | μC-States: Fine-grained GPU Datapath Power Management. Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, Chita R. Das |