| 2016 | AEQUITAS: Coordinated Energy Management Across Parallel Applications. Haris Ribic, Yu David Liu |
| 2016 | BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing. Linnan Wang, Wei Wu, Zenglin Xu, Jianxiong Xiao, Yi Yang |
| 2016 | Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication. Pham Nguyen Quang Anh, Rui Fan, Yonggang Wen |
| 2016 | Barrier-Aware Warp Scheduling for Throughput Processors. Yuxi Liu, Zhibin Yu, Lieven Eeckhout, Vijay Janapa Reddi, Yingwei Luo, Xiaolin Wang, Zhenlin Wang, Cheng-Zhong Xu |
| 2016 | Coherence-Free Multiview: Enabling Reference-Discerning Data Placement on GPU. Guoyang Chen, Xipeng Shen |
| 2016 | CuMAS: Data Transfer Aware Multi-Application Scheduling for Shared GPUs. Mehmet E. Belviranli, Farzad Khorasani, Laxmi N. Bhuyan, Rajiv Gupta |
| 2016 | DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem. Saeed Maleki, Donald Nguyen, Andrew Lenharth, María Jesús Garzarán, David A. Padua, Keshav Pingali |
| 2016 | Efficient Timestamp-Based Cache Coherence Protocol for Many-Core Architectures. Yuan Yao, Guanhua Wang, Zhiguo Ge, Tulika Mitra, Wenzhi Chen, Naxin Zhang |
| 2016 | Exploiting Dynamic Reuse Probability to Manage Shared Last-level Caches in CPU-GPU Heterogeneous Processors. Siddharth Rai, Mainak Chaudhuri |
| 2016 | Exploiting Private Local Memories to Reduce the Opportunity Cost of Accelerator Integration. Emilio G. Cota, Paolo Mantovani, Luca P. Carloni |
| 2016 | Fairness-oriented OS Scheduling Support for Multicore Systems. Changdae Kim, Jaehyuk Huh |
| 2016 | Fast Multiplication in Binary Fields on GPUs via Register Cache. Eli Ben-Sasson, Matan Hamilis, Mark Silberstein, Eran Tromer |
| 2016 | GCaR: Garbage Collection aware Cache Management with Improved Performance for Flash-based SSDs. Suzhen Wu, Yanping Lin, Bo Mao, Hong Jiang |
| 2016 | Galaxyfly: A Novel Family of Flexible-Radix Low-Diameter Topologies for Large-Scales Interconnection Networks. Fei Lei, Dezun Dong, Xiangke Liao, Xing Su, Cunlu Li |
| 2016 | Graph Prefetching Using Data Structure Knowledge. Sam Ainsworth, Timothy M. Jones |
| 2016 | GreenGear: Leveraging and Managing Server Heterogeneity for Improving Energy Efficiency in Green Data Centers. Xu Zhou, Haoran Cai, Qiang Cao, Hong Jiang, Lei Tian, Changsheng Xie |
| 2016 | HOPE: Enabling Efficient Service Orchestration in Software-Defined Data Centers. Yang Hu, Chao Li, Longjun Liu, Tao Li |
| 2016 | High Performance Design for HDFS with Byte-Addressability of NVM and RDMA. Nusrat Sharmin Islam, Md. Wasi-ur-Rahman, Xiaoyi Lu, Dhabaleswar K. Panda |
| 2016 | Hybrid CPU-GPU scheduling and execution of tree traversals. Jianqiao Liu, Nikhil Hegde, Milind Kulkarni |
| 2016 | Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication. Konstantina Mitropoulou, Vasileios Porpodas, Xiaochun Zhang, Timothy M. Jones |
| 2016 | Mini-Ckpts: Surviving OS Failures in Persistent Memory. David Fiala, Frank Mueller, Kurt B. Ferreira, Christian Engelmann |
| 2016 | Noise Aware Scheduling in Data Centers. Hameedah Sultan, Arpit Katiyar, Smruti R. Sarangi |
| 2016 | Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics. Daniele Buono, Fabrizio Petrini, Fabio Checconi, Xing Liu, Xinyu Que, Chris Long, Tai-Ching Tuan |
| 2016 | Origami: Folding Warps for Energy Efficient GPUs. Mohammad Abdel-Majeed, Daniel Wong, Justin Kuang, Murali Annavaram |
| 2016 | Parallel Transposition of Sparse Data Structures. Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng |
| 2016 | Peruse and Profit: Estimating the Accelerability of Loops. Snehasish Kumar, Vijayalakshmi Srinivasan, Amirali Sharifian, Nick Sumner, Arrvindh Shriraman |
| 2016 | Polly-ACC Transparent compilation to heterogeneous hardware. Tobias Grosser, Torsten Hoefler |
| 2016 | Prefetching Techniques for Near-memory Throughput Processors. Reena Panda, Yasuko Eckert, Nuwan Jayasena, Onur Kayiran, Michael Boyer, Lizy Kurian John |
| 2016 | Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1-3, 2016 Ozcan Ozturk, Kemal Ebcioglu, Mahmut T. Kandemir, Onur Mutlu |
| 2016 | Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks. Patrick Judd, Jorge Albericio, Tayler H. Hetherington, Tor M. Aamodt, Natalie D. Enright Jerger, Andreas Moshovos |
| 2016 | Replichard: Towards Tradeoff between Consistency and Performance for Metadata. Zhiying Li, Ruini Xue, Lixiang Ao |
| 2016 | Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications. Peng Jiang, Linchuan Chen, Gagan Agrawal |
| 2016 | Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes. Dimitrios Chasapis, Marc Casas, Miquel Moretó, Martin Schulz, Eduard Ayguadé, Jesús Labarta, Mateo Valero |
| 2016 | SARVAVID: A Domain Specific Language for Developing Scalable Computational Genomics Applications. Kanak Mahadik, Christopher Wright, Jinyi Zhang, Milind Kulkarni, Saurabh Bagchi, Somali Chaterji |
| 2016 | SFU-Driven Transparent Approximation Acceleration on GPUs. Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, Henk Corporaal |
| 2016 | SReplay: Deterministic Sub-Group Replay for One-Sided Communication. Xuehai Qian, Koushik Sen, Paul Hargrove, Costin Iancu |
| 2016 | Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems. Yunlong Xu, Rui Wang, Tao Li, Mingcong Song, Lan Gao, Zhongzhi Luan, Depei Qian |
| 2016 | Simulation and Analysis Engine for Scale-Out Workloads. Nadav Chachmon, Daniel Richins, Robert S. Cohn, Magnus Christensson, Wenzhi Cui, Vijay Janapa Reddi |
| 2016 | Tag-Split Cache for Efficient GPGPU Cache Utilization. Lingda Li, Ari B. Hayes, Shuaiwen Leon Song, Eddy Z. Zhang |
| 2016 | TokenTLB: A Token-Based Page Classification Approach. Albert Esteve, Alberto Ros, Antonio Robles, María Engracia Gómez, José Duato |
| 2016 | Towards an Adaptive Multi-Power-Source Datacenter. Longjun Liu, Hongbin Sun, Chao Li, Yang Hu, Nanning Zheng, Tao Li |
| 2016 | TurboTiling: Leveraging Prefetching to Boost Performance of Tiled Codes. Sanyam Mehta, Rajat Garg, Nishad Trivedi, Pen-Chung Yew |
| 2016 | Variation Among Processors Under Turbo Boost in HPC Systems. Bilge Acun, Phil Miller, Laxmikant V. Kalé |
| 2016 | Write-Aware Management of NVM-based Memory Extensions. Amro Awad, Sergey Blagodurov, Yan Solihin |