| 2016 | 2016 IEEE International Symposium on High Performance Computer Architecture, HPCA 2016, Barcelona, Spain, March 12-16, 2016 |
| 2016 | A case for toggle-aware compression for GPU systems. Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler |
| 2016 | A complete key recovery timing attack on a GPU. Zhen Hang Jiang, Yunsi Fei, David R. Kaeli |
| 2016 | A large-scale study of soft-errors on GPUs in the field. Bin Nie, Devesh Tiwari, Saurabh Gupta, Evgenia Smirni, James H. Rogers |
| 2016 | A low power software-defined-radio baseband processor for the Internet of Things. Yajing Chen, Shengshuo Lu, Hun-Seok Kim, David T. Blaauw, Ronald G. Dreslinski, Trevor N. Mudge |
| 2016 | A low-power hybrid reconfigurable architecture for resistive random-access memories. Miguel Angel Lastras-Montaño, Amirali Ghofrani, Kwang-Ting Cheng |
| 2016 | A market approach for handling power emergencies in multi-tenant data center. Mohammad A. Islam, Xiaoqi Ren, Shaolei Ren, Adam Wierman, Xiaorui Wang |
| 2016 | A performance analysis framework for optimizing OpenCL applications on FPGAs. Zeke Wang, Bingsheng He, Wei Zhang, Shunning Jiang |
| 2016 | Amdahl's law for lifetime reliability scaling in heterogeneous multicore processors. William J. Song, Saibal Mukhopadhyay, Sudhakar Yalamanchili |
| 2016 | Approximating warps with intra-warp operand value similarity. Daniel Wong, Nam Sung Kim, Murali Annavaram |
| 2016 | Atomic persistence for SCM with a non-intrusive backend controller. Kshitij A. Doshi, Ellis Giles, Peter J. Varman |
| 2016 | Best-offset hardware prefetching. Pierre Michaud |
| 2016 | CATalyst: Defeating last-level cache side channel attacks in cloud computing. Fangfei Liu, Qian Ge, Yuval Yarom, Frank McKeen, Carlos V. Rozas, Gernot Heiser, Ruby B. Lee |
| 2016 | Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family. Andrew Herdrich, Edwin Verplanke, Priya Autee, Ramesh Illikkal, Chris Gianos, Ronak Singhal, Ravi R. Iyer |
| 2016 | ChargeCache: Reducing DRAM latency by exploiting row access locality. Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu |
| 2016 | CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM. Poovaiah M. Palangappa, Kartik Mohanram |
| 2016 | Core tunneling: Variation-aware voltage noise mitigation in GPUs. Renji Thomas, Kristin Barber, Naser Sedaghati, Li Zhou, Radu Teodorescu |
| 2016 | Cost effective physical register sharing. Arthur Perais, André Seznec |
| 2016 | DUANG: Fast and lightweight page migration in asymmetric memory systems. Hao Wang, Jie Zhang, Sharmila Shridhar, Gieseo Park, Myoungsoo Jung, Nam Sung Kim |
| 2016 | DVFS for NoCs in CMPs: A thread voting approach. Yuan Yao, Zhonghai Lu |
| 2016 | Design and implementation of a mobile storage leveraging the DRAM interface. Sungyong Seo, Youngjin Cho, Youngkwang Yoo, Otae Bae, Jaegeun Park, Heehyun Nam, Sunmi Lee, Yongmyung Lee, Seungdo Chae, MoonSang Kwon, Jin-Hyeok Choi, Sangyeun Cho, Jaeheon Jeong, Duckhyun Chang |
| 2016 | Efficient GPU hardware transactional memory through early conflict resolution. Sui Chen, Lu Peng |
| 2016 | Efficient footprint caching for Tagless DRAM Caches. Hakbeom Jang, Yongjun Lee, Jongwon Kim, Youngsok Kim, Jangwoo Kim, Jinkyu Jeong, Jae W. Lee |
| 2016 | Efficient synthetic traffic models for large, complex SoCs. Jieming Yin, Onur Kayiran, Matthew Poremba, Natalie D. Enright Jerger, Gabriel H. Loh |
| 2016 | Energy-efficient address translation. Vasileios Karakostas, Jayneel Gandhi, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, Osman S. Unsal |
| 2016 | HRL: Efficient and flexible reconfigurable logic for near-data processing. Mingyu Gao, Christos Kozyrakis |
| 2016 | Improving smartphone user experience by balancing performance and energy with probabilistic QoS guarantee. Benjamin Gaudette, Carole-Jean Wu, Sarma B. K. Vrudhula |
| 2016 | LASER: Light, Accurate Sharing dEtection and Repair. Liang Luo, Akshitha Sriraman, Brooke Fugate, Shiliang Hu, Gilles Pokam, Chris J. Newburn, Joseph Devietti |
| 2016 | Lattice priority scheduling: Low-overhead timing-channel protection for a shared memory controller. Andrew Ferraiuolo, Yao Wang, Danfeng Zhang, Andrew C. Myers, G. Edward Suh |
| 2016 | LiveSim: Going live with microarchitecture simulation. Sina Hassani, Gabriel Southern, Jose Renau |
| 2016 | Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM. Kevin K. Chang, Prashant J. Nair, Donghyuk Lee, Saugata Ghose, Moinuddin K. Qureshi, Onur Mutlu |
| 2016 | MaPU: A novel mathematical computing architecture. Donglin Wang, Xueliang Du, Leizu Yin, Chen Lin, Hong Ma, Weili Ren, Huijuan Wang, Xingang Wang, Shaolin Xie, Lei Wang, Zijun Liu, Tao Wang, Zhonghua Pu, Guangxin Ding, Mengchen Zhu, Lipeng Yang, Ruoshan Guo, Zhiwei Zhang, Xiao Lin, Jie Hao, Yongyong Yang, Wenqin Sun, Fabiao Zhou, NuoZhou Xiao, Qian Cui, Xiaoqin Wang |
| 2016 | McVerSi: A test generation framework for fast memory consistency verification in simulation. Marco Elver, Vijay Nagarajan |
| 2016 | Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. Mahdi Nazm Bojnordi, Engin Ipek |
| 2016 | Minimal disturbance placement and promotion. Elvira Teran, Yingying Tian, Zhe Wang, Daniel A. Jiménez |
| 2016 | Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction. Matthew Halpern, Yuhao Zhu, Vijay Janapa Reddi |
| 2016 | Modeling cache performance beyond LRU. Nathan Beckmann, Daniel Sánchez |
| 2016 | Parity Helix: Efficient protection for single-dimensional faults in multi-dimensional memory systems. Xun Jian, Vilas Sridharan, Rakesh Kumar |
| 2016 | PleaseTM: Enabling transaction conflict management in requester-wins hardware transactional memory. Sunjae Park, Milos Prvulovic, Christopher J. Hughes |
| 2016 | Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. Wei Wang, Jack W. Davidson, Mary Lou Soffa |
| 2016 | Pushing the limits of accelerator efficiency while retaining programmability. Tony Nowatzki, Vinay Gangadhar, Karthikeyan Sankaralingam, Greg Wright |
| 2016 | RADAR: Runtime-assisted dead region management for last-level caches. Madhavan Manivannan, Vassilis Papaefstathiou, Miquel Pericàs, Per Stenström |
| 2016 | Restore truncation for performance improvement in future DRAM systems. Xianwei Zhang, Youtao Zhang, Bruce R. Childers, Jun Yang |
| 2016 | Revisiting virtual L1 caches: A practical design using dynamic synonym remapping. Hongil Yoon, Gurindar S. Sohi |
| 2016 | SCsafe: Logging sequential consistency violations continuously and precisely. Yuelu Duan, David A. Koufaty, Josep Torrellas |
| 2016 | SLaC: Stage laser control for a flattened butterfly network. Yigit Demir, Nikos Hardavellas |
| 2016 | ScalCore: Designing a core for voltage scalability. Bhargava Gopireddy, Choungki Song, Josep Torrellas, Nam Sung Kim, Aditya Agrawal, Asit K. Mishra |
| 2016 | Selective GPU caches to eliminate CPU-GPU HW cache coherence. Neha Agarwal, David W. Nellans, Eiman Ebrahimi, Thomas F. Wenisch, John Danskin, Stephen W. Keckler |
| 2016 | Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing. Zhenning Wang, Jun Yang, Rami G. Melhem, Bruce R. Childers, Youtao Zhang, Minyi Guo |
| 2016 | SizeCap: Efficiently handling power surges in fuel cell powered data centers. Yang Li, Di Wang, Saugata Ghose, Jie Liu, Sriram Govindan, Sean James, Eric Peterson, John Siegler, Rachata Ausavarungnirun, Onur Mutlu |
| 2016 | Software transparent dynamic binary translation for coarse-grain reconfigurable architectures. Matthew A. Watkins, Tony Nowatzki, Anthony Carno |
| 2016 | Symbiotic job scheduling on the IBM POWER8. Josué Feliu, Stijn Eyerman, Julio Sahuquillo, Salvador Petit |
| 2016 | TABLA: A unified template-based framework for accelerating statistical machine learning. Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, Hadi Esmaeilzadeh |
| 2016 | The runahead network-on-chip. Zimo Li, Joshua San Miguel, Natalie D. Enright Jerger |
| 2016 | Towards high performance paged memory for GPUs. Tianhao Zheng, David W. Nellans, Arslan Zulfiqar, Mark Stephenson, Stephen W. Keckler |
| 2016 | Venice: Exploring server architectures for effective resource sharing. Jianbo Dong, Rui Hou, Michael C. Huang, Tao Jiang, Boyan Zhao, Sally A. McKee, Haibin Wang, Xiaosong Cui, Lixin Zhang |
| 2016 | Warped-preexecution: A GPU pre-execution approach for improving latency hiding. Keunsoo Kim, Sangpil Lee, Myung Kuk Yoon, Gunjae Koo, Won Woo Ro, Murali Annavaram |
| 2016 | iPAWS: Instruction-issue pattern-based adaptive warp scheduling for GPGPUs. Minseok Lee, Gwangsun Kim, John Kim, Woong Seo, Yeon-Gon Cho, Soojung Ryu |