HPCA A*

58 papers

YearTitle / Authors
20162016 IEEE International Symposium on High Performance Computer Architecture, HPCA 2016, Barcelona, Spain, March 12-16, 2016
2016A case for toggle-aware compression for GPU systems.
Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler
2016A complete key recovery timing attack on a GPU.
Zhen Hang Jiang, Yunsi Fei, David R. Kaeli
2016A large-scale study of soft-errors on GPUs in the field.
Bin Nie, Devesh Tiwari, Saurabh Gupta, Evgenia Smirni, James H. Rogers
2016A low power software-defined-radio baseband processor for the Internet of Things.
Yajing Chen, Shengshuo Lu, Hun-Seok Kim, David T. Blaauw, Ronald G. Dreslinski, Trevor N. Mudge
2016A low-power hybrid reconfigurable architecture for resistive random-access memories.
Miguel Angel Lastras-Montaño, Amirali Ghofrani, Kwang-Ting Cheng
2016A market approach for handling power emergencies in multi-tenant data center.
Mohammad A. Islam, Xiaoqi Ren, Shaolei Ren, Adam Wierman, Xiaorui Wang
2016A performance analysis framework for optimizing OpenCL applications on FPGAs.
Zeke Wang, Bingsheng He, Wei Zhang, Shunning Jiang
2016Amdahl's law for lifetime reliability scaling in heterogeneous multicore processors.
William J. Song, Saibal Mukhopadhyay, Sudhakar Yalamanchili
2016Approximating warps with intra-warp operand value similarity.
Daniel Wong, Nam Sung Kim, Murali Annavaram
2016Atomic persistence for SCM with a non-intrusive backend controller.
Kshitij A. Doshi, Ellis Giles, Peter J. Varman
2016Best-offset hardware prefetching.
Pierre Michaud
2016CATalyst: Defeating last-level cache side channel attacks in cloud computing.
Fangfei Liu, Qian Ge, Yuval Yarom, Frank McKeen, Carlos V. Rozas, Gernot Heiser, Ruby B. Lee
2016Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family.
Andrew Herdrich, Edwin Verplanke, Priya Autee, Ramesh Illikkal, Chris Gianos, Ronak Singhal, Ravi R. Iyer
2016ChargeCache: Reducing DRAM latency by exploiting row access locality.
Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu
2016CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM.
Poovaiah M. Palangappa, Kartik Mohanram
2016Core tunneling: Variation-aware voltage noise mitigation in GPUs.
Renji Thomas, Kristin Barber, Naser Sedaghati, Li Zhou, Radu Teodorescu
2016Cost effective physical register sharing.
Arthur Perais, André Seznec
2016DUANG: Fast and lightweight page migration in asymmetric memory systems.
Hao Wang, Jie Zhang, Sharmila Shridhar, Gieseo Park, Myoungsoo Jung, Nam Sung Kim
2016DVFS for NoCs in CMPs: A thread voting approach.
Yuan Yao, Zhonghai Lu
2016Design and implementation of a mobile storage leveraging the DRAM interface.
Sungyong Seo, Youngjin Cho, Youngkwang Yoo, Otae Bae, Jaegeun Park, Heehyun Nam, Sunmi Lee, Yongmyung Lee, Seungdo Chae, MoonSang Kwon, Jin-Hyeok Choi, Sangyeun Cho, Jaeheon Jeong, Duckhyun Chang
2016Efficient GPU hardware transactional memory through early conflict resolution.
Sui Chen, Lu Peng
2016Efficient footprint caching for Tagless DRAM Caches.
Hakbeom Jang, Yongjun Lee, Jongwon Kim, Youngsok Kim, Jangwoo Kim, Jinkyu Jeong, Jae W. Lee
2016Efficient synthetic traffic models for large, complex SoCs.
Jieming Yin, Onur Kayiran, Matthew Poremba, Natalie D. Enright Jerger, Gabriel H. Loh
2016Energy-efficient address translation.
Vasileios Karakostas, Jayneel Gandhi, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, Osman S. Unsal
2016HRL: Efficient and flexible reconfigurable logic for near-data processing.
Mingyu Gao, Christos Kozyrakis
2016Improving smartphone user experience by balancing performance and energy with probabilistic QoS guarantee.
Benjamin Gaudette, Carole-Jean Wu, Sarma B. K. Vrudhula
2016LASER: Light, Accurate Sharing dEtection and Repair.
Liang Luo, Akshitha Sriraman, Brooke Fugate, Shiliang Hu, Gilles Pokam, Chris J. Newburn, Joseph Devietti
2016Lattice priority scheduling: Low-overhead timing-channel protection for a shared memory controller.
Andrew Ferraiuolo, Yao Wang, Danfeng Zhang, Andrew C. Myers, G. Edward Suh
2016LiveSim: Going live with microarchitecture simulation.
Sina Hassani, Gabriel Southern, Jose Renau
2016Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM.
Kevin K. Chang, Prashant J. Nair, Donghyuk Lee, Saugata Ghose, Moinuddin K. Qureshi, Onur Mutlu
2016MaPU: A novel mathematical computing architecture.
Donglin Wang, Xueliang Du, Leizu Yin, Chen Lin, Hong Ma, Weili Ren, Huijuan Wang, Xingang Wang, Shaolin Xie, Lei Wang, Zijun Liu, Tao Wang, Zhonghua Pu, Guangxin Ding, Mengchen Zhu, Lipeng Yang, Ruoshan Guo, Zhiwei Zhang, Xiao Lin, Jie Hao, Yongyong Yang, Wenqin Sun, Fabiao Zhou, NuoZhou Xiao, Qian Cui, Xiaoqin Wang
2016McVerSi: A test generation framework for fast memory consistency verification in simulation.
Marco Elver, Vijay Nagarajan
2016Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning.
Mahdi Nazm Bojnordi, Engin Ipek
2016Minimal disturbance placement and promotion.
Elvira Teran, Yingying Tian, Zhe Wang, Daniel A. Jiménez
2016Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction.
Matthew Halpern, Yuhao Zhu, Vijay Janapa Reddi
2016Modeling cache performance beyond LRU.
Nathan Beckmann, Daniel Sánchez
2016Parity Helix: Efficient protection for single-dimensional faults in multi-dimensional memory systems.
Xun Jian, Vilas Sridharan, Rakesh Kumar
2016PleaseTM: Enabling transaction conflict management in requester-wins hardware transactional memory.
Sunjae Park, Milos Prvulovic, Christopher J. Hughes
2016Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines.
Wei Wang, Jack W. Davidson, Mary Lou Soffa
2016Pushing the limits of accelerator efficiency while retaining programmability.
Tony Nowatzki, Vinay Gangadhar, Karthikeyan Sankaralingam, Greg Wright
2016RADAR: Runtime-assisted dead region management for last-level caches.
Madhavan Manivannan, Vassilis Papaefstathiou, Miquel Pericàs, Per Stenström
2016Restore truncation for performance improvement in future DRAM systems.
Xianwei Zhang, Youtao Zhang, Bruce R. Childers, Jun Yang
2016Revisiting virtual L1 caches: A practical design using dynamic synonym remapping.
Hongil Yoon, Gurindar S. Sohi
2016SCsafe: Logging sequential consistency violations continuously and precisely.
Yuelu Duan, David A. Koufaty, Josep Torrellas
2016SLaC: Stage laser control for a flattened butterfly network.
Yigit Demir, Nikos Hardavellas
2016ScalCore: Designing a core for voltage scalability.
Bhargava Gopireddy, Choungki Song, Josep Torrellas, Nam Sung Kim, Aditya Agrawal, Asit K. Mishra
2016Selective GPU caches to eliminate CPU-GPU HW cache coherence.
Neha Agarwal, David W. Nellans, Eiman Ebrahimi, Thomas F. Wenisch, John Danskin, Stephen W. Keckler
2016Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing.
Zhenning Wang, Jun Yang, Rami G. Melhem, Bruce R. Childers, Youtao Zhang, Minyi Guo
2016SizeCap: Efficiently handling power surges in fuel cell powered data centers.
Yang Li, Di Wang, Saugata Ghose, Jie Liu, Sriram Govindan, Sean James, Eric Peterson, John Siegler, Rachata Ausavarungnirun, Onur Mutlu
2016Software transparent dynamic binary translation for coarse-grain reconfigurable architectures.
Matthew A. Watkins, Tony Nowatzki, Anthony Carno
2016Symbiotic job scheduling on the IBM POWER8.
Josué Feliu, Stijn Eyerman, Julio Sahuquillo, Salvador Petit
2016TABLA: A unified template-based framework for accelerating statistical machine learning.
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, Hadi Esmaeilzadeh
2016The runahead network-on-chip.
Zimo Li, Joshua San Miguel, Natalie D. Enright Jerger
2016Towards high performance paged memory for GPUs.
Tianhao Zheng, David W. Nellans, Arslan Zulfiqar, Mark Stephenson, Stephen W. Keckler
2016Venice: Exploring server architectures for effective resource sharing.
Jianbo Dong, Rui Hou, Michael C. Huang, Tao Jiang, Boyan Zhao, Sally A. McKee, Haibin Wang, Xiaosong Cui, Lixin Zhang
2016Warped-preexecution: A GPU pre-execution approach for improving latency hiding.
Keunsoo Kim, Sangpil Lee, Myung Kuk Yoon, Gunjae Koo, Won Woo Ro, Murali Annavaram
2016iPAWS: Instruction-issue pattern-based adaptive warp scheduling for GPGPUs.
Minseok Lee, Gwangsun Kim, John Kim, Woong Seo, Yeon-Gon Cho, Soojung Ryu