| 2019 | A communication-avoiding 3D sparse triangular solver. Piyush Sao, Ramakrishnan Kannan, Xiaoye Sherry Li, Richard W. Vuduc |
| 2019 | A scalable framework for adaptive computational general relativity on heterogeneous clusters. Milinda Fernando, David Neilsen, Eric W. Hirschmann, Hari Sundar |
| 2019 | AMPT-GA: automatic mixed precision floating point tuning for GPU applications. Pradeep V. Kotipalli, Ranvijay Singh, Paul Wood, Ignacio Laguna, Saurabh Bagchi |
| 2019 | Accelerating reduction and scan using tensor core units. Abdul Dakkak, Cheng Li, Jinjun Xiong, Isaac Gelado, Wen-mei W. Hwu |
| 2019 | Address-stride assisted approximate load value prediction in GPUs. Haonan Wang, Mohamed Assem Ibrahim, Sparsh Mittal, Adwait Jog |
| 2019 | An online quality management framework for approximate communication in network-on-chips. Yuechen Chen, Ahmed Louri |
| 2019 | Automatic construct selection and variable classification in OpenMP. Mohammad Norouzi Arab, Felix Wolf, Ali Jannesari |
| 2019 | Avalon: towards QoS awareness and improved utilization through multi-resource management in datacenters. Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, Minyi Guo |
| 2019 | BonVoision: leveraging spatial data smoothness for recovery from memory soft errors. Bo Fang, Hassan Halawa, Karthik Pattabiraman, Matei Ripeanu, Sriram Krishnamoorthy |
| 2019 | Can we trust profiling results?: understanding and fixing the inaccuracy in modern profilers. Hao Xu, Qingsen Wang, Shuang Song, Lizy Kurian John, Xu Liu |
| 2019 | Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse. Lin Ning, Xipeng Shen |
| 2019 | DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture. Cunlu Li, Dezun Dong, Xiangke Liao, John Kim, Changhyun Kim |
| 2019 | Diligent TLBs: a mechanism for exploiting heterogeneity in TLB miss behavior. Hussein Elnawawy, Rangeen Basu Roy Chowdhury, Amro Awad, Gregory T. Byrd |
| 2019 | Dynamically linked MSHRs for adaptive miss handling in GPUs. Yongbin Gu, Lizhong Chen |
| 2019 | Efficient GPU tree walks for effective distributed n-body simulations. Jianqiao Liu, Michael P. Robson, Thomas Quinn, Milind Kulkarni |
| 2019 | Efficient and effective sparse tensor reordering. Jiajia Li, Bora Uçar, Ümit V. Çatalyürek, Jimeng Sun, Kevin J. Barker, Richard W. Vuduc |
| 2019 | Efficient hierarchical online-autotuning: a case study on polyhedral accelerator mapping. Philip Pfaffe, Tobias Grosser, Martin Peter Tillmann |
| 2019 | Efficient thread/page/parallelism autotuning for NUMA systems. Mihail Popov, Alexandra Jimborean, David Black-Schaffer |
| 2019 | Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation. Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong |
| 2019 | GPU road network graph contraction and SSSP query. Roozbeh Karimi, David M. Koppelman, Chris J. Michael |
| 2019 | GPU snapshot: checkpoint offloading for GPU-dense systems. Kyushick Lee, Michael B. Sullivan, Siva Kumar Sastry Hari, Timothy Tsai, Stephen W. Keckler, Mattan Erez |
| 2019 | GPUGuard: mitigating contention based side and covert channel attacks on GPUs. Qiumin Xu, Hoda Naghibijouybari, Shibo Wang, Nael B. Abu-Ghazaleh, Murali Annavaram |
| 2019 | GreenMM: energy efficient GPU matrix multiplication through undervolting. Hadi Zamani, Yuanlai Liu, Devashree Tripathy, Laxmi N. Bhuyan, Zizhong Chen |
| 2019 | HYPHA: a framework based on separation of parallelisms to accelerate persistent homology matrix reduction. Simon Zhang, Mengbai Xiao, Chengxin Guo, Liang Geng, Hao Wang, Xiaodong Zhang |
| 2019 | Henosis: workload-driven small array consolidation and placement for HDF5 applications on heterogeneous data stores. Donghe Kang, Vedang Patel, Ashwati Nair, Spyros Blanas, Yang Wang, Srinivasan Parthasarathy |
| 2019 | Hybrid CPU/GPU clustering in shared memory on the billion point scale. Michael Gowanlock |
| 2019 | IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. Zhen Xie, Guangming Tan, Weifeng Liu, Ninghui Sun |
| 2019 | Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. Wei Zhang, Weihao Cui, Kaihua Fu, Quan Chen, Daniel Edward Mawhirter, Bo Wu, Chao Li, Minyi Guo |
| 2019 | Least squares solvers for distributed-memory machines with GPU accelerators. Jakub Kurzak, Mark Gates, Ali Charara, Asim YarKhan, Jack J. Dongarra |
| 2019 | Multi-criteria partitioning of multi-block structured grids. Hengjie Wang, Aparna Chandramowlishwaran |
| 2019 | O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning. Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Wei Wu, Ang Li, Martin C. Herbordt |
| 2019 | On optimizing distributed non-negative Tucker decomposition. Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, Yogish Sabharwal |
| 2019 | Optimizing computation-communication overlap in asynchronous task-based programs. Emilio Castillo, Nikhil Jain, Marc Casas, Miquel Moretó, Martin Schulz, Ramón Beivide, Mateo Valero, Abhinav Bhatele |
| 2019 | Optimizing the linear fascicle evaluation algorithm for many-core systems. Karan Aggarwal, Uday Bondhugula |
| 2019 | Parallelizing cryo-EM 3D reconstruction on GPU cluster with a partitioned and streamed model. Kunpeng Wang, Shizhen Xu, Haohuan Fu, Hongkun Yu, Wenlai Zhao, Guangwen Yang |
| 2019 | Performance optimization of reactive molecular dynamics simulations with dynamic charge distribution models on distributed memory platforms. Kurt A. O'Hearn, Abdullah Alperen, Hasan Metin Aktulga |
| 2019 | Power efficient job scheduling by predicting the impact of processor manufacturing variability. Dimitrios Chasapis, Miquel Moretó, Martin Schulz, Barry Rountree, Mateo Valero, Marc Casas |
| 2019 | Proceedings of the ACM International Conference on Supercomputing, ICS 2019, Phoenix, AZ, USA, June 26-28, 2019 Rudolf Eigenmann, Chen Ding, Sally A. McKee |
| 2019 | QoSMT: supporting precise performance control for simultaneous multithreading architecture. Xin Jin, Yaoyang Zhou, Bowen Huang, Zihao Yu, Xusheng Zhan, Huizhe Wang, Sa Wang, Ningmei Yu, Ninghui Sun, Yungang Bao |
| 2019 | RFAcc: a 3D ReRAM associative array based random forest accelerator. Lei Zhao, Quan Deng, Youtao Zhang, Jun Yang |
| 2019 | SDC: a software defined cache for efficient data indexing. Fan Ni, Song Jiang, Hong Jiang, Jian Huang, Xingbo Wu |
| 2019 | Software combining to mitigate multithreaded MPI contention. Abdelhalim Amer, Charles Archer, Michael Blocksome, Chongxiao Cao, Michael Chuvelev, Hajime Fujita, Maria Garzaran, Yanfei Guo, Jeff R. Hammond, Shintaro Iwasaki, Kenneth J. Raffenetti, Mikhail Shiryaev, Min Si, Kenjiro Taura, Sagar Thapaliya, Pavan Balaji |
| 2019 | TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs. Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan DeBardeleben, Qiang Guan, Zizhong Chen |
| 2019 | The anatomy of efficient FFT and winograd convolutions on modern CPUs. Aleksandar Zlateski, Zhen Jia, Kai Li, Frédo Durand |
| 2019 | Using performance models to understand scalable Krylov solver performance at scale for structured grid problems. Paul R. Eller, Torsten Hoefler, William Gropp |
| 2019 | WCCV: improving the vectorization of IF-statements with warp-coherent conditions. Huihui Sun, Florian Fey, Jie Zhao, Sergei Gorlatch |