| 2020 | A coordinate-oblivious index for high-dimensional distance similarity searches on the GPU. Brian Donnelly, Michael Gowanlock |
| 2020 | A scalable framework for solving fractional diffusion equations. Max Carlson, Robert M. Kirby, Hari Sundar |
| 2020 | AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scaling. Xianwei Cheng, Hui Zhao, Mahmut T. Kandemir, Beilei Jiang, Gayatri Mehta |
| 2020 | Accelerating relax-ordered task-parallel workloads using multi-level dependency checking. Masab Ahmad, Mohsin Shan, Akif Rehman, Omer Khan |
| 2020 | AutoParBench: a unified test framework for OpenMP-based parallelizers. Gleison Souza Diniz Mendonca, Chunhua Liao, Fernando Magno Quintão Pereira |
| 2020 | Bundlefly: a low-diameter topology for multicore fiber. Fei Lei, Dezun Dong, Xiangke Liao, José Duato |
| 2020 | BurstZ: a bandwidth-efficient scientific computing accelerator platform for large-scale data. Gongjin Sun, Seongyoung Kang, Sang-Woo Jun |
| 2020 | CFDNet: a deep learning-based accelerator for fluid simulations. Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran |
| 2020 | CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks. Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden Kwok-Hay So, Martin C. Herbordt, Ang Li, Yanzhi Wang |
| 2020 | Characterization and identification of HPC applications at leadership computing facility. Zhengchun Liu, Ryan Lewis, Rajkumar Kettimuthu, Kevin Harms, Philip H. Carns, Nageswara S. V. Rao, Ian T. Foster, Michael E. Papka |
| 2020 | Chunking loops with non-uniform workloads. Indu K. Prabhu, V. Krishna Nandivada |
| 2020 | CodeSeer: input-dependent code variants selection via machine learning. Tao Wang, Nikhil Jain, David Böhme, David Beckingsale, Frank Mueller, Todd Gamblin |
| 2020 | Compiler aided checkpointing using crash-consistent data structures in NVMM systems. Tyler Coy, Shuibing He, Bin Ren, Xuechen Zhang |
| 2020 | Efficient parallel algorithms for betweenness- and closeness-centrality in dynamic graphs. Kshitij Shukla, Sai Charan Regunta, Sai Harsh Tondomker, Kishore Kothapalli |
| 2020 | End-to-end performance modeling of distributed GPU applications. Jaemin Choi, David F. Richards, Laxmikant V. Kalé, Abhinav Bhatele |
| 2020 | Fast distributed bandits for online recommendation systems. Kanak Mahadik, Qingyun Wu, Shuai Li, Amit Sabne |
| 2020 | Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles. Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai, Nandakishore Santhi, Stephan J. Eidenbenz |
| 2020 | Fuzzy fairness controller for NVMe SSDs. Shivani Tripathy, Debiprasanna Sahoo, Manoranjan Satpathy, Madhu Mutyam |
| 2020 | Global link arrangement for practical Dragonfly. Zaid Salamah A. Alzaid, Saptarshi Bhowmik, Xin Yuan, Michael Lang |
| 2020 | Graptor: efficient pull and push style vectorized graph processing. Hans Vandierendonck |
| 2020 | How I learned to stop worrying about user-visible endpoints and love MPI. Rohit Zambre, Aparna Chandramowlishwaran, Pavan Balaji |
| 2020 | ICS '20: 2020 International Conference on Supercomputing, Barcelona Spain, June, 2020 Eduard Ayguadé, Wen-mei W. Hwu, Rosa M. Badia, H. Peter Hofstee |
| 2020 | Identifying and (automatically) remedying performance problems in CPU/GPU applications. Benjamin Welton, Barton P. Miller |
| 2020 | Leveraging intra-page update diversity for mitigating write amplification in SSDs. Imran Fareed, Mincheol Kang, Wonyoung Lee, Soontae Kim |
| 2020 | MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA. Ji Liu, Abdullah-Al Kafi, Xipeng Shen, Huiyang Zhou |
| 2020 | Mapping and scheduling HPC applications for optimizing I/O. Jesús Carretero, Emmanuel Jeannot, Guillaume Pallez, David E. Singh, Nicolas Vidal |
| 2020 | Modeling and optimizing NUMA effects and prefetching with machine learning. Isaac Sánchez Barrera, David Black-Schaffer, Marc Casas, Miquel Moretó, Anastasiia Stupnikova, Mihail Popov |
| 2020 | NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda |
| 2020 | Optimizing supercompilers for supercomputers. Michael Wolfe |
| 2020 | Parallelizing pruned landmark labeling: dealing with dependencies in graph algorithms. Ruoming Jin, Zhen Peng, Wendell Wu, Feodor F. Dragan, Gagan Agrawal, Bin Ren |
| 2020 | Post-moore server architecture. Babak Falsafi |
| 2020 | RICH: implementing reductions in the cache hierarchy. Vladimir Dimic, Miquel Moretó, Marc Casas, Jan Ciesko, Mateo Valero |
| 2020 | SB-Fetch: synchronization aware hardware prefetching for chip multiprocessors. Laith M. AlBarakat, Paul V. Gratz, Daniel A. Jiménez |
| 2020 | Snug: architectural support for relaxed concurrent priority queueing in chip multiprocessors. Azin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison, Josep Torrellas |
| 2020 | Sparse-TPU: adapting systolic arrays for sparse matrices. Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Kuan-Yu Chen, Ronald G. Dreslinski, Trevor N. Mudge |
| 2020 | TensorSVM: accelerating kernel machines with tensor engine. Shaoshuai Zhang, Ruchi Shah, Panruo Wu |
| 2020 | Tools for top-down performance analysis of GPU-accelerated applications. Keren Zhou, Mark W. Krentel, John M. Mellor-Crummey |
| 2020 | Tuning applications for efficient GPU offloading to in-memory processing. Yudong Wu, Mingyao Shen, Yi-Hui Chen, Yuanyuan Zhou |
| 2020 | V-Combiner: speeding-up iterative graph processing on a shared-memory platform with vertex merging. Azin Heidarshenas, Serif Yesil, Dimitrios Skarlatos, Sasa Misailovic, Adam Morrison, Josep Torrellas |
| 2020 | Wavefront parallelization of recurrent neural networks on multi-core architectures. Robin Kumar Sharma, Marc Casas |
| 2020 | What every scientific programmer should know about compiler optimizations? Jialiang Tan, Shuyin Jiao, Milind Chabbi, Xu Liu |
| 2020 | cuRipples: influence maximization on multi-GPU systems. Marco Minutoli, Maurizio Drocco, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman |