| 2021 | A Hierarchical Task Scheduler for Heterogeneous Computing. Narasinga Rao Miniskar, Frank Liu, Aaron R. Young, Dwaipayan Chakraborty, Jeffrey S. Vetter |
| 2021 | A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application. Andrei Poenaru, Wei-Chen Lin, Simon McIntosh-Smith |
| 2021 | A Tunable Implementation of Quality-of-Service Classes for HPC Networks. Kevin A. Brown, Neil McGlohon, Sudheer Chunduri, Eric Borch, Robert B. Ross, Christopher D. Carothers, Kevin Harms |
| 2021 | Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact. Ayesha Afzal, Georg Hager, Gerhard Wellein |
| 2021 | Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine Learning. Chad Wood, Giorgis Georgakoudis, David Beckingsale, David Poliakoff, Alfredo Giménez, Kevin A. Huck, Allen D. Malony, Todd Gamblin |
| 2021 | Auto-Precision Scaling for Distributed Deep Learning. Ruobing Han, James Demmel, Yang You |
| 2021 | BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. Mohammadreza Bayatpour, Nick Sarkauskas, Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda |
| 2021 | COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling. Marko Kabic, Simon Pintarelli, Anton Kozhevnikov, Joost VandeVondele |
| 2021 | Characterizing Containerized HPC Applications Performance at Petascale on CPU and GPU Architectures. Amit Ruhela, Stephen Lien Harrell, Richard Todd Evans, Gregory J. Zynda, John M. Fonner, Matt Vaughn, Tommy Minyard, John Cazes |
| 2021 | Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences. Kawthar Shafie Khorassani, Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda |
| 2021 | Distributed Sparse Block Grids on GPUs. Pietro Incardona, Tommaso Bianucci, Ivo F. Sbalzarini |
| 2021 | Enabling AI-Accelerated Multiscale Modeling of Thrombogenesis at Millisecond and Molecular Resolutions on Supercomputers. Yicong Zhu, Peng Zhang, Changnian Han, Guojing Cong, Yuefan Deng |
| 2021 | Evaluation of the NEC Vector Engine for Legacy CFD Codes. Keith Obenschain, Yu Yu Khine, Raghunandan Mathur, Gopal Patnaik, Robert Rosenberg |
| 2021 | FPGA Acceleration of Number Theoretic Transform. Tian Ye, Yang Yang, Sanmukh R. Kuppannagari, Rajgopal Kannan, Viktor K. Prasanna |
| 2021 | HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads. Pouya Fotouhi, Marjan Fariborz, Roberto Proietti, Jason Lowe-Power, Venkatesh Akella, S. J. Ben Yoo |
| 2021 | High Performance Computing - 36th International Conference, ISC High Performance 2021, Virtual Event, June 24 - July 2, 2021, Proceedings Bradford L. Chamberlain, Ana Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek |
| 2021 | Lessons Learned from Accelerating Quicksilver on Programmable Integrated Unified Memory Architecture (PIUMA) and How That's Different from CPU. Jesmin Jahan Tithi, Fabrizio Petrini, David F. Richards |
| 2021 | Microarchitecture of a Configurable High-Radix Router for the Post-Moore Era. Yi Dai, Kai Lu, Junsheng Chang, Xingyun Qi, Jijun Cao, Jianmin Zhang |
| 2021 | Optimizing GPU-Enhanced HPC System and Cloud Procurements for Scientific Workloads. Richard Todd Evans, Matthew Cawood, Stephen Lien Harrell, Lei Huang, Si Liu, Chun-Yaung Lu, Amit Ruhela, Yinzhi Wang, Zhao Zhang |
| 2021 | Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark. Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, Mitsuhisa Sato |
| 2021 | Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC Systems. Burak Aksar, Yijia Zhang, Emre Ates, Benjamin Schwaller, Omar Aaziz, Vitus J. Leung, Jim M. Brandt, Manuel Egele, Ayse K. Coskun |
| 2021 | Scalability of Streaming Anomaly Detection in an Unbounded Key Space Using Migrating Threads. Brian A. Page, Peter M. Kogge |
| 2021 | Ubiquitous Performance Analysis. David Böhme, Pascal Aschwanden, Olga Pearce, Kenneth Weiss, Matthew P. LeGendre |
| 2021 | Under the Hood of SYCL - An Initial Performance Analysis with An Unstructured-Mesh CFD Application. István Z. Reguly, Andrew M. B. Owenson, Archie Powell, Stephen A. Jarvis, Gihan R. Mudalige |
| 2021 | iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs. Luk Burchard, Johannes Moe, Daniel Thilo Schroeder, Konstantin Pogorelov, Johannes Langguth |