| 2021 | 12 Ways to Fool the Masses with Irreproducible Results. Lorena A. Barba |
| 2021 | 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021 |
| 2021 | A Hybrid Scheduling Scheme for Parallel Loops. Aaron Handleman, Arthur G. Rattew, I-Ting Angelina Lee, Tao B. Schardl |
| 2021 | A Multi-GPU Design for Large Size Cryo-EM 3D Reconstruction. Zihao Wang, Xiaohua Wan, Zhiyong Liu, Qianshuo Fan, Fa Zhang, Guangming Tan |
| 2021 | A Tale of Two C's: Convergence and Composability. Ilkay Altintas |
| 2021 | ARBALEST: Dynamic Detection of Data Mapping Issues in Heterogeneous OpenMP Applications. Lechen Yu, Joachim Protze, Oscar R. Hernandez, Vivek Sarkar |
| 2021 | Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths. Edward Hutter, Edgar Solomonik |
| 2021 | Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs. Jieyang Chen, Lipeng Wan, Xin Liang, Ben Whitney, Qing Liu, David Pugmire, Nicholas Thompson, Jong Youl Choi, Matthew Wolf, Todd S. Munson, Ian T. Foster, Scott Klasky |
| 2021 | Accelerating non-power-of-2 size Fourier transforms with GPU Tensor Cores. Louis Pisha, Lukasz Ligowski |
| 2021 | Adaptive Spatially Aware I/O for Multiresolution Particle Data Layouts. Will Usher, Xuan Huang, Steve Petruzza, Sidharth Kumar, Stuart R. Slattery, Samuel Temple Reeve, Feng Wang, Chris R. Johnson, Valerio Pascucci |
| 2021 | AlphaR: Learning-Powered Resource Management for Irregular, Dynamic Microservice Graph. Xiaofeng Hou, Chao Li, Jiacheng Liu, Lu Zhang, Shaolei Ren, Jingwen Leng, Quan Chen, Minyi Guo |
| 2021 | An In-Depth Analysis of Distributed Training of Deep Neural Networks. Yun-Yong Ko, Kibong Choi, Jiwon Seo, Sang-Wook Kim |
| 2021 | Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms. Jean Luca Bez, Alberto Miranda, Ramon Nou, Francieli Zanon Boito, Toni Cortes, Philippe O. A. Navaux |
| 2021 | Argus: Efficient Job Scheduling in RDMA-assisted Big Data Processing. Sijie Wu, Hanhua Chen, Yonghui Wang, Hai Jin |
| 2021 | Astra: Autonomous Serverless Analytics with Cost-Efficiency and QoS-Awareness. Jananie Jarachanthan, Li Chen, Fei Xu, Bo Li |
| 2021 | AuTraScale: An Automated and Transfer Learning Solution for Streaming System Auto-Scaling. Liang Zhang, Wenli Zheng, Chao Li, Yao Shen, Minyi Guo |
| 2021 | Automatic Graph Partitioning for Very Large-scale Deep Learning. Masahiro Tanaka, Kenjiro Taura, Toshihiro Hanawa, Kentaro Torisawa |
| 2021 | BiPS: Hotness-aware Bi-tier Parameter Synchronization for Recommendation Models. Qiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, Minyi Guo |
| 2021 | Byzantine Agreement with Unknown Participants and Failures. Pankaj Khanchandani, Roger Wattenhofer |
| 2021 | Byzantine Dispersion on Graphs. Anisur Rahaman Molla, Kaushik Mondal, William K. Moses Jr. |
| 2021 | CAGC: A Content-aware Garbage Collection Scheme for Ultra-Low Latency Flash-based SSDs. Suzhen Wu, Chunfeng Du, Haijun Li, Hong Jiang, Zhirong Shen, Bo Mao |
| 2021 | CBNet: Minimizing Adjustments in Concurrent Demand-Aware Tree Networks. Otávio Augusto de Oliviera Souza, Olga Goussevskaia, Stefan Schmid |
| 2021 | CTXBack: Enabling Low Latency GPU Context Switching via Context Flashback. Zhuoran Ji, Cho-Li Wang |
| 2021 | Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures. Weiling Yang, Jianbin Fang, Dezun Dong |
| 2021 | Code Generation for Room Acoustics Simulations with Complex Boundary Conditions. Larisa Stoltzfus, Brian Hamilton, Michel Steuwer, Lu Li, Christophe Dubach |
| 2021 | Combining XOR and Partner Checkpointing for Resilient Multilevel Checkpoint/Restart. Masoud Gholami, Florian Schintke |
| 2021 | Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale. Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad |
| 2021 | Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence. Karl Bäckström, Ivan Walulya, Marina Papatriantafilou, Philippas Tsigas |
| 2021 | Cori: Dancing to the Right Beat of Periodic Data Movements over Hybrid Memory Systems. Thaleia Dimitra Doudali, Daniel Zahka, Ada Gavrilovska |
| 2021 | Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC Monitoring Data. Alessio Netti, Daniele Tafani, Michael Ott, Martin Schulz |
| 2021 | Covirt: Lightweight Fault Isolation and Resource Protection for Co-Kernels. Nicholas Gordon, John R. Lange |
| 2021 | DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime. Alberto Parravicini, Arnaud Delamare, Marco Arnaboldi, Marco D. Santambrogio |
| 2021 | DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions. Yuke Wang, Boyuan Feng, Yufei Ding |
| 2021 | DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU Architecture. Minjia Zhang, Zehua Hu, Mingqin Li |
| 2021 | Dancing in the Dark: Profiling for Tiered Memory. Jinyoung Choi, Sergey Blagodurov, Hung-Wei Tseng |
| 2021 | Decentralized Low-Latency Task Scheduling for Ad-Hoc Computing. Janick Edinger, Martin Breitbach, Niklas Gabrisch, Dominik Schäfer, Christian Becker, Amr Rizk |
| 2021 | Deep Reinforcement Agent for Scheduling in HPC. Yuping Fan, Zhiling Lan, J. Taylor Childers, Paul Rich, William E. Allcock, Michael E. Papka |
| 2021 | Demystifying GPU Reliability: Comparing and Combining Beam Experiments, Fault Simulation, and Profiling. Fernando Fernandes dos Santos, Siva Kumar Sastry Hari, Pedro Martins Basso, Luigi Carro, Paolo Rech |
| 2021 | Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis. Tyler N. Allen, Rong Ge |
| 2021 | Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters Qinghua Zhou, C. Chu, N. S. Kumar, Pouya Kousha, Seyedeh Mahdieh Ghazimirsaeed, Hari Subramoni, Dhabaleswar K. Panda |
| 2021 | Detecting Malicious Model Updates from Federated Learning on Conditional Variational Autoencoder. Zhipin Gu, Yuexiang Yang |
| 2021 | Distributed Training of Embeddings using Graph Analytics. Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi |
| 2021 | Distributed-Memory k-mer Counting on GPUs. Israt Nisa, Prashant Pandey, Marquita Ellis, Leonid Oliker, Aydin Buluç, Katherine A. Yelick |
| 2021 | Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure. Thomas Hérault, Yves Robert, George Bosilca, Robert J. Harrison, Cannada A. Lewis, Edward F. Valeev, Jack J. Dongarra |
| 2021 | EAGLE: Expedited Device Placement with Automatic Grouping for Large Models. Hao Lan, Li Chen, Baochun Li |
| 2021 | Efficient Algorithms for Encrypted All-gather Operation. Mehran Sadeghi Lahijani, Abu Naser, Cong Wu, Mohsen Gavahi, Viet Tung Hoang, Zhi Wang, Xin Yuan |
| 2021 | Efficient Distributed Algorithms in the k-machine model via PRAM Simulations. John Augustine, Kishore Kothapalli, Gopal Pandurangan |
| 2021 | Efficient Video Captioning on Heterogeneous System Architectures. Horng-Ruey Huang, Ding-Yong Hong, Jan-Jan Wu, Pangfeng Liu, Wei-Chung Hsu |
| 2021 | Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree. Linjian Ma, Edgar Solomonik |
| 2021 | Euler Meets GPU: Practical Graph Algorithms with Theoretical Guarantees. Adam Polak, Adrian Siwiec, Michal Stobierski |
| 2021 | Extending Sparse Tensor Accelerators to Support Multiple Compression Formats. Eric Qin, Geonhwa Jeong, William Won, Sheng-Chun Kao, Hyoukjun Kwon, Sudarshan Srinivasan, Dipankar Das, Gordon Euhyun Moon, Sivasankaran Rajamanickam, Tushar Krishna |
| 2021 | Extremely Fast and Energy Efficient One-way Wave Equation Migration on GPU-based heterogeneous architecture. Long Qu, Loris Lucido, Marie Bonnasse-Gahot, Pascal Vezolle, Diego Klahr |
| 2021 | F-Write: Fast RDMA-supported Writes in Erasure-coded In-memory Clusters. Bin Xu, Jianzhong Huang, Qiang Cao, Xiao Qin, Ping Xie |
| 2021 | Facilitating Data Discovery for Large-scale Science Facilities using Knowledge Networks. Yubo Qin, Ivan Rodero, Manish Parashar |
| 2021 | Finer-LRU: A Scalable Page Management Scheme for HPC Manycore Architectures. Jiwoo Bang, Chungyong Kim, Sunggon Kim, Qichen Chen, Cheongjun Lee, Eun-Kyu Byun, Jaehwan Lee, Hyeonsang Eom |
| 2021 | From Parallelization to Customization - Challenges and Opportunities. Jason Cong |
| 2021 | FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks. Md. Khaledur Rahman, Majedul Haque Sujon, Ariful Azad |
| 2021 | High Performance Streaming Tensor Decomposition. Yongseok Soh, Patrick Flick, Xing Liu, Shaden Smith, Fabio Checconi, Fabrizio Petrini, Jee W. Choi |
| 2021 | High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers. Kamalakkannan Kamalavasan, Gihan R. Mudalige, István Z. Reguly, Suhaib A. Fahmy |
| 2021 | High-Level Synthesis of Parallel Specifications Coupling Static and Dynamic Controllers. Vito Giovanni Castellana, Antonino Tumeo, Fabrizio Ferrandi |
| 2021 | High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection. Martin Karp, Artur Podobas, Niclas Jansson, Tobias Kenter, Christian Plessl, Philipp Schlatter, Stefano Markidis |
| 2021 | Improving checkpointing intervals by considering individual job failure probabilities. Alvaro Frank, Manuel Baumgartner, Reza Salkhordeh, André Brinkmann |
| 2021 | Interpreting Write Performance of Supercomputer I/O Systems with Regression Models. Bing Xie, Zilong Tan, Philip H. Carns, Jeffrey S. Chase, Kevin Harms, Jay F. Lofstead, Sarp Oral, Sudharshan S. Vazhkudai, Feiyi Wang |
| 2021 | Introducing Application Awareness Into a Unified Power Management Stack. Daniel C. Wilson, Siddhartha Jana, Aniruddha Marathe, Stephanie Brink, Christopher M. Cantalupo, Diana R. Guttman, Brad Geltz, Lowren H. Lawson, Asma H. Al-Rawi, Ali Mohammad, Fuat Keceli, Federico Ardanaz, Jonathan M. Eastep, Ayse K. Coskun |
| 2021 | Is Asymptotic Cost Analysis Useful in Developing Practical Parallel Algorithms. Guy E. Blelloch |
| 2021 | Jigsaw: A Slice-and-Dice Approach to Non-uniform FFT Acceleration for MRI Image Reconstruction. Brendan L. West, Jeffrey A. Fessler, Thomas F. Wenisch |
| 2021 | Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems. Qinglei Cao, Yu Pei, Kadir Akbudak, George Bosilca, Hatem Ltaief, David E. Keyes, Jack J. Dongarra |
| 2021 | Lightweight Function Monitors for Fine-Grained Management in Large Scale Python Applications. Tim Shaffer, Zhuozhao Li, Ben Tovar, Yadu N. Babuji, T. J. Dasso, Zoe Surma, Kyle Chard, Ian T. Foster, Douglas Thain |
| 2021 | Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws? Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka |
| 2021 | Max-Stretch Minimization on an Edge-Cloud Platform. Anne Benoit, Redouane Elghazi, Yves Robert |
| 2021 | MultiLogVC: Efficient Out-of-Core Graph Processing Framework for Flash Storage. Kiran Kumar Matam, Hanieh Hashemi, Murali Annavaram |
| 2021 | Multiplicative Weights Algorithms for Parallel Automated Software Repair. Joseph Renzullo, Westley Weimer, Stephanie Forrest |
| 2021 | NVMe-CR: A Scalable Ephemeral Storage Runtime for Checkpoint/Restart with NVMe-over-Fabrics. Shashank Gugnani, Tianxi Li, Xiaoyi Lu |
| 2021 | Noise-Resilient Empirical Performance Modeling with Deep Neural Networks. Marcus Ritter, Alexander Geiß, Johannes Wehrstein, Alexandru Calotoiu, Thorsten Reimann, Torsten Hoefler, Felix Wolf |
| 2021 | Nowa: A Wait-Free Continuation-Stealing Concurrency Platform. Florian Schmaus, Nicolas Pfeiffer, Wolfgang Schröder-Preikschat, Timo Hönig, Jörg Nolte |
| 2021 | Optimal Task Assignment for Heterogeneous Federated Learning Devices. Laércio Lima Pilla |
| 2021 | Optimizing Memory-Compute Colocation for Irregular Applications on a Migratory Thread Architecture. Thomas B. Rolinger, Christopher D. Krieger, Alan Sussman |
| 2021 | Optimizing Performance for Open-Channel SSDs in Cloud Storage System. Xiaoyi Zhang, Feng Zhu, Shu Li, Kun Wang, Wei Xu, Dengcai Xu |
| 2021 | PALM: Progress- and Locality-Aware Adaptive Task Migration for Efficient Thread Packing. Jinsu Park, Seongbeom Park, Myeonggyun Han, Woongki Baek |
| 2021 | Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly. Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine A. Yelick, Aydin Buluç |
| 2021 | Pase: Parallelization Strategies for Efficient DNN Training. Venmugil Elango |
| 2021 | Performance Analysis of Scientific Computing Workloads on General Purpose TEEs. Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert |
| 2021 | Performance Evaluation of Adaptive Routing on Dragonfly-based Production Systems. Sudheer Chunduri, Kevin Harms, Taylor L. Groves, Peter Mendygral, Justs Zarins, Michèle Weiland, Yasaman Ghadar |
| 2021 | Performance-Portable Graph Coarsening for Efficient Multilevel Graph Analysis. Michael S. Gilbert, Seher Acer, Erik G. Boman, Kamesh Madduri, Sivasankaran Rajamanickam |
| 2021 | Plex: Scaling Parallel Lexing with Backtrack-Free Prescanning. Le Li, Shigeyuki Sato, Qiheng Liu, Kenjiro Taura |
| 2021 | QPR: Quantizing PageRank with Coherent Shared Memory Accelerators. Abdullah T. Mughrabi, Mohannad Ibrahim, Gregory T. Byrd |
| 2021 | QoS-Aware and Resource Efficient Microservice Deployment in Cloud-Edge Continuum. Kaihua Fu, Wei Zhang, Quan Chen, Deze Zeng, Xin Peng, Wenli Zheng, Minyi Guo |
| 2021 | RVMA: Remote Virtual Memory Access. Ryan E. Grant, Michael J. Levenhagen, Matthew G. F. Dosanjh, Patrick M. Widener |
| 2021 | Rack-Scaling: An efficient rack-based redistribution method to accelerate the scaling of cloud disk arrays. Zhehan Lin, Hanchen Guo, Chentao Wu, Jie Li, Guangtao Xue, Minyi Guo |
| 2021 | Rank Position Forecasting in Car Racing. Bo Peng, Jiayu Li, Selahattin Akkas, Takuya Araki, Ohno Yoshiyuki, Judy Qiu |
| 2021 | Redesigning Peridigm on SIMT Accelerators for High-performance Peridynamics Simulations. Xinyuan Li, Huang Ye, Jian Zhang |
| 2021 | Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures. Jiannan Tian, Cody Rivera, Sheng Di, Jieyang Chen, Xin Liang, Dingwen Tao, Franck Cappello |
| 2021 | SNOW Revisited: Understanding When Ideal READ Transactions Are Possible. Kishori M. Konwar, Wyatt Lloyd, Haonan Lu, Nancy A. Lynch |
| 2021 | SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC. George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Dilip Vasudevan, Anastasiia Butko |
| 2021 | SUPER: SUb-Graph Parallelism for TransformERs. Arpan Jain, Tim Moon, Tom Benson, Hari Subramoni, Sam Adé Jacobs, Dhabaleswar K. Panda, Brian Van Essen |
| 2021 | SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services. Srinivasan Ramesh, Allen D. Malony, Philip H. Carns, Robert B. Ross, Matthieu Dorier, Jérome Soumagne, Shane Snyder |
| 2021 | Scalable Epidemiological Workflows to Support COVID-19 Planning and Response. Dustin Machi, Parantapa Bhattacharya, Stefan Hoops, Jiangzhuo Chen, Henning S. Mortveit, Srinivasan Venkatramanan, Bryan L. Lewis, Mandy L. Wilson, Arindam Fadikar, Tom Maiden, Christopher L. Barrett, Madhav V. Marathe |
| 2021 | Scaling Out a Combinatorial Algorithm for Discovering Carcinogenic Gene Combinations to Thousands of GPUs. Sajal Dash, Qais Al-Hajri, Wu-chun Feng, Harold R. Garner, Ramu Anandakrishnan |
| 2021 | Scaling Sparse Matrix Multiplication on CPU-GPU Nodes. Yang Xia, Peng Jiang, Gagan Agrawal, Rajiv Ramnath |
| 2021 | Speculative Parallel Reverse Cuthill-McKee Reordering on Multi- and Many-core Architectures. Daniel Mlakar, Martin Winter, Mathias Parger, Markus Steinberger |
| 2021 | Spray: Sparse Reductions of Arrays in OPENMP. Jan Hückelheim, Johannes Doerfert |
| 2021 | Systemic Assessment of Node Failures in HPC Production Platforms. Anwesha Das, Frank Mueller, Barry Rountree |
| 2021 | Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources. George Bisbas, Fabio Luporini, Mathias Louboutin, Rhodri Nelson, Gerard J. Gorman, Paul H. J. Kelly |
| 2021 | TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs. Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, Guangming Tan |
| 2021 | Towards Internet-Scale Convolutional Root-Cause Analysis with DIAGNET. Loïck Bonniot, Christoph Neumann, François Taïani |
| 2021 | Towards Practical Cloud Offloading for Low-cost Ground Vehicle Workloads. Yuan Xu, Tianwei Zhang, Jimin Han, Sa Wang, Yungang Bao |
| 2021 | Transparent I/O-Aware GPU Virtualization for Efficient Resource Consolidation. Nelson Mimura Gonzalez, Tonia Elengikal |
| 2021 | Virtual-Link: A Scalable Multi-Producer Multi-Consumer Message Queue Architecture for Cross-Core Communication. Qinzhe Wu, Jonathan Beard, Ashen Ekanayake, Andreas Gerstlauer, Lizy K. John |
| 2021 | xBGAS: A Global Address Space Extension on RISC-V for High Performance Computing. Xi Wang, John D. Leidel, Brody Williams, Alan Ehret, Miguel Mark, Michel A. Kinsy, Yong Chen |
| 2021 | zMesh: Exploring Application Characteristics to Improve Lossy Compression Ratio for Adaptive Mesh Refinement. Huizhang Luo, Junqi Wang, Qing Liu, Jieyang Chen, Scott Klasky, Norbert Podhorszki |