| 2020 | A 1024-member ensemble data assimilation with 3.5-km mesh global weather simulations. Hisashi Yashiro, Koji Terasaki, Yuta Kawai, Shuhei Kudo, Takemasa Miyoshi, Toshiyuki Imamura, Kazuo Minami, Hikaru Inoue, Tatsuo Nishiki, Takayuki Saji, Masaki Satoh, Hirofumi Tomita |
| 2020 | A hierarchical and load-aware design for large message neighborhood collectives. S. Mahdieh Ghazimirsaeed, Qinghua Zhou, Amit Ruhela, Mohammadreza Bayatpour |
| 2020 | A parallel framework for constraint-based bayesian network learning via markov blanket discovery. Ankit Srivastava, Sriram P. Chockalingam, Srinivas Aluru |
| 2020 | A performance-portable nonhydrostatic atmospheric dycore for the energy exascale earth system model running at cloud-resolving resolutions. Luca Bertagna, Oksana Guba, Mark A. Taylor, James G. Foucar, Jeff Larkin, Andrew M. Bradley, Sivasankaran Rajamanickam, Andrew G. Salinger |
| 2020 | A submatrix-based method for approximate matrix function evaluation in the quantum chemistry code CP2K. Michael Lass, Robert Schade, Thomas D. Kühne, Christian Plessl |
| 2020 | ANT-man: towards agile power management in the microservice era. Xiaofeng Hou, Chao Li, Jiacheng Liu, Lu Zhang, Yang Hu, Minyi Guo |
| 2020 | Accelerating large-scale excited-state GW calculations on leadership HPC systems. Mauro Del Ben, Charlene Yang, Zhenglu Li, Felipe H. da Jornada, Steven G. Louie, Jack Deslippe |
| 2020 | Accelerating sparse DNN models without hardware-support via tile-wise sparsity. Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu |
| 2020 | Acceleration of fusion plasma turbulence simulations using the mixed-precision communication-avoiding krylov method. Yasuhiro Idomura, Takuya Ina, Yussuf Ali, Toshiyuki Imamura |
| 2020 | Alias-free, matrix-free, Ammar Hakim, James Juno |
| 2020 | Alita: comprehensive performance isolation through bias resource management for public clouds. Quan Chen, Shuai Xue, Shang Zhao, Shanpei Chen, Yihao Wu, Yu Xu, Zhuo Song, Tao Ma, Yong Yang, Minyi Guo |
| 2020 | An efficient and non-intrusive GPU scheduling framework for deep learning training systems. Shaoqi Wang, Oscar J. Gonzalez, Xiaobo Zhou, Thomas Williams, Brian D. Friedman, Martin Havemann, Thomas Y. C. Woo |
| 2020 | An in-depth analysis of the slingshot interconnect. Daniele De Sensi, Salvatore Di Girolamo, Kim H. McMahon, Duncan Roweth, Torsten Hoefler |
| 2020 | Architecture and performance studies of 3D-Hyper-FleX-LION for reconfigurable all-to-all HPC networks. Gengchen Liu, Roberto Proietti, Marjan Fariborz, Pouya Fotouhi, Xian Xiao, S. J. Ben Yoo |
| 2020 | BORA: a bag optimizer for robotic analysis. Jian Zhang, Tao Xie, Yuzhuo Jing, Yanjie Song, Guanzhou Hu, Si Chen, Shu Yin |
| 2020 | Batch: machine learning inference serving on serverless platforms with adaptive batching. Ahsan Ali, Riccardo Pinciroli, Feng Yan, Evgenia Smirni |
| 2020 | BiQGEMM: matrix multiplication with lookup table for binary-coding-based quantized DNNs. Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee |
| 2020 | C-SAW: a framework for graph sampling and random walk on GPUs. Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S. Li, Hang Liu |
| 2020 | CAB-MPI: exploring interprocess work-stealing towards balanced MPI communication. Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, Pavan Balaji |
| 2020 | CCAMP: an integrated translation and optimization framework for OpenACC and OpenMP. Jacob Lambert, Seyong Lee, Jeffrey S. Vetter, Allen D. Malony |
| 2020 | CRAC: checkpoint-restart architecture for CUDA with streams and UVM. Twinkle Jain, Gene Cooperman |
| 2020 | Cell-list based molecular dynamics on many-core processors: a case study on sunway TaihuLight supercomputer. Xiaohui Duan, Ping Gao, Meng Zhang, Tingjian Zhang, Hongsong Meng, Yuxuan Li, Bertil Schmidt, Haohuan Fu, Lin Gan, Wei Xue, Weiguo Liu, Guangwen Yang |
| 2020 | Chronicles of astra: challenges and lessons from the first petascale arm supercomputer. Kevin T. Pedretti, Andrew J. Younge, Simon D. Hammond, James H. Laros III, Matthew L. Curry, Michael J. Aguilar, Robert J. Hoekstra, Ron Brightwell |
| 2020 | Co-design for A64FX manycore processor and "Fugaku". Mitsuhisa Sato, Yutaka Ishikawa, Hirofumi Tomita, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, Hisashi Yashiro, Masaki Aoki, Naoyuki Shida, Ikuo Miyoshi, Kouichi Hirai, Atsushi Furuya, Akira Asato, Kuniki Morita, Toshiyuki Shimizu |
| 2020 | Compiler-based timing for extremely fine-grain preemptive parallelism. Souradip Ghosh, Michael Cuevas, Simone Campanoni, Peter A. Dinda |
| 2020 | Compiling generalized histograms for GPU. Troels Henriksen, Sune Hellfritzsch, Ponnuswamy Sadayappan, Cosmin E. Oancea |
| 2020 | Convolutional neural network training with distributed K-FAC. J. Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, Ian T. Foster |
| 2020 | Cost-aware prediction of uncorrected DRAM errors in the field. Isaac Boixaderas, Darko Zivanovic, Sergi Moré, Javier Bartolome, David Vicente, Marc Casas, Paul M. Carpenter, Petar Radojkovic, Eduard Ayguadé |
| 2020 | Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters. Ang Li, Omer Subasi, Xiu Yang, Sriram Krishnamoorthy |
| 2020 | Distributed many-to-many protein sequence alignment using sparse matrices. Oguz Selvitopi, Saliya Ekanayake, Giulia Guidi, Georgios A. Pavlopoulos, Ariful Azad, Aydin Buluç |
| 2020 | Distributed-memory DMRG via sparse and dense parallel tensor contractions. Ryan Levy, Edgar Solomonik, Bryan K. Clark |
| 2020 | Distributed-memory parallel symmetric nonnegative matrix factorization. Srinivas Eswar, Koby Hayashi, Grey Ballard, Ramakrishnan Kannan, Richard W. Vuduc, Haesun Park |
| 2020 | DrCCTProf: a fine-grained call path profiler for ARM-based clusters. Qidong Zhao, Xu Liu, Milind Chabbi |
| 2020 | Efficient 2D tensor network simulation of quantum systems. Yuchen Pang, Tianyi Hao, Annika Dugad, Yiqing Zhou, Edgar Solomonik |
| 2020 | Efficient tiled sparse matrix multiplication through matrix signatures. Süreyya Emre Kurt, Aravind Sukumaran-Rajam, Fabrice Rastello, P. Sadayappan |
| 2020 | Evaluation of a minimally synchronous algorithm for 2: 1 octree balance. Hansol Suh, Tobin Isaac |
| 2020 | Experimental evaluation of NISQ quantum computers: error measurement, characterization, and implications. Tirthak Patel, Abhay Potharaju, Baolin Li, Rohan Basu Roy, Devesh Tiwari |
| 2020 | Fast stencil-code computation on a wafer-scale processor. Kamil Rocki, Dirk Van Essendelft, Ilya Sharapov, Robert Schreiber, Michael Morrison, Vladimir Kibardin, Andrey Portnoy, Jean-Francois Dietiker, Madhava Syamlal, Michael James |
| 2020 | FatPaths: routing in supercomputers and data centers when shortest paths fall short. Maciej Besta, Marcel Schneider, Marek Konieczny, Karolina Cynk, Erik Henriksson, Salvatore Di Girolamo, Ankit Singla, Torsten Hoefler |
| 2020 | FeatGraph: a flexible and efficient backend for graph neural network systems. Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang |
| 2020 | Foresight: analysis that matters for data reduction. Pascal Grosset, Christopher M. Biwer, Jesus Pulido, Arvind T. Mohan, Ayan Biswas, John Patchett, Terece L. Turton, David H. Rogers, Daniel Livescu, James P. Ahrens |
| 2020 | GE-SpMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks. Guyue Huang, Guohao Dai, Yu Wang, Huazhong Yang |
| 2020 | GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training. Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani |
| 2020 | GPU lifetimes on titan supercomputer: survival analysis and reliability. George Ostrouchov, Don Maxwell, Rizwan A. Ashraf, Christian Engelmann, Mallikarjun Shankar, James H. Rogers |
| 2020 | GPU-trident: efficient modeling of error propagation in GPU programs. Abdul Rehman Anwer, Guanpeng Li, Karthik Pattabiraman, Michael B. Sullivan, Timothy Tsai, Siva Kumar Sastry Hari |
| 2020 | GVProf: a value profiler for GPU-based clusters. Keren Zhou, Yueming Hao, John M. Mellor-Crummey, Xiaozhu Meng, Xu Liu |
| 2020 | GraphPi: high performance graph pattern matching through effective redundancy elimination. Tianhui Shi, Mingshu Zhai, Yi Xu, Jidong Zhai |
| 2020 | HPC I/O throughput bottleneck analysis with explainable local models. Mihailo Isakov, Eliakin Del Rosario, Sandeep Madireddy, Prasanna Balaprakash, Philip H. Carns, Robert B. Ross, Michel A. Kinsy |
| 2020 | Herring: rethinking the parameter server at scale for the cloud. Indu Thangakrishnan, Derya Cavdar, Can Karakus, Piyush Ghai, Yauheni Selivonchyk, Cory Pruce |
| 2020 | High-performance parallel graph coloring with strong guarantees on work, depth, and quality. Maciej Besta, Armon Carigiet, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Torsten Hoefler |
| 2020 | INEC: fast and coherent in-network erasure coding. Haiyang Shi, Xiaoyi Lu |
| 2020 | Improving all-to-many personalized communication in two-phase I/O. Qiao Kang, Robert B. Ross, Robert Latham, Sunwoo Lee, Ankit Agrawal, Alok N. Choudhary, Wei-keng Liao |
| 2020 | Iris: allocation banking and identity and access management for the exascale era. Gabor Torok, Mark R. Day, Rebecca Hartman-Baker, Cory Snavely |
| 2020 | Job characteristics on large-scale systems: long-term analysis, quantification, and implications. Tirthak Patel, Zhengchun Liu, Raj Kettimuthu, Paul Rich, William E. Allcock, Devesh Tiwari |
| 2020 | Kraken: memory-efficient continual learning for large-scale real-time recommendations. Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, Hongbo Ao, Wanhong Xu, Jiwu Shu |
| 2020 | Live forensics for HPC systems: a case study on distributed storage systems. Saurabh Jha, Shengkun Cui, Subho S. Banerjee, Tianyin Xu, Jeremy Enos, Mike Showerman, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer |
| 2020 | Massive parallelization for finding shortest lattice vectors based on ubiquity generator framework. Nariaki Tateiwa, Yuji Shinano, Satoshi Nakamura, Akihiro Yoshida, Shizuo Kaji, Masaya Yasuda, Katsuki Fujisawa |
| 2020 | MeshfreeFlowNet: a physics-constrained deep continuous space-time super-resolution framework. Chiyu Max Jiang, Soheil Esmaeilzadeh, Kamyar Azizzadenesheli, Karthik Kashinath, Mustafa Mustafa, Hamdi A. Tchelepi, Philip Marcus, Prabhat, Anima Anandkumar |
| 2020 | Metis: learning to schedule long-running applications in shared container clusters at scale. Luping Wang, Qizhen Weng, Wei Wang, Chen Chen, Bo Li |
| 2020 | MoHA: a composable system for efficient in-situ analytics on heterogeneous HPC systems. Haoyuan Xing, Gagan Agrawal, Rajiv Ramnath |
| 2020 | Multi-node multi-GPU diffeomorphic image registration for large-scale imaging problems. Malte Brunn, Naveen Himthani, George Biros, Miriam Mehl, Andreas Mang |
| 2020 | Newton-ADMM: a distributed GPU-accelerated optimizer for multiclass classification problems. Chih-Hao Fang, Sudhir B. Kylasa, Fred Roosta, Michael W. Mahoney, Ananth Grama |
| 2020 | OMPRacer: a scalable and precise static race detector for OpenMP programs. Bradley Swain, Yanze Li, Peiming Liu, Ignacio Laguna, Giorgis Georgakoudis, Jeff Huang |
| 2020 | Optimizing deep learning recommender systems training on CPU cluster architectures. Dhiraj D. Kalamkar, Evangelos Georganas, Sudarshan Srinivasan, Jianping Chen, Mikhail Shiryaev, Alexander Heinecke |
| 2020 | Pencil: a pipelined algorithm for distributed stencils. Hengjie Wang, Aparna Chandramowlishwaran |
| 2020 | Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodes. Mert Hidayetoglu, Tekin Bicer, Simon Garcia De Gonzalo, Bin Ren, Vincent De Andrade, Doga Gürsoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu |
| 2020 | Preempt: scalable epidemic interventions using submodular optimization on multi-GPU systems. Marco Minutoli, Prathyush Sambaturu, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Anil Vullikanti |
| 2020 | Preparing nuclear astrophysics for exascale. Max P. Katz, Ann S. Almgren, Maria Barrios Sazo, Kiran Eiden, Kevin Gott, Alice Harpole, Jean M. Sexton, Donald E. Willcox, Weiqun Zhang, Michael Zingale |
| 2020 | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual Event / Atlanta, Georgia, USA, November 9-19, 2020 Christine Cuicchi, Irene Qualters, William T. Kramer |
| 2020 | Processing full-scale square kilometre array data on the summit supercomputer. Ruonan Wang, Rodrigo Tobar, Markus Dolensky, Tao An, Andreas Wicenec, Chen Wu, Fred Dulwich, Norbert Podhorszki, Valentine Anantharaj, Eric Suchyta, Bao-qiang Lao, Scott Klasky |
| 2020 | Pushing the limit of molecular dynamics with Weile Jia, Han Wang, Mohan Chen, Denghui Lu, Lin Lin, Roberto Car, Weinan E, Linfeng Zhang |
| 2020 | RDMP-KV: designing remote direct memory persistence based key-value stores with PMEM. Tianxi Li, Dipti Shankar, Shashank Gugnani, Xiaoyi Lu |
| 2020 | RLScheduler: an automated HPC batch job scheduler using reinforcement learning. Di Zhang, Dong Dai, Youbiao He, Forrest Sheng Bao, Bing Xie |
| 2020 | Recurrent neural network architecture search for geophysical emulation. Romit Maulik, Romain Egele, Bethany Lusch, Prasanna Balaprakash |
| 2020 | Reducing communication in graph neural network training. Alok Tripathy, Katherine A. Yelick, Aydin Buluç |
| 2020 | Rocket: efficient and scalable all-pairs computations on heterogeneous platforms. Stijn Heldens, Pieter Hijma, Ben van Werkhoven, Jason Maassen, Henri E. Bal, Rob van Nieuwpoort |
| 2020 | Runtime-guided ECC protection using online estimation of memory vulnerability. Luc Jaulmes, Miquel Moretó, Mateo Valero, Mattan Erez, Marc Casas |
| 2020 | SEFEE: lightweight storage error forecasting in large-scale enterprise storage systems. Amirhessam Yazdi, Xing Lin, Lei Yang, Feng Yan |
| 2020 | ScalAna: automating scaling loss detection with graph analysis. Yuyang Jin, Haojie Wang, Teng Yu, Xiongchao Tang, Torsten Hoefler, Xu Liu, Jidong Zhai |
| 2020 | Scalable heterogeneous execution of a coupled-cluster model with perturbative triples. Jinsung Kim, Ajay Panyala, Bo Peng, Karol Kowalski, P. Sadayappan, Sriram Krishnamoorthy |
| 2020 | Scalable knowledge graph analytics at 136 petaflop/s. Ramakrishnan Kannan, Piyush Sao, Hao Lu, Drahomira Herrmannova, Vijay Thakkar, Robert M. Patton, Richard W. Vuduc, Thomas E. Potok |
| 2020 | Scalable yet rigorous floating-point error analysis. Arnab Das, Ian Briggs, Ganesh Gopalakrishnan, Sriram Krishnamoorthy, Pavel Panchekha |
| 2020 | Scaling distributed deep learning workloads beyond the memory capacity with KARMA. Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, Satoshi Matsuoka |
| 2020 | Scaling the hartree-fock matrix build on summit. Giuseppe M. J. Barca, David L. Poole, Jorge L. Galvez Vallejo, Melisa Alkan, Colleen Bertoni, Alistair P. Rendell, Mark S. Gordon |
| 2020 | SegAlign: a scalable GPU-based whole genome aligner. Sneha D. Goenka, Yatish Turakhia, Benedict Paten, Mark Horowitz |
| 2020 | Smart-PGSim: using neural network to accelerate AC-OPF power grid simulation. Wenqian Dong, Zhen Xie, Gokcen Kestor, Dong Li |
| 2020 | SpTFS: sparse tensor format selection for MTTKRP via deep learning. Qingxiao Sun, Yi Liu, Ming Dun, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian |
| 2020 | Sparse GPU kernels for deep learning. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen |
| 2020 | Speeding up SpMV for power-law graph analytics by enhancing locality & vectorization. Serif Yesil, Azin Heidarshenas, Adam Morrison, Josep Torrellas |
| 2020 | TAGO: rethinking routing design in high performance reconfigurable networks. Min Yee Teh, Yu-Han Hung, George Michelogiannakis, Shijia Yan, Madeleine Glick, John Shalf, Keren Bergman |
| 2020 | TOSS-2020: a commodity software stack for HPC. Edgar A. León, Trent D'Hooge, Nathan Hanford, Ian Karlin, Ramesh Pankajakshan, Jim Foraker, Chris Chambreau, Matthew L. Leininger |
| 2020 | Taming I/O variation on QoS-less HPC storage: what can applications do? Zhenbo Qiao, Qing Liu, Norbert Podhorszki, Scott Klasky, Jieyang Chen |
| 2020 | Task bench: a parameterized benchmark for evaluating parallel runtime performance. Elliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Qinglei Cao, George Bosilca, Seema Mirchandaney, Wonchan Lee, Sean Treichler, Patrick S. McCormick, Alex Aiken |
| 2020 | Term quantization: furthering quantization at run time. Hsiang-Tsung Kung, Bradley McDanel, Sai Qian Zhang |
| 2020 | Toward realization of numerical towing-tank tests by wall-resolved large eddy simulation based on 32 billion grid finite-element computation. Chisachi Kato, Yoshinobu Yamade, Katsuhiro Nagano, Kiyoshi Kumahata, Kazuo Minami, Tatsuo Nishikawa |
| 2020 | Tuning floating-point precision using dynamic program information and temporal locality. Hugo Brunie, Costin Iancu, Khaled Z. Ibrahim, Philip Brisk, Brandon Cook |
| 2020 | Veritas: accurately estimating the correct output on noisy intermediate-scale quantum computers. Tirthak Patel, Devesh Tiwari |
| 2020 | Waiting game: optimally provisioning fixed resources for cloud-enabled schedulers. Pradeep Ambati, Noman Bashir, David Irwin, Prashant J. Shenoy |
| 2020 | ZeRO: memory optimizations toward training trillion parameter models. Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He |
| 2020 | ZeroSpy: exploring software inefficiency with redundant zeros. Xin You, Hailong Yang, Zhongzhi Luan, Depei Qian, Xu Liu |
| 2020 | fBLAS: streaming linear algebra on FPGA. Tiziano De Matteis, Johannes de Fine Licht, Torsten Hoefler |
| 2020 | pLiner: isolating lines of floating-point code for compiler-induced variability. Hui Guo, Ignacio Laguna, Cindy Rubio-González |