| 2019 | A constraint-based approach to automatic data partitioning for distributed memory execution. Wonchan Lee, Manolis Papadakis, Elliott Slaughter, Alex Aiken |
| 2019 | A data-centric approach to extreme-scale Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler |
| 2019 | A large-scale study of MPI usage in open-source HPC applications. Ignacio Laguna, Ryan J. Marshall, Kathryn M. Mohror, Martin Ruefenacht, Anthony Skjellum, Nawrin Sultana |
| 2019 | A massively parallel infrastructure for adaptive multiscale simulations: modeling RAS initiation pathway for cancer. Francesco Di Natale, Harsh Bhatia, Timothy S. Carpenter, Chris Neale, Sara Kokkila Schumacher, Tomas Oppelstrup, Liam Stanton, Xiaohua Zhang, Shiv Sundram, Thomas R. W. Scogland, Gautham Dharuman, Michael P. Surh, Yue Yang, Claudia Misale, Lars Schneidenbach, Carlos H. A. Costa, Changhoan Kim, Bruce D'Amora, Sandrasegaram Gnanakaran, Dwight V. Nissley, Frederick H. Streitz, Felice C. Lightstone, Peer-Timo Bremer, James N. Glosli, Helgi I. Ingólfsson |
| 2019 | A versatile software systolic execution model for GPU memory-bound kernels. Peng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka |
| 2019 | Adaptive neural network-based approximation to accelerate eulerian fluid simulation. Wenqian Dong, Jie Liu, Zhen Xie, Dong Li |
| 2019 | Addressing data resiliency for staging based scientific workflows. Shaohua Duan, Pradeep Subedi, Philip E. Davis, Manish Parashar |
| 2019 | Almost deterministic work stealing. Shumpei Shiina, Kenjiro Taura |
| 2019 | An early evaluation of Intel's optane DC persistent memory module and its impact on high-performance scientific applications. Michèle Weiland, Holger Brunst, Tiago Quintino, Nick Johnson, Olivier Iffrig, Simon D. Smart, Christian Herold, Antonino Bonanni, Adrian Jackson, Mark Parsons |
| 2019 | An efficient mixed-mode representation of sparse tensors. Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prashant Singh Rawat, Sriram Krishnamoorthy, P. Sadayappan |
| 2019 | An evaluation of the CORAL interconnects. Christopher Zimmer, Scott Atchley, Ramesh Pankajakshan, Brian E. Smith, Ian Karlin, Matthew L. Leininger, Adam Bertsch, Brian S. Ryujin, Jason Burmark, André Walker-Loud, Michael A. Clark, Olga Pearce |
| 2019 | Analytical cache modeling and tilesize optimization for tensor contractions. Rui Li, Aravind Sukumaran-Rajam, Richard Veras, Tze Meng Low, Fabrice Rastello, Atanas Rountev, P. Sadayappan |
| 2019 | Assessing the impact of timing errors on HPC applications. Chun-Kai Chang, Wenqi Yin, Mattan Erez |
| 2019 | AutoFFT: a template-based FFT codes auto-generation framework for ARM and X86 CPUs. Zhihao Li, Haipeng Jia, Yunquan Zhang, Tun Chen, Liang Yuan, Luning Cao, Xiao Wang |
| 2019 | BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets. Ang Li, Tong Geng, Tianqi Wang, Martin C. Herbordt, Shuaiwen Leon Song, Kevin J. Barker |
| 2019 | Bandwidth steering in HPC using silicon nanophotonics. George Michelogiannakis, Yiwen Shen, Min Yee Teh, Xiang Meng, Benjamin Aivazi, Taylor L. Groves, John Shalf, Madeleine Glick, Manya Ghobadi, Larry Dennison, Keren Bergman |
| 2019 | CARE: compiler-assisted recovery from soft failures. Chao Chen, Greg Eisenhauer, Santosh Pande, Qiang Guan |
| 2019 | Channel and filter parallelism for large-scale CNN training. Nikoli Dryden, Naoya Maruyama, Tim Moon, Tom Benson, Marc Snir, Brian Van Essen |
| 2019 | Code generation for massively parallel phase-field simulations. Martin Bauer, Johannes Hötzer, Dominik Ernst, Julian Hammer, Marco Seiz, Henrik Hierl, Jan Hönig, Harald Köstler, Gerhard Wellein, Britta Nestler, Ulrich Rüde |
| 2019 | ComDetective: a lightweight communication detection tool for threads. Muhammad Aditya Sasongko, Milind Chabbi, Palwisha Akhtar, Didem Unat |
| 2019 | Compiler assisted hybrid implicit and explicit GPU memory management under unified address space. Lingda Li, Barbara M. Chapman |
| 2019 | Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures. Athena Elafrou, Georgios I. Goumas, Nectarios Koziris |
| 2019 | Consensus equilibrium framework for super-resolution and extreme-scale CT reconstruction. Xiao Wang, Venkatesh Sridhar, Zahra Ronaghi, Rollin C. Thomas, Jack Deslippe, Dilworth Parkinson, Gregery T. Buzzard, Samuel P. Midkiff, Charles A. Bouman, Simon K. Warfield |
| 2019 | D2P: from recursive formulations to distributed-memory codes. Nikhil Hegde, Qifan Chang, Milind Kulkarni |
| 2019 | Diogenes: looking for an honest CPU/GPU performance measurement tool. Benjamin Welton, Barton P. Miller |
| 2019 | Distributed enhanced suffix arrays: efficient algorithms for construction and querying. Patrick Flick, Srinivas Aluru |
| 2019 | End-to-end I/O portfolio for the summit supercomputing ecosystem. Sarp Oral, Sudharshan S. Vazhkudai, Feiyi Wang, Christopher Zimmer, Christopher Brumgard, Jesse Hanley, George Markomanolis, Ross G. Miller, Dustin Leverman, Scott Atchley, Verónica G. Vergara Larrea |
| 2019 | Etalumis: bringing probabilistic programming to scientific simulators at scale. Atilim Günes Baydin, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, Saeid Naderiparizi, Bradley Gram-Hansen, Gilles Louppe, Mingfei Ma, Xiaohui Zhao, Philip H. S. Torr, Victor W. Lee, Kyle Cranmer, Prabhat, Frank Wood |
| 2019 | Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs. Tuowen Zhao, Protonu Basu, Samuel Williams, Mary W. Hall, Hans Johansen |
| 2019 | FT-iSort: efficient fault tolerance for introsort. Sihuan Li, Hongbo Li, Xin Liang, Jieyang Chen, Elisabeth Giem, Kaiming Ouyang, Kai Zhao, Sheng Di, Franck Cappello, Zizhong Chen |
| 2019 | Fast, scalable and accurate finite-element based Sambit Das, Phani Motamarri, Vikram Gavini, Bruno Turcksin, Ying Wai Li, Brent Leback |
| 2019 | From facility to application sensor data: modular, continuous and holistic monitoring with DCDB. Alessio Netti, Micha Müller, Axel Auweter, Carla Guillén, Michael Ott, Daniele Tafani, Martin Schulz |
| 2019 | From piz daint to the stars: simulation of stellar mergers using high-level abstractions. Gregor Daiß, Parsa Amini, John Biddiscombe, Patrick Diehl, Juhan Frank, Kevin A. Huck, Hartmut Kaiser, Dominic Marcello, David Pfander, Dirk Pflüger |
| 2019 | Full-state quantum circuit simulation by using data compression. Xin-Chuan Wu, Sheng Di, Emma Maitreyee Dasgupta, Franck Cappello, Hal Finkel, Yuri Alexeev, Frederic T. Chong |
| 2019 | Fully integrated FPGA molecular dynamics simulations. Chen Yang, Tong Geng, Tianqi Wang, Rushi Patel, Qingqing Xiong, Ahmed Sanaullah, Chunshu Wu, Jiayi Sheng, Charles Lin, Vipin Sachdeva, Woody Sherman, Martin C. Herbordt |
| 2019 | GPCNeT: designing a benchmark suite for inducing and measuring contention in HPC networks. Sudheer Chunduri, Taylor L. Groves, Peter Mendygral, Brian Austin, Jacob Balma, Krishna Kandalla, Kalyan Kumaran, Glenn K. Lockwood, Scott Parker, Steven Warren, Nathan Wichmann, Nicholas J. Wright |
| 2019 | GPU acceleration of extreme scale pseudo-spectral simulations of turbulence using asynchronism. Kiran Ravikumar, David Appelhans, P. K. Yeung |
| 2019 | GraphM: an efficient storage system for high throughput of concurrent graph processing. Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, Haikun Liu, Yicheng Chen |
| 2019 | Hatchet: pruning the overgrowth in parallel profiles. Abhinav Bhatele, Stephanie Brink, Todd Gamblin |
| 2019 | High performance Monte Carlo simulation of ising model on TPU clusters. Kun Yang, Yi-Fan Chen, Georgios Roumpos, Chris Colby, John R. Anderson |
| 2019 | HyperX topology: first at-scale implementation and comparison to the fat-tree. Jens Domke, Satoshi Matsuoka, Ivan R. Ivanov, Yuki Tsushima, Tomoya Yuki, Akihiro Nomura, Shin'ichi Miura, Nic McDonald, Dennis Lee Floyd, Nicolas Dubé |
| 2019 | INCA: in-network compute assistance. Whit Schonbein, Ryan E. Grant, Matthew G. F. Dosanjh, Dorian C. Arnold |
| 2019 | LPCC: hierarchical persistent client caching for lustre. Yingjin Qian, Xi Li, Shuichi Ihara, Andreas Dilger, Carlos Thomaz, Shilong Wang, Wen Cheng, Chunyan Li, Lingfang Zeng, Fang Wang, Dan Feng, Tim Süß, André Brinkmann |
| 2019 | Large-batch training for LSTM and beyond. Yang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh |
| 2019 | Legate NumPy: accelerated and distributed array computing. Michael Bauer, Michael Garland |
| 2019 | Local-global merge tree computation with local exchanges. Arnur Nigmetov, Dmitriy Morozov |
| 2019 | MIQS: metadata indexing and querying service for self-describing file formats. Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen |
| 2019 | MemXCT: memory-centric X-ray CT reconstruction with massive parallelization. Mert Hidayetoglu, Tekin Biçer, Simon Garcia De Gonzalo, Bin Ren, Doga Gürsoy, Rajkumar Kettimuthu, Ian T. Foster, Wen-mei W. Hwu |
| 2019 | Mitigating network noise on Dragonfly networks through application-aware routing. Daniele De Sensi, Salvatore Di Girolamo, Torsten Hoefler |
| 2019 | Moment representation in the lattice Boltzmann method on massively parallel hardware. Madhurima Vardhan, John Gounley, Luiz Hegele, Erik W. Draeger, Amanda Randles |
| 2019 | Near-memory data transformation for efficient sparse matrix multi-vector multiplication. Daichi Fujiki, Niladrish Chatterjee, Donghyuk Lee, Mike O'Connor |
| 2019 | Network-accelerated non-contiguous memory transfers. Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej Besta, Luca Benini, Duncan Roweth, Torsten Hoefler |
| 2019 | OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight. Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, Zhiqiang Wei |
| 2019 | Optimizing the data movement in quantum transport simulations via data-centric parallel programming. Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler |
| 2019 | Parallel transport time-dependent density functional theory calculations with hybrid functional on summit. Weile Jia, Lin-Wang Wang, Lin Lin |
| 2019 | Performance optimality or reproducibility: that is the question. Tapasya Patki, Jayaraman J. Thiagarajan, Alexis Ayala, Tanzima Z. Islam |
| 2019 | Pinpointing performance inefficiencies via lightweight variance profiling. Pengfei Su, Shuyin Jiao, Milind Chabbi, Xu Liu |
| 2019 | Practical and efficient incremental adaptive routing for HyperX networks. Nic McDonald, Mikhail Isaev, Adriana Flores, Al Davis, John Kim |
| 2019 | Predicting faults in high performance computing systems: an in-depth survey of the state-of-the-practice. David Jauk, Dai Yang, Martin Schulz |
| 2019 | Preparation and optimization of a diverse workload for a large-scale heterogeneous system. Ian Karlin, Yoonho Park, Bronis R. de Supinski, Peng Wang, Bert Still, David Beckingsale, Robert Blake, Tong Chen, Guojing Cong, Carlos H. A. Costa, Johann Dahm, Giacomo Domeniconi, Thomas Epperly, Aaron Fisher, Sara Kokkila Schumacher, Steven H. Langer, Hai Le, Eun Kyung Lee, Naoya Maruyama, Xinyu Que, David F. Richards, Björn Sjögreen, Jonathan Wong, Carol S. Woodward, Ulrike Meier Yang, Xiaohua Zhang, Bob Anderson, David Appelhans, Levi Barnes, Peter D. Barnes Jr., Sorin Bastea, David Böhme, Jamie A. Bramwell, James M. Brase, José R. Brunheroto, Barry Chen, Charway R. Cooper, Tony Degroot, Robert D. Falgout, Todd Gamblin, David J. Gardner, James N. Glosli, John A. Gunnels, Max P. Katz, Tzanio V. Kolev, I-Feng W. Kuo, Matthew P. LeGendre, Ruipeng Li, Pei-Hung Lin, Shelby Lockhart, Kathleen McCandless, Claudia Misale, Jaime H. Moreno, Rob Neely, Jarom Nelson, Rao Nimmakayala, Kathryn M. O'Brien, Kevin O'Brien, Ramesh Pankajakshan, Roger Pearce, Slaven Peles, Phil Regier, Steven C. Rennich, Martin Schulz, Howard Scott, James C. Sexton, Kathleen Shoga, Shiv Sundram, Guillaume Thomas-Collignon, Brian Van Essen, Alexey Voronin, Bob Walkup, Lu Wang, Chris Ward, Hui-Fang Wen, Daniel A. White, Christopher Young, Cyril Zeller, Edward Zywicz |
| 2019 | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, Denver, Colorado, USA, November 17-19, 2019. Michela Taufer, Pavan Balaji, Antonio J. Peña |
| 2019 | PruneTrain: fast neural network training by dynamic sparse model reconfiguration. Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez |
| 2019 | Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication. Grzegorz Kwasniewski, Marko Kabic, Maciej Besta, Joost VandeVondele, Raffaele Solcà, Torsten Hoefler |
| 2019 | Regularizing irregularly sparse point-to-point communications. Oguz Selvitopi, Cevdet Aykanat |
| 2019 | Replication is more efficient than you think. Anne Benoit, Thomas Hérault, Valentin Le Fèvre, Yves Robert |
| 2019 | Revisiting I/O behavior in large-scale storage systems: the expected and the unexpected. Tirthak Patel, Suren Byna, Glenn K. Lockwood, Devesh Tiwari |
| 2019 | SLATE: design of a modern distributed and accelerated linear algebra library. Mark Gates, Jakub Kurzak, Ali Charara, Asim YarKhan, Jack J. Dongarra |
| 2019 | SSD failures in the field: symptoms, causes, and prediction models. Jacob Alter, Ji Xue, Alma Dimnaku, Evgenia Smirni |
| 2019 | SW_GROMACS: accelerate GROMACS on Sunway TaihuLight. Tingjian Zhang, Yuxuan Li, Ping Gao, Qi Shao, Mingshan Shao, Meng Zhang, Jinxiao Zhang, Xiaohui Duan, Zhao Liu, Lin Gan, Haohuan Fu, Wei Xue, Weiguo Liu, Guangwen Yang |
| 2019 | Scalable generation of graphs for benchmarking HPC community-detection algorithms. George M. Slota, Jonathan W. Berry, Simon D. Hammond, Stephen L. Olivier, Cynthia A. Phillips, Sivasankaran Rajamanickam |
| 2019 | Scalable reinforcement-learning-based neural architecture search for cancer deep learning research. Prasanna Balaprakash, Romain Egele, Misha Salim, Stefan M. Wild, Venkatram Vishwanath, Fangfang Xia, Tom Brettin, Rick Stevens |
| 2019 | Scalable simulation of realistic volume fraction red blood cell flows through vascular networks. Libin Lu, Matthew J. Morse, Abtin Rahimian, Georg Stadler, Denis Zorin |
| 2019 | Semantic query transformations for increased parallelization in distributed knowledge graph query processing. HyeongSik Kim, Abhisha Bhattacharyya, Kemafor Anyanwu |
| 2019 | Significantly improving lossy compression quality based on an optimized hybrid prediction model. Xin Liang, Sheng Di, Sihuan Li, Dingwen Tao, Bogdan Nicolae, Zizhong Chen, Franck Cappello |
| 2019 | Slack squeeze coded computing for adaptive straggler mitigation. Krishna Giri Narra, Zhifeng Lin, Mehrdad Kiamari, Salman Avestimehr, Murali Annavaram |
| 2019 | Slim graph: practical lossy graph compression for approximate graph processing, storage, and analytics. Maciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, Torsten Hoefler |
| 2019 | Solving PDEs in space-time: 4D tree-based adaptivity, mesh-free and matrix-free approaches. Masado Ishii, Milinda Fernando, Kumar Saurabh, Biswajit Khara, Baskar Ganapathysubramanian, Hari Sundar |
| 2019 | SparCML: high-performance sparse communication for machine learning. Cédric Renggli, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan Alistarh, Torsten Hoefler |
| 2019 | Spread-n-share: improving application performance and cluster throughput with resource-aware job placement. Xiongchao Tang, Haojie Wang, Xiaosong Ma, Nosayba El-Sayed, Jidong Zhai, Wenguang Chen, Ashraf Aboulnaga |
| 2019 | Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures. Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler |
| 2019 | Streaming message interface: high-performance distributed memory programming on reconfigurable hardware. Tiziano De Matteis, Johannes de Fine Licht, Jakub Beránek, Torsten Hoefler |
| 2019 | Swift machine learning model serving scheduling: a region based reinforcement learning approach. Heyang Qin, Syed Zawad, Yanqi Zhou, Lei Yang, Dongfang Zhao, Feng Yan |
| 2019 | Topology-custom UGAL routing on dragonfly. Md. Shafayat Rahman, Saptarshi Bhowmik, Yevgeniy Ryasnianskiy, Xin Yuan, Michael Lang |
| 2019 | TriEC: tripartite graph based erasure coding NIC offload. Haiyang Shi, Xiaoyi Lu |
| 2019 | Uncore power scavenger: a runtime for uncore power conservation on HPC systems. Neha Gholkar, Frank Mueller, Barry Rountree |
| 2019 | Understanding congestion in high performance interconnection networks using sampling. Philip Taffet, John M. Mellor-Crummey |
| 2019 | Understanding priority-based scheduling of graph algorithms on a shared-memory platform. Serif Yesil, Azin Heidarshenas, Adam Morrison, Josep Torrellas |
| 2019 | iFDK: a scalable framework for instant high-resolution image reconstruction. Peng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka |