| 2016 | 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. Chao Yang, Wei Xue, Haohuan Fu, Hongtao You, Xinliang Wang, Yulong Ao, Fangfang Liu, Lin Gan, Ping Xu, Lanning Wang, Guangwen Yang, Weimin Zheng |
| 2016 | A PCIe congestion-aware performance model for densely populated accelerator servers. Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler |
| 2016 | A data driven scheduling approach for power management on HPC systems. Sean Wallace, Xu Yang, Venkatram Vishwanath, William E. Allcock, Susan Coghlan, Michael E. Papka, Zhiling Lan |
| 2016 | A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment. Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, Robert J. Harrison, P. Sadayappan |
| 2016 | A highly effective global surface wave numerical simulation with ultra-high resolution. Fangli Qiao, Wei Zhao, Xunqiang Yin, Xiaomeng Huang, Xin Liu, Qi Shu, Guansuo Wang, Zhenya Song, Xinfang Li, Haixing Liu, Guangwen Yang, Yeli Yuan |
| 2016 | A machine learning framework for performance coverage analysis of proxy applications. Tanzima Z. Islam, Jayaraman J. Thiagarajan, Abhinav Bhatele, Martin Schulz, Todd Gamblin |
| 2016 | A multi-faceted approach to job placement for improved performance on extreme-scale systems. Christopher Zimmer, Saurabh Gupta, Scott Atchley, Sudharshan S. Vazhkudai, Carl Albing |
| 2016 | A parallel algorithm for finding all pairs Sriram P. Chockalingam, Sharma V. Thankachan, Srinivas Aluru |
| 2016 | A parallel arbitrary-order accurate AMR algorithm for the scalar advection-diffusion equation. Arash Bakhtiari, Dhairya Malhotra, Amir Raoofy, Miriam Mehl, Hans-Joachim Bungartz, George Biros |
| 2016 | Accelerating lattice QCD multigrid on GPUs using fine-grained parallelization. Michael A. Clark, Bálint Joó, Alexei Strelchenko, Michael Cheng, Arjun Singh Gambhir, Richard C. Brower |
| 2016 | An efficient and scalable algorithmic method for generating large: scale random graphs. Md. Maksudul Alam, Maleq Khan, Anil Vullikanti, Madhav V. Marathe |
| 2016 | An ephemeral burst-buffer file system for scientific applications. Teng Wang, Kathryn M. Mohror, Adam Moody, Kento Sato, Weikuan Yu |
| 2016 | An exploration of optimization algorithms for high performance tensor completion. Shaden Smith, Jongsoo Park, George Karypis |
| 2016 | Automating wavefront parallelization for sparse matrix computations. Anand Venkat, Mahdi Soltan Mohammadi, Jongsoo Park, Hongbo Rong, Rajkishore Barik, Michelle Mills Strout, Mary W. Hall |
| 2016 | Block iterative methods and recycling for improved scalability of linear solvers. Pierre Jolivet, Pierre-Henri Tournier |
| 2016 | Caliper: performance introspection for HPC software stacks. David Böhme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Giménez, Matthew P. LeGendre, Olga Pearce, Martin Schulz |
| 2016 | Characterizing parallel scientific applications on commodity clusters: an empirical study of a tapered fat-tree. Edgar A. León, Ian Karlin, Abhinav Bhatele, Steven H. Langer, Chris Chambreau, Louis H. Howell, Trent D'Hooge, Matthew L. Leininger |
| 2016 | Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery. Qingrui Liu, Changhee Jung, Dongyoon Lee, Devesh Tiwari |
| 2016 | DAOS and friends: a proposal for an exascale storage system. Jay F. Lofstead, Ivo Jimenez, Carlos Maltzahn, Quincey Koziol, John Bent, Eric Barton |
| 2016 | DCA: a DRAM-cache-aware DRAM controller. Cheng-Chieh Huang, Vijay Nagarajan, Arpit Joshi |
| 2016 | Daino: a high-level framework for parallel and efficient AMR on GPUs. Mohamed Wahib, Naoya Maruyama, Takayuki Aoki |
| 2016 | Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefits. Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Hari Subramoni, Jie Zhang, Dhabaleswar K. Panda |
| 2016 | Designing scalable Arif M. Khan, Alex Pothen, Md. Mostofa Ali Patwary, Mahantesh Halappanavar, Nadathur Rajagopalan Satish, Narayanan Sundaram, Pradeep Dubey |
| 2016 | Development effort estimation in HPC. Sandra Wienke, Julian Miller, Martin Schulz, Matthias S. Müller |
| 2016 | Distributed-memory large deformation diffeomorphic 3D image registration. Andreas Mang, Amir Gholami, George Biros |
| 2016 | Efficient delaunay tessellation through K-D tree decomposition. Dmitriy Morozov, Tom Peterka |
| 2016 | Elastic multi-resource fairness: balancing fairness and efficiency in coupled CPU-GPU architectures. Shanjiang Tang, Bingsheng He, Shuhao Zhang, Zhaojie Niu |
| 2016 | Enabling efficient preemption for SIMT architectures with lightweight context switching. Zhen Lin, Lars Nyland, Huiyang Zhou |
| 2016 | Enhanced MPSM3 for applications to quantum biological simulations. A. Pozdneev, Valéry Weber, Teodoro Laino, Constantine Bekas, Alessandro Curioni |
| 2016 | Enhancing infiniband with openflow-style SDN capability. Jason Lee, Zhou Tong, Karthik Achalkar, Xin Yuan, Michael Lang |
| 2016 | Evaluating HPC networks via simulation of parallel workloads. Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, Laxmikant V. Kalé |
| 2016 | Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs. Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, Satoshi Matsuoka |
| 2016 | Exploring the potentials of parallel garbage collection in SSDs for enterprise storage systems. Narges Shahidi, Mohammad Arjomand, Myoungsoo Jung, Mahmut T. Kandemir, Chita R. Das, Anand Sivasubramaniam |
| 2016 | Extended task queuing: active messages for heterogeneous systems. Michael LeBeane, Brandon Potter, Abhisek Pan, Alexandru Dutu, Vinay Agarwala, Wonchan Lee, Deepak Majeti, Bibek Ghimire, Eric Van Tassell, Samuel Wasmundt, Brad Benton, Maurício Breternitz, Michael L. Chu, Mithuna Thottethodi, Lizy K. John, Steven K. Reinhardt |
| 2016 | Extreme scale plasma turbulence simulations on top supercomputers worldwide. William M. Tang, Bei Wang, Stéphane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Timothy J. Williams |
| 2016 | Extreme-scale phase field simulations of coarsening dynamics on the sunway taihulight supercomputer. Jian Zhang, Chunbao Zhou, Yangang Wang, Lili Ju, Qiang Du, Xuebin Chi, Dongsheng Xu, Dexun Chen, Yong Liu, Zhao Liu |
| 2016 | Failure detection and propagation in HPC systems. George Bosilca, Aurélien Bouteiller, Amina Guermouche, Thomas Hérault, Yves Robert, Pierre Sens, Jack J. Dongarra |
| 2016 | Flexfly: enabling a reconfigurable dragonfly through silicon photonics. Ke Wen, Payman Samadi, Sébastien Rumley, Christine P. Chen, Yiwen Shen, Meisam Bahadori, Keren Bergman, Jeremiah J. Wilke |
| 2016 | FlipBack: automatic targeted protection against silent data corruption. Xiang Ni, Laxmikant V. Kalé |
| 2016 | G-store: high-performance graph store for trillion-edge processing. Pradeep Kumar, H. Howie Huang |
| 2016 | Granularity and the cost of error recovery in resilient AMR scientific applications. Anshu Dubey, Hajime Fujita, Daniel T. Graves, Andrew A. Chien, Devesh Tiwari |
| 2016 | Graph colouring as a challenge problem for dynamic graph processing on distributed systems. Scott Sallinen, Keita Iwabuchi, Suraj Poudel, Maya B. Gokhale, Matei Ripeanu, Roger A. Pearce |
| 2016 | GreenLA: green linear algebra software for GPU-accelerated heterogeneous computing. Jieyang Chen, Li Tan, Panruo Wu, Dingwen Tao, Hongbo Li, Xin Liang, Sihuan Li, Rong Ge, Laxmi N. Bhuyan, Zizhong Chen |
| 2016 | HARP: predictive transfer optimization based on historical analysis and real-time probing. Engin Arslan, Kemal Guner, Tevfik Kosar |
| 2016 | High performance emulation of quantum circuits. Thomas Häner, Damian S. Steiger, Mikhail Smelyanskiy, Matthias Troyer |
| 2016 | High-frequency nonlinear earthquake simulations on petascale heterogeneous supercomputers. Daniel Roten, Yifeng Cui, Kim B. Olsen, Steven M. Day, Kyle Withers, William H. Savran, Peng Wang, Dawei Mu |
| 2016 | Improving application resilience to memory errors with lightweight compression. Scott Levy, Kurt B. Ferreira, Patrick G. Bridges |
| 2016 | Increasing molecular dynamics simulation rates with an 8-fold increase in electrical power efficiency. W. Michael Brown, Andrey Semin, Michael Hebenstreit, Sergey Khvostov, Karthik Raman, Steven J. Plimpton |
| 2016 | LIBXSMM: accelerating small matrix multiplications by runtime code generation. Alexander Heinecke, Greg Henry, Maxwell Hutchinson, Hans Pabst |
| 2016 | MUSA: a multi-level simulation approach for next-generation HPC machines. Thomas Grass, César Allande, Adrià Armejach, Alejandro Rico, Eduard Ayguadé, Jesús Labarta, Mateo Valero, Marc Casas, Miquel Moretó |
| 2016 | Measuring and understanding throughput of network topologies. Sangeetha Abdu Jyothi, Ankit Singla, Brighten Godfrey, Alexandra Kolla |
| 2016 | Merge-based parallel sparse matrix-vector multiplication. Duane Merrill, Michael Garland |
| 2016 | MetaMorph: a library framework for interoperable kernels on multi- and many-core clusters. Ahmed E. Helal, Paul Sathre, Wu-chun Feng |
| 2016 | Modeling dilute solutions using first-principles molecular dynamics: computing more than a million atoms with over a million cores. Jean-Luc Fattebert, Daniel Osei-Kuffuor, Erik W. Draeger, Tadashi Ogitsu, William D. Krauss |
| 2016 | Multi-resource fair sharing for datacenter jobs with placement constraints. Wei Wang, Baochun Li, Ben Liang, Jun Li |
| 2016 | Optimal execution of co-analysis for large-scale molecular dynamics simulations. Preeti Malakar, Venkatram Vishwanath, Christopher Knight, Todd S. Munson, Michael E. Papka |
| 2016 | Optimizing memory efficiency for deep convolutional neural networks on GPUs. Chao Li, Yi Yang, Min Feng, Srimat T. Chakradhar, Huiyang Zhou |
| 2016 | PFEAST: a high performance sparse eigenvalue solver using distributed-memory linear solvers. James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad |
| 2016 | PIPES: a language and compiler for task-based programming on distributed-memory clusters. Martin Kong, Louis-Noël Pouchet, P. Sadayappan, Vivek Sarkar |
| 2016 | Performance analysis, design considerations, and applications of extreme-scale Utkarsh Ayachit, Andrew C. Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola J. Ferrier, Junmin Gu, Kenneth E. Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, Dmitriy Morozov, Patrick O'Leary, Reetesh Ranjan, Michel E. Rasquin, Christopher P. Stone, Venkatram Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, E. Wes Bethel |
| 2016 | Performance modeling of in situ rendering. Matthew Larsen, Cyrus Harrison, James Kress, David Pugmire, Jeremy S. Meredith, Hank Childs |
| 2016 | Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement. Tan Nguyen, Didem Unat, Weiqun Zhang, Ann S. Almgren, Muhammed Nufail Farooqi, John Shalf |
| 2016 | Pinpointing scale-dependent integer overflow bugs in large-scale parallel applications. Ignacio Laguna, Martin Schulz |
| 2016 | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November 13-18, 2016 John West, Cherri M. Pancake |
| 2016 | Real-time synthesis of compression algorithms for scientific data. Martin Burtscher, Hari Mukka, Annie Yang, Farbod Hesaaraki |
| 2016 | Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer. Haohuan Fu, Junfeng Liao, Wei Xue, Lanning Wang, Dexun Chen, Long Gu, Jinxiu Xu, Nan Ding, Xinliang Wang, Conghui He, Shizhen Xu, Yishuang Liang, Jiarui Fang, Yuanchao Xu, Weijie Zheng, Jingheng Xu, Zhen Zheng, Wanjing Wei, Xu Ji, He Zhang, Bingwei Chen, Kaiwei Li, Xiaomeng Huang, Wenguang Chen, Guangwen Yang |
| 2016 | Reliable and efficient performance monitoring in linux. Maria Dimakopoulou, Stéphane Eranian, Nectarios Koziris, Nicholas Bambos |
| 2016 | SERF: efficient scheduling for fast deep neural network serving via judicious parallelism. Feng Yan, Yuxiong He, Olatunji Ruwase, Evgenia Smirni |
| 2016 | Scalable non-blocking preconditioned conjugate gradient methods. Paul R. Eller, William Gropp |
| 2016 | Scalemine: scalable parallel frequent subgraph mining in a single large graph. Ehab Abdelhamid, Ibrahim Abdelaziz, Panos Kalnis, Zuhair Khayyat, Fuad T. Jamour |
| 2016 | Scheduling-aware routing for supercomputers. Jens Domke, Torsten Hoefler |
| 2016 | Server-side log data analytics for I/O workload characterization and coordination on large shared storage systems. Yang Liu, Raghul Gunasekaran, Xiaosong Ma, Sudharshan S. Vazhkudai |
| 2016 | Simulation and performance analysis of the ECMWF tape library system. Markus Mäsker, Lars Nagel, Tim Süß, André Brinkmann, Lennart Sorth |
| 2016 | Simulations of below-ground dynamics of fungi: 1.184 pflops attained by automated generation and autotuning of temporal blocking codes. Takayuki Muranushi, Hideyuki Hotta, Junichiro Makino, Seiya Nishizawa, Hirofumi Tomita, Keigo Nitadori, Masaki Iwasawa, Natsuki Hosono, Yutaka Maruyama, Hikaru Inoue, Hisashi Yashiro, Yoshifumi Nakamura |
| 2016 | Strassen's algorithm reloaded. Jianyu Huang, Tyler M. Smith, Greg M. Henry, Robert A. van de Geijn |
| 2016 | The mont-blanc prototype: an alternative approach for HPC systems. Nikola Rajovic, Alejandro Rico, Filippo Mantovani, Daniel Ruiz, Josep Oriol Vilarrubi, Constantino Gómez, Luna Backes, Diego Nieto, Harald Servat, Xavier Martorell, Jesús Labarta, Eduard Ayguadé, Chris Adeniyi-Jones, Said Derradji, Hervé Gloaguen, Piero Lanucara, Nico Sanna, Jean-François Méhaut, Kevin Pouget, Brice Videau, Eric Boyer, Momme Allalen, Axel Auweter, David Brayford, Daniele Tafani, Volker Weinberg, Dirk Brömmel, René Halver, Jan H. Meinke, Ramón Beivide, Mariano Benito, Enrique Vallejo, Mateo Valero, Alex Ramírez |
| 2016 | The vectorization of the tersoff multi-body potential: an exercise in performance portability. Markus Höhnerbach, Ahmed E. Ismail, Paolo Bientinesi |
| 2016 | Towards green aviation with python at petascale. Peter E. Vincent, Freddie D. Witherden, Brian C. Vermeire, Jin Seok Park, Arvind Iyer |
| 2016 | Transient guarantees: maximizing the value of idle cloud capacity. Supreeth Shastri, Amr Rizk, David Irwin |
| 2016 | Translating OpenMP device constructs to OpenCL using unnecessary data transfer elimination. Junghyun Kim, Yong-Jun Lee, Jung-Ho Park, Jaejin Lee |
| 2016 | Truenorth ecosystem for brain-inspired computing: scalable systems, software, and applications. Jun Sawada, Filipp Akopyan, Andrew S. Cassidy, Brian Taba, Michael V. DeBole, Pallab Datta, Rodrigo Alvarez-Icaza, Arnon Amir, John V. Arthur, Alexander Andreopoulos, Rathinakumar Appuswamy, Heinz Baier, Davis Barch, David J. Berg, Carmelo di Nolfo, Steven K. Esser, Myron Flickner, Thomas A. Horvath, Bryan L. Jackson, Jeff Kusnitz, Scott Lekuch, Michael Mastro, Timothy Melano, Paul A. Merolla, Steven E. Millman, Tapan K. Nayak, Norm Pass, Hartmut E. Penner, William P. Risk, Kai Schleupen, Benjamin G. Shaw, Hayley Wu, Brian Giera, Adam T. Moody, T. Nathan Mundhenk, Brian Van Essen, Eric X. Wang, David P. Widemann, Qing Wu, William E. Murphy, Jamie K. Infantolino, James A. Ross, Dale R. Shires, Manuel M. Vindiola, Raju Namburu, Dharmendra S. Modha |
| 2016 | Týr: blob storage meets built-in transactions. Pierre Matri, Alexandru Costan, Gabriel Antoniu, Jesús Montes, María S. Pérez |
| 2016 | Understanding error propagation in GPGPU applications. Guanpeng Li, Karthik Pattabiraman, Chen-Yong Cher, Pradip Bose |
| 2016 | Understanding performance interference in next-generation HPC systems. Oscar H. Mondragon, Patrick G. Bridges, Scott Levy, Kurt B. Ferreira, Patrick M. Widener |
| 2016 | Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer. Leonardo Bautista-Gomez, Ferad Zyulkyarov, Osman S. Unsal, Simon McIntosh-Smith |
| 2016 | Watch out for the bully!: job interference study on dragonfly network. Xu Yang, John Jenkins, Misbah Mubarak, Robert B. Ross, Zhiling Lan |
| 2016 | ZNN Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung |
| 2016 | dCUDA: hardware supported overlap of computation and communication. Tobias Gysi, Jeremia Bär, Torsten Hoefler |