SC - RankMe – RankMe

88 papers

Year	Title / Authors
2016	10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. Chao Yang, Wei Xue, Haohuan Fu, Hongtao You, Xinliang Wang, Yulong Ao, Fangfang Liu, Lin Gan, Ping Xu, Lanning Wang, Guangwen Yang, Weimin Zheng
2016	A PCIe congestion-aware performance model for densely populated accelerator servers. Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler
2016	A data driven scheduling approach for power management on HPC systems. Sean Wallace, Xu Yang, Venkatram Vishwanath, William E. Allcock, Susan Coghlan, Michael E. Papka, Zhiling Lan
2016	A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment. Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, Robert J. Harrison, P. Sadayappan
2016	A highly effective global surface wave numerical simulation with ultra-high resolution. Fangli Qiao, Wei Zhao, Xunqiang Yin, Xiaomeng Huang, Xin Liu, Qi Shu, Guansuo Wang, Zhenya Song, Xinfang Li, Haixing Liu, Guangwen Yang, Yeli Yuan
2016	A machine learning framework for performance coverage analysis of proxy applications. Tanzima Z. Islam, Jayaraman J. Thiagarajan, Abhinav Bhatele, Martin Schulz, Todd Gamblin
2016	A multi-faceted approach to job placement for improved performance on extreme-scale systems. Christopher Zimmer, Saurabh Gupta, Scott Atchley, Sudharshan S. Vazhkudai, Carl Albing
2016	A parallel algorithm for finding all pairs Sriram P. Chockalingam, Sharma V. Thankachan, Srinivas Aluru
2016	A parallel arbitrary-order accurate AMR algorithm for the scalar advection-diffusion equation. Arash Bakhtiari, Dhairya Malhotra, Amir Raoofy, Miriam Mehl, Hans-Joachim Bungartz, George Biros
2016	Accelerating lattice QCD multigrid on GPUs using fine-grained parallelization. Michael A. Clark, Bálint Joó, Alexei Strelchenko, Michael Cheng, Arjun Singh Gambhir, Richard C. Brower
2016	An efficient and scalable algorithmic method for generating large: scale random graphs. Md. Maksudul Alam, Maleq Khan, Anil Vullikanti, Madhav V. Marathe
2016	An ephemeral burst-buffer file system for scientific applications. Teng Wang, Kathryn M. Mohror, Adam Moody, Kento Sato, Weikuan Yu
2016	An exploration of optimization algorithms for high performance tensor completion. Shaden Smith, Jongsoo Park, George Karypis
2016	Automating wavefront parallelization for sparse matrix computations. Anand Venkat, Mahdi Soltan Mohammadi, Jongsoo Park, Hongbo Rong, Rajkishore Barik, Michelle Mills Strout, Mary W. Hall
2016	Block iterative methods and recycling for improved scalability of linear solvers. Pierre Jolivet, Pierre-Henri Tournier
2016	Caliper: performance introspection for HPC software stacks. David Böhme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Giménez, Matthew P. LeGendre, Olga Pearce, Martin Schulz
2016	Characterizing parallel scientific applications on commodity clusters: an empirical study of a tapered fat-tree. Edgar A. León, Ian Karlin, Abhinav Bhatele, Steven H. Langer, Chris Chambreau, Louis H. Howell, Trent D'Hooge, Matthew L. Leininger
2016	Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery. Qingrui Liu, Changhee Jung, Dongyoon Lee, Devesh Tiwari
2016	DAOS and friends: a proposal for an exascale storage system. Jay F. Lofstead, Ivo Jimenez, Carlos Maltzahn, Quincey Koziol, John Bent, Eric Barton
2016	DCA: a DRAM-cache-aware DRAM controller. Cheng-Chieh Huang, Vijay Nagarajan, Arpit Joshi
2016	Daino: a high-level framework for parallel and efficient AMR on GPUs. Mohamed Wahib, Naoya Maruyama, Takayuki Aoki
2016	Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefits. Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Hari Subramoni, Jie Zhang, Dhabaleswar K. Panda
2016	Designing scalable Arif M. Khan, Alex Pothen, Md. Mostofa Ali Patwary, Mahantesh Halappanavar, Nadathur Rajagopalan Satish, Narayanan Sundaram, Pradeep Dubey
2016	Development effort estimation in HPC. Sandra Wienke, Julian Miller, Martin Schulz, Matthias S. Müller
2016	Distributed-memory large deformation diffeomorphic 3D image registration. Andreas Mang, Amir Gholami, George Biros
2016	Efficient delaunay tessellation through K-D tree decomposition. Dmitriy Morozov, Tom Peterka
2016	Elastic multi-resource fairness: balancing fairness and efficiency in coupled CPU-GPU architectures. Shanjiang Tang, Bingsheng He, Shuhao Zhang, Zhaojie Niu
2016	Enabling efficient preemption for SIMT architectures with lightweight context switching. Zhen Lin, Lars Nyland, Huiyang Zhou
2016	Enhanced MPSM3 for applications to quantum biological simulations. A. Pozdneev, Valéry Weber, Teodoro Laino, Constantine Bekas, Alessandro Curioni
2016	Enhancing infiniband with openflow-style SDN capability. Jason Lee, Zhou Tong, Karthik Achalkar, Xin Yuan, Michael Lang
2016	Evaluating HPC networks via simulation of parallel workloads. Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, Laxmikant V. Kalé
2016	Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs. Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, Satoshi Matsuoka
2016	Exploring the potentials of parallel garbage collection in SSDs for enterprise storage systems. Narges Shahidi, Mohammad Arjomand, Myoungsoo Jung, Mahmut T. Kandemir, Chita R. Das, Anand Sivasubramaniam
2016	Extended task queuing: active messages for heterogeneous systems. Michael LeBeane, Brandon Potter, Abhisek Pan, Alexandru Dutu, Vinay Agarwala, Wonchan Lee, Deepak Majeti, Bibek Ghimire, Eric Van Tassell, Samuel Wasmundt, Brad Benton, Maurício Breternitz, Michael L. Chu, Mithuna Thottethodi, Lizy K. John, Steven K. Reinhardt
2016	Extreme scale plasma turbulence simulations on top supercomputers worldwide. William M. Tang, Bei Wang, Stéphane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Timothy J. Williams
2016	Extreme-scale phase field simulations of coarsening dynamics on the sunway taihulight supercomputer. Jian Zhang, Chunbao Zhou, Yangang Wang, Lili Ju, Qiang Du, Xuebin Chi, Dongsheng Xu, Dexun Chen, Yong Liu, Zhao Liu
2016	Failure detection and propagation in HPC systems. George Bosilca, Aurélien Bouteiller, Amina Guermouche, Thomas Hérault, Yves Robert, Pierre Sens, Jack J. Dongarra
2016	Flexfly: enabling a reconfigurable dragonfly through silicon photonics. Ke Wen, Payman Samadi, Sébastien Rumley, Christine P. Chen, Yiwen Shen, Meisam Bahadori, Keren Bergman, Jeremiah J. Wilke
2016	FlipBack: automatic targeted protection against silent data corruption. Xiang Ni, Laxmikant V. Kalé
2016	G-store: high-performance graph store for trillion-edge processing. Pradeep Kumar, H. Howie Huang
2016	Granularity and the cost of error recovery in resilient AMR scientific applications. Anshu Dubey, Hajime Fujita, Daniel T. Graves, Andrew A. Chien, Devesh Tiwari
2016	Graph colouring as a challenge problem for dynamic graph processing on distributed systems. Scott Sallinen, Keita Iwabuchi, Suraj Poudel, Maya B. Gokhale, Matei Ripeanu, Roger A. Pearce
2016	GreenLA: green linear algebra software for GPU-accelerated heterogeneous computing. Jieyang Chen, Li Tan, Panruo Wu, Dingwen Tao, Hongbo Li, Xin Liang, Sihuan Li, Rong Ge, Laxmi N. Bhuyan, Zizhong Chen
2016	HARP: predictive transfer optimization based on historical analysis and real-time probing. Engin Arslan, Kemal Guner, Tevfik Kosar
2016	High performance emulation of quantum circuits. Thomas Häner, Damian S. Steiger, Mikhail Smelyanskiy, Matthias Troyer
2016	High-frequency nonlinear earthquake simulations on petascale heterogeneous supercomputers. Daniel Roten, Yifeng Cui, Kim B. Olsen, Steven M. Day, Kyle Withers, William H. Savran, Peng Wang, Dawei Mu
2016	Improving application resilience to memory errors with lightweight compression. Scott Levy, Kurt B. Ferreira, Patrick G. Bridges
2016	Increasing molecular dynamics simulation rates with an 8-fold increase in electrical power efficiency. W. Michael Brown, Andrey Semin, Michael Hebenstreit, Sergey Khvostov, Karthik Raman, Steven J. Plimpton
2016	LIBXSMM: accelerating small matrix multiplications by runtime code generation. Alexander Heinecke, Greg Henry, Maxwell Hutchinson, Hans Pabst
2016	MUSA: a multi-level simulation approach for next-generation HPC machines. Thomas Grass, César Allande, Adrià Armejach, Alejandro Rico, Eduard Ayguadé, Jesús Labarta, Mateo Valero, Marc Casas, Miquel Moretó
2016	Measuring and understanding throughput of network topologies. Sangeetha Abdu Jyothi, Ankit Singla, Brighten Godfrey, Alexandra Kolla
2016	Merge-based parallel sparse matrix-vector multiplication. Duane Merrill, Michael Garland
2016	MetaMorph: a library framework for interoperable kernels on multi- and many-core clusters. Ahmed E. Helal, Paul Sathre, Wu-chun Feng
2016	Modeling dilute solutions using first-principles molecular dynamics: computing more than a million atoms with over a million cores. Jean-Luc Fattebert, Daniel Osei-Kuffuor, Erik W. Draeger, Tadashi Ogitsu, William D. Krauss
2016	Multi-resource fair sharing for datacenter jobs with placement constraints. Wei Wang, Baochun Li, Ben Liang, Jun Li
2016	Optimal execution of co-analysis for large-scale molecular dynamics simulations. Preeti Malakar, Venkatram Vishwanath, Christopher Knight, Todd S. Munson, Michael E. Papka
2016	Optimizing memory efficiency for deep convolutional neural networks on GPUs. Chao Li, Yi Yang, Min Feng, Srimat T. Chakradhar, Huiyang Zhou
2016	PFEAST: a high performance sparse eigenvalue solver using distributed-memory linear solvers. James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad
2016	PIPES: a language and compiler for task-based programming on distributed-memory clusters. Martin Kong, Louis-Noël Pouchet, P. Sadayappan, Vivek Sarkar
2016	Performance analysis, design considerations, and applications of extreme-scale Utkarsh Ayachit, Andrew C. Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola J. Ferrier, Junmin Gu, Kenneth E. Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, Dmitriy Morozov, Patrick O'Leary, Reetesh Ranjan, Michel E. Rasquin, Christopher P. Stone, Venkatram Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, E. Wes Bethel
2016	Performance modeling of in situ rendering. Matthew Larsen, Cyrus Harrison, James Kress, David Pugmire, Jeremy S. Meredith, Hank Childs
2016	Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement. Tan Nguyen, Didem Unat, Weiqun Zhang, Ann S. Almgren, Muhammed Nufail Farooqi, John Shalf
2016	Pinpointing scale-dependent integer overflow bugs in large-scale parallel applications. Ignacio Laguna, Martin Schulz
2016	Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November 13-18, 2016 John West, Cherri M. Pancake
2016	Real-time synthesis of compression algorithms for scientific data. Martin Burtscher, Hari Mukka, Annie Yang, Farbod Hesaaraki
2016	Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer. Haohuan Fu, Junfeng Liao, Wei Xue, Lanning Wang, Dexun Chen, Long Gu, Jinxiu Xu, Nan Ding, Xinliang Wang, Conghui He, Shizhen Xu, Yishuang Liang, Jiarui Fang, Yuanchao Xu, Weijie Zheng, Jingheng Xu, Zhen Zheng, Wanjing Wei, Xu Ji, He Zhang, Bingwei Chen, Kaiwei Li, Xiaomeng Huang, Wenguang Chen, Guangwen Yang
2016	Reliable and efficient performance monitoring in linux. Maria Dimakopoulou, Stéphane Eranian, Nectarios Koziris, Nicholas Bambos
2016	SERF: efficient scheduling for fast deep neural network serving via judicious parallelism. Feng Yan, Yuxiong He, Olatunji Ruwase, Evgenia Smirni
2016	Scalable non-blocking preconditioned conjugate gradient methods. Paul R. Eller, William Gropp
2016	Scalemine: scalable parallel frequent subgraph mining in a single large graph. Ehab Abdelhamid, Ibrahim Abdelaziz, Panos Kalnis, Zuhair Khayyat, Fuad T. Jamour
2016	Scheduling-aware routing for supercomputers. Jens Domke, Torsten Hoefler
2016	Server-side log data analytics for I/O workload characterization and coordination on large shared storage systems. Yang Liu, Raghul Gunasekaran, Xiaosong Ma, Sudharshan S. Vazhkudai
2016	Simulation and performance analysis of the ECMWF tape library system. Markus Mäsker, Lars Nagel, Tim Süß, André Brinkmann, Lennart Sorth
2016	Simulations of below-ground dynamics of fungi: 1.184 pflops attained by automated generation and autotuning of temporal blocking codes. Takayuki Muranushi, Hideyuki Hotta, Junichiro Makino, Seiya Nishizawa, Hirofumi Tomita, Keigo Nitadori, Masaki Iwasawa, Natsuki Hosono, Yutaka Maruyama, Hikaru Inoue, Hisashi Yashiro, Yoshifumi Nakamura
2016	Strassen's algorithm reloaded. Jianyu Huang, Tyler M. Smith, Greg M. Henry, Robert A. van de Geijn
2016	The mont-blanc prototype: an alternative approach for HPC systems. Nikola Rajovic, Alejandro Rico, Filippo Mantovani, Daniel Ruiz, Josep Oriol Vilarrubi, Constantino Gómez, Luna Backes, Diego Nieto, Harald Servat, Xavier Martorell, Jesús Labarta, Eduard Ayguadé, Chris Adeniyi-Jones, Said Derradji, Hervé Gloaguen, Piero Lanucara, Nico Sanna, Jean-François Méhaut, Kevin Pouget, Brice Videau, Eric Boyer, Momme Allalen, Axel Auweter, David Brayford, Daniele Tafani, Volker Weinberg, Dirk Brömmel, René Halver, Jan H. Meinke, Ramón Beivide, Mariano Benito, Enrique Vallejo, Mateo Valero, Alex Ramírez
2016	The vectorization of the tersoff multi-body potential: an exercise in performance portability. Markus Höhnerbach, Ahmed E. Ismail, Paolo Bientinesi
2016	Towards green aviation with python at petascale. Peter E. Vincent, Freddie D. Witherden, Brian C. Vermeire, Jin Seok Park, Arvind Iyer
2016	Transient guarantees: maximizing the value of idle cloud capacity. Supreeth Shastri, Amr Rizk, David Irwin
2016	Translating OpenMP device constructs to OpenCL using unnecessary data transfer elimination. Junghyun Kim, Yong-Jun Lee, Jung-Ho Park, Jaejin Lee
2016	Truenorth ecosystem for brain-inspired computing: scalable systems, software, and applications. Jun Sawada, Filipp Akopyan, Andrew S. Cassidy, Brian Taba, Michael V. DeBole, Pallab Datta, Rodrigo Alvarez-Icaza, Arnon Amir, John V. Arthur, Alexander Andreopoulos, Rathinakumar Appuswamy, Heinz Baier, Davis Barch, David J. Berg, Carmelo di Nolfo, Steven K. Esser, Myron Flickner, Thomas A. Horvath, Bryan L. Jackson, Jeff Kusnitz, Scott Lekuch, Michael Mastro, Timothy Melano, Paul A. Merolla, Steven E. Millman, Tapan K. Nayak, Norm Pass, Hartmut E. Penner, William P. Risk, Kai Schleupen, Benjamin G. Shaw, Hayley Wu, Brian Giera, Adam T. Moody, T. Nathan Mundhenk, Brian Van Essen, Eric X. Wang, David P. Widemann, Qing Wu, William E. Murphy, Jamie K. Infantolino, James A. Ross, Dale R. Shires, Manuel M. Vindiola, Raju Namburu, Dharmendra S. Modha
2016	Týr: blob storage meets built-in transactions. Pierre Matri, Alexandru Costan, Gabriel Antoniu, Jesús Montes, María S. Pérez
2016	Understanding error propagation in GPGPU applications. Guanpeng Li, Karthik Pattabiraman, Chen-Yong Cher, Pradip Bose
2016	Understanding performance interference in next-generation HPC systems. Oscar H. Mondragon, Patrick G. Bridges, Scott Levy, Kurt B. Ferreira, Patrick M. Widener
2016	Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer. Leonardo Bautista-Gomez, Ferad Zyulkyarov, Osman S. Unsal, Simon McIntosh-Smith
2016	Watch out for the bully!: job interference study on dragonfly network. Xu Yang, John Jenkins, Misbah Mubarak, Robert B. Ross, Zhiling Lan
2016	ZNN Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung
2016	dCUDA: hardware supported overlap of computation and communication. Tobias Gysi, Jeremia Bär, Torsten Hoefler