| 2009 | 42 TFlops hierarchical Tsuyoshi Hamada, Tetsu Narumi, Rio Yokota, Kenji Yasuoka, Keigo Nitadori, Makoto Taiji |
| 2009 | A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton. Cliff Young, Joseph A. Bank, Ron O. Dror, J. P. Grossman, John K. Salmon, David E. Shaw |
| 2009 | A case for integrated processor-cache partitioning in chip multiprocessors. Shekhar Srikantaiah, Reetuparna Das, Asit K. Mishra, Chita R. Das, Mahmut T. Kandemir |
| 2009 | A configurable algorithm for parallel image-compositing applications. Tom Peterka, David Goodell, Robert B. Ross, Han-Wei Shen, Rajeev Thakur |
| 2009 | A design methodology for domain-optimized power-efficient supercomputing. Marghoob Mohiyuddin, Mark Murphy, Leonid Oliker, John Shalf, John Wawrzynek, Samuel Williams |
| 2009 | A massively parallel adaptive fast-multipole method on heterogeneous architectures. Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Anh Nguyen, Rahul S. Sampath, Aashay Shringarpure, Richard W. Vuduc, Lexing Ying, Denis Zorin, George Biros |
| 2009 | A microdriver architecture for error correcting codes inside the Linux kernel. André Brinkmann, Dominic Eschweiler |
| 2009 | A scalable method for Markus Eisenbach, C.-G. Zhou, Donald M. C. Nicholson, G. Brown, Jeffrey M. Larkin, Thomas C. Schulthess |
| 2009 | Adaptive and scalable metadata management to support a trillion files. Jing Xing, Jin Xiong, Ninghui Sun, Jie Ma |
| 2009 | Age based scheduling for asymmetric multiprocessors. Nagesh B. Lakshminarayana, Jaekyu Lee, Hyesoon Kim |
| 2009 | Allocator implementations for network-on-chip routers. Daniel U. Becker, William J. Dally |
| 2009 | Auto-tuning 3-D FFT library for CUDA GPUs. Akira Nukada, Satoshi Matsuoka |
| 2009 | Automating the generation of composed linear algebra kernels. Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin, Jeremy G. Siek |
| 2009 | Autotuning multigrid with PetaBricks. Cy P. Chan, Jason Ansel, Yee Lok Wong, Saman P. Amarasinghe, Alan Edelman |
| 2009 | Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems. David F. Richards, James N. Glosli, Bor Chan, Milo R. Dorr, Erik W. Draeger, Jean-Luc Fattebert, William D. Krauss, Thomas E. Spelce, Frederick H. Streitz, Michael P. Surh, John A. Gunnels |
| 2009 | Compact multi-dimensional kernel extraction for register tiling. Lakshminarayanan Renganarayanan, Uday Bondhugula, Salem Derisavi, Alexandre E. Eichenberger, Kevin O'Brien |
| 2009 | Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. Emmanuel Agullo, Bilel Hadri, Hatem Ltaief, Jack J. Dongarra |
| 2009 | Cray award: My adventures in parallel computing. Kenichi Miura |
| 2009 | Diagnosing performance bottlenecks in emerging petascale applications. Nathan R. Tallent, John M. Mellor-Crummey, Laksono Adhianto, Michael W. Fagan, Mark Krentel |
| 2009 | Dynamic storage cache allocation in multi-server architectures. Ramya Prabhakar, Shekhar Srikantaiah, Christina M. Patrick, Mahmut T. Kandemir |
| 2009 | Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems. Fengguang Song, Asim YarKhan, Jack J. Dongarra |
| 2009 | Early performance evaluation of a "Nehalem" cluster using scientific and engineering applications. Subhash Saini, Andrey Naraikin, Rupak Biswas, David Barkai, Timothy Sandstrom |
| 2009 | Efficient band approximation of Gram matrices for large scale kernel methods on GPUs. Mohamed E. Hussein, Wael Abd-Almageed |
| 2009 | Enabling high-fidelity neutron transport simulations on petascale architectures. Dinesh K. Kaushik, Micheal Smith, Allan B. Wollaber, Barry F. Smith, Andrew R. Siegel, Won Sik Yang |
| 2009 | Enabling software management for multicore caches with a lightweight hardware support. Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, P. Sadayappan |
| 2009 | Evaluating similarity-based trace reduction techniques for scalable performance analysis. Kathryn M. Mohror, Karen L. Karavanic |
| 2009 | Evaluating the impact of inaccurate information in utility-based scheduling. Alvin AuYoung, Amin Vahdat, Alex C. Snoeren |
| 2009 | FACT: fast communication trace collection for parallel applications through program slicing. Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng |
| 2009 | FALCON: a system for reliable checkpoint recovery in shared grid environments. Tanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenmann |
| 2009 | Fernbach award. Roberto Car, Michele Parrinello |
| 2009 | Flexible cache error protection using an ECC FIFO. Doe Hyun Yoon, Mattan Erez |
| 2009 | Future scaling of processor-memory interfaces. Jung Ho Ahn, Norman P. Jouppi, Christos Kozyrakis, Jacob Leverich, Robert S. Schreiber |
| 2009 | GridBot: execution of bags of tasks in multiple grids. Mark Silberstein, Artyom Sharov, Dan Geiger, Assaf Schuster |
| 2009 | HyperX: topology, routing, and packaging of efficient large-scale networks. Jung Ho Ahn, Nathan L. Binkert, Al Davis, Moray McLaren, Robert S. Schreiber |
| 2009 | I/O performance challenges at leadership scale. Samuel Lang, Philip H. Carns, Robert Latham, Robert B. Ross, Kevin Harms, William E. Allcock |
| 2009 | Implementing sparse matrix-vector multiplication on throughput-oriented processors. Nathan Bell, Michael Garland |
| 2009 | Improving GridFTP performance using the Phoebus session layer. Ezra Kissel, D. Martin Swany, Aaron Brown |
| 2009 | Increasing memory miss tolerance for SIMD cores. David Tarjan, Jiayuan Meng, Kevin Skadron |
| 2009 | Indexing genomic sequences on the IBM Blue Gene. Amol Ghoting, Konstantin Makarychev |
| 2009 | Instruction-level simulation of a cluster at scale. Edgar A. León, Rolf Riesen, Arthur B. Maccabe, Patrick G. Bridges |
| 2009 | Kennedy award: Laying the groundwork for success in the information age. Francine Berman |
| 2009 | Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems. Xiangyu Dong, Naveen Muralimanohar, Norman P. Jouppi, Richard Kaufmann, Yuan Xie |
| 2009 | Liquid water: obtaining the right answer for the right reasons. Edoardo Aprà, Alistair P. Rendell, Robert J. Harrison, Vinod Tipparaju, Wibe A. de Jong, Sotiris S. Xantheas |
| 2009 | Machine learning-based prefetch optimization for data center applications. Shih-Wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chia-Heng Tu, Hucheng Zhou |
| 2009 | Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors. Kamesh Madduri, Samuel Williams, Stéphane Ethier, Leonid Oliker, John Shalf, Erich Strohmaier, Katherine A. Yelick |
| 2009 | Millisecond-scale molecular dynamics simulations on Anton. David E. Shaw, Ron O. Dror, John K. Salmon, J. P. Grossman, Kenneth M. Mackenzie, Joseph A. Bank, Cliff Young, Martin M. Deneroff, Brannon Batson, Kevin J. Bowers, Edmond Chow, Michael P. Eastwood, Doug Ierardi, John L. Klepeis, Jeffrey Kuskin, Richard H. Larson, Kresten Lindorff-Larsen, Paul Maragakis, Mark A. Moraes, Stefano Piana, Yibing Shan, Brian Towles |
| 2009 | Millisecond-scale molecular dynamics simulations on Anton. David E. Shaw, Ron O. Dror, John K. Salmon, J. P. Grossman, Kenneth M. Mackenzie, Joseph A. Bank, Cliff Young, Martin M. Deneroff, Brannon Batson, Kevin J. Bowers, Edmond Chow, Michael P. Eastwood, Doug Ierardi, John L. Klepeis, Jeffrey Kuskin, Richard H. Larson, Kresten Lindorff-Larsen, Paul Maragakis, Mark A. Moraes, Stefano Piana, Yibing Shan, Brian Towles |
| 2009 | Minimizing communication in sparse matrix solvers. Marghoob Mohiyuddin, Mark Hoemmen, James Demmel, Katherine A. Yelick |
| 2009 | Multi-core acceleration of chemical kinetics for simulation and prediction. John C. Linford, John Michalakes, Manish Vachharajani, Adrian Sandu |
| 2009 | On the design of scalable, self-configuring virtual networks. David Isaac Wolinsky, Yonggang Liu, Pierre St. Juste, Girish Venkatasubramanian, Renato J. O. Figueiredo |
| 2009 | Opening address: The rise of the 3D internet - advancements in collaborative and immersive sciences. Justin R. Rattner |
| 2009 | Optimal real number codes for fault tolerant matrix operations. Zizhong Chen |
| 2009 | PFunc: modern task parallelism for modern high performance computing. Prabhanjan Kambadur, Anshul Gupta, Amol Ghoting, Haim Avron, Andrew Lumsdaine |
| 2009 | PLFS: a checkpoint filesystem for parallel applications. John Bent, Garth A. Gibson, Gary Grider, Ben McClelland, Paul Nowoczynski, James Nunez, Milo Polte, Meghan Wingate |
| 2009 | Performance evaluation of NEC SX-9 using real science and engineering applications. Takashi Soga, Akihiro Musa, Youichi Shimomura, Ryusuke Egawa, Ken'ichi Itakura, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi |
| 2009 | Predicting the execution time of grid workflow applications through local learning. Farrukh Nadeem, Thomas Fahringer |
| 2009 | Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2009, November 14-20, 2009, Portland, Oregon, USA |
| 2009 | Router designs for elastic buffer on-chip networks. George Michelogiannakis, William J. Dally |
| 2009 | SCAMPI: a scalable CAM-based algorithm for multiple pattern inspection. Fabrizio Petrini, Virat Agarwal, Davide Pasetto |
| 2009 | Scalable computation of streamlines on very large datasets. David Pugmire, Hank Childs, Christoph Garth, Sean Ahern, Gunther H. Weber |
| 2009 | Scalable implicit finite element solver for massively parallel processing with demonstration to 160K cores. Onkar Sahni, Min Zhou, Mark S. Shephard, Kenneth E. Jansen |
| 2009 | Scalable massively parallel I/O to task-local files. Wolfgang Frings, Felix Wolf, Ventsislav Petkov |
| 2009 | Scalable temporal order analysis for large scale debugging. Dong H. Ahn, Bronis R. de Supinski, Ignacio Laguna, Gregory L. Lee, Ben Liblit, Barton P. Miller, Martin Schulz |
| 2009 | Scalable work stealing. James Dinan, D. Brian Larkins, P. Sadayappan, Sriram Krishnamoorthy, Jarek Nieplocha |
| 2009 | SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems. Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian |
| 2009 | Space-efficient time-series call-path profiling of parallel applications. Zoltán Szebenyi, Felix Wolf, Brian J. N. Wylie |
| 2009 | Sparse matrix factorization on massively parallel computers. Anshul Gupta, Seid Koric, Thomas George |
| 2009 | Supporting fault-tolerance for time-critical events in distributed environments. Qian Zhu, Gagan Agrawal |
| 2009 | Systems medicine, transformational technologies and the emergence of predictive, personalized, preventive and participatory (P4) medicine. Lee Hood |
| 2009 | Terascale data organization for discovering multivariate climatic trends. Wesley Kendall, Markus Glatter, Jian Huang, Tom Peterka, Robert Latham, Robert B. Ross |
| 2009 | The cat is out of the bag: cortical simulations with 10 Rajagopal Ananthanarayanan, Steven K. Esser, Horst D. Simon, Dharmendra S. Modha |
| 2009 | Towards a framework for abstracting accelerators in parallel applications: experience with cell. David M. Kunzman, Laxmikant V. Kalé |
| 2009 | Triangular matrix inversion on Graphics Processing Unit. Florian Ries, Tommaso DeMarco, Matteo Zivieri, Roberto Guerrieri |
| 2009 | VGrADS: enabling e-Science workflows on grids and clouds with fault tolerance. Lavanya Ramakrishnan, Charles Koelbel, Yang-Suk Kee, Richard Wolski, Daniel Nurmi, Dennis Gannon, Graziano Obertelli, Asim YarKhan, Anirban Mandal, T. Mark Huang, Kiran Thyagaraja, Dmitrii Zagorodnov |