| 2009 | /scratch as a cache: rethinking HPC center scratch storage. Henry M. Monti, Ali Raza Butt, Sudharshan S. Vazhkudai |
| 2009 | A comprehensive power-performance model for NoCs with multi-flit channel buffers. Mohammad Arjomand, Hamid Sarbazi-Azad |
| 2009 | A european perspective on supercomputing. Mateo Valero |
| 2009 | A graph based approach for MPI deadlock detection. Tobias Hilbrich, Bronis R. de Supinski, Martin Schulz, Matthias S. Müller |
| 2009 | A parallel levenberg-marquardt algorithm. Jun Cao, Krista A. Novstrup, Ayush Goyal, Samuel P. Midkiff, James M. Caruthers |
| 2009 | A translation system for enabling data mining applications on GPUs. Wenjing Ma, Gagan Agrawal |
| 2009 | Access map pattern matching for data cache prefetch. Yasuo Ishii, Mary Inaba, Kei Hiraki |
| 2009 | Adagio: making DVS practical for complex HPC applications. Barry Rountree, David K. Lowenthal, Bronis R. de Supinski, Martin Schulz, Vincent W. Freeh, Tyler K. Bletsch |
| 2009 | An infrastructure for scalable and portable parallel programs for computational chemistry. Victor Lotrich, Norbert Flocke, Mark Ponton, Beverly A. Sanders, Erik Deumens, Rodney J. Bartlett, Ajith Perera |
| 2009 | Approximate kernel matrix computation on GPUs forlarge scale learning applications. Mohamed E. Hussein, Wael Abd-Almageed |
| 2009 | Auto-vectorization through code generation for stream processing applications. Huayong Wang, Henrique Andrade, Bugra Gedik, Kun-Lung Wu |
| 2009 | Cancellation of loads that return zero using zero-value caches. Md. Mafijul Islam, Sally A. McKee, Per Stenström |
| 2009 | Chunking parallel loops in the presence of synchronization. Jun Shirako, Jisheng M. Zhao, V. Krishna Nandivada, Vivek Sarkar |
| 2009 | Combining thread level speculation helper threads and runahead execution. Polychronis Xekalakis, Nikolas Ioannou, Marcelo Cintra |
| 2009 | Computer generation of fast fourier transforms for the cell broadband engine. Srinivas Chellappa, Franz Franchetti, Markus Püschel |
| 2009 | Computing outside the box. Ian T. Foster |
| 2009 | Creating artificial global history to improve branch prediction accuracy. Leo Porter, Dean M. Tullsen |
| 2009 | DBDB: optimizing DMATransfer for the cell be architecture. Tao Liu, Haibo Lin, Tong Chen, Kevin O'Brien, Ling Shao |
| 2009 | Design of a novel SIMD architecture by fusing operations and registers. Jih-Ching Chiu, Kai-Ming Yang, Yu-Liang Chou |
| 2009 | Designing multi-socket systems using silicon photonics. Scott Beamer, Krste Asanovic, Christopher Batten, Ajay Joshi, Vladimir Stojanovic |
| 2009 | Divide-and-conquer: a bubble replacement for low level caches. Chuanjun Zhang, Bing Xue |
| 2009 | Dynamic cache clustering for chip multiprocessors. Mohammad Hammoud, Sangyeun Cho, Rami G. Melhem |
| 2009 | Dynamic parallelization of single-threaded binary programs using speculative slicing. Cheng Wang, Youfeng Wu, Edson Borin, Shiliang Hu, Wei Liu, Dave Sager, Tin-Fook Ngai, Jesse Fang |
| 2009 | Dynamic task set partitioning based on balancing memory requirements to reduce power consumption. Diana Bautista, Julio Sahuquillo, Houcine Hassan, Salvador Petit, José Duato |
| 2009 | Dynamic topology aware load balancing algorithms for molecular dynamics applications. Abhinav Bhatele, Laxmikant V. Kalé, Sameer Kumar |
| 2009 | Efficient high performance collective communication for the cell blade. Qasim Ali, Samuel P. Midkiff, Vijay S. Pai |
| 2009 | EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems. Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, V. S. Anil Kumar, Madhav V. Marathe |
| 2009 | Evaluating high performance communication: a power perspective. Jiuxing Liu, Dan E. Poff, Bülent Abali |
| 2009 | Exploring pattern-aware routing in generalized fat tree networks. Germán Rodríguez, Ramón Beivide, Cyriel Minkenberg, Jesús Labarta, Mateo Valero |
| 2009 | FTL design exploration in reconfigurable high-performance SSD for server applications. Ji-Yong Shin, Zenglin Xia, Ning-Yi Xu, Rui Gao, Xiongfei Cai, Seungryoul Maeng, Feng-Hsiung Hsu |
| 2009 | Fast and scalable list ranking on the GPU. M. Suhail Rehman, Kishore Kothapalli, P. J. Narayanan |
| 2009 | Fast memory snapshot for concurrent programmingwithout synchronization. JaeWoong Chung, Woongki Baek, Christos Kozyrakis |
| 2009 | High-performance CUDA kernel execution on FPGAs. Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu |
| 2009 | High-performance regular expression scanning on the Cell/B.E. processor. Daniele Paolo Scarpazza, Gregory F. Russell |
| 2009 | How GPUs can outperform ASICs for fast LDPC decoding. Gabriel Falcão Paiva Fernandes, Vítor Manuel Mendes da Silva, Leonel Sousa |
| 2009 | Implementation of a wide-angle lens distortion correction algorithm on the cell broadband engine. Konstantis Daloukas, Christos D. Antonopoulos, Nikolaos Bellas |
| 2009 | Less reused filter: improving l2 cache performance via filtering less reused lines. Lingxiang Xiang, Tianzhou Chen, Qingsong Shi, Wei Hu |
| 2009 | Limited early value communication to improve performance of transactional memory. Salil Mohan Pant, Gregory T. Byrd |
| 2009 | Load balancing using work-stealing for pipeline parallelism in emerging applications. Angeles G. Navarro, Rafael Asenjo, Siham Tabik, Calin Cascaval |
| 2009 | MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations. Ahmad Faraj, Sameer Kumar, Brian E. Smith, Amith R. Mamidala, John A. Gunnels, Philip Heidelberger |
| 2009 | MPI-aware compiler optimizations for improving communication-computation overlap. Anthony Danalis, Lori L. Pollock, D. Martin Swany, John Cavazos |
| 2009 | Maximizing MPI point-to-point communication performance on RDMA-enabled clusters with customized protocols. Matthew Small, Xin Yuan |
| 2009 | OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations. Hiroshi Nakashima, Yohei Miyake, Hideyuki Usui, Yoshiharu Omura |
| 2009 | P-Code: a new RAID-6 code with optimal properties. Chao Jin, Hong Jiang, Dan Feng, Lei Tian |
| 2009 | PARSEC: hardware profiling of emerging workloads for CMP design. Major Bhadauria, Vincent M. Weaver, Sally A. McKee |
| 2009 | Parametric multi-level tiling of imperfectly nested loops. Albert Hartono, Muthu Manikandan Baskaran, Cédric Bastoul, Albert Cohen, Sriram Krishnamoorthy, Boyana Norris, J. Ramanujam, P. Sadayappan |
| 2009 | Pattern-based sparse matrix representation for memory-efficient SMVM kernels. Mehmet Belgin, Godmar Back, Calvin J. Ribbens |
| 2009 | Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. Jiayuan Meng, Kevin Skadron |
| 2009 | Performance modeling for DFT algorithms in FFTW. Liang Gu, Xiaoming Li |
| 2009 | Practice of parallelizing network applications on multi-core architectures. Junchang Wang, Haipeng Cheng, Bei Hua, Xinan Tang |
| 2009 | Prediction-based power estimation and scheduling for CMPs. Karan Singh, Major Bhadauria, Sally A. McKee |
| 2009 | Prefetch optimizations on large-scale applications via parameter value prediction. Shih-Wei Liao, Tzu-Han Hung, Donald Nguyen, Hucheng Zhou, Chinyen Chou, Chia-Heng Tu |
| 2009 | Proceedings of the 23rd international conference on Supercomputing, 2009, Yorktown Heights, NY, USA, June 8-12, 2009 Michael Gschwind, Alexandru Nicolau, Valentina Salapura, José E. Moreira |
| 2009 | QuakeTM: parallelizing a complex sequential application using transactional memory. Vladimir Gajinov, Ferad Zyulkyarov, Osman S. Unsal, Adrián Cristal, Eduard Ayguadé, Tim Harris, Mateo Valero |
| 2009 | R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems. Chuanyi Liu, Yu Gu, Linchun Sun, Bin Yan, Dongsheng Wang |
| 2009 | Rate-based QoS techniques for cache/memory in CMP platforms. Andrew Herdrich, Ramesh Illikkal, Ravi R. Iyer, Donald Newell, Vineet Chadha, Jaideep Moses |
| 2009 | Refereeing conflicts in hardware transactional memory. Arrvindh Shriraman, Sandhya Dwarkadas |
| 2009 | Single-particle 3d reconstruction from cryo-electron microscopy images on GPU. Guangming Tan, Ziyu Guo, Mingyu Chen, Dan Meng |
| 2009 | Subdomain communication to increase scalability in large-scale scientific applications. Aleksandr Ovcharenko, Onkar Sahni, Christopher D. Carothers, Kenneth E. Jansen, Mark S. Shephard |
| 2009 | Synchronization optimizations for efficient execution on multi-cores. Alexandru Nicolau, Guangqiang Li, Alexander V. Veidenbaum, Arun Kejariwal |
| 2009 | The roadrunner project and the importance of energy efficiency on the road to exascale computing. Don G. Grice |
| 2009 | Thrifty interconnection network for HPC systems. Jian Li, Lixin Zhang, Charles Lefurgy, Richard R. Treumann, Wolfgang E. Denzel |
| 2009 | Towards 100 gbit/s ethernet: multicore-based parallel communication protocol design. Stavros Passas, Kostas Magoutis, Angelos Bilas |
| 2009 | TransMetric: architecture independent workload characterization for transactional memory benchmarks. James Poe, Clay Hughes, Tao Li |
| 2009 | Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. Sundaresan Venkatasubramanian, Richard W. Vuduc |
| 2009 | Understanding the interconnection network of SpiNNaker. Javier Navaridas, Mikel Luján, José Miguel-Alonso, Luis A. Plana, Steve B. Furber |
| 2009 | Using many-core hardware to correlate radio astronomy signals. Rob van Nieuwpoort, John W. Romein |
| 2009 | Virtualization polling engine (VPE): using dedicated CPU cores to accelerate I/O virtualization. Jiuxing Liu, Bülent Abali |
| 2009 | Zero-content augmented caches. Julien Dusser, Thomas Piquet, André Seznec |