| 2010 | A compiler-automated array compression scheme for optimizing memory intensive programs. Lixia Liu, Zhiyuan Li |
| 2010 | A query language for understanding component interactions in production systems. Adam J. Oliner, Alex Aiken |
| 2010 | Adaptive multi-level cache allocation in distributed storage architectures. Ramya Prabhakar, Shekhar Srikantaiah, Mahmut T. Kandemir, Christina M. Patrick |
| 2010 | An approach to resource-aware co-scheduling for CMPs. Major Bhadauria, Sally A. McKee |
| 2010 | An empirically tuned 2D and 3D FFT library on CUDA GPU. Liang Gu, Xiaoming Li, Jakob Siegel |
| 2010 | An experimental approach to performance measurement of heterogeneous parallel applications using CUDA. Allen D. Malony, Scott Biersdorff, Wyatt Spear, Shangkar Mayanglambam |
| 2010 | Cache oblivious parallelograms in iterative stencil computations. Robert Strzodka, Mohammed Shaheen, Dawid Pajak, Hans-Peter Seidel |
| 2010 | Clustering performance data efficiently at massive scales. Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, Daniel A. Reed |
| 2010 | Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. Vignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal |
| 2010 | Decomposable and responsive power models for multicore processors using performance counters. Ramon Bertran, Marc González, Xavier Martorell, Nacho Navarro, Eduard Ayguadé |
| 2010 | Enigma: architectural and operating system support for reducing the impact of address translation. Lixin Zhang, Evan Speight, Ramakrishnan Rajamony, Jiang Lin |
| 2010 | Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine. Chi Ching Chi, Ben H. H. Juurlink, Cor Meenderinck |
| 2010 | Exascale science: the next frontier in high performance computing. Stephen S. Pawlowski |
| 2010 | FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing. Yong Dou, Yuanwu Lei, Guiming Wu, Song Guo, Jie Zhou, Li Shen |
| 2010 | Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering. Atabak Mahram, Martin C. Herbordt |
| 2010 | Handling task dependencies under strided and aliased references. Josep M. Pérez, Rosa M. Badia, Jesús Labarta |
| 2010 | High-throughput Bayesian network learning using heterogeneous multicore computers. Michael D. Linderman, Robert V. Bruggner, Vivek Athalye, Teresa H. Meng, Narges Bani Asadi, Garry P. Nolan |
| 2010 | How to unleash array optimizations on code using recursive data structures. Harmen L. A. van der Spek, C. W. Mattias Holm, Harry A. G. Wijshoff |
| 2010 | Indemics: an interactive data intensive framework for high performance epidemic simulation. Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, Yifei Ma, Madhav V. Marathe |
| 2010 | Large-scale FFT on GPU clusters. Yifeng Chen, Xiang Cui, Hong Mei |
| 2010 | Making nested parallel transactions practical using lightweight hardware support. Woongki Baek, Nathan Grasso Bronson, Christos Kozyrakis, Kunle Olukotun |
| 2010 | Optimal bucket algorithms for large MPI collectives on torus interconnects. Nikhil Jain, Yogish Sabharwal |
| 2010 | Overlapping communication and computation by using a hybrid MPI/SMPSs approach. Vladimir Marjanovic, Jesús Labarta, Eduard Ayguadé, Mateo Valero |
| 2010 | Proceedings of the 24th International Conference on Supercomputing, 2010, Tsukuba, Ibaraki, Japan, June 2-4, 2010 Taisuke Boku, Hiroshi Nakashima, Avi Mendelson |
| 2010 | Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application. Sreeram Potluri, Ping Lai, Karen A. Tomko, Sayantan Sur, Yifeng Cui, Mahidhar Tatineni, Karl W. Schulz, William L. Barth, Amitava Majumdar, Dhabaleswar K. Panda |
| 2010 | SAMS multi-layout memory: providing multiple views of data to boost SIMD performance. Chunyang Gou, Georgi Kuzmanov, Georgi Gaydadjiev |
| 2010 | Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization. Jamin Naghmouchi, Daniele Paolo Scarpazza, Mladen Berekovic |
| 2010 | Speeding up Nek5000 with autotuning and specialization. Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul F. Fischer, Paul D. Hovland |
| 2010 | Static reuse distances for locality-based optimizations in MATLAB. Arun Chauhan, Chun-Yu Shei |
| 2010 | Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng Shen |
| 2010 | The auction: optimizing banks usage in Non-Uniform Cache Architectures. Javier Lira, Carlos Molina, Antonio González |
| 2010 | The next-generation supercomputer project and a plan for the advanced institute for computational science. Kimihiko Hirao |
| 2010 | Throughput computing. William J. Dally |
| 2010 | Timing local streams: improving timeliness in data prefetching. Huaiyu Zhu, Yong Chen, Xian-He Sun |