| 2013 | A decomposition method with minimal communication volume for parallelization of multi-dimensional FFTs. Truong Vinh Truong Duy, Taisuke Ozaki |
| 2013 | A massively parallel domain decomposition method for large-scale DFT electronic structure calculations. Truong Vinh Truong Duy, Taisuke Ozaki |
| 2013 | A new approach for performance analysis of openMP programs. Xu Liu, John M. Mellor-Crummey, Michael W. Fagan |
| 2013 | A stencil compiler for short-vector SIMD architectures. Thomas Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan |
| 2013 | Abstractions to separate concerns in semi-regular grids. Andrew Stone, Michelle Mills Strout |
| 2013 | Active disk meets flash: a case for intelligent SSDs. Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, Gregory R. Ganger |
| 2013 | Address-aware fences. Changhui Lin, Vijay Nagarajan, Rajiv Gupta |
| 2013 | An automatic input-sensitive approach for heterogeneous task partitioning. Klaus Kofler, Ivan Grasso, Biagio Cosenza, Thomas Fahringer |
| 2013 | Automatically adapting programs for mixed-precision floating-point computation. Michael O. Lam, Jeffrey K. Hollingsworth, Bronis R. de Supinski, Matthew P. LeGendre |
| 2013 | Bandwidth-optimal all-to-all exchanges in fat tree networks. Bogdan Prisacari, Germán Rodríguez, Cyriel Minkenberg, Torsten Hoefler |
| 2013 | Bubble coloring: avoiding routing- and protocol-induced deadlocks with minimal virtual channel requirement. Ruisheng Wang, Lizhong Chen, Timothy Mark Pinkston |
| 2013 | Business meets supercomputing: keynote talk. Bob Blainey |
| 2013 | CMP off-chip bandwidth scheduling guided by instruction criticality. Pablo Prieto, Valentin Puente, José-Ángel Gregorio |
| 2013 | CUPL: a compile-time uncoalesced memory access pattern locator for CUDA. Madhur Amilkanthwar, Shankar Balachandran |
| 2013 | Conservative row activation to improve memory power efficiency. Kun Fang, Zhichun Zhu |
| 2013 | Design of a large-scale storage-class RRAM system. Myoungsoo Jung, John Shalf, Mahmut T. Kandemir |
| 2013 | Diagnosis and optimization of application prefetching performance. Gabriel Marin, Collin McCurdy, Jeffrey S. Vetter |
| 2013 | Efficient scheduling of recursive control flow on GPUs. Xin Huo, Sriram Krishnamoorthy, Gagan Agrawal |
| 2013 | Efficient sparse matrix-vector multiplication on x86-based many-core processors. Xing Liu, Mikhail Smelyanskiy, Edmond Chow, Pradeep Dubey |
| 2013 | Elastic and scalable tracing and accurate replay of non-deterministic events. Xing Wu, Frank Mueller |
| 2013 | Evaluating on-die interconnects for a 4 TB/s router. Keith D. Underwood, Eric Borch, John Sizer, Timothy Stremcha, Michael Strom |
| 2013 | Exploiting data parallelism in the yConvex hypergraph algorithm for image representation using GPGPUs. Saurabh Jha, Tejaswi Agarwal, B. Rajesh Kanna |
| 2013 | Exploiting domain knowledge to optimize parallel computational mechanics codes. Chenyang Liu, Muhammad Hasan Jamal, Milind Kulkarni, Arun Prakash, Vijay S. Pai |
| 2013 | Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches. Alejandro Valero, Julio Sahuquillo, Salvador Petit, José Duato |
| 2013 | Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement. Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, Lisa R. Hsu, Huiyang Zhou |
| 2013 | Exploring hardware overprovisioning in power-constrained, high performance computing. Tapasya Patki, David K. Lowenthal, Barry Rountree, Martin Schulz, Bronis R. de Supinski |
| 2013 | Expressing graph algorithms using generalized active messages. Nicholas Gerard Edmonds, Jeremiah Willcock, Andrew Lumsdaine |
| 2013 | FASTER run-time reconfiguration management. Catalin Bogdan Ciobanu, Dionisios N. Pnevmatikatos, Kyprianos D. Papadimitriou, Georgi Nedeltchev Gaydadjiev |
| 2013 | Function, latency, bandwidth, power: towards a better computer. Steven L. Teig |
| 2013 | G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems. R. Vasudevan, Sathish S. Vadhiyar, Laxmikant V. Kalé |
| 2013 | High quality real-time image-to-mesh conversion for finite element simulations. Panagiotis A. Foteinos, Nikos Chrisochoides |
| 2013 | Holistic run-time parallelism management for time and energy efficiency. Srinath Sridharan, Gagan Gupta, Gurindar S. Sohi |
| 2013 | Hybrid approach for data-flow analysis of MPI programs. Sriram Aananthakrishnan, Greg Bronevetsky, Ganesh Gopalakrishnan |
| 2013 | HykSort: a new variant of hypercube quicksort on distributed memory architectures. Hari Sundar, Dhairya Malhotra, George Biros |
| 2013 | Imbalance optimization in scientific workflows. Weiwei Chen, Ewa Deelman, Rizos Sakellariou |
| 2013 | Imogen: a parallel 3D fluid and MHD code for GPUs. Erik Keever, James N. Imamura |
| 2013 | Implementing OmpSs support for regions of data in architectures with multiple address spaces. Javier Bueno, Xavier Martorell, Rosa M. Badia, Eduard Ayguadé, Jesús Labarta |
| 2013 | Improving communication in PGAS environments: static and dynamic coalescing in UPC. Michail Alvanos, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell |
| 2013 | Improving numerical accuracy for non-negative matrix multiplication on GPUs using recursive algorithms. Matthew Badin, Paolo D'Alberto, Lubomir Bic, Michael B. Dillencourt, Alexandru Nicolau |
| 2013 | Improving performance of all-to-all communication through loop scheduling in PGAS environments. Michail Alvanos, Gabriel Tanase, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell |
| 2013 | Improving performance of openSHMEM reference library by portable PE mapping technique. Swaroop Pophale, Tony Curtis, Barbara M. Chapman |
| 2013 | Inspector/executor load balancing algorithms for block-sparse tensor contractions. David Ozog, Sameer Shende, Allen D. Malony, Jeff R. Hammond, James Dinan, Pavan Balaji |
| 2013 | International Conference on Supercomputing, ICS'13, Eugene, OR, USA - June 10 - 14, 2013 Allen D. Malony, Mario Nemirovsky, Samuel P. Midkiff |
| 2013 | LibWater: heterogeneous distributed computing made easy. Ivan Grasso, Simone Pellegrini, Biagio Cosenza, Thomas Fahringer |
| 2013 | MAD7: a memory architecture simulator targeted at design space exploration. Hadrien A. Clarke, Antoine Trouvé, Kazuaki J. Murakami |
| 2013 | MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand. Khaled Hamidouche, Sreeram Potluri, Hari Subramoni, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda |
| 2013 | Massively parallel loading. Wolfgang Frings, Dong H. Ahn, Matthew P. LeGendre, Todd Gamblin, Bronis R. de Supinski, Felix Wolf |
| 2013 | Memorage: emerging persistent RAM based malleable main memory and storage architecture. Ju-Young Jung, Sangyeun Cho |
| 2013 | Multi-layered unstructured mesh generation. Panagiotis A. Foteinos, Daming Feng, Andrey N. Chernikov, Nikos Chrisochoides |
| 2013 | Network-on-chip for a partially reconfigurable FPGA system. Justin A. Hogan, Raymond J. Weber, Brock J. LaMeres, Todd Kaiser |
| 2013 | Power efficiency in a partially reconfigurable multiprocessor system. Raymond J. Weber, Justin A. Hogan, Brock J. LaMeres, Todd Kaiser |
| 2013 | Prefetching and cache management using task lifetimes. Vassilis Papaefstathiou, Manolis Katevenis, Dimitrios S. Nikolopoulos, Dionisios N. Pnevmatikatos |
| 2013 | Quantifying performance bottleneck cost through differential analysis. Souad Koliai, Zakaria Bendifallah, Mathieu Tribalat, Cédric Valensi, Jean-Thomas Acquaviva, William Jalby |
| 2013 | SMIO: I/O similarity aware virtual machine management invirtual desktop environments. Min Li, Sushil Mantri, Pin Zhou, Ali Raza Butt |
| 2013 | Scaling data race detection for partitioned global address space programs. Chang-Seo Park, Koushik Sen, Costin Iancu |
| 2013 | Scaling large-data computations on multi-GPU accelerators. Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann |
| 2013 | SemCache: semantics-aware caching for efficient GPU offloading. Nabeel AlSaber, Milind Kulkarni |
| 2013 | TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems. José-María Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis |
| 2013 | The ARMv8 simulator. Tao Jiang, Lele Zhang, Rui Hou, Yi Zhang, Qianlong Zhang, Lin Chai, Jing Han, Wuxiong Zhang, Cong Wang, Lixin Zhang |
| 2013 | The power 775 architecture at scale. Ramakrishnan Rajamony, Mark W. Stephenson, William Evan Speight |
| 2013 | The role of computer designers in reverse-engineering the brain. James E. Smith |
| 2013 | Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication. Azzam Haidar, Mark Gates, Stanimire Tomov, Jack J. Dongarra |
| 2013 | Towards more efficient execution: a decoupled access-execute approach. Konstantinos Koukos, David Black-Schaffer, Vasileios Spiliopoulos, Stefanos Kaxiras |
| 2013 | Towards shared memory consistency models for GPUs. Tyler Sorensen, Ganesh Gopalakrishnan, Vinod Grover |
| 2013 | Tuning the continual flow pipeline architecture. Komal Jothi, Haitham Akkary |
| 2013 | Using platform-independent data locality analysis to predict cache performance on abstract hardware platforms. Sonish Shrestha |
| 2013 | V-OpenCL: a method to use remote GPGPU. Cong Wang, Tao Jiang, Rui Hou |