ICS A

67 papers

YearTitle / Authors
2013A decomposition method with minimal communication volume for parallelization of multi-dimensional FFTs.
Truong Vinh Truong Duy, Taisuke Ozaki
2013A massively parallel domain decomposition method for large-scale DFT electronic structure calculations.
Truong Vinh Truong Duy, Taisuke Ozaki
2013A new approach for performance analysis of openMP programs.
Xu Liu, John M. Mellor-Crummey, Michael W. Fagan
2013A stencil compiler for short-vector SIMD architectures.
Thomas Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan
2013Abstractions to separate concerns in semi-regular grids.
Andrew Stone, Michelle Mills Strout
2013Active disk meets flash: a case for intelligent SSDs.
Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, Gregory R. Ganger
2013Address-aware fences.
Changhui Lin, Vijay Nagarajan, Rajiv Gupta
2013An automatic input-sensitive approach for heterogeneous task partitioning.
Klaus Kofler, Ivan Grasso, Biagio Cosenza, Thomas Fahringer
2013Automatically adapting programs for mixed-precision floating-point computation.
Michael O. Lam, Jeffrey K. Hollingsworth, Bronis R. de Supinski, Matthew P. LeGendre
2013Bandwidth-optimal all-to-all exchanges in fat tree networks.
Bogdan Prisacari, Germán Rodríguez, Cyriel Minkenberg, Torsten Hoefler
2013Bubble coloring: avoiding routing- and protocol-induced deadlocks with minimal virtual channel requirement.
Ruisheng Wang, Lizhong Chen, Timothy Mark Pinkston
2013Business meets supercomputing: keynote talk.
Bob Blainey
2013CMP off-chip bandwidth scheduling guided by instruction criticality.
Pablo Prieto, Valentin Puente, José-Ángel Gregorio
2013CUPL: a compile-time uncoalesced memory access pattern locator for CUDA.
Madhur Amilkanthwar, Shankar Balachandran
2013Conservative row activation to improve memory power efficiency.
Kun Fang, Zhichun Zhu
2013Design of a large-scale storage-class RRAM system.
Myoungsoo Jung, John Shalf, Mahmut T. Kandemir
2013Diagnosis and optimization of application prefetching performance.
Gabriel Marin, Collin McCurdy, Jeffrey S. Vetter
2013Efficient scheduling of recursive control flow on GPUs.
Xin Huo, Sriram Krishnamoorthy, Gagan Agrawal
2013Efficient sparse matrix-vector multiplication on x86-based many-core processors.
Xing Liu, Mikhail Smelyanskiy, Edmond Chow, Pradeep Dubey
2013Elastic and scalable tracing and accurate replay of non-deterministic events.
Xing Wu, Frank Mueller
2013Evaluating on-die interconnects for a 4 TB/s router.
Keith D. Underwood, Eric Borch, John Sizer, Timothy Stremcha, Michael Strom
2013Exploiting data parallelism in the yConvex hypergraph algorithm for image representation using GPGPUs.
Saurabh Jha, Tejaswi Agarwal, B. Rajesh Kanna
2013Exploiting domain knowledge to optimize parallel computational mechanics codes.
Chenyang Liu, Muhammad Hasan Jamal, Milind Kulkarni, Arun Prakash, Vijay S. Pai
2013Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches.
Alejandro Valero, Julio Sahuquillo, Salvador Petit, José Duato
2013Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement.
Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, Lisa R. Hsu, Huiyang Zhou
2013Exploring hardware overprovisioning in power-constrained, high performance computing.
Tapasya Patki, David K. Lowenthal, Barry Rountree, Martin Schulz, Bronis R. de Supinski
2013Expressing graph algorithms using generalized active messages.
Nicholas Gerard Edmonds, Jeremiah Willcock, Andrew Lumsdaine
2013FASTER run-time reconfiguration management.
Catalin Bogdan Ciobanu, Dionisios N. Pnevmatikatos, Kyprianos D. Papadimitriou, Georgi Nedeltchev Gaydadjiev
2013Function, latency, bandwidth, power: towards a better computer.
Steven L. Teig
2013G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems.
R. Vasudevan, Sathish S. Vadhiyar, Laxmikant V. Kalé
2013High quality real-time image-to-mesh conversion for finite element simulations.
Panagiotis A. Foteinos, Nikos Chrisochoides
2013Holistic run-time parallelism management for time and energy efficiency.
Srinath Sridharan, Gagan Gupta, Gurindar S. Sohi
2013Hybrid approach for data-flow analysis of MPI programs.
Sriram Aananthakrishnan, Greg Bronevetsky, Ganesh Gopalakrishnan
2013HykSort: a new variant of hypercube quicksort on distributed memory architectures.
Hari Sundar, Dhairya Malhotra, George Biros
2013Imbalance optimization in scientific workflows.
Weiwei Chen, Ewa Deelman, Rizos Sakellariou
2013Imogen: a parallel 3D fluid and MHD code for GPUs.
Erik Keever, James N. Imamura
2013Implementing OmpSs support for regions of data in architectures with multiple address spaces.
Javier Bueno, Xavier Martorell, Rosa M. Badia, Eduard Ayguadé, Jesús Labarta
2013Improving communication in PGAS environments: static and dynamic coalescing in UPC.
Michail Alvanos, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell
2013Improving numerical accuracy for non-negative matrix multiplication on GPUs using recursive algorithms.
Matthew Badin, Paolo D'Alberto, Lubomir Bic, Michael B. Dillencourt, Alexandru Nicolau
2013Improving performance of all-to-all communication through loop scheduling in PGAS environments.
Michail Alvanos, Gabriel Tanase, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell
2013Improving performance of openSHMEM reference library by portable PE mapping technique.
Swaroop Pophale, Tony Curtis, Barbara M. Chapman
2013Inspector/executor load balancing algorithms for block-sparse tensor contractions.
David Ozog, Sameer Shende, Allen D. Malony, Jeff R. Hammond, James Dinan, Pavan Balaji
2013International Conference on Supercomputing, ICS'13, Eugene, OR, USA - June 10 - 14, 2013
Allen D. Malony, Mario Nemirovsky, Samuel P. Midkiff
2013LibWater: heterogeneous distributed computing made easy.
Ivan Grasso, Simone Pellegrini, Biagio Cosenza, Thomas Fahringer
2013MAD7: a memory architecture simulator targeted at design space exploration.
Hadrien A. Clarke, Antoine Trouvé, Kazuaki J. Murakami
2013MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand.
Khaled Hamidouche, Sreeram Potluri, Hari Subramoni, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda
2013Massively parallel loading.
Wolfgang Frings, Dong H. Ahn, Matthew P. LeGendre, Todd Gamblin, Bronis R. de Supinski, Felix Wolf
2013Memorage: emerging persistent RAM based malleable main memory and storage architecture.
Ju-Young Jung, Sangyeun Cho
2013Multi-layered unstructured mesh generation.
Panagiotis A. Foteinos, Daming Feng, Andrey N. Chernikov, Nikos Chrisochoides
2013Network-on-chip for a partially reconfigurable FPGA system.
Justin A. Hogan, Raymond J. Weber, Brock J. LaMeres, Todd Kaiser
2013Power efficiency in a partially reconfigurable multiprocessor system.
Raymond J. Weber, Justin A. Hogan, Brock J. LaMeres, Todd Kaiser
2013Prefetching and cache management using task lifetimes.
Vassilis Papaefstathiou, Manolis Katevenis, Dimitrios S. Nikolopoulos, Dionisios N. Pnevmatikatos
2013Quantifying performance bottleneck cost through differential analysis.
Souad Koliai, Zakaria Bendifallah, Mathieu Tribalat, Cédric Valensi, Jean-Thomas Acquaviva, William Jalby
2013SMIO: I/O similarity aware virtual machine management invirtual desktop environments.
Min Li, Sushil Mantri, Pin Zhou, Ali Raza Butt
2013Scaling data race detection for partitioned global address space programs.
Chang-Seo Park, Koushik Sen, Costin Iancu
2013Scaling large-data computations on multi-GPU accelerators.
Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann
2013SemCache: semantics-aware caching for efficient GPU offloading.
Nabeel AlSaber, Milind Kulkarni
2013TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems.
José-María Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis
2013The ARMv8 simulator.
Tao Jiang, Lele Zhang, Rui Hou, Yi Zhang, Qianlong Zhang, Lin Chai, Jing Han, Wuxiong Zhang, Cong Wang, Lixin Zhang
2013The power 775 architecture at scale.
Ramakrishnan Rajamony, Mark W. Stephenson, William Evan Speight
2013The role of computer designers in reverse-engineering the brain.
James E. Smith
2013Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication.
Azzam Haidar, Mark Gates, Stanimire Tomov, Jack J. Dongarra
2013Towards more efficient execution: a decoupled access-execute approach.
Konstantinos Koukos, David Black-Schaffer, Vasileios Spiliopoulos, Stefanos Kaxiras
2013Towards shared memory consistency models for GPUs.
Tyler Sorensen, Ganesh Gopalakrishnan, Vinod Grover
2013Tuning the continual flow pipeline architecture.
Komal Jothi, Haitham Akkary
2013Using platform-independent data locality analysis to predict cache performance on abstract hardware platforms.
Sonish Shrestha
2013V-OpenCL: a method to use remote GPGPU.
Cong Wang, Tao Jiang, Rui Hou