ICS - RankMe – RankMe

67 papers

Year	Title / Authors
2013	A decomposition method with minimal communication volume for parallelization of multi-dimensional FFTs. Truong Vinh Truong Duy, Taisuke Ozaki
2013	A massively parallel domain decomposition method for large-scale DFT electronic structure calculations. Truong Vinh Truong Duy, Taisuke Ozaki
2013	A new approach for performance analysis of openMP programs. Xu Liu, John M. Mellor-Crummey, Michael W. Fagan
2013	A stencil compiler for short-vector SIMD architectures. Thomas Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan
2013	Abstractions to separate concerns in semi-regular grids. Andrew Stone, Michelle Mills Strout
2013	Active disk meets flash: a case for intelligent SSDs. Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, Gregory R. Ganger
2013	Address-aware fences. Changhui Lin, Vijay Nagarajan, Rajiv Gupta
2013	An automatic input-sensitive approach for heterogeneous task partitioning. Klaus Kofler, Ivan Grasso, Biagio Cosenza, Thomas Fahringer
2013	Automatically adapting programs for mixed-precision floating-point computation. Michael O. Lam, Jeffrey K. Hollingsworth, Bronis R. de Supinski, Matthew P. LeGendre
2013	Bandwidth-optimal all-to-all exchanges in fat tree networks. Bogdan Prisacari, Germán Rodríguez, Cyriel Minkenberg, Torsten Hoefler
2013	Bubble coloring: avoiding routing- and protocol-induced deadlocks with minimal virtual channel requirement. Ruisheng Wang, Lizhong Chen, Timothy Mark Pinkston
2013	Business meets supercomputing: keynote talk. Bob Blainey
2013	CMP off-chip bandwidth scheduling guided by instruction criticality. Pablo Prieto, Valentin Puente, José-Ángel Gregorio
2013	CUPL: a compile-time uncoalesced memory access pattern locator for CUDA. Madhur Amilkanthwar, Shankar Balachandran
2013	Conservative row activation to improve memory power efficiency. Kun Fang, Zhichun Zhu
2013	Design of a large-scale storage-class RRAM system. Myoungsoo Jung, John Shalf, Mahmut T. Kandemir
2013	Diagnosis and optimization of application prefetching performance. Gabriel Marin, Collin McCurdy, Jeffrey S. Vetter
2013	Efficient scheduling of recursive control flow on GPUs. Xin Huo, Sriram Krishnamoorthy, Gagan Agrawal
2013	Efficient sparse matrix-vector multiplication on x86-based many-core processors. Xing Liu, Mikhail Smelyanskiy, Edmond Chow, Pradeep Dubey
2013	Elastic and scalable tracing and accurate replay of non-deterministic events. Xing Wu, Frank Mueller
2013	Evaluating on-die interconnects for a 4 TB/s router. Keith D. Underwood, Eric Borch, John Sizer, Timothy Stremcha, Michael Strom
2013	Exploiting data parallelism in the yConvex hypergraph algorithm for image representation using GPGPUs. Saurabh Jha, Tejaswi Agarwal, B. Rajesh Kanna
2013	Exploiting domain knowledge to optimize parallel computational mechanics codes. Chenyang Liu, Muhammad Hasan Jamal, Milind Kulkarni, Arun Prakash, Vijay S. Pai
2013	Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches. Alejandro Valero, Julio Sahuquillo, Salvador Petit, José Duato
2013	Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement. Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, Lisa R. Hsu, Huiyang Zhou
2013	Exploring hardware overprovisioning in power-constrained, high performance computing. Tapasya Patki, David K. Lowenthal, Barry Rountree, Martin Schulz, Bronis R. de Supinski
2013	Expressing graph algorithms using generalized active messages. Nicholas Gerard Edmonds, Jeremiah Willcock, Andrew Lumsdaine
2013	FASTER run-time reconfiguration management. Catalin Bogdan Ciobanu, Dionisios N. Pnevmatikatos, Kyprianos D. Papadimitriou, Georgi Nedeltchev Gaydadjiev
2013	Function, latency, bandwidth, power: towards a better computer. Steven L. Teig
2013	G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems. R. Vasudevan, Sathish S. Vadhiyar, Laxmikant V. Kalé
2013	High quality real-time image-to-mesh conversion for finite element simulations. Panagiotis A. Foteinos, Nikos Chrisochoides
2013	Holistic run-time parallelism management for time and energy efficiency. Srinath Sridharan, Gagan Gupta, Gurindar S. Sohi
2013	Hybrid approach for data-flow analysis of MPI programs. Sriram Aananthakrishnan, Greg Bronevetsky, Ganesh Gopalakrishnan
2013	HykSort: a new variant of hypercube quicksort on distributed memory architectures. Hari Sundar, Dhairya Malhotra, George Biros
2013	Imbalance optimization in scientific workflows. Weiwei Chen, Ewa Deelman, Rizos Sakellariou
2013	Imogen: a parallel 3D fluid and MHD code for GPUs. Erik Keever, James N. Imamura
2013	Implementing OmpSs support for regions of data in architectures with multiple address spaces. Javier Bueno, Xavier Martorell, Rosa M. Badia, Eduard Ayguadé, Jesús Labarta
2013	Improving communication in PGAS environments: static and dynamic coalescing in UPC. Michail Alvanos, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell
2013	Improving numerical accuracy for non-negative matrix multiplication on GPUs using recursive algorithms. Matthew Badin, Paolo D'Alberto, Lubomir Bic, Michael B. Dillencourt, Alexandru Nicolau
2013	Improving performance of all-to-all communication through loop scheduling in PGAS environments. Michail Alvanos, Gabriel Tanase, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell
2013	Improving performance of openSHMEM reference library by portable PE mapping technique. Swaroop Pophale, Tony Curtis, Barbara M. Chapman
2013	Inspector/executor load balancing algorithms for block-sparse tensor contractions. David Ozog, Sameer Shende, Allen D. Malony, Jeff R. Hammond, James Dinan, Pavan Balaji
2013	International Conference on Supercomputing, ICS'13, Eugene, OR, USA - June 10 - 14, 2013 Allen D. Malony, Mario Nemirovsky, Samuel P. Midkiff
2013	LibWater: heterogeneous distributed computing made easy. Ivan Grasso, Simone Pellegrini, Biagio Cosenza, Thomas Fahringer
2013	MAD7: a memory architecture simulator targeted at design space exploration. Hadrien A. Clarke, Antoine Trouvé, Kazuaki J. Murakami
2013	MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand. Khaled Hamidouche, Sreeram Potluri, Hari Subramoni, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda
2013	Massively parallel loading. Wolfgang Frings, Dong H. Ahn, Matthew P. LeGendre, Todd Gamblin, Bronis R. de Supinski, Felix Wolf
2013	Memorage: emerging persistent RAM based malleable main memory and storage architecture. Ju-Young Jung, Sangyeun Cho
2013	Multi-layered unstructured mesh generation. Panagiotis A. Foteinos, Daming Feng, Andrey N. Chernikov, Nikos Chrisochoides
2013	Network-on-chip for a partially reconfigurable FPGA system. Justin A. Hogan, Raymond J. Weber, Brock J. LaMeres, Todd Kaiser
2013	Power efficiency in a partially reconfigurable multiprocessor system. Raymond J. Weber, Justin A. Hogan, Brock J. LaMeres, Todd Kaiser
2013	Prefetching and cache management using task lifetimes. Vassilis Papaefstathiou, Manolis Katevenis, Dimitrios S. Nikolopoulos, Dionisios N. Pnevmatikatos
2013	Quantifying performance bottleneck cost through differential analysis. Souad Koliai, Zakaria Bendifallah, Mathieu Tribalat, Cédric Valensi, Jean-Thomas Acquaviva, William Jalby
2013	SMIO: I/O similarity aware virtual machine management invirtual desktop environments. Min Li, Sushil Mantri, Pin Zhou, Ali Raza Butt
2013	Scaling data race detection for partitioned global address space programs. Chang-Seo Park, Koushik Sen, Costin Iancu
2013	Scaling large-data computations on multi-GPU accelerators. Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann
2013	SemCache: semantics-aware caching for efficient GPU offloading. Nabeel AlSaber, Milind Kulkarni
2013	TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems. José-María Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis
2013	The ARMv8 simulator. Tao Jiang, Lele Zhang, Rui Hou, Yi Zhang, Qianlong Zhang, Lin Chai, Jing Han, Wuxiong Zhang, Cong Wang, Lixin Zhang
2013	The power 775 architecture at scale. Ramakrishnan Rajamony, Mark W. Stephenson, William Evan Speight
2013	The role of computer designers in reverse-engineering the brain. James E. Smith
2013	Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication. Azzam Haidar, Mark Gates, Stanimire Tomov, Jack J. Dongarra
2013	Towards more efficient execution: a decoupled access-execute approach. Konstantinos Koukos, David Black-Schaffer, Vasileios Spiliopoulos, Stefanos Kaxiras
2013	Towards shared memory consistency models for GPUs. Tyler Sorensen, Ganesh Gopalakrishnan, Vinod Grover
2013	Tuning the continual flow pipeline architecture. Komal Jothi, Haitham Akkary
2013	Using platform-independent data locality analysis to predict cache performance on abstract hardware platforms. Sonish Shrestha
2013	V-OpenCL: a method to use remote GPGPU. Cong Wang, Tao Jiang, Rui Hou