ICS A

39 papers

YearTitle / Authors
2012An analysis of computational workloads for the ORNL Jaguar system.
Wayne Joubert, Shi-Quan Su
2012An efficient work-distribution strategy for gridding radio-telescope data on GPUs.
John W. Romein
2012An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs.
Jiajia Li, Xingjian Li, Guangming Tan, Mingyu Chen, Ninghui Sun
2012Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors.
Nishkam Ravi, Yi Yang, Tao Bao, Srimat T. Chakradhar
2012Blue Gene/Q: design for sustained multi-petaflop computing.
Michael Gschwind
2012CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures.
Quan Chen, Minyi Guo, Zhiyi Huang
2012CRQ-based fair scheduling on composable multicore architectures.
Tao Sun, Hong An, Tao Wang, Haibo Zhang, Xiufeng Sui
2012CVP: an energy-efficient indirect branch prediction with compiler-guided value pattern.
Mingxing Tan, Xianhua Liu, Tong Tong, Xu Cheng
2012Channel borrowing: an energy-efficient nanophotonic crossbar architecture with light-weight arbitration.
Yi Xu, Jun Yang, Rami G. Melhem
2012Characterizing and improving the use of demand-fetched caches in GPUs.
Wenhao Jia, Kelly A. Shaw, Margaret Martonosi
2012Collective algorithms for sub-communicators.
Anshul Mittal, Nikhil Jain, Thomas George, Yogish Sabharwal, Sameer Kumar
2012Composable, non-blocking collective operations on power7 IH.
Gabriel Ilie Tanase, Gheorghe Almási, Hanhong Xue, Charles Archer
2012Congestion avoidance on manycore high performance computing systems.
Miao Luo, Dhabaleswar K. Panda, Khaled Z. Ibrahim, Costin Iancu
2012Data-driven fault tolerance for work stealing computations.
Wenjing Ma, Sriram Krishnamoorthy
2012Distributed replay protocol for distributed uniprocessors.
Mengjie Mao, Hong An, Bobin Deng, Tao Sun, Xuechao Wei, Wei Zhou, Wenting Han
2012Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems.
Fengguang Song, Stanimire Tomov, Jack J. Dongarra
2012Enhancing the performance of assisted execution runtime systems through hardware/software techniques.
Gokcen Kestor, Roberto Gioiosa, Osman S. Unsal, Adrián Cristal, Mateo Valero
2012Exploiting communication and packaging locality for cost-effective large scale networks.
Keith D. Underwood, Eric Borch
2012Fast loop-level data dependence profiling.
Hongtao Yu, Zhiyuan Li
2012Fault resilience of the algebraic multi-grid solver.
Marc Casas-Guix, Bronis R. de Supinski, Greg Bronevetsky, Martin Schulz
2012Fault tolerant preconditioned conjugate gradient for sparse linear system solution.
Manu Shantharam, Sowmyalatha Srinivasmurthy, Padma Raghavan
2012GPU merge path: a GPU merging algorithm.
Oded Green, Robert McColl, David A. Bader
2012Hardware support for enforcing isolation in lock-based parallel programs.
Paruj Ratanaworabhan, Martin Burtscher, Darko Kirovski, Benjamin G. Zorn
2012HiRe: using hint & release to improve synchronization of speculative threads.
Liang Han, Xiaowei Jiang, Wei Liu, Youfeng Wu, James Tuck
2012High performance supercomputers: should the individual processor be more than a brick?
Yale N. Patt
2012High-performance code generation for stencil computations on GPU architectures.
Justin Holewinski, Louis-Noël Pouchet, P. Sadayappan
2012International Conference on Supercomputing, ICS'12, Venice, Italy, June 25-29, 2012
Utpal Banerjee, Kyle A. Gallivan, Gianfranco Bilardi, Manolis Katevenis
2012Locality & utility co-optimization for practical capacity management of shared last level caches.
Dongyuan Zhan, Hong Jiang, Sharad C. Seth
2012Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities.
Nagendra Dwarakanath Gulur, R. Manikantan, Mahesh Mehendale, R. Govindarajan
2012On the communication complexity of 3D FFTs and its implications for Exascale.
Kenneth Czechowski, Casey Battaglino, Chris McClanahan, Kartik Iyer, P.-K. Yeung, Richard W. Vuduc
2012One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation.
Ziyu Guo, Bo Wu, Xipeng Shen
2012Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture.
Janani Mukundan, Saugata Ghose, Robert Karmazin, Engin Ipek, José F. Martínez
2012Quantifying the effectiveness of load balance algorithms.
Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato
2012SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters.
Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee
2012Space-round tradeoffs for MapReduce computations.
Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, Eli Upfal
2012Sparse matrix-vector multiply on the HICAMP architecture.
John P. Stevenson, Amin Firoozshahian, Alex Solomatnikov, Mark Horowitz, David R. Cheriton
2012UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique.
Somayeh Sardashti, David A. Wood
2012Unified memory optimizing architecture: memory subsystem control with a unified predictor.
Yasuo Ishii, Mary Inaba, Kei Hiraki
2012clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs.
Bor-Yiing Su, Kurt Keutzer