ICS A

39 papers

Year	Title / Authors
2012	An analysis of computational workloads for the ORNL Jaguar system. Wayne Joubert, Shi-Quan Su
2012	An efficient work-distribution strategy for gridding radio-telescope data on GPUs. John W. Romein
2012	An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs. Jiajia Li, Xingjian Li, Guangming Tan, Mingyu Chen, Ninghui Sun
2012	Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors. Nishkam Ravi, Yi Yang, Tao Bao, Srimat T. Chakradhar
2012	Blue Gene/Q: design for sustained multi-petaflop computing. Michael Gschwind
2012	CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures. Quan Chen, Minyi Guo, Zhiyi Huang
2012	CRQ-based fair scheduling on composable multicore architectures. Tao Sun, Hong An, Tao Wang, Haibo Zhang, Xiufeng Sui
2012	CVP: an energy-efficient indirect branch prediction with compiler-guided value pattern. Mingxing Tan, Xianhua Liu, Tong Tong, Xu Cheng
2012	Channel borrowing: an energy-efficient nanophotonic crossbar architecture with light-weight arbitration. Yi Xu, Jun Yang, Rami G. Melhem
2012	Characterizing and improving the use of demand-fetched caches in GPUs. Wenhao Jia, Kelly A. Shaw, Margaret Martonosi
2012	Collective algorithms for sub-communicators. Anshul Mittal, Nikhil Jain, Thomas George, Yogish Sabharwal, Sameer Kumar
2012	Composable, non-blocking collective operations on power7 IH. Gabriel Ilie Tanase, Gheorghe Almási, Hanhong Xue, Charles Archer
2012	Congestion avoidance on manycore high performance computing systems. Miao Luo, Dhabaleswar K. Panda, Khaled Z. Ibrahim, Costin Iancu
2012	Data-driven fault tolerance for work stealing computations. Wenjing Ma, Sriram Krishnamoorthy
2012	Distributed replay protocol for distributed uniprocessors. Mengjie Mao, Hong An, Bobin Deng, Tao Sun, Xuechao Wei, Wei Zhou, Wenting Han
2012	Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems. Fengguang Song, Stanimire Tomov, Jack J. Dongarra
2012	Enhancing the performance of assisted execution runtime systems through hardware/software techniques. Gokcen Kestor, Roberto Gioiosa, Osman S. Unsal, Adrián Cristal, Mateo Valero
2012	Exploiting communication and packaging locality for cost-effective large scale networks. Keith D. Underwood, Eric Borch
2012	Fast loop-level data dependence profiling. Hongtao Yu, Zhiyuan Li
2012	Fault resilience of the algebraic multi-grid solver. Marc Casas-Guix, Bronis R. de Supinski, Greg Bronevetsky, Martin Schulz
2012	Fault tolerant preconditioned conjugate gradient for sparse linear system solution. Manu Shantharam, Sowmyalatha Srinivasmurthy, Padma Raghavan
2012	GPU merge path: a GPU merging algorithm. Oded Green, Robert McColl, David A. Bader
2012	Hardware support for enforcing isolation in lock-based parallel programs. Paruj Ratanaworabhan, Martin Burtscher, Darko Kirovski, Benjamin G. Zorn
2012	HiRe: using hint & release to improve synchronization of speculative threads. Liang Han, Xiaowei Jiang, Wei Liu, Youfeng Wu, James Tuck
2012	High performance supercomputers: should the individual processor be more than a brick? Yale N. Patt
2012	High-performance code generation for stencil computations on GPU architectures. Justin Holewinski, Louis-Noël Pouchet, P. Sadayappan
2012	International Conference on Supercomputing, ICS'12, Venice, Italy, June 25-29, 2012 Utpal Banerjee, Kyle A. Gallivan, Gianfranco Bilardi, Manolis Katevenis
2012	Locality & utility co-optimization for practical capacity management of shared last level caches. Dongyuan Zhan, Hong Jiang, Sharad C. Seth
2012	Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities. Nagendra Dwarakanath Gulur, R. Manikantan, Mahesh Mehendale, R. Govindarajan
2012	On the communication complexity of 3D FFTs and its implications for Exascale. Kenneth Czechowski, Casey Battaglino, Chris McClanahan, Kartik Iyer, P.-K. Yeung, Richard W. Vuduc
2012	One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation. Ziyu Guo, Bo Wu, Xipeng Shen
2012	Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture. Janani Mukundan, Saugata Ghose, Robert Karmazin, Engin Ipek, José F. Martínez
2012	Quantifying the effectiveness of load balance algorithms. Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato
2012	SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee
2012	Space-round tradeoffs for MapReduce computations. Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, Eli Upfal
2012	Sparse matrix-vector multiply on the HICAMP architecture. John P. Stevenson, Amin Firoozshahian, Alex Solomatnikov, Mark Horowitz, David R. Cheriton
2012	UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique. Somayeh Sardashti, David A. Wood
2012	Unified memory optimizing architecture: memory subsystem control with a unified predictor. Yasuo Ishii, Mary Inaba, Kei Hiraki
2012	clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. Bor-Yiing Su, Kurt Keutzer