ICS - RankMe – RankMe

69 papers

Year	Title / Authors
2009	/scratch as a cache: rethinking HPC center scratch storage. Henry M. Monti, Ali Raza Butt, Sudharshan S. Vazhkudai
2009	A comprehensive power-performance model for NoCs with multi-flit channel buffers. Mohammad Arjomand, Hamid Sarbazi-Azad
2009	A european perspective on supercomputing. Mateo Valero
2009	A graph based approach for MPI deadlock detection. Tobias Hilbrich, Bronis R. de Supinski, Martin Schulz, Matthias S. Müller
2009	A parallel levenberg-marquardt algorithm. Jun Cao, Krista A. Novstrup, Ayush Goyal, Samuel P. Midkiff, James M. Caruthers
2009	A translation system for enabling data mining applications on GPUs. Wenjing Ma, Gagan Agrawal
2009	Access map pattern matching for data cache prefetch. Yasuo Ishii, Mary Inaba, Kei Hiraki
2009	Adagio: making DVS practical for complex HPC applications. Barry Rountree, David K. Lowenthal, Bronis R. de Supinski, Martin Schulz, Vincent W. Freeh, Tyler K. Bletsch
2009	An infrastructure for scalable and portable parallel programs for computational chemistry. Victor Lotrich, Norbert Flocke, Mark Ponton, Beverly A. Sanders, Erik Deumens, Rodney J. Bartlett, Ajith Perera
2009	Approximate kernel matrix computation on GPUs forlarge scale learning applications. Mohamed E. Hussein, Wael Abd-Almageed
2009	Auto-vectorization through code generation for stream processing applications. Huayong Wang, Henrique Andrade, Bugra Gedik, Kun-Lung Wu
2009	Cancellation of loads that return zero using zero-value caches. Md. Mafijul Islam, Sally A. McKee, Per Stenström
2009	Chunking parallel loops in the presence of synchronization. Jun Shirako, Jisheng M. Zhao, V. Krishna Nandivada, Vivek Sarkar
2009	Combining thread level speculation helper threads and runahead execution. Polychronis Xekalakis, Nikolas Ioannou, Marcelo Cintra
2009	Computer generation of fast fourier transforms for the cell broadband engine. Srinivas Chellappa, Franz Franchetti, Markus Püschel
2009	Computing outside the box. Ian T. Foster
2009	Creating artificial global history to improve branch prediction accuracy. Leo Porter, Dean M. Tullsen
2009	DBDB: optimizing DMATransfer for the cell be architecture. Tao Liu, Haibo Lin, Tong Chen, Kevin O'Brien, Ling Shao
2009	Design of a novel SIMD architecture by fusing operations and registers. Jih-Ching Chiu, Kai-Ming Yang, Yu-Liang Chou
2009	Designing multi-socket systems using silicon photonics. Scott Beamer, Krste Asanovic, Christopher Batten, Ajay Joshi, Vladimir Stojanovic
2009	Divide-and-conquer: a bubble replacement for low level caches. Chuanjun Zhang, Bing Xue
2009	Dynamic cache clustering for chip multiprocessors. Mohammad Hammoud, Sangyeun Cho, Rami G. Melhem
2009	Dynamic parallelization of single-threaded binary programs using speculative slicing. Cheng Wang, Youfeng Wu, Edson Borin, Shiliang Hu, Wei Liu, Dave Sager, Tin-Fook Ngai, Jesse Fang
2009	Dynamic task set partitioning based on balancing memory requirements to reduce power consumption. Diana Bautista, Julio Sahuquillo, Houcine Hassan, Salvador Petit, José Duato
2009	Dynamic topology aware load balancing algorithms for molecular dynamics applications. Abhinav Bhatele, Laxmikant V. Kalé, Sameer Kumar
2009	Efficient high performance collective communication for the cell blade. Qasim Ali, Samuel P. Midkiff, Vijay S. Pai
2009	EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems. Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, V. S. Anil Kumar, Madhav V. Marathe
2009	Evaluating high performance communication: a power perspective. Jiuxing Liu, Dan E. Poff, Bülent Abali
2009	Exploring pattern-aware routing in generalized fat tree networks. Germán Rodríguez, Ramón Beivide, Cyriel Minkenberg, Jesús Labarta, Mateo Valero
2009	FTL design exploration in reconfigurable high-performance SSD for server applications. Ji-Yong Shin, Zenglin Xia, Ning-Yi Xu, Rui Gao, Xiongfei Cai, Seungryoul Maeng, Feng-Hsiung Hsu
2009	Fast and scalable list ranking on the GPU. M. Suhail Rehman, Kishore Kothapalli, P. J. Narayanan
2009	Fast memory snapshot for concurrent programmingwithout synchronization. JaeWoong Chung, Woongki Baek, Christos Kozyrakis
2009	High-performance CUDA kernel execution on FPGAs. Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu
2009	High-performance regular expression scanning on the Cell/B.E. processor. Daniele Paolo Scarpazza, Gregory F. Russell
2009	How GPUs can outperform ASICs for fast LDPC decoding. Gabriel Falcão Paiva Fernandes, Vítor Manuel Mendes da Silva, Leonel Sousa
2009	Implementation of a wide-angle lens distortion correction algorithm on the cell broadband engine. Konstantis Daloukas, Christos D. Antonopoulos, Nikolaos Bellas
2009	Less reused filter: improving l2 cache performance via filtering less reused lines. Lingxiang Xiang, Tianzhou Chen, Qingsong Shi, Wei Hu
2009	Limited early value communication to improve performance of transactional memory. Salil Mohan Pant, Gregory T. Byrd
2009	Load balancing using work-stealing for pipeline parallelism in emerging applications. Angeles G. Navarro, Rafael Asenjo, Siham Tabik, Calin Cascaval
2009	MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations. Ahmad Faraj, Sameer Kumar, Brian E. Smith, Amith R. Mamidala, John A. Gunnels, Philip Heidelberger
2009	MPI-aware compiler optimizations for improving communication-computation overlap. Anthony Danalis, Lori L. Pollock, D. Martin Swany, John Cavazos
2009	Maximizing MPI point-to-point communication performance on RDMA-enabled clusters with customized protocols. Matthew Small, Xin Yuan
2009	OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations. Hiroshi Nakashima, Yohei Miyake, Hideyuki Usui, Yoshiharu Omura
2009	P-Code: a new RAID-6 code with optimal properties. Chao Jin, Hong Jiang, Dan Feng, Lei Tian
2009	PARSEC: hardware profiling of emerging workloads for CMP design. Major Bhadauria, Vincent M. Weaver, Sally A. McKee
2009	Parametric multi-level tiling of imperfectly nested loops. Albert Hartono, Muthu Manikandan Baskaran, Cédric Bastoul, Albert Cohen, Sriram Krishnamoorthy, Boyana Norris, J. Ramanujam, P. Sadayappan
2009	Pattern-based sparse matrix representation for memory-efficient SMVM kernels. Mehmet Belgin, Godmar Back, Calvin J. Ribbens
2009	Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. Jiayuan Meng, Kevin Skadron
2009	Performance modeling for DFT algorithms in FFTW. Liang Gu, Xiaoming Li
2009	Practice of parallelizing network applications on multi-core architectures. Junchang Wang, Haipeng Cheng, Bei Hua, Xinan Tang
2009	Prediction-based power estimation and scheduling for CMPs. Karan Singh, Major Bhadauria, Sally A. McKee
2009	Prefetch optimizations on large-scale applications via parameter value prediction. Shih-Wei Liao, Tzu-Han Hung, Donald Nguyen, Hucheng Zhou, Chinyen Chou, Chia-Heng Tu
2009	Proceedings of the 23rd international conference on Supercomputing, 2009, Yorktown Heights, NY, USA, June 8-12, 2009 Michael Gschwind, Alexandru Nicolau, Valentina Salapura, José E. Moreira
2009	QuakeTM: parallelizing a complex sequential application using transactional memory. Vladimir Gajinov, Ferad Zyulkyarov, Osman S. Unsal, Adrián Cristal, Eduard Ayguadé, Tim Harris, Mateo Valero
2009	R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems. Chuanyi Liu, Yu Gu, Linchun Sun, Bin Yan, Dongsheng Wang
2009	Rate-based QoS techniques for cache/memory in CMP platforms. Andrew Herdrich, Ramesh Illikkal, Ravi R. Iyer, Donald Newell, Vineet Chadha, Jaideep Moses
2009	Refereeing conflicts in hardware transactional memory. Arrvindh Shriraman, Sandhya Dwarkadas
2009	Single-particle 3d reconstruction from cryo-electron microscopy images on GPU. Guangming Tan, Ziyu Guo, Mingyu Chen, Dan Meng
2009	Subdomain communication to increase scalability in large-scale scientific applications. Aleksandr Ovcharenko, Onkar Sahni, Christopher D. Carothers, Kenneth E. Jansen, Mark S. Shephard
2009	Synchronization optimizations for efficient execution on multi-cores. Alexandru Nicolau, Guangqiang Li, Alexander V. Veidenbaum, Arun Kejariwal
2009	The roadrunner project and the importance of energy efficiency on the road to exascale computing. Don G. Grice
2009	Thrifty interconnection network for HPC systems. Jian Li, Lixin Zhang, Charles Lefurgy, Richard R. Treumann, Wolfgang E. Denzel
2009	Towards 100 gbit/s ethernet: multicore-based parallel communication protocol design. Stavros Passas, Kostas Magoutis, Angelos Bilas
2009	TransMetric: architecture independent workload characterization for transactional memory benchmarks. James Poe, Clay Hughes, Tao Li
2009	Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. Sundaresan Venkatasubramanian, Richard W. Vuduc
2009	Understanding the interconnection network of SpiNNaker. Javier Navaridas, Mikel Luján, José Miguel-Alonso, Luis A. Plana, Steve B. Furber
2009	Using many-core hardware to correlate radio astronomy signals. Rob van Nieuwpoort, John W. Romein
2009	Virtualization polling engine (VPE): using dedicated CPU cores to accelerate I/O virtualization. Jiuxing Liu, Bülent Abali
2009	Zero-content augmented caches. Julien Dusser, Thomas Piquet, André Seznec