ICS A

69 papers

YearTitle / Authors
2009/scratch as a cache: rethinking HPC center scratch storage.
Henry M. Monti, Ali Raza Butt, Sudharshan S. Vazhkudai
2009A comprehensive power-performance model for NoCs with multi-flit channel buffers.
Mohammad Arjomand, Hamid Sarbazi-Azad
2009A european perspective on supercomputing.
Mateo Valero
2009A graph based approach for MPI deadlock detection.
Tobias Hilbrich, Bronis R. de Supinski, Martin Schulz, Matthias S. Müller
2009A parallel levenberg-marquardt algorithm.
Jun Cao, Krista A. Novstrup, Ayush Goyal, Samuel P. Midkiff, James M. Caruthers
2009A translation system for enabling data mining applications on GPUs.
Wenjing Ma, Gagan Agrawal
2009Access map pattern matching for data cache prefetch.
Yasuo Ishii, Mary Inaba, Kei Hiraki
2009Adagio: making DVS practical for complex HPC applications.
Barry Rountree, David K. Lowenthal, Bronis R. de Supinski, Martin Schulz, Vincent W. Freeh, Tyler K. Bletsch
2009An infrastructure for scalable and portable parallel programs for computational chemistry.
Victor Lotrich, Norbert Flocke, Mark Ponton, Beverly A. Sanders, Erik Deumens, Rodney J. Bartlett, Ajith Perera
2009Approximate kernel matrix computation on GPUs forlarge scale learning applications.
Mohamed E. Hussein, Wael Abd-Almageed
2009Auto-vectorization through code generation for stream processing applications.
Huayong Wang, Henrique Andrade, Bugra Gedik, Kun-Lung Wu
2009Cancellation of loads that return zero using zero-value caches.
Md. Mafijul Islam, Sally A. McKee, Per Stenström
2009Chunking parallel loops in the presence of synchronization.
Jun Shirako, Jisheng M. Zhao, V. Krishna Nandivada, Vivek Sarkar
2009Combining thread level speculation helper threads and runahead execution.
Polychronis Xekalakis, Nikolas Ioannou, Marcelo Cintra
2009Computer generation of fast fourier transforms for the cell broadband engine.
Srinivas Chellappa, Franz Franchetti, Markus Püschel
2009Computing outside the box.
Ian T. Foster
2009Creating artificial global history to improve branch prediction accuracy.
Leo Porter, Dean M. Tullsen
2009DBDB: optimizing DMATransfer for the cell be architecture.
Tao Liu, Haibo Lin, Tong Chen, Kevin O'Brien, Ling Shao
2009Design of a novel SIMD architecture by fusing operations and registers.
Jih-Ching Chiu, Kai-Ming Yang, Yu-Liang Chou
2009Designing multi-socket systems using silicon photonics.
Scott Beamer, Krste Asanovic, Christopher Batten, Ajay Joshi, Vladimir Stojanovic
2009Divide-and-conquer: a bubble replacement for low level caches.
Chuanjun Zhang, Bing Xue
2009Dynamic cache clustering for chip multiprocessors.
Mohammad Hammoud, Sangyeun Cho, Rami G. Melhem
2009Dynamic parallelization of single-threaded binary programs using speculative slicing.
Cheng Wang, Youfeng Wu, Edson Borin, Shiliang Hu, Wei Liu, Dave Sager, Tin-Fook Ngai, Jesse Fang
2009Dynamic task set partitioning based on balancing memory requirements to reduce power consumption.
Diana Bautista, Julio Sahuquillo, Houcine Hassan, Salvador Petit, José Duato
2009Dynamic topology aware load balancing algorithms for molecular dynamics applications.
Abhinav Bhatele, Laxmikant V. Kalé, Sameer Kumar
2009Efficient high performance collective communication for the cell blade.
Qasim Ali, Samuel P. Midkiff, Vijay S. Pai
2009EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems.
Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, V. S. Anil Kumar, Madhav V. Marathe
2009Evaluating high performance communication: a power perspective.
Jiuxing Liu, Dan E. Poff, Bülent Abali
2009Exploring pattern-aware routing in generalized fat tree networks.
Germán Rodríguez, Ramón Beivide, Cyriel Minkenberg, Jesús Labarta, Mateo Valero
2009FTL design exploration in reconfigurable high-performance SSD for server applications.
Ji-Yong Shin, Zenglin Xia, Ning-Yi Xu, Rui Gao, Xiongfei Cai, Seungryoul Maeng, Feng-Hsiung Hsu
2009Fast and scalable list ranking on the GPU.
M. Suhail Rehman, Kishore Kothapalli, P. J. Narayanan
2009Fast memory snapshot for concurrent programmingwithout synchronization.
JaeWoong Chung, Woongki Baek, Christos Kozyrakis
2009High-performance CUDA kernel execution on FPGAs.
Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu
2009High-performance regular expression scanning on the Cell/B.E. processor.
Daniele Paolo Scarpazza, Gregory F. Russell
2009How GPUs can outperform ASICs for fast LDPC decoding.
Gabriel Falcão Paiva Fernandes, Vítor Manuel Mendes da Silva, Leonel Sousa
2009Implementation of a wide-angle lens distortion correction algorithm on the cell broadband engine.
Konstantis Daloukas, Christos D. Antonopoulos, Nikolaos Bellas
2009Less reused filter: improving l2 cache performance via filtering less reused lines.
Lingxiang Xiang, Tianzhou Chen, Qingsong Shi, Wei Hu
2009Limited early value communication to improve performance of transactional memory.
Salil Mohan Pant, Gregory T. Byrd
2009Load balancing using work-stealing for pipeline parallelism in emerging applications.
Angeles G. Navarro, Rafael Asenjo, Siham Tabik, Calin Cascaval
2009MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations.
Ahmad Faraj, Sameer Kumar, Brian E. Smith, Amith R. Mamidala, John A. Gunnels, Philip Heidelberger
2009MPI-aware compiler optimizations for improving communication-computation overlap.
Anthony Danalis, Lori L. Pollock, D. Martin Swany, John Cavazos
2009Maximizing MPI point-to-point communication performance on RDMA-enabled clusters with customized protocols.
Matthew Small, Xin Yuan
2009OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations.
Hiroshi Nakashima, Yohei Miyake, Hideyuki Usui, Yoshiharu Omura
2009P-Code: a new RAID-6 code with optimal properties.
Chao Jin, Hong Jiang, Dan Feng, Lei Tian
2009PARSEC: hardware profiling of emerging workloads for CMP design.
Major Bhadauria, Vincent M. Weaver, Sally A. McKee
2009Parametric multi-level tiling of imperfectly nested loops.
Albert Hartono, Muthu Manikandan Baskaran, Cédric Bastoul, Albert Cohen, Sriram Krishnamoorthy, Boyana Norris, J. Ramanujam, P. Sadayappan
2009Pattern-based sparse matrix representation for memory-efficient SMVM kernels.
Mehmet Belgin, Godmar Back, Calvin J. Ribbens
2009Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs.
Jiayuan Meng, Kevin Skadron
2009Performance modeling for DFT algorithms in FFTW.
Liang Gu, Xiaoming Li
2009Practice of parallelizing network applications on multi-core architectures.
Junchang Wang, Haipeng Cheng, Bei Hua, Xinan Tang
2009Prediction-based power estimation and scheduling for CMPs.
Karan Singh, Major Bhadauria, Sally A. McKee
2009Prefetch optimizations on large-scale applications via parameter value prediction.
Shih-Wei Liao, Tzu-Han Hung, Donald Nguyen, Hucheng Zhou, Chinyen Chou, Chia-Heng Tu
2009Proceedings of the 23rd international conference on Supercomputing, 2009, Yorktown Heights, NY, USA, June 8-12, 2009
Michael Gschwind, Alexandru Nicolau, Valentina Salapura, José E. Moreira
2009QuakeTM: parallelizing a complex sequential application using transactional memory.
Vladimir Gajinov, Ferad Zyulkyarov, Osman S. Unsal, Adrián Cristal, Eduard Ayguadé, Tim Harris, Mateo Valero
2009R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems.
Chuanyi Liu, Yu Gu, Linchun Sun, Bin Yan, Dongsheng Wang
2009Rate-based QoS techniques for cache/memory in CMP platforms.
Andrew Herdrich, Ramesh Illikkal, Ravi R. Iyer, Donald Newell, Vineet Chadha, Jaideep Moses
2009Refereeing conflicts in hardware transactional memory.
Arrvindh Shriraman, Sandhya Dwarkadas
2009Single-particle 3d reconstruction from cryo-electron microscopy images on GPU.
Guangming Tan, Ziyu Guo, Mingyu Chen, Dan Meng
2009Subdomain communication to increase scalability in large-scale scientific applications.
Aleksandr Ovcharenko, Onkar Sahni, Christopher D. Carothers, Kenneth E. Jansen, Mark S. Shephard
2009Synchronization optimizations for efficient execution on multi-cores.
Alexandru Nicolau, Guangqiang Li, Alexander V. Veidenbaum, Arun Kejariwal
2009The roadrunner project and the importance of energy efficiency on the road to exascale computing.
Don G. Grice
2009Thrifty interconnection network for HPC systems.
Jian Li, Lixin Zhang, Charles Lefurgy, Richard R. Treumann, Wolfgang E. Denzel
2009Towards 100 gbit/s ethernet: multicore-based parallel communication protocol design.
Stavros Passas, Kostas Magoutis, Angelos Bilas
2009TransMetric: architecture independent workload characterization for transactional memory benchmarks.
James Poe, Clay Hughes, Tao Li
2009Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems.
Sundaresan Venkatasubramanian, Richard W. Vuduc
2009Understanding the interconnection network of SpiNNaker.
Javier Navaridas, Mikel Luján, José Miguel-Alonso, Luis A. Plana, Steve B. Furber
2009Using many-core hardware to correlate radio astronomy signals.
Rob van Nieuwpoort, John W. Romein
2009Virtualization polling engine (VPE): using dedicated CPU cores to accelerate I/O virtualization.
Jiuxing Liu, Bülent Abali
2009Zero-content augmented caches.
Julien Dusser, Thomas Piquet, André Seznec