ICS A

34 papers

YearTitle / Authors
2010A compiler-automated array compression scheme for optimizing memory intensive programs.
Lixia Liu, Zhiyuan Li
2010A query language for understanding component interactions in production systems.
Adam J. Oliner, Alex Aiken
2010Adaptive multi-level cache allocation in distributed storage architectures.
Ramya Prabhakar, Shekhar Srikantaiah, Mahmut T. Kandemir, Christina M. Patrick
2010An approach to resource-aware co-scheduling for CMPs.
Major Bhadauria, Sally A. McKee
2010An empirically tuned 2D and 3D FFT library on CUDA GPU.
Liang Gu, Xiaoming Li, Jakob Siegel
2010An experimental approach to performance measurement of heterogeneous parallel applications using CUDA.
Allen D. Malony, Scott Biersdorff, Wyatt Spear, Shangkar Mayanglambam
2010Cache oblivious parallelograms in iterative stencil computations.
Robert Strzodka, Mohammed Shaheen, Dawid Pajak, Hans-Peter Seidel
2010Clustering performance data efficiently at massive scales.
Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, Daniel A. Reed
2010Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations.
Vignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal
2010Decomposable and responsive power models for multicore processors using performance counters.
Ramon Bertran, Marc González, Xavier Martorell, Nacho Navarro, Eduard Ayguadé
2010Enigma: architectural and operating system support for reducing the impact of address translation.
Lixin Zhang, Evan Speight, Ramakrishnan Rajamony, Jiang Lin
2010Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine.
Chi Ching Chi, Ben H. H. Juurlink, Cor Meenderinck
2010Exascale science: the next frontier in high performance computing.
Stephen S. Pawlowski
2010FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing.
Yong Dou, Yuanwu Lei, Guiming Wu, Song Guo, Jie Zhou, Li Shen
2010Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering.
Atabak Mahram, Martin C. Herbordt
2010Handling task dependencies under strided and aliased references.
Josep M. Pérez, Rosa M. Badia, Jesús Labarta
2010High-throughput Bayesian network learning using heterogeneous multicore computers.
Michael D. Linderman, Robert V. Bruggner, Vivek Athalye, Teresa H. Meng, Narges Bani Asadi, Garry P. Nolan
2010How to unleash array optimizations on code using recursive data structures.
Harmen L. A. van der Spek, C. W. Mattias Holm, Harry A. G. Wijshoff
2010Indemics: an interactive data intensive framework for high performance epidemic simulation.
Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, Yifei Ma, Madhav V. Marathe
2010Large-scale FFT on GPU clusters.
Yifeng Chen, Xiang Cui, Hong Mei
2010Making nested parallel transactions practical using lightweight hardware support.
Woongki Baek, Nathan Grasso Bronson, Christos Kozyrakis, Kunle Olukotun
2010Optimal bucket algorithms for large MPI collectives on torus interconnects.
Nikhil Jain, Yogish Sabharwal
2010Overlapping communication and computation by using a hybrid MPI/SMPSs approach.
Vladimir Marjanovic, Jesús Labarta, Eduard Ayguadé, Mateo Valero
2010Proceedings of the 24th International Conference on Supercomputing, 2010, Tsukuba, Ibaraki, Japan, June 2-4, 2010
Taisuke Boku, Hiroshi Nakashima, Avi Mendelson
2010Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application.
Sreeram Potluri, Ping Lai, Karen A. Tomko, Sayantan Sur, Yifeng Cui, Mahidhar Tatineni, Karl W. Schulz, William L. Barth, Amitava Majumdar, Dhabaleswar K. Panda
2010SAMS multi-layout memory: providing multiple views of data to boost SIMD performance.
Chunyang Gou, Georgi Kuzmanov, Georgi Gaydadjiev
2010Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization.
Jamin Naghmouchi, Daniele Paolo Scarpazza, Mladen Berekovic
2010Speeding up Nek5000 with autotuning and specialization.
Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul F. Fischer, Paul D. Hovland
2010Static reuse distances for locality-based optimizations in MATLAB.
Arun Chauhan, Chun-Yu Shei
2010Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping.
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng Shen
2010The auction: optimizing banks usage in Non-Uniform Cache Architectures.
Javier Lira, Carlos Molina, Antonio González
2010The next-generation supercomputer project and a plan for the advanced institute for computational science.
Kimihiko Hirao
2010Throughput computing.
William J. Dally
2010Timing local streams: improving timeliness in data prefetching.
Huaiyu Zhu, Yong Chen, Xian-He Sun