ICS A

34 papers

Year	Title / Authors
2010	A compiler-automated array compression scheme for optimizing memory intensive programs. Lixia Liu, Zhiyuan Li
2010	A query language for understanding component interactions in production systems. Adam J. Oliner, Alex Aiken
2010	Adaptive multi-level cache allocation in distributed storage architectures. Ramya Prabhakar, Shekhar Srikantaiah, Mahmut T. Kandemir, Christina M. Patrick
2010	An approach to resource-aware co-scheduling for CMPs. Major Bhadauria, Sally A. McKee
2010	An empirically tuned 2D and 3D FFT library on CUDA GPU. Liang Gu, Xiaoming Li, Jakob Siegel
2010	An experimental approach to performance measurement of heterogeneous parallel applications using CUDA. Allen D. Malony, Scott Biersdorff, Wyatt Spear, Shangkar Mayanglambam
2010	Cache oblivious parallelograms in iterative stencil computations. Robert Strzodka, Mohammed Shaheen, Dawid Pajak, Hans-Peter Seidel
2010	Clustering performance data efficiently at massive scales. Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, Daniel A. Reed
2010	Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. Vignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal
2010	Decomposable and responsive power models for multicore processors using performance counters. Ramon Bertran, Marc González, Xavier Martorell, Nacho Navarro, Eduard Ayguadé
2010	Enigma: architectural and operating system support for reducing the impact of address translation. Lixin Zhang, Evan Speight, Ramakrishnan Rajamony, Jiang Lin
2010	Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine. Chi Ching Chi, Ben H. H. Juurlink, Cor Meenderinck
2010	Exascale science: the next frontier in high performance computing. Stephen S. Pawlowski
2010	FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing. Yong Dou, Yuanwu Lei, Guiming Wu, Song Guo, Jie Zhou, Li Shen
2010	Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering. Atabak Mahram, Martin C. Herbordt
2010	Handling task dependencies under strided and aliased references. Josep M. Pérez, Rosa M. Badia, Jesús Labarta
2010	High-throughput Bayesian network learning using heterogeneous multicore computers. Michael D. Linderman, Robert V. Bruggner, Vivek Athalye, Teresa H. Meng, Narges Bani Asadi, Garry P. Nolan
2010	How to unleash array optimizations on code using recursive data structures. Harmen L. A. van der Spek, C. W. Mattias Holm, Harry A. G. Wijshoff
2010	Indemics: an interactive data intensive framework for high performance epidemic simulation. Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, Yifei Ma, Madhav V. Marathe
2010	Large-scale FFT on GPU clusters. Yifeng Chen, Xiang Cui, Hong Mei
2010	Making nested parallel transactions practical using lightweight hardware support. Woongki Baek, Nathan Grasso Bronson, Christos Kozyrakis, Kunle Olukotun
2010	Optimal bucket algorithms for large MPI collectives on torus interconnects. Nikhil Jain, Yogish Sabharwal
2010	Overlapping communication and computation by using a hybrid MPI/SMPSs approach. Vladimir Marjanovic, Jesús Labarta, Eduard Ayguadé, Mateo Valero
2010	Proceedings of the 24th International Conference on Supercomputing, 2010, Tsukuba, Ibaraki, Japan, June 2-4, 2010 Taisuke Boku, Hiroshi Nakashima, Avi Mendelson
2010	Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application. Sreeram Potluri, Ping Lai, Karen A. Tomko, Sayantan Sur, Yifeng Cui, Mahidhar Tatineni, Karl W. Schulz, William L. Barth, Amitava Majumdar, Dhabaleswar K. Panda
2010	SAMS multi-layout memory: providing multiple views of data to boost SIMD performance. Chunyang Gou, Georgi Kuzmanov, Georgi Gaydadjiev
2010	Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization. Jamin Naghmouchi, Daniele Paolo Scarpazza, Mladen Berekovic
2010	Speeding up Nek5000 with autotuning and specialization. Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul F. Fischer, Paul D. Hovland
2010	Static reuse distances for locality-based optimizations in MATLAB. Arun Chauhan, Chun-Yu Shei
2010	Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng Shen
2010	The auction: optimizing banks usage in Non-Uniform Cache Architectures. Javier Lira, Carlos Molina, Antonio González
2010	The next-generation supercomputer project and a plan for the advanced institute for computational science. Kimihiko Hirao
2010	Throughput computing. William J. Dally
2010	Timing local streams: improving timeliness in data prefetching. Huaiyu Zhu, Yong Chen, Xian-He Sun