ICS A

46 papers

YearTitle / Authors
2019A communication-avoiding 3D sparse triangular solver.
Piyush Sao, Ramakrishnan Kannan, Xiaoye Sherry Li, Richard W. Vuduc
2019A scalable framework for adaptive computational general relativity on heterogeneous clusters.
Milinda Fernando, David Neilsen, Eric W. Hirschmann, Hari Sundar
2019AMPT-GA: automatic mixed precision floating point tuning for GPU applications.
Pradeep V. Kotipalli, Ranvijay Singh, Paul Wood, Ignacio Laguna, Saurabh Bagchi
2019Accelerating reduction and scan using tensor core units.
Abdul Dakkak, Cheng Li, Jinjun Xiong, Isaac Gelado, Wen-mei W. Hwu
2019Address-stride assisted approximate load value prediction in GPUs.
Haonan Wang, Mohamed Assem Ibrahim, Sparsh Mittal, Adwait Jog
2019An online quality management framework for approximate communication in network-on-chips.
Yuechen Chen, Ahmed Louri
2019Automatic construct selection and variable classification in OpenMP.
Mohammad Norouzi Arab, Felix Wolf, Ali Jannesari
2019Avalon: towards QoS awareness and improved utilization through multi-resource management in datacenters.
Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, Minyi Guo
2019BonVoision: leveraging spatial data smoothness for recovery from memory soft errors.
Bo Fang, Hassan Halawa, Karthik Pattabiraman, Matei Ripeanu, Sriram Krishnamoorthy
2019Can we trust profiling results?: understanding and fixing the inaccuracy in modern profilers.
Hao Xu, Qingsen Wang, Shuang Song, Lizy Kurian John, Xu Liu
2019Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse.
Lin Ning, Xipeng Shen
2019DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture.
Cunlu Li, Dezun Dong, Xiangke Liao, John Kim, Changhyun Kim
2019Diligent TLBs: a mechanism for exploiting heterogeneity in TLB miss behavior.
Hussein Elnawawy, Rangeen Basu Roy Chowdhury, Amro Awad, Gregory T. Byrd
2019Dynamically linked MSHRs for adaptive miss handling in GPUs.
Yongbin Gu, Lizhong Chen
2019Efficient GPU tree walks for effective distributed n-body simulations.
Jianqiao Liu, Michael P. Robson, Thomas Quinn, Milind Kulkarni
2019Efficient and effective sparse tensor reordering.
Jiajia Li, Bora Uçar, Ümit V. Çatalyürek, Jimeng Sun, Kevin J. Barker, Richard W. Vuduc
2019Efficient hierarchical online-autotuning: a case study on polyhedral accelerator mapping.
Philip Pfaffe, Tobias Grosser, Martin Peter Tillmann
2019Efficient thread/page/parallelism autotuning for NUMA systems.
Mihail Popov, Alexandra Jimborean, David Black-Schaffer
2019Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation.
Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong
2019GPU road network graph contraction and SSSP query.
Roozbeh Karimi, David M. Koppelman, Chris J. Michael
2019GPU snapshot: checkpoint offloading for GPU-dense systems.
Kyushick Lee, Michael B. Sullivan, Siva Kumar Sastry Hari, Timothy Tsai, Stephen W. Keckler, Mattan Erez
2019GPUGuard: mitigating contention based side and covert channel attacks on GPUs.
Qiumin Xu, Hoda Naghibijouybari, Shibo Wang, Nael B. Abu-Ghazaleh, Murali Annavaram
2019GreenMM: energy efficient GPU matrix multiplication through undervolting.
Hadi Zamani, Yuanlai Liu, Devashree Tripathy, Laxmi N. Bhuyan, Zizhong Chen
2019HYPHA: a framework based on separation of parallelisms to accelerate persistent homology matrix reduction.
Simon Zhang, Mengbai Xiao, Chengxin Guo, Liang Geng, Hao Wang, Xiaodong Zhang
2019Henosis: workload-driven small array consolidation and placement for HDF5 applications on heterogeneous data stores.
Donghe Kang, Vedang Patel, Ashwati Nair, Spyros Blanas, Yang Wang, Srinivasan Parthasarathy
2019Hybrid CPU/GPU clustering in shared memory on the billion point scale.
Michael Gowanlock
2019IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication.
Zhen Xie, Guangming Tan, Weifeng Liu, Ninghui Sun
2019Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters.
Wei Zhang, Weihao Cui, Kaihua Fu, Quan Chen, Daniel Edward Mawhirter, Bo Wu, Chao Li, Minyi Guo
2019Least squares solvers for distributed-memory machines with GPU accelerators.
Jakub Kurzak, Mark Gates, Ali Charara, Asim YarKhan, Jack J. Dongarra
2019Multi-criteria partitioning of multi-block structured grids.
Hengjie Wang, Aparna Chandramowlishwaran
2019O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning.
Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Wei Wu, Ang Li, Martin C. Herbordt
2019On optimizing distributed non-negative Tucker decomposition.
Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, Yogish Sabharwal
2019Optimizing computation-communication overlap in asynchronous task-based programs.
Emilio Castillo, Nikhil Jain, Marc Casas, Miquel Moretó, Martin Schulz, Ramón Beivide, Mateo Valero, Abhinav Bhatele
2019Optimizing the linear fascicle evaluation algorithm for many-core systems.
Karan Aggarwal, Uday Bondhugula
2019Parallelizing cryo-EM 3D reconstruction on GPU cluster with a partitioned and streamed model.
Kunpeng Wang, Shizhen Xu, Haohuan Fu, Hongkun Yu, Wenlai Zhao, Guangwen Yang
2019Performance optimization of reactive molecular dynamics simulations with dynamic charge distribution models on distributed memory platforms.
Kurt A. O'Hearn, Abdullah Alperen, Hasan Metin Aktulga
2019Power efficient job scheduling by predicting the impact of processor manufacturing variability.
Dimitrios Chasapis, Miquel Moretó, Martin Schulz, Barry Rountree, Mateo Valero, Marc Casas
2019Proceedings of the ACM International Conference on Supercomputing, ICS 2019, Phoenix, AZ, USA, June 26-28, 2019
Rudolf Eigenmann, Chen Ding, Sally A. McKee
2019QoSMT: supporting precise performance control for simultaneous multithreading architecture.
Xin Jin, Yaoyang Zhou, Bowen Huang, Zihao Yu, Xusheng Zhan, Huizhe Wang, Sa Wang, Ningmei Yu, Ninghui Sun, Yungang Bao
2019RFAcc: a 3D ReRAM associative array based random forest accelerator.
Lei Zhao, Quan Deng, Youtao Zhang, Jun Yang
2019SDC: a software defined cache for efficient data indexing.
Fan Ni, Song Jiang, Hong Jiang, Jian Huang, Xingbo Wu
2019Software combining to mitigate multithreaded MPI contention.
Abdelhalim Amer, Charles Archer, Michael Blocksome, Chongxiao Cao, Michael Chuvelev, Hajime Fujita, Maria Garzaran, Yanfei Guo, Jeff R. Hammond, Shintaro Iwasaki, Kenneth J. Raffenetti, Mikhail Shiryaev, Min Si, Kenjiro Taura, Sagar Thapaliya, Pavan Balaji
2019TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs.
Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan DeBardeleben, Qiang Guan, Zizhong Chen
2019The anatomy of efficient FFT and winograd convolutions on modern CPUs.
Aleksandar Zlateski, Zhen Jia, Kai Li, Frédo Durand
2019Using performance models to understand scalable Krylov solver performance at scale for structured grid problems.
Paul R. Eller, Torsten Hoefler, William Gropp
2019WCCV: improving the vectorization of IF-statements with warp-coherent conditions.
Huihui Sun, Florian Fey, Jie Zhao, Sergei Gorlatch