ICS A

40 papers

YearTitle / Authors
2022A data-centric optimization framework for machine learning.
Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler
2022ASAP: automatic synthesis of area-efficient and precision-aware CGRAs.
Cheng Tan, Thierry Tambe, Jeff Jun Zhang, Bo Fang, Tong Geng, Gu-Yeon Wei, David Brooks, Antonino Tumeo, Ganesh Gopalakrishnan, Ang Li
2022AnySeq/GPU: a novel approach for faster sequence alignment on GPUs.
André Müller, Bertil Schmidt, Richard Membarth, Roland Leißa, Sebastian Hack
2022Beyond time complexity: data movement complexity analysis for matrix multiplication.
Wesley Smith, Aidan Goldfarb, Chen Ding
2022Bring orders into uncertainty: enabling efficient uncertain graph processing via novel path sampling on multi-accelerator systems.
Heng Zhang, Lingda Li, Hang Liu, Donglin Zhuang, Rui Liu, Chengying Huan, Shuang Song, Dingwen Tao, Yongchao Liu, Charles He, Yanjun Wu, Shuaiwen Leon Song
2022CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression.
Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li, Dingwen Tao
2022Calipers: a criticality-aware framework for modeling processor performance.
Hossein Golestani, Rathijit Sen, Vinson Young, Gagan Gupta
2022Clairvoyant: a log-based transformer-decoder for failure prediction in large-scale systems.
Khalid Ayedh Alharthi, Arshad Jhumka, Sheng Di, Franck Cappello
2022Cloak: tolerating non-volatile cache read latency.
Apostolos Kokolis, Namrata Mantri, Shrikanth Ganapathy, Josep Torrellas, John Kalamatianos
2022Dense dynamic blocks: optimizing SpMM for processors with vector and matrix units using machine learning techniques.
Serif Yesil, José E. Moreira, Josep Torrellas
2022Dynamic memory management in massively parallel systems: a case on GPUs.
Minh Pham, Hao Li, Yongke Yuan, Chengcheng Mou, Kandethody Ramachandran, Zichen Xu, Yicheng Tu
2022Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores.
Zhuoran Ji, Cho-Li Wang
2022Efficient, out-of-memory sparse MTTKRP on massively parallel architectures.
Andy Nguyen, Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Jesmin Jahan Tithi, Yongseok Soh, Teresa M. Ranadive, Fabrizio Petrini, Jee W. Choi
2022Efficiently emulating high-bitwidth computation with low-bitwidth hardware.
Zixuan Ma, Haojie Wang, Guanyu Feng, Chen Zhang, Lei Xie, Jiaao He, Shengqi Chen, Jidong Zhai
2022Fast-track cache: a huge racetrack memory L1 data cache.
Hugo Tárrega, Alejandro Valero, Vicente Lorente, Salvador Petit, Julio Sahuquillo
2022GAPS: GPU-acceleration of PDE solvers for wave simulation.
Bagus Hanindhito, Dimitrios Gourounas, Arash Fathi, Dimitar Trenev, Andreas Gerstlauer, Lizy K. John
2022Handling heavy-tailed input of transformer inference on GPUs.
Jiangsu Du, Jiazhi Jiang, Yang You, Dan Huang, Yutong Lu
2022High throughput multidimensional tridiagonal system solvers on FPGAs.
Kamalakkannan Kamalavasan, Gihan R. Mudalige, István Z. Reguly, Suhaib A. Fahmy
2022ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28 - 30, 2022
Lawrence Rauchwerger, Kirk W. Cameron, Dimitrios S. Nikolopoulos, Dionisios N. Pnevmatikatos
2022KrakenOnMem: a memristor-augmented HW/SW framework for taxonomic profiling.
Taha Shahroodi, Mahdi Zahedi, Abhairaj Singh, Stephan Wong, Said Hamdioui
2022LITE: a low-cost practical inter-operable GPU TEE.
Ardhi Wiratama Baskara Yudha, Jake Meyer, Shougang Yuan, Huiyang Zhou, Yan Solihin
2022Lifting C semantics for dataflow optimization.
Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad, Torsten Hoefler
2022Low overhead and context sensitive profiling of CPU-accelerated applications.
Keren Zhou, Jonathon M. Anderson, Xiaozhu Meng, John M. Mellor-Crummey
2022MASTIFF: structure-aware minimum spanning tree/forest.
Mohsen Koohi Esfahani, Peter Kilpatrick, Hans Vandierendonck
2022MegTaiChi: dynamic tensor-based memory management optimization for DNN training.
Zhongzhe Hu, Junmin Xiao, Zheye Deng, Mingyi Li, Kewei Zhang, Xiaoyang Zhang, Ke Meng, Ninghui Sun, Guangming Tan
2022Optimized MPI collective algorithms for dragonfly topology.
Guangnan Feng, Dezun Dong, Yutong Lu
2022PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.
Shulai Zhang, Weihao Cui, Quan Chen, Zhengnian Zhang, Yue Guan, Jingwen Leng, Chao Li, Minyi Guo
2022Parallel K-clique counting on GPUs.
Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu
2022Performance-detective: automatic deduction of cheap and accurate performance models.
Larissa Schmid, Marcin Copik, Alexandru Calotoiu, Dominik Werle, Andreas Reiter, Michael Selzer, Anne Koziolek, Torsten Hoefler
2022Preparing for performance analysis at exascale.
Jonathon M. Anderson, Yumeng Liu, John M. Mellor-Crummey
2022Rethinking graph data placement for graph neural network training on multiple GPUs.
Shihui Song, Peng Jiang
2022Seamless optimization of the GEMM kernel for task-based programming models.
Arthur Francisco Lorenzon, Sandro Matheus V. N. Marques, Antoni C. Navarro, Vicenç Beltran
2022SnuHPL: high performance LINPACK for heterogeneous GPUs.
Jinpyo Kim, Hyungdal Kwon, Jintaek Kang, Jihwan Park, Seungwook Lee, Jaejin Lee
2022SnuQS: scaling quantum circuit simulation using storage devices.
Daeyoung Park, Heehoon Kim, Jinpyo Kim, Taehyun Kim, Jaejin Lee
2022Software-defined floating-point number formats and their application to graph processing.
Hans Vandierendonck
2022SparseLNR: accelerating sparse tensor computations using loop nest restructuring.
Adhitha Dias, Kirshanthan Sundararajah, Charitha Saumya, Milind Kulkarni
2022Toward accelerated stencil computation by adapting tensor core unit on GPU.
Xiaoyan Liu, Yi Liu, Hailong Yang, Jianjin Liao, Mingzhen Li, Zhongzhi Luan, Depei Qian
2022Towards low-latency I/O services for mixed workloads using ultra-low latency SSDs.
Mingzhe Liu, Haikun Liu, Chencheng Ye, Xiaofei Liao, Hai Jin, Yu Zhang, Ran Zheng, Liting Hu
2022VICO: demand-driven verification for improving compiler optimizations.
Sharjeel Khan, Bodhisatwa Chatterjee, Santosh Pande
2022uiCA: accurate throughput prediction of basic blocks on recent intel microarchitectures.
Andreas Abel, Jan Reineke