ICS A

41 papers

YearTitle / Authors
2023A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.
Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele
2023Accelerating BWA-MEM Read Mapping on GPUs.
Minh Pham, Yicheng Tu, Xiaoyi Lv
2023Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs.
Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen
2023BiRFIA: Selective Binary Rewriting for Function Interception on ARM.
Kelun Lei, Xin You, Hailong Yang, Zhongzhi Luan, Depei Qian
2023BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs.
Jou-An Chen, Hsin-Hsuan Sung, Xipeng Shen, Sutanay Choudhury, Ang Li
2023CMLCompiler: A Unified Compiler for Classical Machine Learning.
Xu Wen, Wanling Gao, Anzheng Li, Lei Wang, Zihan Jiang, Jianfeng Zhan
2023DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access.
Meghana Madhyastha, Robert Underwood, Randal C. Burns, Bogdan Nicolae
2023Distributed-Memory Parallel JointNMF.
Srinivas Eswar, Benjamin Cobb, Koby Hayashi, Ramakrishnan Kannan, Grey Ballard, Richard W. Vuduc, Haesun Park
2023DyVer: Dynamic Version Handling for Array Databases.
Amelie Chi Zhou, Zhoubin Ke, Jianming Lao
2023Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication.
Nicholas Contini, Bharath Ramesh, Kaushik Kandadi Suresh, Tu Tran, Benjamin Michalowicz, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda
2023FAZ: A flexible auto-tuned modular error-bounded compression framework for scientific data.
Jinyang Liu, Sheng Di, Kai Zhao, Xin Liang, Zizhong Chen, Franck Cappello
2023FLASH: FPGA-Accelerated Smart Switches with GCN Case Study.
Pouya Haghi, William Krska, Cheng Tan, Tong Geng, Po Hao Chen, Connor Greenwood, Anqi Guo, Thomas M. Hines, Chunshu Wu, Ang Li, Anthony Skjellum, Martin C. Herbordt
2023FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance.
Jun Xiao, Yaocheng Xiang, Xiaolin Wang, Yingwei Luo, Andy D. Pimentel, Zhenlin Wang
2023FMI: Fast and Cheap Message Passing for Serverless Functions.
Marcin Copik, Roman Böhringer, Alexandru Calotoiu, Torsten Hoefler
2023FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing.
Xinbiao Gan, Guang Wu, Ruigeng Zeng, Jiaqi Si, Ji Liu, Daxiang Dong, Chunye Gong, Cong Liu, Tiejun Li
2023Fast All-Pairs Shortest Paths Algorithm in Large Sparse Graph.
Shaofeng Yang, Xiandong Liu, Yunting Wang, Xin He, Guangming Tan
2023GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs.
Boyuan Zhang, Jiannan Tian, Sheng Di, Xiaodong Yu, Martin Swany, Dingwen Tao, Franck Cappello
2023GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPC.
Guangnan Feng, Dezun Dong, Shizhen Zhao, Yutong Lu
2023HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs.
Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao
2023Lightweight Huffman Coding for Efficient GPU Compression.
Milan Shah, Xiaodong Yu, Sheng Di, Michela Becchi, Franck Cappello
2023Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge.
Ismayil Ismayilov, Javid Baydamirli, Dogan Sagbili, Mohamed Wahib, Didem Unat
2023OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs.
Tun Chen, Haipeng Jia, Yunquan Zhang, Kun Li, Zhihao Li, Xiang Zhao, Jianyu Yao, Chendi Li
2023Optimizing Multi-grid Computation and Parallelization on Multi-cores.
Xiaojian Yang, Shengguo Li, Fan Yuan, Dezun Dong, Chun Huang, Zheng Wang
2023PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization.
Pu Pang, Yaoxuan Li, Bo Liu, Quan Chen, Zhou Yu, Zhibin Yu, Deze Zeng, Jingwen Leng, Jieru Zhao, Minyi Guo
2023PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications.
Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka
2023Parallel Software for Million-scale Exact Kernel Regression.
Yu Chen, Lucca Skon, James R. McCombs, Zhenming Liu, Andreas Stathopoulos
2023Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization.
Lukas Trümper, Tal Ben-Nun, Philipp Schaad, Alexandru Calotoiu, Torsten Hoefler
2023Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023
Kyle A. Gallivan, Efstratios Gallopoulos, Dimitrios S. Nikolopoulos, Ramón Beivide
2023RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search.
Vani Nagarajan, Durga Mandarapu, Milind Kulkarni
2023Revisiting Temporal Blocking Stencil Optimizations.
Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka
2023Roar: A Router Microarchitecture for In-network Allreduce.
Ruiqi Wang, Dezun Dong, Fei Lei, Junchao Ma, Ke Wu, Kai Lu
2023SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation.
Gagandeep Singh, Alireza Khodamoradi, Kristof Denolf, Jack Lo, Juan Gómez-Luna, Joseph Melber, Andra Bisca, Henk Corporaal, Onur Mutlu
2023Scalable algorithms for compact spanners on real world graphs.
Maulein Pathak, Yogish Sabharwal, Neelima Gupta
2023Scalable parallelization for the solution of phonon Boltzmann Transport Equation.
Han D. Tran, Siddharth Saurav, P. Sadayappan, Sandip Mazumder, Hari Sundar
2023Seizing the Bandwidth Scaling of On-Package Interconnect in a Post-Moore's Law World.
Grigory Chirkov, David Wentzlaff
2023Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training.
Anqi Guo, Yuchen Hao, Chunshu Wu, Pouya Haghi, Zhenyu Pan, Min Si, Dingwen Tao, Ang Li, Martin C. Herbordt, Tong Geng
2023Towards a Unified Implementation of GEMM in BLIS.
RuQing G. Xu, Field G. Van Zee, Robert A. van de Geijn
2023Transfer-learning-based Autotuning using Gaussian Copula.
Thomas Randall, Jaehoon Koo, Brice Videau, Michael Kruse, Xingfu Wu, Paul D. Hovland, Mary W. Hall, Rong Ge, Prasanna Balaprakash
2023Use Only What You Need: Judicious Parallelism For File Transfers in High Performance Networks.
Md. Arifuzzaman, Engin Arslan
2023Using Additive Modifications in LU Factorization Instead of Pivoting.
Neil Lindquist, Piotr Luszczek, Jack J. Dongarra
2023Wafer-Scale Fast Fourier Transforms.
Marcelo Orenes-Vera, Ilya Sharapov, Robert S. Schreiber, Mathias Jacquelin, Philippe Vandermersch, Sharan Chetlur