| 2023 | A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training. Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele |
| 2023 | Accelerating BWA-MEM Read Mapping on GPUs. Minh Pham, Yicheng Tu, Xiaoyi Lv |
| 2023 | Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs. Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen |
| 2023 | BiRFIA: Selective Binary Rewriting for Function Interception on ARM. Kelun Lei, Xin You, Hailong Yang, Zhongzhi Luan, Depei Qian |
| 2023 | BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs. Jou-An Chen, Hsin-Hsuan Sung, Xipeng Shen, Sutanay Choudhury, Ang Li |
| 2023 | CMLCompiler: A Unified Compiler for Classical Machine Learning. Xu Wen, Wanling Gao, Anzheng Li, Lei Wang, Zihan Jiang, Jianfeng Zhan |
| 2023 | DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access. Meghana Madhyastha, Robert Underwood, Randal C. Burns, Bogdan Nicolae |
| 2023 | Distributed-Memory Parallel JointNMF. Srinivas Eswar, Benjamin Cobb, Koby Hayashi, Ramakrishnan Kannan, Grey Ballard, Richard W. Vuduc, Haesun Park |
| 2023 | DyVer: Dynamic Version Handling for Array Databases. Amelie Chi Zhou, Zhoubin Ke, Jianming Lao |
| 2023 | Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication. Nicholas Contini, Bharath Ramesh, Kaushik Kandadi Suresh, Tu Tran, Benjamin Michalowicz, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda |
| 2023 | FAZ: A flexible auto-tuned modular error-bounded compression framework for scientific data. Jinyang Liu, Sheng Di, Kai Zhao, Xin Liang, Zizhong Chen, Franck Cappello |
| 2023 | FLASH: FPGA-Accelerated Smart Switches with GCN Case Study. Pouya Haghi, William Krska, Cheng Tan, Tong Geng, Po Hao Chen, Connor Greenwood, Anqi Guo, Thomas M. Hines, Chunshu Wu, Ang Li, Anthony Skjellum, Martin C. Herbordt |
| 2023 | FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance. Jun Xiao, Yaocheng Xiang, Xiaolin Wang, Yingwei Luo, Andy D. Pimentel, Zhenlin Wang |
| 2023 | FMI: Fast and Cheap Message Passing for Serverless Functions. Marcin Copik, Roman Böhringer, Alexandru Calotoiu, Torsten Hoefler |
| 2023 | FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing. Xinbiao Gan, Guang Wu, Ruigeng Zeng, Jiaqi Si, Ji Liu, Daxiang Dong, Chunye Gong, Cong Liu, Tiejun Li |
| 2023 | Fast All-Pairs Shortest Paths Algorithm in Large Sparse Graph. Shaofeng Yang, Xiandong Liu, Yunting Wang, Xin He, Guangming Tan |
| 2023 | GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs. Boyuan Zhang, Jiannan Tian, Sheng Di, Xiaodong Yu, Martin Swany, Dingwen Tao, Franck Cappello |
| 2023 | GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPC. Guangnan Feng, Dezun Dong, Shizhen Zhao, Yutong Lu |
| 2023 | HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao |
| 2023 | Lightweight Huffman Coding for Efficient GPU Compression. Milan Shah, Xiaodong Yu, Sheng Di, Michela Becchi, Franck Cappello |
| 2023 | Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge. Ismayil Ismayilov, Javid Baydamirli, Dogan Sagbili, Mohamed Wahib, Didem Unat |
| 2023 | OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs. Tun Chen, Haipeng Jia, Yunquan Zhang, Kun Li, Zhihao Li, Xiang Zhao, Jianyu Yao, Chendi Li |
| 2023 | Optimizing Multi-grid Computation and Parallelization on Multi-cores. Xiaojian Yang, Shengguo Li, Fan Yuan, Dezun Dong, Chun Huang, Zheng Wang |
| 2023 | PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization. Pu Pang, Yaoxuan Li, Bo Liu, Quan Chen, Zhou Yu, Zhibin Yu, Deze Zeng, Jingwen Leng, Jieru Zhao, Minyi Guo |
| 2023 | PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka |
| 2023 | Parallel Software for Million-scale Exact Kernel Regression. Yu Chen, Lucca Skon, James R. McCombs, Zhenming Liu, Andreas Stathopoulos |
| 2023 | Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization. Lukas Trümper, Tal Ben-Nun, Philipp Schaad, Alexandru Calotoiu, Torsten Hoefler |
| 2023 | Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023 Kyle A. Gallivan, Efstratios Gallopoulos, Dimitrios S. Nikolopoulos, Ramón Beivide |
| 2023 | RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search. Vani Nagarajan, Durga Mandarapu, Milind Kulkarni |
| 2023 | Revisiting Temporal Blocking Stencil Optimizations. Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka |
| 2023 | Roar: A Router Microarchitecture for In-network Allreduce. Ruiqi Wang, Dezun Dong, Fei Lei, Junchao Ma, Ke Wu, Kai Lu |
| 2023 | SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation. Gagandeep Singh, Alireza Khodamoradi, Kristof Denolf, Jack Lo, Juan Gómez-Luna, Joseph Melber, Andra Bisca, Henk Corporaal, Onur Mutlu |
| 2023 | Scalable algorithms for compact spanners on real world graphs. Maulein Pathak, Yogish Sabharwal, Neelima Gupta |
| 2023 | Scalable parallelization for the solution of phonon Boltzmann Transport Equation. Han D. Tran, Siddharth Saurav, P. Sadayappan, Sandip Mazumder, Hari Sundar |
| 2023 | Seizing the Bandwidth Scaling of On-Package Interconnect in a Post-Moore's Law World. Grigory Chirkov, David Wentzlaff |
| 2023 | Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training. Anqi Guo, Yuchen Hao, Chunshu Wu, Pouya Haghi, Zhenyu Pan, Min Si, Dingwen Tao, Ang Li, Martin C. Herbordt, Tong Geng |
| 2023 | Towards a Unified Implementation of GEMM in BLIS. RuQing G. Xu, Field G. Van Zee, Robert A. van de Geijn |
| 2023 | Transfer-learning-based Autotuning using Gaussian Copula. Thomas Randall, Jaehoon Koo, Brice Videau, Michael Kruse, Xingfu Wu, Paul D. Hovland, Mary W. Hall, Rong Ge, Prasanna Balaprakash |
| 2023 | Use Only What You Need: Judicious Parallelism For File Transfers in High Performance Networks. Md. Arifuzzaman, Engin Arslan |
| 2023 | Using Additive Modifications in LU Factorization Instead of Pivoting. Neil Lindquist, Piotr Luszczek, Jack J. Dongarra |
| 2023 | Wafer-Scale Fast Fourier Transforms. Marcelo Orenes-Vera, Ilya Sharapov, Robert S. Schreiber, Mathias Jacquelin, Philippe Vandermersch, Sharan Chetlur |