| 2024 | A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations. Mingyi Li, Junmin Xiao, Kewei Zhang, Zhiheng Lin, Chaoyang Shui, Ke Meng, Zehua Wang, Yunfei Pang, Guangming Tan |
| 2024 | Accelerated Auto-Tuning of GPU Kernels for Tensor Computations. Chendi Li, Yufan Xu, Sina Mahdipour Saravani, Ponnuswamy Sadayappan |
| 2024 | Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs. Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg |
| 2024 | An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge Devices. Juhyeon Lee, Insung Bahk, Hoseung Kim, Sinjin Jeong, Suyeon Lee, Donghyun Min |
| 2024 | An Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer. Quentin R. Petit, Chong Li, Nahid Emad |
| 2024 | Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing. Durga Keerthi Mandarapu, Vani Nagarajan, Artem Pelenitsyn, Milind Kulkarni |
| 2024 | AutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training Workloads. Wei Gao, Xu Zhang, Shan Huang, Shangwei Guo, Peng Sun, Yonggang Wen, Tianwei Zhang |
| 2024 | CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers. Sungmin Yun, Hwayong Nam, Kwanhee Kyung, Jaehyun Park, Byeongho Kim, Yongsuk Kwon, Eojin Lee, Jung Ho Ahn |
| 2024 | CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes. Mert Hidayetoglu, Simon Garcia De Gonzalo, Elliott Slaughter, Yu Li, Christopher Zimmer, Tekin Bicer, Bin Ren, William Gropp, Wen-Mei Hwu, Alex Aiken |
| 2024 | DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs. Yelai Feng, Huaixi Wang, Yining Zhu, Xiandong Liu, Hongyi Lu, Qing Liu |
| 2024 | DeepHYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems. Franz Kevin Stehle, Wainer Vandelli, Felix Zahn, Giuseppe Avolio, Holger Fröning |
| 2024 | Differentiating Set Intersections in Maximal Clique Enumeration by Function and Subproblem Size. Hans Vandierendonck |
| 2024 | Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and Views. Benjamin Brock, Robert Cohn, Suyash Bakshi, Tuomas Karna, Jeongnim Kim, Mateusz Nowak, Lukasz Slusarczyk, Kacper Stefanski, Timothy G. Mattson |
| 2024 | Enhanced UGAL Routing Schemes for Dragonfly Networks. Ram Sharan Chaulagain, Xin Yuan |
| 2024 | Exploiting Vector Code Semantics for Efficient Data Cache Prefetching. Francesc Martínez Palau, Martí Torrents, Adrià Armejach, Marc Casas |
| 2024 | FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks. Keren Zhou, Karthik Ganapathi Subramanian, Po-Hsun Lin, Matthias Fey, Binqian Yin, Jiajia Li |
| 2024 | Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment. Hanxian Huang, Xin Chen, Jishen Zhao |
| 2024 | FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems. Reece Neff, Mostafa Eghbali Zarch, Marco Minutoli, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Michela Becchi |
| 2024 | HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory. Qi Shao, Angelos Arelakis, Per Stenström |
| 2024 | Input Range Generation for Compiler-Induced Numerical Inconsistencies. Dolores Miao, Ignacio Laguna, Cindy Rubio-González |
| 2024 | LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators. Chengtao Lai, Zhongchun Zhou, Akash Poptani, Wei Zhang |
| 2024 | Matrix-free SBP-SAT finite difference methods and the multigrid preconditioner on GPUs. Alexandre Chen, Brittany A. Erickson, Jeremy E. Kozdon, Jee Choi |
| 2024 | Minimizing Coherence Errors via Dynamic Decoupling. Soheil Khadirsharbiyani, Movahhed Sadeghi, Mostafa Eghbali Zarch, Mahmut Taylan Kandemir |
| 2024 | NEOCNN: NTT-Enabled Optical Convolution Neural Network Accelerator. Xianbin Li, Yinyi Liu, Fan Jiang, Chengeng Li, Yuxiang Fu, Wei Zhang, Jiang Xu |
| 2024 | NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches. Raveendra Soori, Shreyas Prabhu, Harpreet Singh Chawla, Michael Ferdman |
| 2024 | Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs. Xiao Fu, Weiling Yang, Dezun Dong, Xing Su |
| 2024 | Proceedings of the 38th ACM International Conference on Supercomputing, ICS 2024, Kyoto, Japan, June 4-7, 2024 Kenji Kise, Valentina Salapura, Murali Annavaram, Ana Lucia Varbanescu |
| 2024 | Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers. Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang |
| 2024 | RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs. Benjamin Brock, Aydin Buluç, Katherine A. Yelick |
| 2024 | RTT-UAF: Reuse Time Tracking for Use-After-Free Detection. Yubo Du, Yanan Guo, Youtao Zhang, Jun Yang |
| 2024 | RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection. Yifei Li, Bole Zhou, Jiejing Zhang, Xuechao Wei, Yinghan Li, Yingda Chen |
| 2024 | RayJoin: Fast and Precise Spatial Join. Liang Geng, Rubao Lee, Xiaodong Zhang |
| 2024 | Real-time High-resolution X-Ray Computed Tomography. Du Wu, Peng Chen, Xiao Wang, Isaac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib |
| 2024 | SLIDEX: A Novel Architecture for Sliding Window Processing. Raúl Taranco, José-María Arnau, Antonio González |
| 2024 | Scheduling for Cyber-Physical Systems with Heterogeneous Processing Units under Real-World Constraints. Justin McGowen, Ismet Dagli, Neil T. Dantam, Mehmet E. Belviranli |
| 2024 | Shared Virtual Memory: Its Design and Performance Implications for Diverse Applications. Bennett Cooper, Thomas R. W. Scogland, Rong Ge |
| 2024 | SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications. Pouya Haghi, Cheng Tan, Anqi Guo, Chunshu Wu, Dongfang Liu, Ang Li, Anthony Skjellum, Tong Geng, Martin C. Herbordt |
| 2024 | Snoopie: A Multi-GPU Communication Profiler and Visualizer. Mohammad Kefah Taha Issa, Muhammad Aditya Sasongko, Ilyas Turimbetov, Javid Baydamirli, Dogan Sagbili, Didem Unat |
| 2024 | Soft Error Resilience at Near-Zero Cost. Jianping Zeng, Shao-Yu Huang, Jiuyang Liu, Changhee Jung |
| 2024 | Stencil Computation with Vector Outer Product. Wenxuan Zhao, Liang Yuan, Baicheng Yan, Penghao Ma, Yunquan Zhang, Long Wang, Zhe Wang |
| 2024 | Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information. Baorun Mu, Christina Giannoula, Shang Wang, Gennady Pekhimenko |
| 2024 | Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures. Shilpa Babalad, Shirish K. Shevade, Matthew Jacob Thazhuthaveetil, R. Govindarajan |
| 2024 | Understanding GPU Memory Corruption at Extreme Scale: The Summit Case Study. Vladyslav Oles, Anna Schmedding, George Ostrouchov, Woong Shin, Evgenia Smirni, Christian Engelmann |
| 2024 | Ymir: A Scheduler for Foundation Model Fine-tuning Workloads in Datacenters. Wei Gao, Weiming Zhuang, Minghao Li, Peng Sun, Yonggang Wen, Tianwei Zhang |
| 2024 | gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters. Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur |
| 2024 | sys-sage: A Unified Representation of Dynamic Topologies & Attributes on HPC Systems. Stepan Vanecek, Martin Schulz |