ICS A

46 papers

YearTitle / Authors
2024A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations.
Mingyi Li, Junmin Xiao, Kewei Zhang, Zhiheng Lin, Chaoyang Shui, Ke Meng, Zehua Wang, Yunfei Pang, Guangming Tan
2024Accelerated Auto-Tuning of GPU Kernels for Tensor Computations.
Chendi Li, Yufan Xu, Sina Mahdipour Saravani, Ponnuswamy Sadayappan
2024Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs.
Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg
2024An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge Devices.
Juhyeon Lee, Insung Bahk, Hoseung Kim, Sinjin Jeong, Suyeon Lee, Donghyun Min
2024An Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer.
Quentin R. Petit, Chong Li, Nahid Emad
2024Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing.
Durga Keerthi Mandarapu, Vani Nagarajan, Artem Pelenitsyn, Milind Kulkarni
2024AutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training Workloads.
Wei Gao, Xu Zhang, Shan Huang, Shangwei Guo, Peng Sun, Yonggang Wen, Tianwei Zhang
2024CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers.
Sungmin Yun, Hwayong Nam, Kwanhee Kyung, Jaehyun Park, Byeongho Kim, Yongsuk Kwon, Eojin Lee, Jung Ho Ahn
2024CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes.
Mert Hidayetoglu, Simon Garcia De Gonzalo, Elliott Slaughter, Yu Li, Christopher Zimmer, Tekin Bicer, Bin Ren, William Gropp, Wen-Mei Hwu, Alex Aiken
2024DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs.
Yelai Feng, Huaixi Wang, Yining Zhu, Xiandong Liu, Hongyi Lu, Qing Liu
2024DeepHYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems.
Franz Kevin Stehle, Wainer Vandelli, Felix Zahn, Giuseppe Avolio, Holger Fröning
2024Differentiating Set Intersections in Maximal Clique Enumeration by Function and Subproblem Size.
Hans Vandierendonck
2024Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and Views.
Benjamin Brock, Robert Cohn, Suyash Bakshi, Tuomas Karna, Jeongnim Kim, Mateusz Nowak, Lukasz Slusarczyk, Kacper Stefanski, Timothy G. Mattson
2024Enhanced UGAL Routing Schemes for Dragonfly Networks.
Ram Sharan Chaulagain, Xin Yuan
2024Exploiting Vector Code Semantics for Efficient Data Cache Prefetching.
Francesc Martínez Palau, Martí Torrents, Adrià Armejach, Marc Casas
2024FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks.
Keren Zhou, Karthik Ganapathi Subramanian, Po-Hsun Lin, Matthias Fey, Binqian Yin, Jiajia Li
2024Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment.
Hanxian Huang, Xin Chen, Jishen Zhao
2024FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems.
Reece Neff, Mostafa Eghbali Zarch, Marco Minutoli, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Michela Becchi
2024HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory.
Qi Shao, Angelos Arelakis, Per Stenström
2024Input Range Generation for Compiler-Induced Numerical Inconsistencies.
Dolores Miao, Ignacio Laguna, Cindy Rubio-González
2024LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators.
Chengtao Lai, Zhongchun Zhou, Akash Poptani, Wei Zhang
2024Matrix-free SBP-SAT finite difference methods and the multigrid preconditioner on GPUs.
Alexandre Chen, Brittany A. Erickson, Jeremy E. Kozdon, Jee Choi
2024Minimizing Coherence Errors via Dynamic Decoupling.
Soheil Khadirsharbiyani, Movahhed Sadeghi, Mostafa Eghbali Zarch, Mahmut Taylan Kandemir
2024NEOCNN: NTT-Enabled Optical Convolution Neural Network Accelerator.
Xianbin Li, Yinyi Liu, Fan Jiang, Chengeng Li, Yuxiang Fu, Wei Zhang, Jiang Xu
2024NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches.
Raveendra Soori, Shreyas Prabhu, Harpreet Singh Chawla, Michael Ferdman
2024Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs.
Xiao Fu, Weiling Yang, Dezun Dong, Xing Su
2024Proceedings of the 38th ACM International Conference on Supercomputing, ICS 2024, Kyoto, Japan, June 4-7, 2024
Kenji Kise, Valentina Salapura, Murali Annavaram, Ana Lucia Varbanescu
2024Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers.
Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang
2024RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs.
Benjamin Brock, Aydin Buluç, Katherine A. Yelick
2024RTT-UAF: Reuse Time Tracking for Use-After-Free Detection.
Yubo Du, Yanan Guo, Youtao Zhang, Jun Yang
2024RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection.
Yifei Li, Bole Zhou, Jiejing Zhang, Xuechao Wei, Yinghan Li, Yingda Chen
2024RayJoin: Fast and Precise Spatial Join.
Liang Geng, Rubao Lee, Xiaodong Zhang
2024Real-time High-resolution X-Ray Computed Tomography.
Du Wu, Peng Chen, Xiao Wang, Isaac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib
2024SLIDEX: A Novel Architecture for Sliding Window Processing.
Raúl Taranco, José-María Arnau, Antonio González
2024Scheduling for Cyber-Physical Systems with Heterogeneous Processing Units under Real-World Constraints.
Justin McGowen, Ismet Dagli, Neil T. Dantam, Mehmet E. Belviranli
2024Shared Virtual Memory: Its Design and Performance Implications for Diverse Applications.
Bennett Cooper, Thomas R. W. Scogland, Rong Ge
2024SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications.
Pouya Haghi, Cheng Tan, Anqi Guo, Chunshu Wu, Dongfang Liu, Ang Li, Anthony Skjellum, Tong Geng, Martin C. Herbordt
2024Snoopie: A Multi-GPU Communication Profiler and Visualizer.
Mohammad Kefah Taha Issa, Muhammad Aditya Sasongko, Ilyas Turimbetov, Javid Baydamirli, Dogan Sagbili, Didem Unat
2024Soft Error Resilience at Near-Zero Cost.
Jianping Zeng, Shao-Yu Huang, Jiuyang Liu, Changhee Jung
2024Stencil Computation with Vector Outer Product.
Wenxuan Zhao, Liang Yuan, Baicheng Yan, Penghao Ma, Yunquan Zhang, Long Wang, Zhe Wang
2024Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information.
Baorun Mu, Christina Giannoula, Shang Wang, Gennady Pekhimenko
2024Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures.
Shilpa Babalad, Shirish K. Shevade, Matthew Jacob Thazhuthaveetil, R. Govindarajan
2024Understanding GPU Memory Corruption at Extreme Scale: The Summit Case Study.
Vladyslav Oles, Anna Schmedding, George Ostrouchov, Woong Shin, Evgenia Smirni, Christian Engelmann
2024Ymir: A Scheduler for Foundation Model Fine-tuning Workloads in Datacenters.
Wei Gao, Weiming Zhuang, Minghao Li, Peng Sun, Yonggang Wen, Tianwei Zhang
2024gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters.
Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur
2024sys-sage: A Unified Representation of Dynamic Topologies & Attributes on HPC Systems.
Stepan Vanecek, Martin Schulz