| 2025 | A Cost-Effective Dueling Framework for Set-Associative Cache Indexing. Kevin Weston, Vahid Janfaza, Avery Johnson, Abdullah Muzahid |
| 2025 | A Device-Side Execution Model for Multi-GPU Task Graphs. Ilyas Turimbetov, Mohamed Wahib, Didem Unat |
| 2025 | A Global Perspective on Supercomputer Power Provisioning: Case Studies from United States and Europe. Tapasya Patki, Barry Rountree, Torsten Wilde, Andrea Bartolini, Stephanie Brink, Esa Heiskanen, Sachin Idgunji, Matthias Maiterth, James H. Rogers, Ermal Rrapaj, Ralf Schneider, Woong Shin, Kathleen Shoga, Christian Simmendinger, Nicholas J. Wright, Zhengji Zhao |
| 2025 | A Multi-GPU Algorithm for Computing Maximal Independent Sets in Large Graphs. Anju Mongandampulath Akathoott, Benila Virgin Jerald Xavier, Martin Burtscher |
| 2025 | A3FR: Agile 3D Gaussian Splatting with Incremental Gaze Tracked Foveated Rendering in Virtual Reality. Shuo Xin, Haiyu Wang, Sai Qian Zhang |
| 2025 | Accelerating Complex Stencil Computations with Adaptive Fusion Strategy. Siqi Wang, Hailong Yang, Pengbo Wang, Shaokang Du, Yufan Xu, Qingxiao Sun, Xiaoyan Liu, Xuezhu Wang, Xuning Liang, Zhongzhi Luan, Yi Liu, Depei Qian |
| 2025 | An Efficient 2D Fusion Method for High-Performance Two-Stage Eigensolvers on Modern Heterogeneous Architectures. Yongxiao Zhou, Yi Zong, Yuyang Jin, Heng Li, Wei Xue |
| 2025 | Analyzing the Performance of Applications at Exascale. Dragana Grbic, John M. Mellor-Crummey |
| 2025 | Auto-Healer: Self-Healing Hardware for Perception Stage Faults in Autonomous Driving Systems. Ali Suvizi, Guru Venkataramani |
| 2025 | BMQSim: Overcoming Memory Constraints in Quantum Circuit Simulation with a High-Fidelity Compression Framework. Boyuan Zhang, Bo Fang, Fanjiang Ye, Luanzheng Guo, Fengguang Song, Nathan R. Tallent, Dingwen Tao |
| 2025 | BitWeaver: Read-Time Truncation in Memory. Garrett Gagnon, Srikanth Malla, Yangwook Kang, Liu Liu |
| 2025 | CB-SpMV: A Data Aggregating and Balance Algorithm for for Cache-Friendly Block-Based SpMV on GPUs. Xing Cong, FuKai Sun, Yifan Chen, Chenhao Xie, Yi Liu, Depei Qian |
| 2025 | CIExplorer: Microarchitecture-Aware Exploration for Tightly Integrated Custom Instruction. Xiaoyu Hao, Sen Zhang, Liang Qiao, Qingcai Jiang, Jun Shi, Junshi Chen, Hong An, Xulong Tang, Hao Shu, Honghui Yuan |
| 2025 | CLOVER: A GPU-native, Spatio-graph-based Approach to Exact kNN. Victor Kamel, Hanxueyu Yan, Sean Chester |
| 2025 | CRAMG: A Communication-Reduced Algebraic Multigrid Method. Fan Yuan, Xiaojian Yang, Yunqing Huang, Dezun Dong, Chuanfu Xu, Jie Liu, Xiaoqiang Yue, Shengguo Li, Hongxia Wang |
| 2025 | CTCCL: Cost-Efficient Joint Device-Network Load Balancing for LLM Training in RoCE-based Intelligent Computing Network. Zhuotong Li, Liang Xu, Ziqi Huang, Shuyun Qian, Hongwei Bu, Ming Yang, Mengyun Luan, Weiguo Chen, Xu Wen |
| 2025 | Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models. Runsheng Benson Guo, Utkarsh Anand, Arthur Chen, Khuzaima Daudjee |
| 2025 | Cherry: Breaking the GPU Memory Wall for Large-Scale GNN Training via Micro-Batching. Yan Wang, Qinghua Guo, Haoran Kong, Kai Sheng, Zhen Xie, Hao Chen, Weile Jia, Dingwen Tao, Xin He |
| 2025 | CoLa: Towards Communication-efficient Distributed Sparse Matrix-Matrix Multiplication on GPUs. Lixing Zhang, Yingxia Shao, Shigang Li |
| 2025 | ConCo: Optimizing Compilation of Concurrent Tensor Programs on Shared GPU. Jiamin Lu, Jingwei Sun, Yunlong Xu, Peng Sun, Guangzhong Sun |
| 2025 | ConTraPh: Contrastive Learning for Parallelization and Performance Optimization. Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Nesreen K. Ahmed, Theodore L. Willke, Ali Jannesari |
| 2025 | D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage. Maxime Gonthier, Dante D. Sánchez-Gallegos, Haochen Pan, Bogdan Nicolae, Sicheng Zhou, Hai Duc Nguyen, Valérie Hayot-Sasson, J. Gregory Pauloski, Jesús Carretero, Kyle Chard, Ian T. Foster |
| 2025 | DALdex: A DPU-Accelerated Persistent Learned Index via Incremental Learning. Aoyang Tong, Yu Hua, Menglei Chen |
| 2025 | DEDUPKV: A Space-Efficient and High-Performance Key-Value Store via Fine-Grained Deduplication. Safdar Jamil, Awais Khan, Xubin He, Youngjae Kim |
| 2025 | DIMPLES: Distributed Influence Maximization for Pandemic pLanning on Exascale Systems. Marco Minutoli, Reece Neff, Naw Safrin Sattar, Hao Lu, John Feo, Henning S. Mortveit, Anil Vullikanti, Dawen Xie, Mandy L. Wilson, Gregor von Laszewski, Parantapa Bhattacharya, S. M. Ferdous, Ananth Kalyanaraman, Michela Becchi, Madhav V. Marathe, Mahantesh Halappanavar |
| 2025 | DIV: An Index & Value compression method for SpMV on large matrices. Dimitrios Galanopoulos, Panagiotis Mpakos, Petros Anastasiadis, Nectarios Koziris, Georgios I. Goumas |
| 2025 | DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs. Yuebo Luo, Shiyang Li, Junran Tao, Kiran Gautam Thorat, Xi Xie, Hongwu Peng, Nuo Xu, Caiwen Ding, Shaoyi Huang |
| 2025 | DREAM: Device-Driven Efficient Access to Virtual Memory. Nurlan Nazaraliyev, Elaheh Sadredini, Nael B. Abu-Ghazaleh |
| 2025 | DeCOS: Data-Efficient Reinforcement Learning for Compiler Optimization Selection Ignited by LLM. Tianming Cui, Pen-Chung Yew, Stephen McCamant, Antonia Zhai |
| 2025 | EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC. Siyuan Shen, Mikhail Khalilov, Lukas Gianinazzi, Timo Schneider, Marcin Chrapek, Jai Dayal, Manisha Gajbe, Robert W. Wisniewski, Torsten Hoefler |
| 2025 | EPIClear: Exploiting Domain-Specific Features for Epistasis Detection Acceleration on Tensor Cores. Ricardo Nobre, Miguel Graça, Leonel Sousa, Aleksandar Ilic |
| 2025 | EVeREST-C: An Effective and Versatile Runtime Energy Saving Tool for CPUs. Anna Yue, Pen-Chung Yew, Sanyam Mehta |
| 2025 | Efficient Locality-aware Instruction Stream Scheduling for Stencil Computation on ARM Processors. Shanghao Liu, Hailong Yang, Xin You, Zhongzhi Luan, Yi Liu, Depei Qian |
| 2025 | Efficient Server Consolidation through a balanced mix of Transformer-based and Conventional Applications. Pablo Abad, Pablo Prieto, Valentin Puente, José-Ángel Gregorio |
| 2025 | Fast and Fair Training for Deep Learning in Heterogeneous GPU Clusters. Zizhao Mo, Huanle Xu, Wing Cheong Lau |
| 2025 | From Islands to Archipelago: Towards Collaborative and Adaptive Burst Buffer for HPC Systems. Mingtian Shao, Ruibo Wang, Wenzhe Zhang, Kai Lu, Yiqin Dai, Huijun Wu |
| 2025 | Fused3S: Fast Sparse Attention on Tensor Cores. Zitong Li, Aparna Chandramowlishwaran |
| 2025 | G^3SA: A GPU-Accelerated Gold Standard Genomics Library for End-to-End Sequence Alignment. Yeejoo Han, Sunwoo Kim, Seongyeon Park, Jinho Lee |
| 2025 | Generating Microservice Graphs with Production Characteristics for Efficient Resource Scaling. Fanrong Du, Jiuchen Shi, Quan Chen, Pu Pang, Li Li, Minyi Guo |
| 2025 | GraCFL: A Holistically Designed Vertex-Centric Graph System for CFL Reachability. Sakib Fuad, Amir Hossein Nodehi Sabet, Umar Farooq, Zhijia Zhao |
| 2025 | Graph Convolutional Network Acceleration Using Adiabatic Superconductor Josephson Devices. Zhengang Li, Hongwu Peng, Xuan Shen, Masoud Zabihi, Xi Xie, Geng Yuan, Yanzhi Wang, Olivia Chen, Caiwen Ding |
| 2025 | HARNESS: Holistic Resource Management for Diversely Scaled Edge Cloud Systems. Ismet Dagli, Justin Davis, Mehmet Esat Belviranli |
| 2025 | HR-SpMM: Adaptive Row Partitioning and Hybrid Kernel Design for Sparse Matrix Multiplication. Qi Wang, Yaobin Wang, Yi Luo, Rong Luo, Pingping Tang |
| 2025 | IA-Chol: Input-Aware Cholesky Decomposition on CPU and GPU. Jixiao Deng, Qinglin Wang, Lin Chen, Tun Li, Bo Yang, Xinhai Chen, Jie Liu |
| 2025 | JBSA: A Bit-Serial Accelerator for Deep Neural Networks Using Superconducting SFQ Logic. Yang Su, Sheng Li, Huilong Jiang, Haofei Yin, Rongliang Fu, JunYing Huang, Xiaochun Ye, Zhimin Zhang, Jie Ren, Xiaoping Gao, Tsung-Yi Ho, Dongrui Fan |
| 2025 | Leonid: Exploring Automated Kernel Fusion in Performance-Portable Programming Models for Scientific Computation. Chenchen Zhang, Hao Luo, Chao Yang |
| 2025 | Light-FP: Analyze Floating-Point Error in a Highly Condensed Approach. Jiazhi Mi, Li Chen, Haoyu Wang, Ruixiang Gao, Hongze Zhang, Ronghong Shen, Kai Lin, You Fu, Huimin Cui |
| 2025 | Loop Fusion in Matrix Multiplications with Sparse Dependence. Mohammad Mahdi Salehi Dezfuli, Kazem Cheshmi |
| 2025 | MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs. Jordi Wolfson-Pou, Jan Laukemann, Fabrizio Petrini |
| 2025 | MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem. Melina Soysal, Konstantina Koliogeorgi, Can Firtina, Nika Mansouri-Ghiasi, Rakesh Nadig, Haiyu Mao, Geraldo Francisco de Oliveira Junior, Yu Liang, Klea Zambaku, Mohammad Sadrosadati, Onur Mutlu |
| 2025 | MEMPLEX: A Memory System with Replication and Migration of Data for Multi-Chiplet NUMA Architectures. Neethu Bal Mallya, Bhavishya Goel, Ioannis Sourdis |
| 2025 | MG-αGCD: Accelerating Graph Community Detection on Multi-GPU Platforms. Shuai Yang, Changyou Zhang |
| 2025 | Multi-Node Multi-GPU Datalog. Ahmedur Rahman Shovon, Yihao Sun, Kristopher K. Micinski, Thomas Gilray, Sidharth Kumar |
| 2025 | NeurLZ: An Online Neural Learning-based Method to Enhance Scientific Lossy Compression. Wenqi Jia, Zhewen Hu, Youyuan Liu, Boyuan Zhang, Jinzhen Wang, Jinyang Liu, Wei Niu, Stavros Kalafatis, Junzhou Huang, Sian Jin, Daoce Wang, Jiannan Tian, Miao Yin |
| 2025 | OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths. Leo Gold, Adam Bienkowski, David Sidoti, Krishna R. Pattipati, Omer Khan |
| 2025 | ORA: Job Runtime Prediction for High-Performance Computing Platforms Using the Online Retrieval-Augmented Language Model. Hongyi Liu, Yinping Ma, Xiaosong Huang, Lingzhe Zhang, Tong Jia, Ying Li |
| 2025 | ORION: Optimizing OLAP Query Execution with Proactive Caching and Separate Operators. Zhixin Tong, Jiuchen Shi, Quan Chen, Pu Pang, Shixuan Sun, Jie Meng, Jiang Liu, En Shao, Minyi Guo |
| 2025 | OpaQue: Program Output Obfuscation for Quantum Software Circuits in Quantum Clouds. Tirthak Patel, Aditya Ranjan, Daniel Silver, Harshitta Gandhi, William Cutler, Devesh Tiwari |
| 2025 | PIE: Enabling Fast and Scalable Incremental Evolving Graph Analytics on Persistent Memory. Yunmo Zhang, Jiacheng Huang, Xizhe Yin, Junqiao Qiu, Hong Xu, Chun Jason Xue |
| 2025 | PIM-CARE: A Compiler-Assisted Dynamic Resource Allocation Framework for Real-world DRAM PIM. Inyong Hwang, Donghyeon Kim, Seokwon Kang, Taehyeong Park, Taehoon Kim, Jiwon Seo, Hanjun Kim, Youngsok Kim, Yongjun Park |
| 2025 | Page Migration for Hardware Memory Disaggregation Across a Network. Archit Patke, Christian Pinto, Saurabh Jha, Haoran Qiu, Zbigniew Kalbarczyk, Ravishankar K. Iyer |
| 2025 | Parallel Contraction Hierarchies Can Be Efficient and Scalable. Zijin Wan, Xiaojun Dong, Letong Wang, Enzuo Zhu, Yan Gu, Yihan Sun |
| 2025 | Pearl: Automatic Code Optimization Using Deep Reinforcement Learning. Djamel Rassem Lamouri, Iheb Nassim Aouadj, Smail Kourta, Riyadh Baghdadi |
| 2025 | Persistent Memory Objects on the Cheap. Derrick Greenspan, Naveed Ul Mustafa, Jongouk Choi, Mark Heinrich, Yan Solihin |
| 2025 | PortFC: Designing High-performance Deadlock-free BCube Networks. Peirui Cao, Rui Ning, Hongwei Yang, Zhaochen Zhang, Chang Liu, Rui Li, Yongqi Yang, Yunzhuo Liu, Chengyuan Huang, Tao Sun, Xiaodong Duan, Guihai Chen, Chen Tian |
| 2025 | Proceedings of the 39th ACM International Conference on Supercomputing, ICS 2025, Salt Lake City, UT, USA, June 8-11, 2025 |
| 2025 | Proteus: Achieving High-Performance Processing-Using-DRAM with Dynamic Bit-Precision, Adaptive Data Representation, and Flexible Arithmetic. Geraldo Francisco de Oliveira Junior, Mayank Kabra, Yuxin Guo, Kangqi Chen, Abdullah Giray Yaglikçi, Melina Soysal, Mohammad Sadrosadati, Joaquín Olivares Bueno, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu |
| 2025 | Pushing the Limits of GPU Lossy Compression: A Hierarchical Delta Approach. Boyuan Zhang, Yafan Huang, Sheng Di, Fengguang Song, Guanpeng Li, Franck Cappello |
| 2025 | ROCKET: An RNS-based Photonic Accelerator for High-Precision and Energy-Efficient DNN Training. Hao Zhang, Haibo Zhang, Chengpeng Xia, Zhiyi Huang, Yawen Chen, Amanda S. Barnard |
| 2025 | SYprox: Combining Host and Device Perforation with Mixed Precision Approximation on Heterogeneous Architectures. Lorenzo Carpentieri, Biagio Cosenza |
| 2025 | Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers. Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib |
| 2025 | SmartNIC-GPU-CPU Heterogeneous System for Large Machine Learning Model with Software-Hardware Codesign. Anqi Guo, Yuchen Hao, Xiteng Yao, Shining Yang, Jianyu Huang, Tony Tong Geng, Martin C. Herbordt |
| 2025 | SnuSOLVER: Optimizing Sparse Direct Solvers for Heterogeneous Systems. Chaewon Kim, Jaehwan Lee, Jinpyo Kim, Dohyun Kim, Kyusu Ahn, Hyung Uk Cho, Seungin Baek, Jaejin Lee |
| 2025 | SortingHat: System Topology-aware Scheduling of Deep Neural Network Models on Multi-GPU Systems. Seok Namkoong, Taehyeong Park, Kiung Jung, Jinyoung Kim, Yongjun Park |
| 2025 | SparsePIM: An Efficient HBM-Based PIM Architecture for Sparse Matrix-Vector Multiplications. Taewoon Kang, Geonwoo Choi, Taeweon Suh, Gunjae Koo |
| 2025 | Statistical Treatment of Variable MPI Latencies and MPI-Communication Hiding for Matrix-Free Finite Element Operators. Max Heldman, Johann Rudi, Julie Bessac |
| 2025 | StructILU: Dependency-Preserving Incomplete LU with Hierarchical Parallelism for Structured Grid PDEs on GPUs. Hao Luo, Qianchao Zhu, Xiaochen Hao, Chunxi Lei, Chengdi Ma, Chenchen Zhang, Yun Liang, Chao Yang |
| 2025 | TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations. Jiexiong Guan, Zhenqing Hu, Christos D. Antonopoulos, Nikolaos Bellas, Spyros Lalis, Evgenia Smirni, Gang Zhou, Gagan Agrawal, Bin Ren |
| 2025 | Taking GPU Programming Models to Task for Performance Portability. Joshua Hoke Davis, Pranav Sivaraman, Joy Kitson, Konstantinos Parasyris, Harshitha Menon, Isaac Minn, Giorgis Georgakoudis, Abhinav Bhatele |
| 2025 | UJOpt: Heuristic Approach for Applying Unroll-and-Jam Optimization and Loop Order Selection. Shilpa Babalad, Shirish K. Shevade, Matthew Jacob Thazhuthaveetil, R. Govindarajan |
| 2025 | Understanding the Idiosyncrasies of Emerging BlueField DPUs. Arjun Kashyap, Yuke Li, Darren Ng, Xiaoyi Lu |
| 2025 | WisIO: Automated I/O Bottleneck Detection with Multi-Perspective Views for HPC Workflows. Izzet Yildirim, Hariharan Devarajan, Anthony Kougkas, Xian-He Sun, Kathryn M. Mohror |
| 2025 | YH-Light: Yielding Hierarchy-aware Partitioner for Large-scale Graph Processing. Xinbiao Gan, Tiejun Li, Chunye Gong, Jie Liu, Kai Lu |
| 2025 | ghZCCL: Advancing GPU-aware Collective Communications with Homomorphic Compression. Jiajun Huang, Sheng Di, Yafan Huang, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur |