| 2025 | A Nested Krylov Method Using Half-Precision Arithmetic. Kengo Suzuki, Takeshi Iwashita |
| 2025 | A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation. Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Peng Chen, Mohamed Wahib, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Yun Lin, Jin Song Dong, Wenxi Zhu, Minwen Deng |
| 2025 | A Streaming Collectives Interface Targeting Dataflow Acceleration and HPC Workloads. Nicholas Contini, Jake Queiser, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda |
| 2025 | ACTINA: Adapting Circuit-Switching Techniques for AI Networking Architectures. Zhenguo Wu, Benjamin Klenk, Larry Dennison, Keren Bergman |
| 2025 | AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions. Väinö Hatanpää, Eugene Ku, Jason Stock, Murali Emani, Sam Foreman, Chunyong Jung, Sandeep Madireddy, Tung Nguyen, Varuni Sastry, Ray A. O. Sinurat, Huihuo Zheng, Sam Wheeler, Troy Arcomano, Venkatram Vishwanath, Rao Kotamarthi |
| 2025 | AGILE: Lightweight and Efficient Asynchronous GPU-SSD Integration. Zhuoping Yang, Jinming Zhuang, Xingzhen Chen, Alex K. Jones, Peipei Zhou |
| 2025 | AMRaCut: Scalable Partitioning for Adaptive Mesh Refinement. Budvin Edippuliarachchi, David Van Komen, Hari Sundar |
| 2025 | ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage. Siyuan Shen, Tommaso Bonato, Zhiyi Hu, Pasquale Jordan, Tiancheng Chen, Torsten Hoefler |
| 2025 | Ab-initio Quantum Transport with the GW Approximation, 42, 240 Atoms, and Sustained Exascale Performance. Nicolas Vetsch, Alexander Maeder, Vincent Maillou, Anders Winka, Jiang Cao, Grzegorz Kwasniewski, Leonard Deuschle, Torsten Hoefler, Alexandros Nikolaos Ziogas, Mathieu Luisier |
| 2025 | Accelerated Spatio-Temporal Bayesian Modeling for Multivariate Gaussian Processes. Lisa Gaedke-Merzhäuser, Vincent Maillou, Fernando Rodriguez Avellaneda, Olaf Schenk, Paula Moraga, Mathieu Luisier, Alexandros Nikolaos Ziogas, Håvard Rue |
| 2025 | Addressing Reproducibility Challenges in HPC with Continuous Integration. Valérie Hayot-Sasson, Nathaniel Hudson, André Bauer, Maxime Gonthier, Ian T. Foster, Kyle Chard |
| 2025 | Advancing Quantum Many-Body GW Calculations on Exascale Supercomputing Platforms. Benran Zhang, Daniel Weinberg, Chih-En Hsu, Aaron R. Altman, Yuming Shi, James B. White, Derek Vigil-Fowler, Steven G. Louie, Jack R. Deslippe, Felipe H. da Jornada, Zhenglu Li, Mauro Del Ben |
| 2025 | Augmenting Simulated Noisy Quantum Data Collection by Orders of Magnitude Using Pre-Trajectory Sampling with Batched Execution. Taylor Lee Patti, Thien Nguyen, Justin Gage Lietz, Alex McCaskey, Brucek Khailany |
| 2025 | Automatic Generation of Mappings for Distributed Fourier Operations. Doru-Thom Popovici, Botao Wu, John Shalf, Martin Kong |
| 2025 | BLAZE: Exploiting Hybrid Parallelism and Size-customized Kernels to Accelerate BLASTP on GPUs. Sree Charan Gundabolu, Mithuna Thottethodi, T. N. Vijaykumar |
| 2025 | BOER: Enhancing Resource Utilization for Deep Learning Inference with Hybrid Spatial GPU Sharing. Bowen Zhang, Yuhang Wang, Zhuozhao Li |
| 2025 | Balanced and Elastic End-to-end Training of Dynamic LLMs. Mohamed Wahib, Muhammed Abdullah Soyturk, Didem Unat |
| 2025 | Benchmark-driven Models for Energy Analysis and Attribution of GPU-Accelerated Supercomputing. Oscar Antepara, Zhengji Zhao, Brian Austin, Nan Ding, Leonid Oliker, Nicholas J. Wright, Samuel Williams |
| 2025 | Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality. Daniele De Sensi, Saverio Pasqualoni, Lorenzo Piarulli, Tommaso Bonato, Seydou Ba, Matteo Turisini, Jens Domke, Torsten Hoefler |
| 2025 | Boosting Scientific Error-Bounded Lossy Compression through Optimized Synergistic Lossy-Lossless Orchestration. Shixun Wu, Jinwen Pan, Jinyang Liu, Jiannan Tian, Ziwei Qiu, Jiajun Huang, Kai Zhao, Xin Liang, Sheng Di, Zizhong Chen, Franck Cappello |
| 2025 | Breaking the System Noise Barrier at Exascale. Edgar A. León, Joseph Glenski, Mark J. Stock, Kim H. McMahon, William Loewe, Clark Snyder, Larry Kaplan, Srinath Vadlamani, Timothy I. Mattox, Trent D'Hooge, Brian Behlendorf, Nathan Hanford, Ramesh Pankajakshan, Matthew L. Leininger |
| 2025 | Bridging the Gap Between Binary and Source Based Package Management in Spack. John Gouwar, Gregory Becker, Tamara Dahlgren, Nathan Hanford, Arjun Guha, Todd Gamblin |
| 2025 | Bridging the Gap between Unstructured SpMM and Structured Sparse Tensor Cores. Yukang Dong, Ziyuan Shen, Wenbin Jiang, Zhenghang Liu, Ye Xu, Bingyi He, Ran Zheng, Hai Jin |
| 2025 | Bubble: Towards Scalable Evolving Graph Processing via Mini-Batch Sorting. Long Deng, Yongkun Li, Zaigui Zhang, Yinlong Xu, John C. S. Lui |
| 2025 | BurstEngine: An efficient distributed framework for training transformers On extremely Long sequences of over 1M tokens. Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun |
| 2025 | C.A.T.S.: Memory and Control Flow Tracing for Whole-Program Performance Analysis. Philipp Schaad, Tal Ben-Nun, Torsten Hoefler |
| 2025 | COSMOS: Performance Portable Graph Pattern Matching with Domain-Specific Software Distributed Shared Memory. Zhiheng Lin, Ke Meng, Changjie Xu, Weichen Cao, Guangming Tan |
| 2025 | CPU- and GPU-initiated Communication Strategies for Conjugate Gradient Methods on Large GPU Clusters. James D. Trotter, Sinan Ekmekçibasi, Dogan Sagbili, Johannes Langguth, Xing Cai, Didem Unat |
| 2025 | Caracal: A GPU-Resident Sparse LU Solver with Lightweight Fine-Grained Scheduling. Jie Ren, Tingxuan Zhong, Yuxi Hong, Guofeng Feng, Xincheng Wang, Weile Jia, Hatem Ltaief, David Elliot Keyes |
| 2025 | Characterizing Performance, Power, and Energy of AMD CDNA3 GPU Family. Bagus Hanindhito, Bhavesh Patel |
| 2025 | ChatHPC: Building the Foundations for a Productive and Trustworthy AI-Assisted HPC Ecosystem. Pedro Valero-Lara, Aaron R. Young, Jeffrey S. Vetter, Zheming Jin, Swaroop Pophale, Mohammad Alaul Haque Monil, Keita Teranishi, William F. Godoy |
| 2025 | Compile-Time QoS Scheme for Deep Learning Inferences. Sungin Hong, Hyunjun Kim, Hwansoo Han |
| 2025 | Computing the Full Earth System at 1km Resolution. Daniel Klocke, Claudia Frauen, Jan Frederik Engels, Dmitry Alexeev, René Redler, Reiner Schnur, Helmuth Haak, Luis Kornblueh, Nils Brüggemann, Fatemeh Chegini, Manoel Römmer, Lars Hoffmann, Sabine Griessbach, Mathis Bode, Jonathan Coles, Miguel Gila, William Sawyer, Alexandru Calotoiu, Yakup Budanaz, Pratyai Mazumder, Marcin Copik, Benjamin Weber, Andreas Herten, Hendryk Bockelmann, Torsten Hoefler, Cathy Hohenegger, Bjorn Stevens |
| 2025 | Constraint-Driven Auto-Tuning of GEMM-like Operators for MT-3000 Many-core Processor. Xinxin Qi, Jianbin Fang, Peng Zhang, Yonggang Che, Jie Ren |
| 2025 | Core Hours and Carbon Credits: Incentivizing Sustainability in HPC. Alok Kamatar, Maxime Gonthier, Valérie Hayot-Sasson, André Bauer, Marcin Copik, Raul Castro Fernandez, Torsten Hoefler, Kyle Chard, Ian T. Foster |
| 2025 | Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in Capability. Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Esteban M. Rangel, Salman Habib, Katrin Heitmann, Patricia Larsen, Vitali A. Morozov, Adrian Pope, Claude-André Faucher-Giguère, Antigoni Georgiadou, Damien Lebrun-Grandié, Andrey Prokopenko |
| 2025 | DAS-ILU: A Distributed Asynchronous Parallel ILU Factorization Based on Domain Decomposition. Fan Yuan, Shengguo Li, Xiaojian Yang, Yunqing Huang, Hongxia Wang, Chuanfu Xu, Dezun Dong, Tiejun Li, Jianchun Wang, Jie Liu |
| 2025 | DHAP: Towards Efficient OLAP in a Disaggregated and Heterogeneous Environment. Guangda Liu, Chenqi Zhang, Yizhou Shan, Hao Feng, Zeke Wang, Shixuan Sun, Minyi Guo, Jieru Zhao |
| 2025 | DPAR: High-Performance, Secure, and Scalable Differential Privacy-based AllReduce. Hao Qi, Weicong Chen, Chenghong Wang, Xiaoyi Lu |
| 2025 | DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs. Mingkai Chen, Tianhua Han, Cheng Liu, Shengwen Liang, Kuai Yu, Lei Dai, Ziming Yuan, Ying Wang, Lei Zhang, Huawei Li, Xiaowei Li |
| 2025 | Deep Learning-Enabled Supercritical Flame Simulation at Detailed Chemistry and Real-Fluid Accuracy Towards Trillion-Cell Scale. Zhuoqiang Guo, Runze Mao, Lijun Liu, Guangming Tan, Weile Jia, Zhi X. Chen |
| 2025 | Demystifying the Resilience of Large Language Model Inference: An End-to-End Perspective. Yu Sun, Zachary Coalson, Shiyang Chen, Hang Liu, Zhao Zhang, Sanghyun Hong, Bo Fang, Lishan Yang |
| 2025 | Deploying Lightweight Input-Aware Selective Instruction Duplication in HPC Applications. Md Hasanur Rahman, Guanpeng Li |
| 2025 | Destination Earth: The Climate Change Adaptation Digital Twin. Ioan Hadade, Daniel Klocke, Jussi Enkovaara, Tuomas Lunttila, Thomas Rackow, Jan Frederik Engels, Claudia Frauen, René Redler, Jenni Kontkanen, Thomas Jung, Dmitry V. Sein, Irina Sandu, Balthasar Reuter, Nils Wedi, Sebastian Milinski, Francisco Doblas-Reyes, Miguel Castrillo, Mario C. Acosta, Sergi Girona, Pekka Manninen |
| 2025 | Diff-MoE: Efficient Batched MoE Inference with Priority-Driven Differential Expert Caching. Kexin Li, Wenkan Huang, Qinggang Wang, Long Zheng, Xiaofei Liao, Hai Jin, Jingling Xue |
| 2025 | Distributed Cross-Channel Hierarchical Aggregation for Foundation Models. Aristeidis Tsaris, Isaac Lyngaas, John H. Lagergren, Mohamed Wahib, Larry M. York, Prasanna Balaprakash, Dan Lu, Feiyi Wang, Xiao Wang |
| 2025 | EDDE: Container Deployment Framework Beyond the Cloud. Hao Fan, Zhuo Huang, Shadi Ibrahim, Lin Gu, Song Wu |
| 2025 | Effective Node-Level Anomaly Detection in HPC Systems via Coarse-Grained Clustering and Fine-Grained Model Sharing. Sibo Xia, Yongqian Sun, Xijie Pan, Yuan Yuan, Shenglin Zhang, Shaoyu Hu, Lei Tao, Yuqi Li, Jinghua Feng |
| 2025 | Exploring and Mitigating Failure Behavior of Large Language Model Training Workloads in HPC Systems. Pengfei Yu, Jingjing Gu, Hao Han, Dazhong Shen, Bao Wen, Yang Liu |
| 2025 | FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention. Huangliang Dai, Shixun Wu, Jiajun Huang, Zizhe Jian, Yue Zhu, Haiyang Hu, Zizhong Chen |
| 2025 | FaSTCC: Fast Sparse Tensor Contractions on CPUs. Saurabh Raje, Hunter McCoy, Atanas Rountev, Prashant Pandey, P. Sadayappan |
| 2025 | Fine-grained Automated Failure Management for Extreme-Scale GPU Accelerated Systems. Yonatan Levitt, Richard Barella, Sam Zeltner, Thomas Musta, Lance Cheney, Gustavo Espinosa, Olivier Franza, Balazs Gerofi |
| 2025 | Fringe-SGC: Counting Subgraphs with Fringe Vertices. Cameron Bradley, Ghadeer Ahmed H. Alabandi, Martin Burtscher |
| 2025 | GPU Lossy Compression for HPC Can Be Versatile and Ultra-Fast. Yafan Huang, Sheng Di, Guanpeng Li, Franck Cappello |
| 2025 | Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction. Xiao Li, Liangji Zhu, Anand Rangarajan, Sanjay Ranka |
| 2025 | Graphago: Accelerating SSD-based Graph Processing via Activity-Aware Graph Preprocessing. Xianghao Xu, Yucheng Zhang, Gongxuan Zhang, Yongli Cheng, Fang Wang |
| 2025 | GreenMix: Energy-Efficient Serverless Computing via Randomized Sketching on Asymmetric Multi-Cores. Rohan Basu Roy, Tirthak Patel, Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari |
| 2025 | HELM: Characterizing Unified Memory Accesses to Improve GPU Performance under Memory Oversubscription. Nathan Jones, Tyler N. Allen, Rong Ge |
| 2025 | HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUs. Yanliang Li, Wenbo Li, Qian Gong, Qing Liu, Norbert Podhorszki, Scott Klasky, Xin Liang, Jieyang Chen |
| 2025 | HPC-R1: Characterizing R1-like Large Reasoning Models on HPC. Adam Weingram, Duo Zhang, Zhonghao Chen, Hao Qi, Xiaoyi Lu |
| 2025 | HStencil: Matrix-Vector Stencil Computation with Interleaved Outer Product and MLA. Han Huang, Jiabin Xie, Guangnan Feng, Xianwei Zhang, Dan Huang, Zhiguang Chen, Yutong Lu |
| 2025 | Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism. Zizhao Mo, Jianxiong Liao, Huanle Xu, Zhi Zhou, Chengzhong Xu |
| 2025 | High-Performance Branch-Free Algorithms for Extended-Precision Floating-Point Arithmetic. David Kai Zhang, Alex Aiken |
| 2025 | HyTiS: Hybrid Tile Scheduling for GPU GEMM with Enhanced Wave Utilization and Cache Locality. Zheng Zhang, Hulin Wang, Hongming Xu, Donglin Yang, Xiaobo Zhou, Dazhao Cheng |
| 2025 | Hypertron: Efficiently Scaling Large Models by Exploring High-Dimensional Parallelization Space. Shigang Li, Jingkun Dong, Jihao Chen, Zhi Ma, Zhongzhe Hu |
| 2025 | Improving SpGEMM Performance Through Matrix-Reordering and Cluster-wise Computation. Abdullah Al Raqibul Islam, Helen Xu, Dong Dai, Aydin Buluç |
| 2025 | Insights from Optimizing HPL Performance on Exascale Systems: A Comparative Analysis of Panel Factorization. Hao Lu, Michael A. Matheson, Noel Chalmers, Aditya Kashi, Nicholas Malaya, Feiyi Wang |
| 2025 | KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPU. Hemeng Wang, Yang Du, Sidu Li, Xiaowen Tian, Qingxiao Sun, Weifeng Liu |
| 2025 | Kilometer-Scale AI-Powered and Performance-Portable Earth System Model (AP3ESM) to Achieve Year-Scale Simulation Speed on Heterogeneous Supercomputers. Kai Xu, Maoxue Yu, Yuhu Chen, Jie Gao, Shuang Wang, Jiaying Song, Xiaohui Duan, Junwei Wei, Jiangfeng Yu, Hailong Liu, Jinrong Jiang, Yi Zhang, Pengfei Lin, Tianyi Wang, Pengfei Wang, Weipeng Zheng, Jingwei Xie, Jiakang Zhang, Zilu Liu, Xiaoyu Jin, Jilin Wei, Qixin Chang, Qingxia Lin, Yanzhi Zhou, Weiguo Liu, Wei Xue, Yiwen Li, Haohuan Fu, Yue Yu, Xuebin Chi, Lixin Wu |
| 2025 | LCI: a Lightweight Communication Interface for Efficient Asynchronous Multithreaded Communication. Jiakun Yan, Marc Snir |
| 2025 | LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving. Huanqi Hu, Bowen Xiao, Shixuan Sun, Jianian Yin, Zhexi Zhang, Xiang Luo, Chengquan Jiang, Weiqi Xu, Xiaoying Jia, Xin Liu, Minyi Guo |
| 2025 | LowDiff: Efficient Frequent Checkpointing via Low-Cost Differential for High-Performance Distributed Training Systems. Chenxuan Yao, Feifan Liu, Yuchong Hu, Zhengyu Liu, Xinjue Zheng, Wenxiang Zhou |
| 2025 | MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs. Wenjing Huang, Jinwu Yang, Shengquan Yin, Haoxu Li, Yida Gu, Zedong Liu, Xing Jing, Zheng Wei, Shiyuan Fu, Hao Hu, Guangming Tan, Dingwen Tao |
| 2025 | MISA-AKMC : Achieve Kinetic Monte Carlo Simulation of 20 Quadrillion Atoms on GPU Clusters. Shunde Li, Zhijie Pan, Ningming Nie, Jue Wang, He Bai, Genshen Chu, Yan Zeng, Xinfu He, Yangang Wang, Changjun Hu, Xuebin Chi |
| 2025 | MLP-Offload: Multi-Level, Multi-Path Offloading for LLM Pre-training to Break the GPU Memory Wall. Avinash Maurya, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae |
| 2025 | MXBLAS: Accelerating 8-bit Deep Learning with a Unified Micro-Scaled GEMM Library. Weihu Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng |
| 2025 | Make Updates Faster: A Fast Multi-Stripe Updates Framework in Erasure-Coded Storage Clusters. Hai Zhou, Dan Feng |
| 2025 | Matrix Is All You Need: Rearchitecting Quantum Chemistry to Scale on AI Accelerators. Haozhi Han, Kun Li, Fusong Ju, Qi Li, Hong An, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang |
| 2025 | MaverIQ: Fingerprint-Guided Extrapolation and Fragmentation-Aware Layering for Intent-Based LLM Serving. Dimitrios Liakopoulos, Prasoon Sinha, Tianrui Hu, Myungjin Lee, Neeraja J. Yadwadkar |
| 2025 | MetoHash: A Memory-Efficient and Traffic-Optimized Hashing Index on Hybrid PMem-DRAM Memories. Zixiang Yu, Guangyang Deng, Zhirong Shen, Qiangsheng Su, Ronglong Wu, Xiaoli Wang, Quanqing Xu, Chuanhui Yang, Zhifeng Bao |
| 2025 | Microscopic-Level Mouse Whole Cortex Simulation Composed of 9 Million Biophysical Neurons and 26 Billion Synapses on the Supercomputer Fugaku. Rin Kuriyama, Kaaya Akira, Laura Green, Beatriz Herrera, Kael Dai, Mari Iura, Gilles Gouaillardet, Asako Terasawa, Taira Kobayashi, Jun Igarashi, Anton Arkhipov, Tadashi Yamazaki |
| 2025 | Million-Atom Ab Initio Electron Dynamics: Discontinuous Galerkin Real-Time Time-Dependent Density Functional Theory. Junwei Feng, Junshi Chen, Xiangyu Zhang, Junhui Liu, Xinming Qin, Lingyun Wan, Sheng Chen, Wentiao Wu, Bingkun Hou, Yexuan Lin, Yihong Zhang, Zechuan Zhang, Yijun Hu, Weile Jia, Hong An, Jinlong Yang, Wei Hu |
| 2025 | Minimizing Power Waste in Heterogenous Computing via Adaptive Uncore Scaling. Zhong Zheng, Seyfal Sultanov, Michael E. Papka, Zhiling Lan |
| 2025 | Moment: Co-optimizing Physical Communication Topology and Data Placement for Multi-GPU Out-of-core GNN Training. Zuocheng Shi, Jie Sun, Ziyu Song, Mo Sun, Yang Xiao, Fei Wu, Zeke Wang |
| 2025 | Multiscale Light-Matter Dynamics in Quantum Materials: From Electrons to Topological Superlattices. Taufeq Mohammed Razakh, Thomas Linker, Ye Luo, Nariman Piroozan, Simon John Pennycook, Nalini Kumar, Albert Musaelian, Anders Johansson, Boris Kozinsky, Rajiv K. Kalia, Priya Vashishta, Fuyuki Shimojo, Shinnosuke Hattori, Ken-ichi Nomura, Aiichiro Nakano |
| 2025 | NNQS-SCI: Tackling Trillion-Dimensional Hilbert Space with Adaptive Neural Network Quantum States. Bowen Kan, Yumeng Zhou, Daiyou Xie, Pengyu Zhou, Yunquan Zhang, Honghui Shang |
| 2025 | Numerical Performance of the Implicitly Restarted Arnoldi Method in OFP8, Bfloat16, Posit, and Takum Arithmetics. Laslo Hunhold, James Quinlan, Stefan Wesner |
| 2025 | ODOS-MPI: HPC-Friendly SmartNIC Offloading of Computation/Communication Kernels. Muhammad Usman, Mariano Benito, Sergio Iserte, Antonio J. Peña |
| 2025 | ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling. Xiao Wang, Jong-Youl Choi, Takuya Kurihana, Isaac Lyngaas, Hong-Jun Yoon, Xi Xiao, David Pugmire, Ming Fan, Nasik Muhammad Nafi, Aristeidis Tsaris, Ashwin M. Aji, Maliha Hossain, Mohamed Wahib, Dali Wang, Peter E. Thornton, Prasanna Balaprakash, Moetasim Ashfaq, Dan Lu |
| 2025 | Optimizing Data Acquisitions in Multi-Robot Systems. Yanhao Li, Zijun Xu, Xuanjun Wen, Yanjie Song, Guancheng Li, Shu Yin |
| 2025 | Optimizing Quantum Circuit Mapping to Reduce Inter-Module Communications in Distributed Architectures. Longshan Xu, Edwin Hsing-Mean Sha, Xiulin Cui, Qingfeng Zhuge |
| 2025 | PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training. Seth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert B. Ross, Shivaram Venkataraman |
| 2025 | Parallel Rank-Adaptive Higher Order Orthogonal Iteration. João Pinheiro, Aditya Devarakonda, Grey Ballard |
| 2025 | PerfDojo: Automated ML Library Generation for Heterogeneous Architectures. Andrei Ivanov, Siyuan Shen, Gioele Gottardo, Marcin Chrapek, Afif Boudaoud, Timo Schneider, Luca Benini, Torsten Hoefler |
| 2025 | Phoenix: A Refactored I/O Stack for GPU Direct Storage without Phony Buffers. Jianqin Yan, Shi Qiu, Yina Lv, Yifan Hu, Hao Chen, Zhirong Shen, Xin Yao, Renhai Chen, Jiwu Shu, Gong Zhang, Yiming Zhang |
| 2025 | Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training. Aditya K. Ranjan, Siddharth Singh, Cunyang Wei, Abhinav Bhatele |
| 2025 | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2025, St. Louis, MO, USA, November 16-21, 2025 |
| 2025 | QDockBank: A dataset for Ligand Docking on Protein Fragments Predicted on Utility-Level Quantum Computers. Yuqi Zhang, Yuxin Yang, Cheng-Chang Lu, Weiwen Jiang, Feixiong Cheng, Bo Fang, Qiang Guan |
| 2025 | Qonductor: A Cloud Orchestrator for Quantum Computing. Emmanouil Giortamis, Francisco Romão, Nathaniel Tornow, Dmitry Lugovoy, Pramod Bhatotia |
| 2025 | RAPTOR: Practical Numerical Profiling of Scientific Applications. Faveo Hoerold, Ivan R. Ivanov, Akash Dhruv, William S. Moses, Anshu Dubey, Mohamed Wahib, Jens Domke |
| 2025 | Real-Time Bayesian Inference at Extreme Scale: A Digital Twin for Tsunami Early Warning Applied to the Cascadia Subduction Zone. Stefan Henneking, Sreeram Venkat, Veselin Dobrev, John Camier, Tzanio V. Kolev, Milinda Fernando, Alice-Agnes Gabriel, Omar Ghattas |
| 2025 | RedSan: A Redundant Memory Instruction Sanitizer for GPU Programs. Yanbo Zhao, Yueming Hao, Zecheng Li, Shuyin Jiao, Xu Liu, Jiajia Li |
| 2025 | Reproducibility Report for SC25 Paper ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage. Iacopo Colonnelli |
| 2025 | Reproducibility Report for SC25 Paper Ab-initio Quantum Transport with the GW Approximation, 42, 240 Atoms, and Sustained Exascale Performance. Sayef Azad Sakin |
| 2025 | Reproducibility Report for SC25 Paper Addressing Reproducibility Challenges in HPC with Continuous Integration. Ruben Laso |
| 2025 | Reproducibility Report for SC25 Paper Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality. Jan Laukemann |
| 2025 | Reproducibility Report for SC25 Paper Bridging the Gap Between Binary and Source Based Package Management in Spack. Iacopo Colonnelli |
| 2025 | Reproducibility Report for SC25 Paper C.A.T.S.: Memory and Control Flow Tracing for Whole-Program Performance Analysis. Kurt H. Maier |
| 2025 | Reproducibility Report for SC25 Paper CPU- and GPU-initiated Communication Strategies for Conjugate Gradient Methods on Large GPU Clusters. Brian J. N. Wylie |
| 2025 | Reproducibility Report for SC25 Paper Caracal: A GPU-Resident Sparse LU Solver with Lightweight Fine-Grained Scheduling. Dogan Sagbili |
| 2025 | Reproducibility Report for SC25 Paper DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs. Sergej Breiter |
| 2025 | Reproducibility Report for SC25 Paper Demystifying the Resilience of Large Language Model Inference: An End-to-End Perspective. Sandra Wienke |
| 2025 | Reproducibility Report for SC25 Paper FaSTCC: Fast Sparse Tensor Contractions on CPUs. Marcel Koch |
| 2025 | Reproducibility Report for SC25 Paper GPU Lossy Compression for HPC Can Be Versatile and Ultra-Fast. Thomas Randall |
| 2025 | Reproducibility Report for SC25 Paper HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUs. Philippe Swartvagher |
| 2025 | Reproducibility Report for SC25 Paper High-Performance Branch-Free Algorithms for Extended-Precision Floating-Point Arithmetic. Minh Chung |
| 2025 | Reproducibility Report for SC25 Paper KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPU. Amir Raoofy |
| 2025 | Reproducibility Report for SC25 Paper MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs. Joshua Hoke Davis |
| 2025 | Reproducibility Report for SC25 Paper MLP-Offload: Multi-Level, Multi-Path Offloading for LLM Pre-training to Break the GPU Memory Wall. Benjamin Brock |
| 2025 | Reproducibility Report for SC25 Paper MXBLAS: Accelerating 8-bit Deep Learning with a Unified Micro-Scaled GEMM Library. Roberto R. Expósito |
| 2025 | Reproducibility Report for SC25 Paper MetoHash: A Memory-Efficient and Traffic-Optimized Hashing Index on Hybrid PMem-DRAM Memories. Alessio Orsino |
| 2025 | Reproducibility Report for SC25 Paper Moment: Co-optimizing Physical Communication Topology and Data Placement for Multi-GPU Out-of-core GNN Training. Minh Chung |
| 2025 | Reproducibility Report for SC25 Paper Numerical Performance of the Implicitly Restarted Arnoldi Method in OFP8, Bfloat16, Posit, and Takum Arithmetics. Pedro Bruel |
| 2025 | Reproducibility Report for SC25 Paper Optimizing Quantum Circuit Mapping to Reduce Inter-Module Communications in Distributed Architectures. Kurt H. Maier |
| 2025 | Reproducibility Report for SC25 Paper RAPTOR: Practical Numerical Profiling of Scientific Applications. Ruben Laso |
| 2025 | Reproducibility Report for SC25 Paper RedSan: A Redundant Memory Instruction Sanitizer for GPU Programs. Volker Weinberg |
| 2025 | Reproducibility Report for SC25 Paper SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching. Gianluca Mittone |
| 2025 | Reproducibility Report for SC25 Paper STZ: A High Quality and High Speed Streaming Lossy Compression Framework for Scientific Data. Marc-André Vef |
| 2025 | Reproducibility Report for SC25 Paper Sparsified Preconditioned Conjugate Gradient Solver on GPUs. Sixu Li |
| 2025 | Reproducibility Report for SC25 Paper Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUs. Philippe Swartvagher |
| 2025 | Reproducibility Report for SC25 Paper TensorMD: Accelerating Molecular Dynamics with a High-Performance Machine Learning Interatomic Potential. Minh Chung |
| 2025 | Reproducibility Report for SC25 Paper ThirstyFLOPS: Water Footprint Modeling and Analysis Toward Sustainable HPC Systems. Shaina Smith |
| 2025 | Reproducibility Report for SC25 Paper TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPU. Arjun Parab |
| 2025 | Reproducibility Report for SC25 Paper Uno: A One-Stop Solution for Inter- and Intra- Data Center Congestion Control and Reliable Connectivity. Strahinja Trecakov |
| 2025 | Reproducibility Report for SC25 Paper X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms. Joseph Schuchart |
| 2025 | Reproducibility Report for SC25 Paper XaaS Containers: Performance-Portable Representation With Source and IR Containers. Joao Vicente Ferreira Lima |
| 2025 | Reproducibility Report for SC25 Paper Zero-Value Code Specialization via Profile-Guided Control Data Flow Analysis. Quentin Guilloteau |
| 2025 | Reproducibility Report for SC25 Paper lsCOMP: Efficient Light Source Compression. Arjun Parab |
| 2025 | Reproducibility Report for SC25 Paper lsCOMP: Efficient Light Source Compression. Vinícius Garcia Pinto |
| 2025 | Rethinking Back Transformation in 2-stage Eigenvalue Decomposition on Heterogeneous Architectures. Hansheng Wang, Dajun Huang, Gaoyuan Zou, Lu Shi, Xu Jiang, Xi Wu, Hancong Duan, Shaoshuai Zhang |
| 2025 | RingX: Scalable Parallel Attention for Long-Context Learning on HPC. Junqi Yin, Mijanur Palash, Mallikarjun Shankar, Feiyi Wang |
| 2025 | SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication. Mikhail Khalilov, Siyuan Shen, Marcin Chrapek, Tiancheng Chen, Kenji Nakano, Nicola Mazzoletti, Peter-Jan Gootzen, Salvatore Di Girolamo, Rami Nudelman, Gil Bloch, Jithin Jose, Abdul Kabbani, Sreevatsa Anantharamu, Jie Zhang, Konstantin Taranov, Zhuolong Yu, Scott Moe, Mahmoud Elhaddad, Torsten Hoefler |
| 2025 | SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching. Antonio De Caro, Gennaro Cordasco, Federico Ficarelli, Biagio Cosenza |
| 2025 | SIREN: Software Identification and Recognition in HPC Systems. Thomas Jakobsche, Fredrik Robertsén, Jessica R. Jones, Utz-Uwe Haus, Florina M. Ciorba |
| 2025 | STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems. Chris Egersdoerfer, Philip H. Carns, Shane Snyder, Robert Ross, Dong Dai |
| 2025 | STZ: A High Quality and High Speed Streaming Lossy Compression Framework for Scientific Data. Daoce Wang, Pascal Grosset, Jesus Pulido, Jiannan Tian, Tushar M. Athawale, Jinda Jia, Baixi Sun, Boyuan Zhang, Sian Jin, Kai Zhao, James P. Ahrens, Fengguang Song |
| 2025 | Scaling Out Chip Interconnect Networks with Implicit Sequence Numbers. Giyong Jung, Saeid Gorgin, John Kim, Jungrae Kim |
| 2025 | Scaling the memory wall using mixed-precision - HPG-MxP on an exascale machine. Aditya Kashi, Nicholson Koukpaizan, Hao Lu, Michael A. Matheson, Sarp Oral, Feiyi Wang |
| 2025 | Simulating many-engine spacecraft: Exceeding 1 quadrillion degrees of freedom via information geometric regularization. Benjamin Wilfong, Anand Radhakrishnan, Henry Le Berre, Daniel Vickers, Tanush Prathi, Nikolaos Tselepidis, Benedikt Dorschner, Reuben D. Budiardja, Brian Cornille, Stephen Abbott, Florian Schäfer, Spencer H. Bryngelson |
| 2025 | SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training. Zhouyang Li, Yuliang Liu, Wei Zhang, Tailing Yuan, Bin Chen, Chengru Song |
| 2025 | SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation. Qi Li, Kun Li, Haozhi Han, Liang Yuan, Yunquan Zhang, Yifeng Chen, Junshi Chen, Hong An, Ting Cao, Mao Yang |
| 2025 | Sparsified Preconditioned Conjugate Gradient Solver on GPUs. Da Ma, Khalid Ahmad, Kazem Cheshmi, Hari Sundar, Mary W. Hall |
| 2025 | Stability-preserving Lossy Compression for Large-scale Partial Differential Equations. Qian Gong, Mark Ainsworth, Jieyang Chen, Xin Liang, Liangji Zhu, Ethan Klasky, Tushar M. Athawale, Qing Liu, Anand Rangarajan, Sanjay Ranka, Scott Klasky |
| 2025 | Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUs. Shengkun Cui, Archit Patke, Hung Nguyen, Aditya Ranjan, Ziheng Chen, Phuong Cao, Gregory H. Bauer, Brett M. Bode, Catello Di Martino, Saurabh Jha, Chandra Narayanaswami, Daby Sow, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer |
| 2025 | StraGCN: GPU-Accelerated Strassen's Sparse-Dense Matrix Multiplication for Graph Convolutional Network Training. Weidong He, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Shuhao Zhang, Fubing Mao, Hai Jin |
| 2025 | T2-RELION: Task Parallelism, Tensor Core Accelerated RELION for Cryo-EM 3D Reconstruction. Jiayu Fu, Jingle Xu, Lin Gan, Tianqi Mao, Zirong Shen, Yinuo Wang, Zeyu Song, Xiaohui Duan, Wei Xue, Guangwen Yang |
| 2025 | TENSORMD: Accelerating Molecular Dynamics with a High-Performance Machine Learning Interatomic Potential. Yucheng Ouyang, Xin Chen, Ying Liu, Honghui Shang, Zhenchuan Chen, Rongfen Lin, Xingyu Gao, Lifang Wang, Fang Li, Jiahao Shan, Haifeng Song, Huimin Cui, Xiaobing Feng, Jingling Xue |
| 2025 | TT-LoRA MoE: Using Parameter-Efficient Fine-Tuning and Sparse Mixture-Of-Experts. Pradip Kunwar, Minh N. Vu, Maanak Gupta, Mahmoud Abdelsalam, Manish Bhattarai |
| 2025 | TaGNN: An Efficient Topology-aware Accelerator for High-performance Dynamic Graph Neural Network. Hui Yu, Yu Zhang, Ligang He, Bing Peng, Jin Zhao, Zixiao Wang, Hao Qi, Hai Jin |
| 2025 | The First Star-by-star $N$-body/Hydrodynamics Simulation of Our Galaxy Coupling with a Surrogate Model. Keiya Hirashima, Michiko S. Fujii, Takayuki R. Saitoh, Naoto Harada, Kentaro Nomura, Kohji Yoshikawa, Yutaka Hirai, Tetsuro Asano, Kana Moriwaki, Masaki Iwasawa, Takashi Okamoto, Junichiro Makino |
| 2025 | ThirstyFLOPS: Water Footprint Modeling and Analysis Toward Sustainable HPC Systems. Yankai Jiang, Raghavendra Kanakagiri, Rohan Basu Roy, Devesh Tiwari |
| 2025 | TianheEngine: Hierarchy-aware Adaptive Partitioning System for Trillion-scale Graph Processing. Xinbiao Gan, Tiejun Li, Yiqi Wang, Qiang Zhang, Yongming Yi, Chunye Gong, Jie Liu, Kai Lu |
| 2025 | Towards Efficient LLM Inference via Collective and Adaptive Speculative Decoding. Siqi Wang, Hailong Yang, Xuezhu Wang, Tongxuan Liu, Pengbo Wang, Yufan Xu, Xuning Liang, Kejie Ma, Tianyu Feng, Xin You, Ruihao Gong, Rui Wang, Zhongzhi Luan, Yi Liu, Depei Qian |
| 2025 | TraceFlow: Efficient Trace Analysis for Large-Scale Parallel Applications via Interaction Pattern-Aware Trace Distribution. Yuyang Jin, Xirui Shui, Mingshu Zhai, Zan Zong, Feng Zhang, Felix Wolf, Jidong Zhai |
| 2025 | Trillion Ligands per Day: Performance-Portable Virtual Screening via Compound Database Optimization and Multi-Target Docking. Xiaohui Duan, Cheng Shen, Gaowei Chen, Shanshan Wu, Yizhen Wang, Yizhen Chen, Qixin Chang, Qiancheng Xia, Zekun Yin, Lin Gan, Yibing Shan, Guangwen Yang, Weiguo Liu, Niu Huang |
| 2025 | TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPU. Shixun Wu, Yujia Zhai, Huangliang Dai, Yue Zhu, Haiyang Hu, Zizhong Chen |
| 2025 | UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling. Haoyu Yang, Zan Zong, Yuyang Jin, Kinman Lei, Jiaao He, Qigang Yang, Jidong Zhai |
| 2025 | Uno: A One-Stop Solution for Inter- and Intra-Data Center Congestion Control and Reliable Connectivity. Tommaso Bonato, Sepehr Abdous, Abdul Kabbani, Ahmad Ghalayini, Nadeen Gebara, Terry Lam, Anup Agarwal, Tiancheng Chen, Zhuolong Yu, Konstantin Taranov, Mahmoud Elhaddad, Daniele De Sensi, Soudeh Ghorbani, Torsten Hoefler |
| 2025 | UpANNS: Enhancing Billion-Scale ANNS Efficiency with Real-World PIM Architecture. Sitian Chen, Amelie Chi Zhou, Yucheng Shi, Yusen Li, Xin Yao |
| 2025 | Utilizing Sparsity in the GPU-accelerated Assembly of Schur Complement Matrices in Domain Decomposition Methods. Jakub Homola, Ondrej Meca, Lubomír Ríha, Tomás Brzobohatý |
| 2025 | Wasp: Efficient Asynchronous Single-Source Shortest Path on Multicore Systems via Work Stealing. Marco D'Antonio, Son Thai Mai, Philippas Tsigas, Hans Vandierendonck |
| 2025 | What to Support When You're Compressing: The State of Practice Gaps and Opportunities for Scientific Data Compression. Franck Cappello, Robert Underwood, Yuri Alexeev, Allison H. Baker, Ebru Bozdag, Martin Burtscher, Kyle Chard, Sheng Di, Kyle Gerard Felker, Paul Christopher O'Grady, Hanqi Guo, Yafan Huang, Peng Jiang, Sian Jin, Petter Johansson, Shaomeng Li, Xin Liang, Erik Lindahl, Peter Lindstrom, Zarija Lukic, Magnus Lundborg, Danylo Lykov, Masaru Nagaso, Kento Sato, Amarjit Singh, Seung Woo Son, Shihui Song, William Tang, Dingwen Tao, Jiannan Tian, Kazutomo Yoshii, Kai Zhao |
| 2025 | Workload Intelligence: Workload-Aware IaaS abstraction for Cloud Efficiency. Lexiang Huang, Anjaly Parayil, Jue Zhang, Xiaoting Qin, Chetan Bansal, Jovan Stojkovic, Pantea Zardoshti, Pulkit A. Misra, Eli Cortez, Raphael Ghelman, Íñigo Goiri, Saravan Rajmohan, Jim Kleewein, Rodrigo Fonseca, Timothy Zhu, Ricardo Bianchini |
| 2025 | X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms. Yueming Yuan, Ahan Gupta, Jianping Li, Sajal Dash, Feiyi Wang, Minjia Zhang |
| 2025 | XaaS Containers: Performance-Portable Representation With Source and IR Containers. Marcin Copik, Eiman Alnuaimi, Alok Kamatar, Valérie Hayot-Sasson, Alberto Madonna, Todd Gamblin, Kyle Chard, Ian T. Foster, Torsten Hoefler |
| 2025 | Zero-Value Code Specialization via Profile-Guided Control Data Flow Analysis. Shaokang Du, Kelun Lei, Xin You, Hailong Yang, Yufan Xu, Zhongzhi Luan, Yi Liu, Depei Qian |
| 2025 | cMPI: Using CXL Memory Sharing for MPI One-Sided and Two-Sided Inter-Node Communications. Xi Wang, Bin Ma, Jongryool Kim, Byungil Koh, Hoshik Kim, Dong Li |
| 2025 | coMtainer: Compilation-assisted HPC Container Images with Enhanced Adaptability. Yuhao Gu, Haoquan Chen, Xianjie Chen, Jiangsu Du, Zhiguang Chen, Nong Xiao, Xianwei Zhang, Yutong Lu |
| 2025 | gLLM: Global Balanced Pipeline Parallelism Systems for Distributed LLMs Serving with Token Throttling. Tianyu Guo, Xianwei Zhang, Jiangsu Du, Zhiguang Chen, Nong Xiao, Yutong Lu |
| 2025 | gParaKV: A GPGPU-accelerated Key-Value Separation-based KV Store with Optimized Compaction and Garbage Collection. Hui Sun, Xiangxiang Jiang, Xiao Qin, Song Jiang, Enhui Wang |
| 2025 | lsCOMP: Efficient Light Source Compression. Yafan Huang, Sheng Di, Robert Underwood, Peco Myint, Miaoqi Chu, Guanpeng Li, Nicholas Schwarz, Franck Cappello |
| 2025 | mLR: Scalable Laminography Reconstruction based on Memoization. Bin Ma, Viktor Nikitin, Xi Wang, Tekin Bicer, Dong Li |