SC A

182 papers

YearTitle / Authors
2025A Nested Krylov Method Using Half-Precision Arithmetic.
Kengo Suzuki, Takeshi Iwashita
2025A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation.
Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Peng Chen, Mohamed Wahib, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Yun Lin, Jin Song Dong, Wenxi Zhu, Minwen Deng
2025A Streaming Collectives Interface Targeting Dataflow Acceleration and HPC Workloads.
Nicholas Contini, Jake Queiser, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda
2025ACTINA: Adapting Circuit-Switching Techniques for AI Networking Architectures.
Zhenguo Wu, Benjamin Klenk, Larry Dennison, Keren Bergman
2025AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions.
Väinö Hatanpää, Eugene Ku, Jason Stock, Murali Emani, Sam Foreman, Chunyong Jung, Sandeep Madireddy, Tung Nguyen, Varuni Sastry, Ray A. O. Sinurat, Huihuo Zheng, Sam Wheeler, Troy Arcomano, Venkatram Vishwanath, Rao Kotamarthi
2025AGILE: Lightweight and Efficient Asynchronous GPU-SSD Integration.
Zhuoping Yang, Jinming Zhuang, Xingzhen Chen, Alex K. Jones, Peipei Zhou
2025AMRaCut: Scalable Partitioning for Adaptive Mesh Refinement.
Budvin Edippuliarachchi, David Van Komen, Hari Sundar
2025ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage.
Siyuan Shen, Tommaso Bonato, Zhiyi Hu, Pasquale Jordan, Tiancheng Chen, Torsten Hoefler
2025Ab-initio Quantum Transport with the GW Approximation, 42, 240 Atoms, and Sustained Exascale Performance.
Nicolas Vetsch, Alexander Maeder, Vincent Maillou, Anders Winka, Jiang Cao, Grzegorz Kwasniewski, Leonard Deuschle, Torsten Hoefler, Alexandros Nikolaos Ziogas, Mathieu Luisier
2025Accelerated Spatio-Temporal Bayesian Modeling for Multivariate Gaussian Processes.
Lisa Gaedke-Merzhäuser, Vincent Maillou, Fernando Rodriguez Avellaneda, Olaf Schenk, Paula Moraga, Mathieu Luisier, Alexandros Nikolaos Ziogas, Håvard Rue
2025Addressing Reproducibility Challenges in HPC with Continuous Integration.
Valérie Hayot-Sasson, Nathaniel Hudson, André Bauer, Maxime Gonthier, Ian T. Foster, Kyle Chard
2025Advancing Quantum Many-Body GW Calculations on Exascale Supercomputing Platforms.
Benran Zhang, Daniel Weinberg, Chih-En Hsu, Aaron R. Altman, Yuming Shi, James B. White, Derek Vigil-Fowler, Steven G. Louie, Jack R. Deslippe, Felipe H. da Jornada, Zhenglu Li, Mauro Del Ben
2025Augmenting Simulated Noisy Quantum Data Collection by Orders of Magnitude Using Pre-Trajectory Sampling with Batched Execution.
Taylor Lee Patti, Thien Nguyen, Justin Gage Lietz, Alex McCaskey, Brucek Khailany
2025Automatic Generation of Mappings for Distributed Fourier Operations.
Doru-Thom Popovici, Botao Wu, John Shalf, Martin Kong
2025BLAZE: Exploiting Hybrid Parallelism and Size-customized Kernels to Accelerate BLASTP on GPUs.
Sree Charan Gundabolu, Mithuna Thottethodi, T. N. Vijaykumar
2025BOER: Enhancing Resource Utilization for Deep Learning Inference with Hybrid Spatial GPU Sharing.
Bowen Zhang, Yuhang Wang, Zhuozhao Li
2025Balanced and Elastic End-to-end Training of Dynamic LLMs.
Mohamed Wahib, Muhammed Abdullah Soyturk, Didem Unat
2025Benchmark-driven Models for Energy Analysis and Attribution of GPU-Accelerated Supercomputing.
Oscar Antepara, Zhengji Zhao, Brian Austin, Nan Ding, Leonid Oliker, Nicholas J. Wright, Samuel Williams
2025Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality.
Daniele De Sensi, Saverio Pasqualoni, Lorenzo Piarulli, Tommaso Bonato, Seydou Ba, Matteo Turisini, Jens Domke, Torsten Hoefler
2025Boosting Scientific Error-Bounded Lossy Compression through Optimized Synergistic Lossy-Lossless Orchestration.
Shixun Wu, Jinwen Pan, Jinyang Liu, Jiannan Tian, Ziwei Qiu, Jiajun Huang, Kai Zhao, Xin Liang, Sheng Di, Zizhong Chen, Franck Cappello
2025Breaking the System Noise Barrier at Exascale.
Edgar A. León, Joseph Glenski, Mark J. Stock, Kim H. McMahon, William Loewe, Clark Snyder, Larry Kaplan, Srinath Vadlamani, Timothy I. Mattox, Trent D'Hooge, Brian Behlendorf, Nathan Hanford, Ramesh Pankajakshan, Matthew L. Leininger
2025Bridging the Gap Between Binary and Source Based Package Management in Spack.
John Gouwar, Gregory Becker, Tamara Dahlgren, Nathan Hanford, Arjun Guha, Todd Gamblin
2025Bridging the Gap between Unstructured SpMM and Structured Sparse Tensor Cores.
Yukang Dong, Ziyuan Shen, Wenbin Jiang, Zhenghang Liu, Ye Xu, Bingyi He, Ran Zheng, Hai Jin
2025Bubble: Towards Scalable Evolving Graph Processing via Mini-Batch Sorting.
Long Deng, Yongkun Li, Zaigui Zhang, Yinlong Xu, John C. S. Lui
2025BurstEngine: An efficient distributed framework for training transformers On extremely Long sequences of over 1M tokens.
Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun
2025C.A.T.S.: Memory and Control Flow Tracing for Whole-Program Performance Analysis.
Philipp Schaad, Tal Ben-Nun, Torsten Hoefler
2025COSMOS: Performance Portable Graph Pattern Matching with Domain-Specific Software Distributed Shared Memory.
Zhiheng Lin, Ke Meng, Changjie Xu, Weichen Cao, Guangming Tan
2025CPU- and GPU-initiated Communication Strategies for Conjugate Gradient Methods on Large GPU Clusters.
James D. Trotter, Sinan Ekmekçibasi, Dogan Sagbili, Johannes Langguth, Xing Cai, Didem Unat
2025Caracal: A GPU-Resident Sparse LU Solver with Lightweight Fine-Grained Scheduling.
Jie Ren, Tingxuan Zhong, Yuxi Hong, Guofeng Feng, Xincheng Wang, Weile Jia, Hatem Ltaief, David Elliot Keyes
2025Characterizing Performance, Power, and Energy of AMD CDNA3 GPU Family.
Bagus Hanindhito, Bhavesh Patel
2025ChatHPC: Building the Foundations for a Productive and Trustworthy AI-Assisted HPC Ecosystem.
Pedro Valero-Lara, Aaron R. Young, Jeffrey S. Vetter, Zheming Jin, Swaroop Pophale, Mohammad Alaul Haque Monil, Keita Teranishi, William F. Godoy
2025Compile-Time QoS Scheme for Deep Learning Inferences.
Sungin Hong, Hyunjun Kim, Hwansoo Han
2025Computing the Full Earth System at 1km Resolution.
Daniel Klocke, Claudia Frauen, Jan Frederik Engels, Dmitry Alexeev, René Redler, Reiner Schnur, Helmuth Haak, Luis Kornblueh, Nils Brüggemann, Fatemeh Chegini, Manoel Römmer, Lars Hoffmann, Sabine Griessbach, Mathis Bode, Jonathan Coles, Miguel Gila, William Sawyer, Alexandru Calotoiu, Yakup Budanaz, Pratyai Mazumder, Marcin Copik, Benjamin Weber, Andreas Herten, Hendryk Bockelmann, Torsten Hoefler, Cathy Hohenegger, Bjorn Stevens
2025Constraint-Driven Auto-Tuning of GEMM-like Operators for MT-3000 Many-core Processor.
Xinxin Qi, Jianbin Fang, Peng Zhang, Yonggang Che, Jie Ren
2025Core Hours and Carbon Credits: Incentivizing Sustainability in HPC.
Alok Kamatar, Maxime Gonthier, Valérie Hayot-Sasson, André Bauer, Marcin Copik, Raul Castro Fernandez, Torsten Hoefler, Kyle Chard, Ian T. Foster
2025Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in Capability.
Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Esteban M. Rangel, Salman Habib, Katrin Heitmann, Patricia Larsen, Vitali A. Morozov, Adrian Pope, Claude-André Faucher-Giguère, Antigoni Georgiadou, Damien Lebrun-Grandié, Andrey Prokopenko
2025DAS-ILU: A Distributed Asynchronous Parallel ILU Factorization Based on Domain Decomposition.
Fan Yuan, Shengguo Li, Xiaojian Yang, Yunqing Huang, Hongxia Wang, Chuanfu Xu, Dezun Dong, Tiejun Li, Jianchun Wang, Jie Liu
2025DHAP: Towards Efficient OLAP in a Disaggregated and Heterogeneous Environment.
Guangda Liu, Chenqi Zhang, Yizhou Shan, Hao Feng, Zeke Wang, Shixuan Sun, Minyi Guo, Jieru Zhao
2025DPAR: High-Performance, Secure, and Scalable Differential Privacy-based AllReduce.
Hao Qi, Weicong Chen, Chenghong Wang, Xiaoyi Lu
2025DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs.
Mingkai Chen, Tianhua Han, Cheng Liu, Shengwen Liang, Kuai Yu, Lei Dai, Ziming Yuan, Ying Wang, Lei Zhang, Huawei Li, Xiaowei Li
2025Deep Learning-Enabled Supercritical Flame Simulation at Detailed Chemistry and Real-Fluid Accuracy Towards Trillion-Cell Scale.
Zhuoqiang Guo, Runze Mao, Lijun Liu, Guangming Tan, Weile Jia, Zhi X. Chen
2025Demystifying the Resilience of Large Language Model Inference: An End-to-End Perspective.
Yu Sun, Zachary Coalson, Shiyang Chen, Hang Liu, Zhao Zhang, Sanghyun Hong, Bo Fang, Lishan Yang
2025Deploying Lightweight Input-Aware Selective Instruction Duplication in HPC Applications.
Md Hasanur Rahman, Guanpeng Li
2025Destination Earth: The Climate Change Adaptation Digital Twin.
Ioan Hadade, Daniel Klocke, Jussi Enkovaara, Tuomas Lunttila, Thomas Rackow, Jan Frederik Engels, Claudia Frauen, René Redler, Jenni Kontkanen, Thomas Jung, Dmitry V. Sein, Irina Sandu, Balthasar Reuter, Nils Wedi, Sebastian Milinski, Francisco Doblas-Reyes, Miguel Castrillo, Mario C. Acosta, Sergi Girona, Pekka Manninen
2025Diff-MoE: Efficient Batched MoE Inference with Priority-Driven Differential Expert Caching.
Kexin Li, Wenkan Huang, Qinggang Wang, Long Zheng, Xiaofei Liao, Hai Jin, Jingling Xue
2025Distributed Cross-Channel Hierarchical Aggregation for Foundation Models.
Aristeidis Tsaris, Isaac Lyngaas, John H. Lagergren, Mohamed Wahib, Larry M. York, Prasanna Balaprakash, Dan Lu, Feiyi Wang, Xiao Wang
2025EDDE: Container Deployment Framework Beyond the Cloud.
Hao Fan, Zhuo Huang, Shadi Ibrahim, Lin Gu, Song Wu
2025Effective Node-Level Anomaly Detection in HPC Systems via Coarse-Grained Clustering and Fine-Grained Model Sharing.
Sibo Xia, Yongqian Sun, Xijie Pan, Yuan Yuan, Shenglin Zhang, Shaoyu Hu, Lei Tao, Yuqi Li, Jinghua Feng
2025Exploring and Mitigating Failure Behavior of Large Language Model Training Workloads in HPC Systems.
Pengfei Yu, Jingjing Gu, Hao Han, Dazhong Shen, Bao Wen, Yang Liu
2025FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention.
Huangliang Dai, Shixun Wu, Jiajun Huang, Zizhe Jian, Yue Zhu, Haiyang Hu, Zizhong Chen
2025FaSTCC: Fast Sparse Tensor Contractions on CPUs.
Saurabh Raje, Hunter McCoy, Atanas Rountev, Prashant Pandey, P. Sadayappan
2025Fine-grained Automated Failure Management for Extreme-Scale GPU Accelerated Systems.
Yonatan Levitt, Richard Barella, Sam Zeltner, Thomas Musta, Lance Cheney, Gustavo Espinosa, Olivier Franza, Balazs Gerofi
2025Fringe-SGC: Counting Subgraphs with Fringe Vertices.
Cameron Bradley, Ghadeer Ahmed H. Alabandi, Martin Burtscher
2025GPU Lossy Compression for HPC Can Be Versatile and Ultra-Fast.
Yafan Huang, Sheng Di, Guanpeng Li, Franck Cappello
2025Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction.
Xiao Li, Liangji Zhu, Anand Rangarajan, Sanjay Ranka
2025Graphago: Accelerating SSD-based Graph Processing via Activity-Aware Graph Preprocessing.
Xianghao Xu, Yucheng Zhang, Gongxuan Zhang, Yongli Cheng, Fang Wang
2025GreenMix: Energy-Efficient Serverless Computing via Randomized Sketching on Asymmetric Multi-Cores.
Rohan Basu Roy, Tirthak Patel, Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari
2025HELM: Characterizing Unified Memory Accesses to Improve GPU Performance under Memory Oversubscription.
Nathan Jones, Tyler N. Allen, Rong Ge
2025HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUs.
Yanliang Li, Wenbo Li, Qian Gong, Qing Liu, Norbert Podhorszki, Scott Klasky, Xin Liang, Jieyang Chen
2025HPC-R1: Characterizing R1-like Large Reasoning Models on HPC.
Adam Weingram, Duo Zhang, Zhonghao Chen, Hao Qi, Xiaoyi Lu
2025HStencil: Matrix-Vector Stencil Computation with Interleaved Outer Product and MLA.
Han Huang, Jiabin Xie, Guangnan Feng, Xianwei Zhang, Dan Huang, Zhiguang Chen, Yutong Lu
2025Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism.
Zizhao Mo, Jianxiong Liao, Huanle Xu, Zhi Zhou, Chengzhong Xu
2025High-Performance Branch-Free Algorithms for Extended-Precision Floating-Point Arithmetic.
David Kai Zhang, Alex Aiken
2025HyTiS: Hybrid Tile Scheduling for GPU GEMM with Enhanced Wave Utilization and Cache Locality.
Zheng Zhang, Hulin Wang, Hongming Xu, Donglin Yang, Xiaobo Zhou, Dazhao Cheng
2025Hypertron: Efficiently Scaling Large Models by Exploring High-Dimensional Parallelization Space.
Shigang Li, Jingkun Dong, Jihao Chen, Zhi Ma, Zhongzhe Hu
2025Improving SpGEMM Performance Through Matrix-Reordering and Cluster-wise Computation.
Abdullah Al Raqibul Islam, Helen Xu, Dong Dai, Aydin Buluç
2025Insights from Optimizing HPL Performance on Exascale Systems: A Comparative Analysis of Panel Factorization.
Hao Lu, Michael A. Matheson, Noel Chalmers, Aditya Kashi, Nicholas Malaya, Feiyi Wang
2025KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPU.
Hemeng Wang, Yang Du, Sidu Li, Xiaowen Tian, Qingxiao Sun, Weifeng Liu
2025Kilometer-Scale AI-Powered and Performance-Portable Earth System Model (AP3ESM) to Achieve Year-Scale Simulation Speed on Heterogeneous Supercomputers.
Kai Xu, Maoxue Yu, Yuhu Chen, Jie Gao, Shuang Wang, Jiaying Song, Xiaohui Duan, Junwei Wei, Jiangfeng Yu, Hailong Liu, Jinrong Jiang, Yi Zhang, Pengfei Lin, Tianyi Wang, Pengfei Wang, Weipeng Zheng, Jingwei Xie, Jiakang Zhang, Zilu Liu, Xiaoyu Jin, Jilin Wei, Qixin Chang, Qingxia Lin, Yanzhi Zhou, Weiguo Liu, Wei Xue, Yiwen Li, Haohuan Fu, Yue Yu, Xuebin Chi, Lixin Wu
2025LCI: a Lightweight Communication Interface for Efficient Asynchronous Multithreaded Communication.
Jiakun Yan, Marc Snir
2025LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving.
Huanqi Hu, Bowen Xiao, Shixuan Sun, Jianian Yin, Zhexi Zhang, Xiang Luo, Chengquan Jiang, Weiqi Xu, Xiaoying Jia, Xin Liu, Minyi Guo
2025LowDiff: Efficient Frequent Checkpointing via Low-Cost Differential for High-Performance Distributed Training Systems.
Chenxuan Yao, Feifan Liu, Yuchong Hu, Zhengyu Liu, Xinjue Zheng, Wenxiang Zhou
2025MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs.
Wenjing Huang, Jinwu Yang, Shengquan Yin, Haoxu Li, Yida Gu, Zedong Liu, Xing Jing, Zheng Wei, Shiyuan Fu, Hao Hu, Guangming Tan, Dingwen Tao
2025MISA-AKMC : Achieve Kinetic Monte Carlo Simulation of 20 Quadrillion Atoms on GPU Clusters.
Shunde Li, Zhijie Pan, Ningming Nie, Jue Wang, He Bai, Genshen Chu, Yan Zeng, Xinfu He, Yangang Wang, Changjun Hu, Xuebin Chi
2025MLP-Offload: Multi-Level, Multi-Path Offloading for LLM Pre-training to Break the GPU Memory Wall.
Avinash Maurya, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae
2025MXBLAS: Accelerating 8-bit Deep Learning with a Unified Micro-Scaled GEMM Library.
Weihu Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng
2025Make Updates Faster: A Fast Multi-Stripe Updates Framework in Erasure-Coded Storage Clusters.
Hai Zhou, Dan Feng
2025Matrix Is All You Need: Rearchitecting Quantum Chemistry to Scale on AI Accelerators.
Haozhi Han, Kun Li, Fusong Ju, Qi Li, Hong An, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang
2025MaverIQ: Fingerprint-Guided Extrapolation and Fragmentation-Aware Layering for Intent-Based LLM Serving.
Dimitrios Liakopoulos, Prasoon Sinha, Tianrui Hu, Myungjin Lee, Neeraja J. Yadwadkar
2025MetoHash: A Memory-Efficient and Traffic-Optimized Hashing Index on Hybrid PMem-DRAM Memories.
Zixiang Yu, Guangyang Deng, Zhirong Shen, Qiangsheng Su, Ronglong Wu, Xiaoli Wang, Quanqing Xu, Chuanhui Yang, Zhifeng Bao
2025Microscopic-Level Mouse Whole Cortex Simulation Composed of 9 Million Biophysical Neurons and 26 Billion Synapses on the Supercomputer Fugaku.
Rin Kuriyama, Kaaya Akira, Laura Green, Beatriz Herrera, Kael Dai, Mari Iura, Gilles Gouaillardet, Asako Terasawa, Taira Kobayashi, Jun Igarashi, Anton Arkhipov, Tadashi Yamazaki
2025Million-Atom Ab Initio Electron Dynamics: Discontinuous Galerkin Real-Time Time-Dependent Density Functional Theory.
Junwei Feng, Junshi Chen, Xiangyu Zhang, Junhui Liu, Xinming Qin, Lingyun Wan, Sheng Chen, Wentiao Wu, Bingkun Hou, Yexuan Lin, Yihong Zhang, Zechuan Zhang, Yijun Hu, Weile Jia, Hong An, Jinlong Yang, Wei Hu
2025Minimizing Power Waste in Heterogenous Computing via Adaptive Uncore Scaling.
Zhong Zheng, Seyfal Sultanov, Michael E. Papka, Zhiling Lan
2025Moment: Co-optimizing Physical Communication Topology and Data Placement for Multi-GPU Out-of-core GNN Training.
Zuocheng Shi, Jie Sun, Ziyu Song, Mo Sun, Yang Xiao, Fei Wu, Zeke Wang
2025Multiscale Light-Matter Dynamics in Quantum Materials: From Electrons to Topological Superlattices.
Taufeq Mohammed Razakh, Thomas Linker, Ye Luo, Nariman Piroozan, Simon John Pennycook, Nalini Kumar, Albert Musaelian, Anders Johansson, Boris Kozinsky, Rajiv K. Kalia, Priya Vashishta, Fuyuki Shimojo, Shinnosuke Hattori, Ken-ichi Nomura, Aiichiro Nakano
2025NNQS-SCI: Tackling Trillion-Dimensional Hilbert Space with Adaptive Neural Network Quantum States.
Bowen Kan, Yumeng Zhou, Daiyou Xie, Pengyu Zhou, Yunquan Zhang, Honghui Shang
2025Numerical Performance of the Implicitly Restarted Arnoldi Method in OFP8, Bfloat16, Posit, and Takum Arithmetics.
Laslo Hunhold, James Quinlan, Stefan Wesner
2025ODOS-MPI: HPC-Friendly SmartNIC Offloading of Computation/Communication Kernels.
Muhammad Usman, Mariano Benito, Sergio Iserte, Antonio J. Peña
2025ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling.
Xiao Wang, Jong-Youl Choi, Takuya Kurihana, Isaac Lyngaas, Hong-Jun Yoon, Xi Xiao, David Pugmire, Ming Fan, Nasik Muhammad Nafi, Aristeidis Tsaris, Ashwin M. Aji, Maliha Hossain, Mohamed Wahib, Dali Wang, Peter E. Thornton, Prasanna Balaprakash, Moetasim Ashfaq, Dan Lu
2025Optimizing Data Acquisitions in Multi-Robot Systems.
Yanhao Li, Zijun Xu, Xuanjun Wen, Yanjie Song, Guancheng Li, Shu Yin
2025Optimizing Quantum Circuit Mapping to Reduce Inter-Module Communications in Distributed Architectures.
Longshan Xu, Edwin Hsing-Mean Sha, Xiulin Cui, Qingfeng Zhuge
2025PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training.
Seth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert B. Ross, Shivaram Venkataraman
2025Parallel Rank-Adaptive Higher Order Orthogonal Iteration.
João Pinheiro, Aditya Devarakonda, Grey Ballard
2025PerfDojo: Automated ML Library Generation for Heterogeneous Architectures.
Andrei Ivanov, Siyuan Shen, Gioele Gottardo, Marcin Chrapek, Afif Boudaoud, Timo Schneider, Luca Benini, Torsten Hoefler
2025Phoenix: A Refactored I/O Stack for GPU Direct Storage without Phony Buffers.
Jianqin Yan, Shi Qiu, Yina Lv, Yifan Hu, Hao Chen, Zhirong Shen, Xin Yao, Renhai Chen, Jiwu Shu, Gong Zhang, Yiming Zhang
2025Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training.
Aditya K. Ranjan, Siddharth Singh, Cunyang Wei, Abhinav Bhatele
2025Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2025, St. Louis, MO, USA, November 16-21, 2025
2025QDockBank: A dataset for Ligand Docking on Protein Fragments Predicted on Utility-Level Quantum Computers.
Yuqi Zhang, Yuxin Yang, Cheng-Chang Lu, Weiwen Jiang, Feixiong Cheng, Bo Fang, Qiang Guan
2025Qonductor: A Cloud Orchestrator for Quantum Computing.
Emmanouil Giortamis, Francisco Romão, Nathaniel Tornow, Dmitry Lugovoy, Pramod Bhatotia
2025RAPTOR: Practical Numerical Profiling of Scientific Applications.
Faveo Hoerold, Ivan R. Ivanov, Akash Dhruv, William S. Moses, Anshu Dubey, Mohamed Wahib, Jens Domke
2025Real-Time Bayesian Inference at Extreme Scale: A Digital Twin for Tsunami Early Warning Applied to the Cascadia Subduction Zone.
Stefan Henneking, Sreeram Venkat, Veselin Dobrev, John Camier, Tzanio V. Kolev, Milinda Fernando, Alice-Agnes Gabriel, Omar Ghattas
2025RedSan: A Redundant Memory Instruction Sanitizer for GPU Programs.
Yanbo Zhao, Yueming Hao, Zecheng Li, Shuyin Jiao, Xu Liu, Jiajia Li
2025Reproducibility Report for SC25 Paper ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage.
Iacopo Colonnelli
2025Reproducibility Report for SC25 Paper Ab-initio Quantum Transport with the GW Approximation, 42, 240 Atoms, and Sustained Exascale Performance.
Sayef Azad Sakin
2025Reproducibility Report for SC25 Paper Addressing Reproducibility Challenges in HPC with Continuous Integration.
Ruben Laso
2025Reproducibility Report for SC25 Paper Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality.
Jan Laukemann
2025Reproducibility Report for SC25 Paper Bridging the Gap Between Binary and Source Based Package Management in Spack.
Iacopo Colonnelli
2025Reproducibility Report for SC25 Paper C.A.T.S.: Memory and Control Flow Tracing for Whole-Program Performance Analysis.
Kurt H. Maier
2025Reproducibility Report for SC25 Paper CPU- and GPU-initiated Communication Strategies for Conjugate Gradient Methods on Large GPU Clusters.
Brian J. N. Wylie
2025Reproducibility Report for SC25 Paper Caracal: A GPU-Resident Sparse LU Solver with Lightweight Fine-Grained Scheduling.
Dogan Sagbili
2025Reproducibility Report for SC25 Paper DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs.
Sergej Breiter
2025Reproducibility Report for SC25 Paper Demystifying the Resilience of Large Language Model Inference: An End-to-End Perspective.
Sandra Wienke
2025Reproducibility Report for SC25 Paper FaSTCC: Fast Sparse Tensor Contractions on CPUs.
Marcel Koch
2025Reproducibility Report for SC25 Paper GPU Lossy Compression for HPC Can Be Versatile and Ultra-Fast.
Thomas Randall
2025Reproducibility Report for SC25 Paper HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUs.
Philippe Swartvagher
2025Reproducibility Report for SC25 Paper High-Performance Branch-Free Algorithms for Extended-Precision Floating-Point Arithmetic.
Minh Chung
2025Reproducibility Report for SC25 Paper KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPU.
Amir Raoofy
2025Reproducibility Report for SC25 Paper MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs.
Joshua Hoke Davis
2025Reproducibility Report for SC25 Paper MLP-Offload: Multi-Level, Multi-Path Offloading for LLM Pre-training to Break the GPU Memory Wall.
Benjamin Brock
2025Reproducibility Report for SC25 Paper MXBLAS: Accelerating 8-bit Deep Learning with a Unified Micro-Scaled GEMM Library.
Roberto R. Expósito
2025Reproducibility Report for SC25 Paper MetoHash: A Memory-Efficient and Traffic-Optimized Hashing Index on Hybrid PMem-DRAM Memories.
Alessio Orsino
2025Reproducibility Report for SC25 Paper Moment: Co-optimizing Physical Communication Topology and Data Placement for Multi-GPU Out-of-core GNN Training.
Minh Chung
2025Reproducibility Report for SC25 Paper Numerical Performance of the Implicitly Restarted Arnoldi Method in OFP8, Bfloat16, Posit, and Takum Arithmetics.
Pedro Bruel
2025Reproducibility Report for SC25 Paper Optimizing Quantum Circuit Mapping to Reduce Inter-Module Communications in Distributed Architectures.
Kurt H. Maier
2025Reproducibility Report for SC25 Paper RAPTOR: Practical Numerical Profiling of Scientific Applications.
Ruben Laso
2025Reproducibility Report for SC25 Paper RedSan: A Redundant Memory Instruction Sanitizer for GPU Programs.
Volker Weinberg
2025Reproducibility Report for SC25 Paper SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching.
Gianluca Mittone
2025Reproducibility Report for SC25 Paper STZ: A High Quality and High Speed Streaming Lossy Compression Framework for Scientific Data.
Marc-André Vef
2025Reproducibility Report for SC25 Paper Sparsified Preconditioned Conjugate Gradient Solver on GPUs.
Sixu Li
2025Reproducibility Report for SC25 Paper Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUs.
Philippe Swartvagher
2025Reproducibility Report for SC25 Paper TensorMD: Accelerating Molecular Dynamics with a High-Performance Machine Learning Interatomic Potential.
Minh Chung
2025Reproducibility Report for SC25 Paper ThirstyFLOPS: Water Footprint Modeling and Analysis Toward Sustainable HPC Systems.
Shaina Smith
2025Reproducibility Report for SC25 Paper TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPU.
Arjun Parab
2025Reproducibility Report for SC25 Paper Uno: A One-Stop Solution for Inter- and Intra- Data Center Congestion Control and Reliable Connectivity.
Strahinja Trecakov
2025Reproducibility Report for SC25 Paper X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms.
Joseph Schuchart
2025Reproducibility Report for SC25 Paper XaaS Containers: Performance-Portable Representation With Source and IR Containers.
Joao Vicente Ferreira Lima
2025Reproducibility Report for SC25 Paper Zero-Value Code Specialization via Profile-Guided Control Data Flow Analysis.
Quentin Guilloteau
2025Reproducibility Report for SC25 Paper lsCOMP: Efficient Light Source Compression.
Arjun Parab
2025Reproducibility Report for SC25 Paper lsCOMP: Efficient Light Source Compression.
Vinícius Garcia Pinto
2025Rethinking Back Transformation in 2-stage Eigenvalue Decomposition on Heterogeneous Architectures.
Hansheng Wang, Dajun Huang, Gaoyuan Zou, Lu Shi, Xu Jiang, Xi Wu, Hancong Duan, Shaoshuai Zhang
2025RingX: Scalable Parallel Attention for Long-Context Learning on HPC.
Junqi Yin, Mijanur Palash, Mallikarjun Shankar, Feiyi Wang
2025SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication.
Mikhail Khalilov, Siyuan Shen, Marcin Chrapek, Tiancheng Chen, Kenji Nakano, Nicola Mazzoletti, Peter-Jan Gootzen, Salvatore Di Girolamo, Rami Nudelman, Gil Bloch, Jithin Jose, Abdul Kabbani, Sreevatsa Anantharamu, Jie Zhang, Konstantin Taranov, Zhuolong Yu, Scott Moe, Mahmoud Elhaddad, Torsten Hoefler
2025SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching.
Antonio De Caro, Gennaro Cordasco, Federico Ficarelli, Biagio Cosenza
2025SIREN: Software Identification and Recognition in HPC Systems.
Thomas Jakobsche, Fredrik Robertsén, Jessica R. Jones, Utz-Uwe Haus, Florina M. Ciorba
2025STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems.
Chris Egersdoerfer, Philip H. Carns, Shane Snyder, Robert Ross, Dong Dai
2025STZ: A High Quality and High Speed Streaming Lossy Compression Framework for Scientific Data.
Daoce Wang, Pascal Grosset, Jesus Pulido, Jiannan Tian, Tushar M. Athawale, Jinda Jia, Baixi Sun, Boyuan Zhang, Sian Jin, Kai Zhao, James P. Ahrens, Fengguang Song
2025Scaling Out Chip Interconnect Networks with Implicit Sequence Numbers.
Giyong Jung, Saeid Gorgin, John Kim, Jungrae Kim
2025Scaling the memory wall using mixed-precision - HPG-MxP on an exascale machine.
Aditya Kashi, Nicholson Koukpaizan, Hao Lu, Michael A. Matheson, Sarp Oral, Feiyi Wang
2025Simulating many-engine spacecraft: Exceeding 1 quadrillion degrees of freedom via information geometric regularization.
Benjamin Wilfong, Anand Radhakrishnan, Henry Le Berre, Daniel Vickers, Tanush Prathi, Nikolaos Tselepidis, Benedikt Dorschner, Reuben D. Budiardja, Brian Cornille, Stephen Abbott, Florian Schäfer, Spencer H. Bryngelson
2025SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training.
Zhouyang Li, Yuliang Liu, Wei Zhang, Tailing Yuan, Bin Chen, Chengru Song
2025SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation.
Qi Li, Kun Li, Haozhi Han, Liang Yuan, Yunquan Zhang, Yifeng Chen, Junshi Chen, Hong An, Ting Cao, Mao Yang
2025Sparsified Preconditioned Conjugate Gradient Solver on GPUs.
Da Ma, Khalid Ahmad, Kazem Cheshmi, Hari Sundar, Mary W. Hall
2025Stability-preserving Lossy Compression for Large-scale Partial Differential Equations.
Qian Gong, Mark Ainsworth, Jieyang Chen, Xin Liang, Liangji Zhu, Ethan Klasky, Tushar M. Athawale, Qing Liu, Anand Rangarajan, Sanjay Ranka, Scott Klasky
2025Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUs.
Shengkun Cui, Archit Patke, Hung Nguyen, Aditya Ranjan, Ziheng Chen, Phuong Cao, Gregory H. Bauer, Brett M. Bode, Catello Di Martino, Saurabh Jha, Chandra Narayanaswami, Daby Sow, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
2025StraGCN: GPU-Accelerated Strassen's Sparse-Dense Matrix Multiplication for Graph Convolutional Network Training.
Weidong He, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Shuhao Zhang, Fubing Mao, Hai Jin
2025T2-RELION: Task Parallelism, Tensor Core Accelerated RELION for Cryo-EM 3D Reconstruction.
Jiayu Fu, Jingle Xu, Lin Gan, Tianqi Mao, Zirong Shen, Yinuo Wang, Zeyu Song, Xiaohui Duan, Wei Xue, Guangwen Yang
2025TENSORMD: Accelerating Molecular Dynamics with a High-Performance Machine Learning Interatomic Potential.
Yucheng Ouyang, Xin Chen, Ying Liu, Honghui Shang, Zhenchuan Chen, Rongfen Lin, Xingyu Gao, Lifang Wang, Fang Li, Jiahao Shan, Haifeng Song, Huimin Cui, Xiaobing Feng, Jingling Xue
2025TT-LoRA MoE: Using Parameter-Efficient Fine-Tuning and Sparse Mixture-Of-Experts.
Pradip Kunwar, Minh N. Vu, Maanak Gupta, Mahmoud Abdelsalam, Manish Bhattarai
2025TaGNN: An Efficient Topology-aware Accelerator for High-performance Dynamic Graph Neural Network.
Hui Yu, Yu Zhang, Ligang He, Bing Peng, Jin Zhao, Zixiao Wang, Hao Qi, Hai Jin
2025The First Star-by-star $N$-body/Hydrodynamics Simulation of Our Galaxy Coupling with a Surrogate Model.
Keiya Hirashima, Michiko S. Fujii, Takayuki R. Saitoh, Naoto Harada, Kentaro Nomura, Kohji Yoshikawa, Yutaka Hirai, Tetsuro Asano, Kana Moriwaki, Masaki Iwasawa, Takashi Okamoto, Junichiro Makino
2025ThirstyFLOPS: Water Footprint Modeling and Analysis Toward Sustainable HPC Systems.
Yankai Jiang, Raghavendra Kanakagiri, Rohan Basu Roy, Devesh Tiwari
2025TianheEngine: Hierarchy-aware Adaptive Partitioning System for Trillion-scale Graph Processing.
Xinbiao Gan, Tiejun Li, Yiqi Wang, Qiang Zhang, Yongming Yi, Chunye Gong, Jie Liu, Kai Lu
2025Towards Efficient LLM Inference via Collective and Adaptive Speculative Decoding.
Siqi Wang, Hailong Yang, Xuezhu Wang, Tongxuan Liu, Pengbo Wang, Yufan Xu, Xuning Liang, Kejie Ma, Tianyu Feng, Xin You, Ruihao Gong, Rui Wang, Zhongzhi Luan, Yi Liu, Depei Qian
2025TraceFlow: Efficient Trace Analysis for Large-Scale Parallel Applications via Interaction Pattern-Aware Trace Distribution.
Yuyang Jin, Xirui Shui, Mingshu Zhai, Zan Zong, Feng Zhang, Felix Wolf, Jidong Zhai
2025Trillion Ligands per Day: Performance-Portable Virtual Screening via Compound Database Optimization and Multi-Target Docking.
Xiaohui Duan, Cheng Shen, Gaowei Chen, Shanshan Wu, Yizhen Wang, Yizhen Chen, Qixin Chang, Qiancheng Xia, Zekun Yin, Lin Gan, Yibing Shan, Guangwen Yang, Weiguo Liu, Niu Huang
2025TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPU.
Shixun Wu, Yujia Zhai, Huangliang Dai, Yue Zhu, Haiyang Hu, Zizhong Chen
2025UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling.
Haoyu Yang, Zan Zong, Yuyang Jin, Kinman Lei, Jiaao He, Qigang Yang, Jidong Zhai
2025Uno: A One-Stop Solution for Inter- and Intra-Data Center Congestion Control and Reliable Connectivity.
Tommaso Bonato, Sepehr Abdous, Abdul Kabbani, Ahmad Ghalayini, Nadeen Gebara, Terry Lam, Anup Agarwal, Tiancheng Chen, Zhuolong Yu, Konstantin Taranov, Mahmoud Elhaddad, Daniele De Sensi, Soudeh Ghorbani, Torsten Hoefler
2025UpANNS: Enhancing Billion-Scale ANNS Efficiency with Real-World PIM Architecture.
Sitian Chen, Amelie Chi Zhou, Yucheng Shi, Yusen Li, Xin Yao
2025Utilizing Sparsity in the GPU-accelerated Assembly of Schur Complement Matrices in Domain Decomposition Methods.
Jakub Homola, Ondrej Meca, Lubomír Ríha, Tomás Brzobohatý
2025Wasp: Efficient Asynchronous Single-Source Shortest Path on Multicore Systems via Work Stealing.
Marco D'Antonio, Son Thai Mai, Philippas Tsigas, Hans Vandierendonck
2025What to Support When You're Compressing: The State of Practice Gaps and Opportunities for Scientific Data Compression.
Franck Cappello, Robert Underwood, Yuri Alexeev, Allison H. Baker, Ebru Bozdag, Martin Burtscher, Kyle Chard, Sheng Di, Kyle Gerard Felker, Paul Christopher O'Grady, Hanqi Guo, Yafan Huang, Peng Jiang, Sian Jin, Petter Johansson, Shaomeng Li, Xin Liang, Erik Lindahl, Peter Lindstrom, Zarija Lukic, Magnus Lundborg, Danylo Lykov, Masaru Nagaso, Kento Sato, Amarjit Singh, Seung Woo Son, Shihui Song, William Tang, Dingwen Tao, Jiannan Tian, Kazutomo Yoshii, Kai Zhao
2025Workload Intelligence: Workload-Aware IaaS abstraction for Cloud Efficiency.
Lexiang Huang, Anjaly Parayil, Jue Zhang, Xiaoting Qin, Chetan Bansal, Jovan Stojkovic, Pantea Zardoshti, Pulkit A. Misra, Eli Cortez, Raphael Ghelman, Íñigo Goiri, Saravan Rajmohan, Jim Kleewein, Rodrigo Fonseca, Timothy Zhu, Ricardo Bianchini
2025X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms.
Yueming Yuan, Ahan Gupta, Jianping Li, Sajal Dash, Feiyi Wang, Minjia Zhang
2025XaaS Containers: Performance-Portable Representation With Source and IR Containers.
Marcin Copik, Eiman Alnuaimi, Alok Kamatar, Valérie Hayot-Sasson, Alberto Madonna, Todd Gamblin, Kyle Chard, Ian T. Foster, Torsten Hoefler
2025Zero-Value Code Specialization via Profile-Guided Control Data Flow Analysis.
Shaokang Du, Kelun Lei, Xin You, Hailong Yang, Yufan Xu, Zhongzhi Luan, Yi Liu, Depei Qian
2025cMPI: Using CXL Memory Sharing for MPI One-Sided and Two-Sided Inter-Node Communications.
Xi Wang, Bin Ma, Jongryool Kim, Byungil Koh, Hoshik Kim, Dong Li
2025coMtainer: Compilation-assisted HPC Container Images with Enhanced Adaptability.
Yuhao Gu, Haoquan Chen, Xianjie Chen, Jiangsu Du, Zhiguang Chen, Nong Xiao, Xianwei Zhang, Yutong Lu
2025gLLM: Global Balanced Pipeline Parallelism Systems for Distributed LLMs Serving with Token Throttling.
Tianyu Guo, Xianwei Zhang, Jiangsu Du, Zhiguang Chen, Nong Xiao, Yutong Lu
2025gParaKV: A GPGPU-accelerated Key-Value Separation-based KV Store with Optimized Compaction and Garbage Collection.
Hui Sun, Xiangxiang Jiang, Xiao Qin, Song Jiang, Enhui Wang
2025lsCOMP: Efficient Light Source Compression.
Yafan Huang, Sheng Di, Robert Underwood, Peco Myint, Miaoqi Chu, Guanpeng Li, Nicholas Schwarz, Franck Cappello
2025mLR: Scalable Laminography Reconstruction based on Memoization.
Bin Ma, Viktor Nikitin, Xi Wang, Tekin Bicer, Dong Li