| 2025 | A Bidirectional GPU Algorithm for Computing Maximum Matchings in Bipartite Graphs. Anju Mongandampulath Akathoott, Martin Burtscher |
| 2025 | A Deep Look into the Temporal I/O Behavior of HPC Applications. Francieli Boito, Luan Teylo, Mihail Popov, Théo Jolivel, François Tessier, Jakob Lüttgau, Julien Monniot, Ahmad Tarraf, André Ramos Carneiro, Carla Osthoff |
| 2025 | A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems. Minseok Ryu, Geunyeong Byeon, Kibaek Kim |
| 2025 | A Memory-Efficient and Computation-Balanced Lossy Compressor on Wafer-Scale Engine. Shihui Song, Robert Underwood, Sheng Di, Yafan Huang, Peng Jiang, Franck Cappello |
| 2025 | A New Spin on the Fast Multipole Method for GPUS: Rethinking the Far-Field Operators. Arijus Lengvenis, Holger Dachsel, Laura Morgenstern, Ivo Kabadshow |
| 2025 | A Work-Optimal Parallel Algorithm for Aligning Sequences to Genome Graphs. Aranya Banerjee, Daniel Gibney, Helen Xu, Srinivas Aluru |
| 2025 | AI and HPC Applications on Leadership Computing Platforms: Performance and Scalability Studies. JaeHyuk Kwack, Colleen Bertoni, Umesh Unnikrishnan, Riccardo Balin, Khalid Hossain, Yasaman Ghadar, Timothy J. Williams, Abhishek Bagusetty, Mathialakan Thavappiragasam, Väinö Hatanpää, Archit Vasan, John R. Tramm, Scott Parker |
| 2025 | ALGAS: A Low-Latency GPU-Based Approximate Nearest Neighbor Search System. Yuanhui Chen, Lixiao Cui, Zebin Yao, Hao Zhou, Gang Wang, Xiaoguang Liu |
| 2025 | AQUA: Hardware-Agnostic Qubit Allocation for Quantum Multi-Programming. XinYu Piao, JooYong Shim, Joongheon Kim, Jong-Kook Kim |
| 2025 | Accelerate Coastal Ocean Circulation Model with AI Surrogate. Zelin Xu, Jie Ren, Yupu Zhang, Jose Maria Gonzalez Ondina, Maitane Olabarrieta, Tingsong Xiao, Wenchong He, Zibo Liu, Shigang Chen, Kaleb E. Smith, Zhe Jiang |
| 2025 | Accelerating Graph Neural Networks Using a Novel Computation-Friendly Matrix Compression Format. João Nuno Ferreira Alves, Samir Moustafa, Siegfried Benkner, Alexandre P. Francisco, Wilfried N. Gansterer, Luís M. S. Russo |
| 2025 | Accelerating Homotopy Continuation with GPUs: Application to Trifocal Pose Estimation. Chiang-Heng Chien, Ahmad Abdelfattah, Benjamin B. Kimia |
| 2025 | Accelerating Sparse Linear Solvers on Intelligence Processing Units. Tim Noack, Louis Krüger, Andreas Koch |
| 2025 | Accelerating Tensor-Train Decomposition on Graph Neural Networks. Shenghao Qiu, Chunwei Xia, Zheng Wang |
| 2025 | Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) Model with OpenACC. Lucas Esclapez, Laurent Soucasse, Caspar Jungbacker, Fredrik Jansson, Stephan R. de Roode, Pedro Costa, Gijs van den Oord, Alessio Sclocco |
| 2025 | Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta Compression. Fengkui Yang, Bo Mao, Yuhan Liu, Liang Bao, Weipeng Jiang, Dongying Zhang, Chunhua Li, Ke Zhou |
| 2025 | AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage. Md. Hasanur Rashid, Dong Dai |
| 2025 | Adapt-S: Effective DNN Pruning via Unified Accuracy and Performance Tuning. Roberto L. Castro, Diego Andrade, Basilio B. Fraguela |
| 2025 | Adaptive s-Step GMRES with Randomized and Truncated Low-Synchronization Orthogonalization. Robert Ernstbrunner, Wilfried N. Gansterer |
| 2025 | Air-FedGA: A Grouping Asynchronous Federated Learning Mechanism Exploiting Over-The-Air Computation. Qianpiao Ma, Junlong Zhou, Xiangpeng Hou, Jianchun Liu, Hongli Xu, Jianeng Miao, Qingmin Jia |
| 2025 | An Adaptive Two-Stage Algorithm for Error-Bounded Scientific Data Compression. Roberto Nuca, Matteo Parsani, George Turkiyyah |
| 2025 | An Asynchronous Distributed-Memory Parallel Algorithm for $k$-Mer Counting. Souvadra Hati, Akihiro Hayashi, Richard W. Vuduc |
| 2025 | An Effective Uncorrectable Memory Error Prediction Framework by Exploiting UPH Indicators in Production Environments. Xiaobo Zheng, Lisha Qin, Shiyi Li, Wen Xia, Chentao Wu, Yunfei Gu, Qicong Lin, Jun Wan, Huifang Jiao, Rubing Huang |
| 2025 | An Efficient Adaptive Dual-Threshold Svm Based on Heterogeneous Collaboration. Xing Peng, Qinglin Wang, Chuhe Hong, Gencheng Liu, Rui Xia, Xinhai Chen, Zhigang Sun, Jie Liu |
| 2025 | Automated MPI-X Code Generation for Scalable Finite-Difference Solvers. George Bisbas, Rhodri Nelson, Mathias Louboutin, Fabio Luporini, Paul H. J. Kelly, Gerard Gorman |
| 2025 | BRP-SpMM: Block-Row Partition Based Sparse Matrix Multiplication with Tensor and CUDA Cores. Yukang Dong, Wenbin Jiang, Xinhai Shen, Haihong Guo, Zhiyuan Shao, Hai Jin |
| 2025 | Be Aware of Metadata Corruption in Parallel File System: It can be Silent and Catastrophic. Saisha Kamat, Mai Zheng, Bo Fang, Dong Dai |
| 2025 | CALock: Multi-Granularity Locking in Dynamic Hierarchies. Ayush Pandey, Julien Sopena, Marc Shapiro, Swan Dubois |
| 2025 | CORD: Parallelizing Query Processing Across Multiple Computational Storage Devices. Wahid Uz Zaman, Cyan Subhra Mishra, Saleh Alsaleh, Abutalib Aghayev, Mahmut Taylan Kandemir |
| 2025 | Cello: Co-Designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse. Raveesh Garg, Michael Pellauer, Sivasankaran Rajamanickam, Tushar Krishna |
| 2025 | Characterizing the Behavior and Impact of KV Caching on Transformer Inferences Under Concurrency. Jie Ye, Jaime Cernuda, Avinash Maurya, Xian-He Sun, Anthony Kougkas, Bogdan Nicolae |
| 2025 | CoRD: Converged RDMA Dataplane. Maksym Planeta, Jan Bierbaum, Michael Roitzsch, Hermann Härtig |
| 2025 | Compiler, Runtime, and Hardware Parameters Design Space Exploration. Lana Scravaglieri, Ani Anciaux-Sedrakian, Olivier Aumage, Thomas Guignon, Mihail Popov |
| 2025 | DeepBAT: Performance and Cost Optimization of Serverless Inference Using Transformers. Bowen Sun, Riccardo Pinciroli, Giuliano Casale, Evgenia Smirni |
| 2025 | Distributed Construction of Demand-Aware Datacenter Networks. Aleksander Figiel, Darya Melnyk, Tijana Milentijevic, Stefan Schmid |
| 2025 | Edge-Disjoint Spanning Trees on Star Products. Kelly Isham, Laura Monroe, Kartik Lakhotia, Aleyah Dawkins, Daniel Hwang, Ales Kubicek |
| 2025 | Ekko: Fully Decentralized Scheduling for Serverless Edge Computing. Xin Chen, Manoj Prabhakar Paidiparthy, Dilma Da Silva, Liting Hu |
| 2025 | Enabling Efficient Error-Controlled Lossy Compression for Unstructured Scientific Data. Xuan Wu, Sheng Di, Congrong Ren, Pu Jiao, Mingze Xia, Cheng Wang, Hanqi Guo, Xin Liang, Franck Cappello |
| 2025 | Energy-Optimal and Low-Depth Algorithmic Primitives for Spatial Dataflow Architectures. Lukas Gianinazzi, Tal Ben-Nun, Maciej Besta, Saleh Ashkboos, Yves Baumann, Piotr Luczynski, Torsten Hoefler |
| 2025 | Enhanced JPEG Decoding Using PIM Architectures with Parallel MCU Processing. Jieun Kim, Dukyun Nam |
| 2025 | Enhancing OmpSs-2 Suspendable Tasks by Combining Operating System and User-Level Threads with C++ Coroutines. Arnau Cinca, Aleix Roca, Kevin Sala, Raúl Peñacoba Veigas, David Álvarez, Vicenç Beltran |
| 2025 | FATHOM: Fast Attention Through Optimizing Memory. Elliott Binder, Arvind Sudarsanam, Ravi Sunkavalli, Tze Meng Low |
| 2025 | FLAME: Federated Learning for Attack Mitigation and Evasion. Diletta Chiaro, Pian Qi, Edoardo Prezioso, Antonella Guzzo, Francesco Piccialli |
| 2025 | Fast and Effective Lossy Compression on GPUs and CPUs with Guaranteed Error Bounds. Alex Fallin, Noushin Azami, Sheng Di, Franck Cappello, Martin Burtscher |
| 2025 | FastCHGNet: Training One Universal Interatomic Potential to 1.5 Hours with 32 GPUs. Yuanchang Zhou, Siyu Hu, Chen Wang, Lin-Wang Wang, Guangming Tan, Weile Jia |
| 2025 | Fine-Grained Global Search for Inputs Triggering Floating-Point Exceptions in Gpu Programs. Xin Yi, Hengbiao Yu, Liqian Chen, Xiaoguang Mao, Ji Wang, Chun Huang, Deheng Yang |
| 2025 | FlexRLHF: A Flexible Placement and Parallelism Framework for Efficient RLHF Training. Youshao Xiao, Zhenglei Zhou, Fagui Mao, Weichang Wu, Shangchun Zhao, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou |
| 2025 | FlowForecaster: Automatically Inferring Detailed & Interpretable Workflow Scaling Models for Forecasts. Hyungro Lee, Jesun Firoz, Nathan R. Tallent, Luanzheng Guo, Mahantesh Halappanavar |
| 2025 | For What the Bell Tolls. David E. Keyes |
| 2025 | GIFTS: Efficient GCN Inference Framework on PyTorch-CPU via Exploring the Sparsity. Ruiyang Chen, Xing Li, Xiaoyao Liang, Zhuoran Song |
| 2025 | GNNPerf: Towards Effective Performance Profiling and Analysis Across GNN Frameworks. Kejie Ma, Hailong Yang, Zizheng Zhang, Xin You, Zhibo Xuan, Qingxiao Sun, Zhongzhi Luan, Yi Liu, Depei Qian |
| 2025 | Gensor: A Graph-Based Construction Tensor Compilation Method for Deep Learning. Hangda Liu, Boyu Diao, Yu Yang, Wenxin Chen, Xiaohui Peng, Yongjun Xu |
| 2025 | Graph Input-Aware Matrix Multiplication for Pruned Graph Neural Network Acceleration. Hanan Khan, Deniz Gurevin, Omer Khan |
| 2025 | Graph Neural Network-Based Latency Prediction for Stream Processing Task. Zheng Chu, Ren Hang Zhang, Baozhu Li, Changtian Ying, Weiyun Li |
| 2025 | GuardianOMP: A Framework for Highly Productive Fault Tolerance Via OpenMP Task-Level Replication. Adrian Munera, Eduardo Quiñones, Sara Royuela |
| 2025 | HPDR: High-Performance Portable Scientific Data Reduction Framework. Jieyang Chen, Qian Gong, Yanliang Li, Xin Liang, Lipeng Wan, Qing Liu, Norbert Podhorszki, Scott Klasky |
| 2025 | HiCCL: A Hierarchical Collective Communication Library. Mert Hidayetoglu, Simon Garcia De Gonzalo, Elliott Slaughter, Pinku Surana, Wen-mei W. Hwu, William Gropp, Alex Aiken |
| 2025 | Hybrid-Granularity Parallelism Support for Fast Transaction Processing in Blockchain-Based Federated Learning. Mulin Li, Zhaolong Jian, Kaixuan Yang, Xueshuo Xie, Wajdy Othman, Tao Li |
| 2025 | IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025, Milano, Italy, June 3-7, 2025 |
| 2025 | IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs. Chris Egersdoerfer, Arnav Sareen, Jean Luca Bez, Suren Byna, Dongkuan Xu, Dong Dai |
| 2025 | IP-FL: Incentive-Driven Personalization in Federated Learning. Ahmad Faraz Khan, Xinran Wang, Qi Le, Zain ul Abdeen, Azal Ahmad Khan, Haider Ali, Ming Jin, Jie Ding, Ali Raza Butt, Ali Anwar |
| 2025 | Improving Accuracy and Efficiency of Graph Embedding Training with Fine-Grained Parameter Management. Lihan Hu, Peng Jiang |
| 2025 | Improving Parallel Scalability for Molecular Dynamics Simulations in the Exascale Era. Brian C. Dandurand, Hans Vandierendonck, Bronis R. de Supinski |
| 2025 | Improving the Efficiency of Interpolation-based Scientific Data Compressors with Adaptive Quantization Index Prediction. Pu Jiao, Sheng Di, Mingze Xia, Xuan Wu, Jinyang Liu, Xin Liang, Franck Cappello |
| 2025 | Inkstream: Instantaneous GNN Inference on Dynamic Graphs via Incremental Update. Dan Wu, Zhaoying Li, Tulika Mitra |
| 2025 | It Takes Two to Tango: Serverless Workflow Serving via Bilaterally Engaged Resource Adaptation. Jing Wu, Lin Wang, Quanfeng Deng, Chen Yu, Dong Zhang, Bingheng Yan, Fangming Liu |
| 2025 | KVACCEL: A Novel Write Accelerator for LSM-Tree-Based KV Stores with Host-SSD Collaboration. Kihwan Kim, Hyunsun Chung, Seonghoon Ahn, Junhyeok Park, Safdar Jamil, Hongsu Byun, Myungcheol Lee, Jinchun Choi, Youngjae Kim |
| 2025 | LaOvl: Lifecycle-Aware Overlay File System for Efficient Container I/O in Cloud Computing. Zhuo Yuan, Haopeng Chen, Yucheng Tao, Zihong Lin |
| 2025 | Large Scale Finite-Temperature Real-Time Time Dependent Density Functional Theory Calculation with Hybrid Functional on ARM and GPU Systems. Rongrong Liu, Zhuoqiang Guo, Qiuchen Sha, Tong Zhao, Haibo Li, Wei Hu, Lijun Liu, Guangming Tan, Weile Jia |
| 2025 | Less is More: Faster Maximum Clique Search by Work-Avoidance. Hans Vandierendonck |
| 2025 | Leveraging Compilation Statistics for Compiler Phase Ordering. Jiayu Zhao, Chunwei Xia, Zheng Wang |
| 2025 | Locality Aware Process Remapping for Distributed-Memory Graph Workloads. Md Nahid Newaz, Sayan Ghosh, Nathan R. Tallent, Guangzhi Qu |
| 2025 | Longer Attention Span: Increasing Transformer Context Length With Sparse Graph Processing Techniques. Nathaniel Tomczak, Sanmukh Kuppannagari |
| 2025 | Matcha: A Language and Compiler for Backtracking-Based Subgraph Matching. Yihua Wei, Lihan Hu, Peng Jiang |
| 2025 | MeanCache: User-Centric Semantic Caching for LLM Web Services. Waris Gill, Mohamed Elidrisi, Pallavi Kalapatapu, Ammar Ahmed, Ali Anwar, Muhammad Ali Gulzar |
| 2025 | Message from the 2025 General Co-chairs. Marco D. Santambrogio, Ananth Kalyanaraman |
| 2025 | NBLFQ: A Lock-Free MPMC Queue Optimized for Low Contention. Alexandre Denis, Charles Goedefroit |
| 2025 | NM-SpMM: Accelerating Matrix Multiplication Using N: M Sparsity with GPGPU. Cong Ma, Du Wu, Zhelang Deng, Jiang Chen, Xiaowen Huang, Jintao Meng, Wenxi Zhu, Bingqiang Wang, Amelie Chi Zhou, Peng Chen, Minwen Deng, Yanjie Wei, Shengzhong Feng, Yi Pan |
| 2025 | Next-gen Infrastructure for Scalable Generative AI: Focus on Advances in Storage, Computing and Orchestration. Robert Haas |
| 2025 | Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems. Wenyi Wang, Maxime Gonthier, Poornima Nookala, Haochen Pan, Ian T. Foster, Ioan Raicu, Kyle Chard |
| 2025 | P Yu Kuang, Li Yan, Zhuozhao Li |
| 2025 | PALLAS: A Generic Trace Format for Large HPC Trace Analysis. Catherine Guelque, Valentin Honoré, Philippe Swartvagher, Gaël Thomas, François Trahay |
| 2025 | PCEBench: A Multi-Dimensional Benchmark for Evaluating Large Language Models in Parallel Code Generation. Le Chen, Nesreen K. Ahmed, Mihai Capota, Ted Willke, Niranjan Hasabnis, Ali Jannesari |
| 2025 | PISA: An Adversarial Approach to Comparing Task Graph Scheduling Algorithms. Jared Coleman, Bhaskar Krishnamachari |
| 2025 | Pair-Then-Aggregate: Simplified and Efficient Parallel Programming Paradigm for Secure Multi-Party Computation. Xiaoyu Fan, Kun Chen, Guosai Wang, Xiaowei Zhu, Haoqing He, Xie Yong, Xiaofeng Jia, Yidong Li, Wei Xu |
| 2025 | Pandemics in Silico: Scaling Agent-Based Simulations on Realistic Social Contact Networks. Joy Kitson, Ian J. Costello, Jiangzhuo Chen, Diego Jiménez, Stefan Hoops, Henning S. Mortveit, Esteban Meneses, Jae-Seung Yeom, Madhav V. Marathe, Abhinav Bhatele |
| 2025 | Parallel Scheduling of Task Graphs with Minimal Memory Requirements. Pascal Fradet, Alain Girault, Alexandre Honorat |
| 2025 | Parallel-in-Time Kalman Smoothing Using Orthogonal Transformations. Shahaf Gargir, Sivan Toledo |
| 2025 | Performance Characterization of CXL Memory and Its Use Cases. Xi Wang, Jie Liu, Jianbo Wu, Shuangyan Yang, Jie Ren, Bhanu Shankar, Dong Li |
| 2025 | Performance Projection for Design-Space Exploration on future HPC Architectures. Clément Gavoille, Hugo Taboada, Jens Domke, Brice Goglin, Emmanuel Jeannot |
| 2025 | Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous Computing. Lorenzo Carpentieri, Antonio De Caro, Majid Salimi Beni, Kaijie Fan, Biagio Cosenza |
| 2025 | PivotScale: A Holistic Approach for Scalable Clique Counting. Amogh Lonkar, Scott Beamer |
| 2025 | PolyMorphous: An MLIR-Based Polyhedral Compiler with Loop Transformation Primitives. Jinman Zhao, Seyed Aryan Vahabpour, Xingyu Yue, Kai-Ting Amy Wang, Tarek S. Abdelrahman |
| 2025 | PredTOP: Latency Predictor Utilizing DAG Transformers for Distributed Deep Learning Training with Operator Parallelism. Dipak Acharya, Tong Shu |
| 2025 | RXT: RefleXive Address Translation for Pointer-Chasing Workloads. Rashid Aligholipour, Pavlos Aimoniotis, Stefanos Kaxiras, Yuan Yao |
| 2025 | Reducing the End-to-End Latency of DNN-Based Recommendation Systems in GPU Pools. Guangqiang Luan, Pu Pang, Quan Chen, Chen Chen, Guoyao Xu, Chi Zhang, Yanyi Zi, Yinghao Yu, Guodong Yang, Liping Zhang, Minyi Guo |
| 2025 | SEAFL: Enhancing Efficiency in Semi-Asynchronous Federated Learning Through Adaptive Aggregation and Selective Training. Md Sirajul Islam, Sanjeev Panta, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng |
| 2025 | SPRT Weicong Chen, Sarah J. Carr, Jing Zhang, Curtis Tatsuoka, Xiaoyi Lu |
| 2025 | Scalable and Portable LU Factorization with Partial Pivoting on Top of Runtime Systems. Alycia Lisito, Mathieu Faverge, Matthieu Kuhn, Florent Pruvost, Pierre Ramet |
| 2025 | Sensitivity and Impacts on Parallel Compression of Prediction of Lossy Compression Ratios for Scientific Data. Alexandra Poulos, Robert Underwood, Jon C. Calhoun, Sheng Di, Franck Cappello |
| 2025 | SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry Propagation. Zecheng Li, Shruti Shivakumar, Jiajia Li, Ramakrishnan Kannan |
| 2025 | TOSS: Tiering of Serverless Snapshots for Memory-Efficient Serverless Computing. Theodore Michailidis, Juno Kim, Linsong Guo, Steven Swanson, Jishen Zhao |
| 2025 | Taijigraph: an Out-Of-Core Graph Processing System Enhanced with Computational Storage. Xinmiao Zhang, Cheng Liu, Shengwen Liang, Hayden Kwok-Hay So, Ying Wang, Lei Zhang, Huawei Li, Xiaowei Li |
| 2025 | Tera-Scale Multilevel Graph Partitioning. Daniel Salwasser, Daniel Seemaier, Lars Gottesbüren, Peter Sanders |
| 2025 | The Artificial Scientist: in-Transit Machine Learning of Plasma Simulations. Jeffrey Kelling, Vicente Bolea, Michael Bussmann, Ankush Checkervarty, Alexander Debus, Jan Ebert, Greg Eisenhauer, Vineeth Gutta, Stefan Kesselheim, Scott Klasky, Vedhas Pandit, Richard Pausch, Norbert Podhorszki, Franz Pöschel, David Rogers, Jeyhun Rustamov, Steve Schmerler, Ulrich Schramm, Klaus Steiniger, René Widera, Anna Willmann, Sunita Chandrasekaran |
| 2025 | The Power of Parallelism: Accelerating Discovery in the Biosciences. Srinivas Aluru |
| 2025 | The Tensor-Core Beamformer: A High-Speed Signal-Processing Library for Multidisciplinary Use. Leon C. Oostrum, Bram Veenboer, Ronald Rook, Michael Brown, Pieter Kruizinga, John W. Romein |
| 2025 | Tide: A Distributed Runtime Management Framework for Things-Edge-Cloud Computing Continuum. Xiaohui Peng, Wenkai Yan, Yifan Wang, Shoujian Zheng, Zhiwei Xu |
| 2025 | To Compress or Not to Compress: Energy Trade-Offs and Benefits of Lossy Compressed I/O. Grant Wilkins, Sheng Di, Jon C. Calhoun, Robert Underwood, Franck Cappello |
| 2025 | Unified Designs of Multi-Rail-Aware MPI Allreduce and Alltoall Operations Across Diverse GPU and Interconnect Systems. Chen-Chun Chen, Jinghan Yao, Lang Xu, Hari Subramoni, Dhabaleswar K. Panda |
| 2025 | VerifyIO: Verifying Adherence to Parallel I/O Consistency Semantics. Chen Wang, Zhaobin Zhu, Kathryn M. Mohror, Sarah Neuwirth, Marc Snir |