IPDPS A

111 papers

YearTitle / Authors
2025A Bidirectional GPU Algorithm for Computing Maximum Matchings in Bipartite Graphs.
Anju Mongandampulath Akathoott, Martin Burtscher
2025A Deep Look into the Temporal I/O Behavior of HPC Applications.
Francieli Boito, Luan Teylo, Mihail Popov, Théo Jolivel, François Tessier, Jakob Lüttgau, Julien Monniot, Ahmad Tarraf, André Ramos Carneiro, Carla Osthoff
2025A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems.
Minseok Ryu, Geunyeong Byeon, Kibaek Kim
2025A Memory-Efficient and Computation-Balanced Lossy Compressor on Wafer-Scale Engine.
Shihui Song, Robert Underwood, Sheng Di, Yafan Huang, Peng Jiang, Franck Cappello
2025A New Spin on the Fast Multipole Method for GPUS: Rethinking the Far-Field Operators.
Arijus Lengvenis, Holger Dachsel, Laura Morgenstern, Ivo Kabadshow
2025A Work-Optimal Parallel Algorithm for Aligning Sequences to Genome Graphs.
Aranya Banerjee, Daniel Gibney, Helen Xu, Srinivas Aluru
2025AI and HPC Applications on Leadership Computing Platforms: Performance and Scalability Studies.
JaeHyuk Kwack, Colleen Bertoni, Umesh Unnikrishnan, Riccardo Balin, Khalid Hossain, Yasaman Ghadar, Timothy J. Williams, Abhishek Bagusetty, Mathialakan Thavappiragasam, Väinö Hatanpää, Archit Vasan, John R. Tramm, Scott Parker
2025ALGAS: A Low-Latency GPU-Based Approximate Nearest Neighbor Search System.
Yuanhui Chen, Lixiao Cui, Zebin Yao, Hao Zhou, Gang Wang, Xiaoguang Liu
2025AQUA: Hardware-Agnostic Qubit Allocation for Quantum Multi-Programming.
XinYu Piao, JooYong Shim, Joongheon Kim, Jong-Kook Kim
2025Accelerate Coastal Ocean Circulation Model with AI Surrogate.
Zelin Xu, Jie Ren, Yupu Zhang, Jose Maria Gonzalez Ondina, Maitane Olabarrieta, Tingsong Xiao, Wenchong He, Zibo Liu, Shigang Chen, Kaleb E. Smith, Zhe Jiang
2025Accelerating Graph Neural Networks Using a Novel Computation-Friendly Matrix Compression Format.
João Nuno Ferreira Alves, Samir Moustafa, Siegfried Benkner, Alexandre P. Francisco, Wilfried N. Gansterer, Luís M. S. Russo
2025Accelerating Homotopy Continuation with GPUs: Application to Trifocal Pose Estimation.
Chiang-Heng Chien, Ahmad Abdelfattah, Benjamin B. Kimia
2025Accelerating Sparse Linear Solvers on Intelligence Processing Units.
Tim Noack, Louis Krüger, Andreas Koch
2025Accelerating Tensor-Train Decomposition on Graph Neural Networks.
Shenghao Qiu, Chunwei Xia, Zheng Wang
2025Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) Model with OpenACC.
Lucas Esclapez, Laurent Soucasse, Caspar Jungbacker, Fredrik Jansson, Stephan R. de Roode, Pedro Costa, Gijs van den Oord, Alessio Sclocco
2025Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta Compression.
Fengkui Yang, Bo Mao, Yuhan Liu, Liang Bao, Weipeng Jiang, Dongying Zhang, Chunhua Li, Ke Zhou
2025AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage.
Md. Hasanur Rashid, Dong Dai
2025Adapt-S: Effective DNN Pruning via Unified Accuracy and Performance Tuning.
Roberto L. Castro, Diego Andrade, Basilio B. Fraguela
2025Adaptive s-Step GMRES with Randomized and Truncated Low-Synchronization Orthogonalization.
Robert Ernstbrunner, Wilfried N. Gansterer
2025Air-FedGA: A Grouping Asynchronous Federated Learning Mechanism Exploiting Over-The-Air Computation.
Qianpiao Ma, Junlong Zhou, Xiangpeng Hou, Jianchun Liu, Hongli Xu, Jianeng Miao, Qingmin Jia
2025An Adaptive Two-Stage Algorithm for Error-Bounded Scientific Data Compression.
Roberto Nuca, Matteo Parsani, George Turkiyyah
2025An Asynchronous Distributed-Memory Parallel Algorithm for $k$-Mer Counting.
Souvadra Hati, Akihiro Hayashi, Richard W. Vuduc
2025An Effective Uncorrectable Memory Error Prediction Framework by Exploiting UPH Indicators in Production Environments.
Xiaobo Zheng, Lisha Qin, Shiyi Li, Wen Xia, Chentao Wu, Yunfei Gu, Qicong Lin, Jun Wan, Huifang Jiao, Rubing Huang
2025An Efficient Adaptive Dual-Threshold Svm Based on Heterogeneous Collaboration.
Xing Peng, Qinglin Wang, Chuhe Hong, Gencheng Liu, Rui Xia, Xinhai Chen, Zhigang Sun, Jie Liu
2025Automated MPI-X Code Generation for Scalable Finite-Difference Solvers.
George Bisbas, Rhodri Nelson, Mathias Louboutin, Fabio Luporini, Paul H. J. Kelly, Gerard Gorman
2025BRP-SpMM: Block-Row Partition Based Sparse Matrix Multiplication with Tensor and CUDA Cores.
Yukang Dong, Wenbin Jiang, Xinhai Shen, Haihong Guo, Zhiyuan Shao, Hai Jin
2025Be Aware of Metadata Corruption in Parallel File System: It can be Silent and Catastrophic.
Saisha Kamat, Mai Zheng, Bo Fang, Dong Dai
2025CALock: Multi-Granularity Locking in Dynamic Hierarchies.
Ayush Pandey, Julien Sopena, Marc Shapiro, Swan Dubois
2025CORD: Parallelizing Query Processing Across Multiple Computational Storage Devices.
Wahid Uz Zaman, Cyan Subhra Mishra, Saleh Alsaleh, Abutalib Aghayev, Mahmut Taylan Kandemir
2025Cello: Co-Designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse.
Raveesh Garg, Michael Pellauer, Sivasankaran Rajamanickam, Tushar Krishna
2025Characterizing the Behavior and Impact of KV Caching on Transformer Inferences Under Concurrency.
Jie Ye, Jaime Cernuda, Avinash Maurya, Xian-He Sun, Anthony Kougkas, Bogdan Nicolae
2025CoRD: Converged RDMA Dataplane.
Maksym Planeta, Jan Bierbaum, Michael Roitzsch, Hermann Härtig
2025Compiler, Runtime, and Hardware Parameters Design Space Exploration.
Lana Scravaglieri, Ani Anciaux-Sedrakian, Olivier Aumage, Thomas Guignon, Mihail Popov
2025DeepBAT: Performance and Cost Optimization of Serverless Inference Using Transformers.
Bowen Sun, Riccardo Pinciroli, Giuliano Casale, Evgenia Smirni
2025Distributed Construction of Demand-Aware Datacenter Networks.
Aleksander Figiel, Darya Melnyk, Tijana Milentijevic, Stefan Schmid
2025Edge-Disjoint Spanning Trees on Star Products.
Kelly Isham, Laura Monroe, Kartik Lakhotia, Aleyah Dawkins, Daniel Hwang, Ales Kubicek
2025Ekko: Fully Decentralized Scheduling for Serverless Edge Computing.
Xin Chen, Manoj Prabhakar Paidiparthy, Dilma Da Silva, Liting Hu
2025Enabling Efficient Error-Controlled Lossy Compression for Unstructured Scientific Data.
Xuan Wu, Sheng Di, Congrong Ren, Pu Jiao, Mingze Xia, Cheng Wang, Hanqi Guo, Xin Liang, Franck Cappello
2025Energy-Optimal and Low-Depth Algorithmic Primitives for Spatial Dataflow Architectures.
Lukas Gianinazzi, Tal Ben-Nun, Maciej Besta, Saleh Ashkboos, Yves Baumann, Piotr Luczynski, Torsten Hoefler
2025Enhanced JPEG Decoding Using PIM Architectures with Parallel MCU Processing.
Jieun Kim, Dukyun Nam
2025Enhancing OmpSs-2 Suspendable Tasks by Combining Operating System and User-Level Threads with C++ Coroutines.
Arnau Cinca, Aleix Roca, Kevin Sala, Raúl Peñacoba Veigas, David Álvarez, Vicenç Beltran
2025FATHOM: Fast Attention Through Optimizing Memory.
Elliott Binder, Arvind Sudarsanam, Ravi Sunkavalli, Tze Meng Low
2025FLAME: Federated Learning for Attack Mitigation and Evasion.
Diletta Chiaro, Pian Qi, Edoardo Prezioso, Antonella Guzzo, Francesco Piccialli
2025Fast and Effective Lossy Compression on GPUs and CPUs with Guaranteed Error Bounds.
Alex Fallin, Noushin Azami, Sheng Di, Franck Cappello, Martin Burtscher
2025FastCHGNet: Training One Universal Interatomic Potential to 1.5 Hours with 32 GPUs.
Yuanchang Zhou, Siyu Hu, Chen Wang, Lin-Wang Wang, Guangming Tan, Weile Jia
2025Fine-Grained Global Search for Inputs Triggering Floating-Point Exceptions in Gpu Programs.
Xin Yi, Hengbiao Yu, Liqian Chen, Xiaoguang Mao, Ji Wang, Chun Huang, Deheng Yang
2025FlexRLHF: A Flexible Placement and Parallelism Framework for Efficient RLHF Training.
Youshao Xiao, Zhenglei Zhou, Fagui Mao, Weichang Wu, Shangchun Zhao, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou
2025FlowForecaster: Automatically Inferring Detailed & Interpretable Workflow Scaling Models for Forecasts.
Hyungro Lee, Jesun Firoz, Nathan R. Tallent, Luanzheng Guo, Mahantesh Halappanavar
2025For What the Bell Tolls.
David E. Keyes
2025GIFTS: Efficient GCN Inference Framework on PyTorch-CPU via Exploring the Sparsity.
Ruiyang Chen, Xing Li, Xiaoyao Liang, Zhuoran Song
2025GNNPerf: Towards Effective Performance Profiling and Analysis Across GNN Frameworks.
Kejie Ma, Hailong Yang, Zizheng Zhang, Xin You, Zhibo Xuan, Qingxiao Sun, Zhongzhi Luan, Yi Liu, Depei Qian
2025Gensor: A Graph-Based Construction Tensor Compilation Method for Deep Learning.
Hangda Liu, Boyu Diao, Yu Yang, Wenxin Chen, Xiaohui Peng, Yongjun Xu
2025Graph Input-Aware Matrix Multiplication for Pruned Graph Neural Network Acceleration.
Hanan Khan, Deniz Gurevin, Omer Khan
2025Graph Neural Network-Based Latency Prediction for Stream Processing Task.
Zheng Chu, Ren Hang Zhang, Baozhu Li, Changtian Ying, Weiyun Li
2025GuardianOMP: A Framework for Highly Productive Fault Tolerance Via OpenMP Task-Level Replication.
Adrian Munera, Eduardo Quiñones, Sara Royuela
2025HPDR: High-Performance Portable Scientific Data Reduction Framework.
Jieyang Chen, Qian Gong, Yanliang Li, Xin Liang, Lipeng Wan, Qing Liu, Norbert Podhorszki, Scott Klasky
2025HiCCL: A Hierarchical Collective Communication Library.
Mert Hidayetoglu, Simon Garcia De Gonzalo, Elliott Slaughter, Pinku Surana, Wen-mei W. Hwu, William Gropp, Alex Aiken
2025Hybrid-Granularity Parallelism Support for Fast Transaction Processing in Blockchain-Based Federated Learning.
Mulin Li, Zhaolong Jian, Kaixuan Yang, Xueshuo Xie, Wajdy Othman, Tao Li
2025IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025, Milano, Italy, June 3-7, 2025
2025IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs.
Chris Egersdoerfer, Arnav Sareen, Jean Luca Bez, Suren Byna, Dongkuan Xu, Dong Dai
2025IP-FL: Incentive-Driven Personalization in Federated Learning.
Ahmad Faraz Khan, Xinran Wang, Qi Le, Zain ul Abdeen, Azal Ahmad Khan, Haider Ali, Ming Jin, Jie Ding, Ali Raza Butt, Ali Anwar
2025Improving Accuracy and Efficiency of Graph Embedding Training with Fine-Grained Parameter Management.
Lihan Hu, Peng Jiang
2025Improving Parallel Scalability for Molecular Dynamics Simulations in the Exascale Era.
Brian C. Dandurand, Hans Vandierendonck, Bronis R. de Supinski
2025Improving the Efficiency of Interpolation-based Scientific Data Compressors with Adaptive Quantization Index Prediction.
Pu Jiao, Sheng Di, Mingze Xia, Xuan Wu, Jinyang Liu, Xin Liang, Franck Cappello
2025Inkstream: Instantaneous GNN Inference on Dynamic Graphs via Incremental Update.
Dan Wu, Zhaoying Li, Tulika Mitra
2025It Takes Two to Tango: Serverless Workflow Serving via Bilaterally Engaged Resource Adaptation.
Jing Wu, Lin Wang, Quanfeng Deng, Chen Yu, Dong Zhang, Bingheng Yan, Fangming Liu
2025KVACCEL: A Novel Write Accelerator for LSM-Tree-Based KV Stores with Host-SSD Collaboration.
Kihwan Kim, Hyunsun Chung, Seonghoon Ahn, Junhyeok Park, Safdar Jamil, Hongsu Byun, Myungcheol Lee, Jinchun Choi, Youngjae Kim
2025LaOvl: Lifecycle-Aware Overlay File System for Efficient Container I/O in Cloud Computing.
Zhuo Yuan, Haopeng Chen, Yucheng Tao, Zihong Lin
2025Large Scale Finite-Temperature Real-Time Time Dependent Density Functional Theory Calculation with Hybrid Functional on ARM and GPU Systems.
Rongrong Liu, Zhuoqiang Guo, Qiuchen Sha, Tong Zhao, Haibo Li, Wei Hu, Lijun Liu, Guangming Tan, Weile Jia
2025Less is More: Faster Maximum Clique Search by Work-Avoidance.
Hans Vandierendonck
2025Leveraging Compilation Statistics for Compiler Phase Ordering.
Jiayu Zhao, Chunwei Xia, Zheng Wang
2025Locality Aware Process Remapping for Distributed-Memory Graph Workloads.
Md Nahid Newaz, Sayan Ghosh, Nathan R. Tallent, Guangzhi Qu
2025Longer Attention Span: Increasing Transformer Context Length With Sparse Graph Processing Techniques.
Nathaniel Tomczak, Sanmukh Kuppannagari
2025Matcha: A Language and Compiler for Backtracking-Based Subgraph Matching.
Yihua Wei, Lihan Hu, Peng Jiang
2025MeanCache: User-Centric Semantic Caching for LLM Web Services.
Waris Gill, Mohamed Elidrisi, Pallavi Kalapatapu, Ammar Ahmed, Ali Anwar, Muhammad Ali Gulzar
2025Message from the 2025 General Co-chairs.
Marco D. Santambrogio, Ananth Kalyanaraman
2025NBLFQ: A Lock-Free MPMC Queue Optimized for Low Contention.
Alexandre Denis, Charles Goedefroit
2025NM-SpMM: Accelerating Matrix Multiplication Using N: M Sparsity with GPGPU.
Cong Ma, Du Wu, Zhelang Deng, Jiang Chen, Xiaowen Huang, Jintao Meng, Wenxi Zhu, Bingqiang Wang, Amelie Chi Zhou, Peng Chen, Minwen Deng, Yanjie Wei, Shengzhong Feng, Yi Pan
2025Next-gen Infrastructure for Scalable Generative AI: Focus on Advances in Storage, Computing and Orchestration.
Robert Haas
2025Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems.
Wenyi Wang, Maxime Gonthier, Poornima Nookala, Haochen Pan, Ian T. Foster, Ioan Raicu, Kyle Chard
2025P
Yu Kuang, Li Yan, Zhuozhao Li
2025PALLAS: A Generic Trace Format for Large HPC Trace Analysis.
Catherine Guelque, Valentin Honoré, Philippe Swartvagher, Gaël Thomas, François Trahay
2025PCEBench: A Multi-Dimensional Benchmark for Evaluating Large Language Models in Parallel Code Generation.
Le Chen, Nesreen K. Ahmed, Mihai Capota, Ted Willke, Niranjan Hasabnis, Ali Jannesari
2025PISA: An Adversarial Approach to Comparing Task Graph Scheduling Algorithms.
Jared Coleman, Bhaskar Krishnamachari
2025Pair-Then-Aggregate: Simplified and Efficient Parallel Programming Paradigm for Secure Multi-Party Computation.
Xiaoyu Fan, Kun Chen, Guosai Wang, Xiaowei Zhu, Haoqing He, Xie Yong, Xiaofeng Jia, Yidong Li, Wei Xu
2025Pandemics in Silico: Scaling Agent-Based Simulations on Realistic Social Contact Networks.
Joy Kitson, Ian J. Costello, Jiangzhuo Chen, Diego Jiménez, Stefan Hoops, Henning S. Mortveit, Esteban Meneses, Jae-Seung Yeom, Madhav V. Marathe, Abhinav Bhatele
2025Parallel Scheduling of Task Graphs with Minimal Memory Requirements.
Pascal Fradet, Alain Girault, Alexandre Honorat
2025Parallel-in-Time Kalman Smoothing Using Orthogonal Transformations.
Shahaf Gargir, Sivan Toledo
2025Performance Characterization of CXL Memory and Its Use Cases.
Xi Wang, Jie Liu, Jianbo Wu, Shuangyan Yang, Jie Ren, Bhanu Shankar, Dong Li
2025Performance Projection for Design-Space Exploration on future HPC Architectures.
Clément Gavoille, Hugo Taboada, Jens Domke, Brice Goglin, Emmanuel Jeannot
2025Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous Computing.
Lorenzo Carpentieri, Antonio De Caro, Majid Salimi Beni, Kaijie Fan, Biagio Cosenza
2025PivotScale: A Holistic Approach for Scalable Clique Counting.
Amogh Lonkar, Scott Beamer
2025PolyMorphous: An MLIR-Based Polyhedral Compiler with Loop Transformation Primitives.
Jinman Zhao, Seyed Aryan Vahabpour, Xingyu Yue, Kai-Ting Amy Wang, Tarek S. Abdelrahman
2025PredTOP: Latency Predictor Utilizing DAG Transformers for Distributed Deep Learning Training with Operator Parallelism.
Dipak Acharya, Tong Shu
2025RXT: RefleXive Address Translation for Pointer-Chasing Workloads.
Rashid Aligholipour, Pavlos Aimoniotis, Stefanos Kaxiras, Yuan Yao
2025Reducing the End-to-End Latency of DNN-Based Recommendation Systems in GPU Pools.
Guangqiang Luan, Pu Pang, Quan Chen, Chen Chen, Guoyao Xu, Chi Zhang, Yanyi Zi, Yinghao Yu, Guodong Yang, Liping Zhang, Minyi Guo
2025SEAFL: Enhancing Efficiency in Semi-Asynchronous Federated Learning Through Adaptive Aggregation and Selective Training.
Md Sirajul Islam, Sanjeev Panta, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng
2025SPRT
Weicong Chen, Sarah J. Carr, Jing Zhang, Curtis Tatsuoka, Xiaoyi Lu
2025Scalable and Portable LU Factorization with Partial Pivoting on Top of Runtime Systems.
Alycia Lisito, Mathieu Faverge, Matthieu Kuhn, Florent Pruvost, Pierre Ramet
2025Sensitivity and Impacts on Parallel Compression of Prediction of Lossy Compression Ratios for Scientific Data.
Alexandra Poulos, Robert Underwood, Jon C. Calhoun, Sheng Di, Franck Cappello
2025SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry Propagation.
Zecheng Li, Shruti Shivakumar, Jiajia Li, Ramakrishnan Kannan
2025TOSS: Tiering of Serverless Snapshots for Memory-Efficient Serverless Computing.
Theodore Michailidis, Juno Kim, Linsong Guo, Steven Swanson, Jishen Zhao
2025Taijigraph: an Out-Of-Core Graph Processing System Enhanced with Computational Storage.
Xinmiao Zhang, Cheng Liu, Shengwen Liang, Hayden Kwok-Hay So, Ying Wang, Lei Zhang, Huawei Li, Xiaowei Li
2025Tera-Scale Multilevel Graph Partitioning.
Daniel Salwasser, Daniel Seemaier, Lars Gottesbüren, Peter Sanders
2025The Artificial Scientist: in-Transit Machine Learning of Plasma Simulations.
Jeffrey Kelling, Vicente Bolea, Michael Bussmann, Ankush Checkervarty, Alexander Debus, Jan Ebert, Greg Eisenhauer, Vineeth Gutta, Stefan Kesselheim, Scott Klasky, Vedhas Pandit, Richard Pausch, Norbert Podhorszki, Franz Pöschel, David Rogers, Jeyhun Rustamov, Steve Schmerler, Ulrich Schramm, Klaus Steiniger, René Widera, Anna Willmann, Sunita Chandrasekaran
2025The Power of Parallelism: Accelerating Discovery in the Biosciences.
Srinivas Aluru
2025The Tensor-Core Beamformer: A High-Speed Signal-Processing Library for Multidisciplinary Use.
Leon C. Oostrum, Bram Veenboer, Ronald Rook, Michael Brown, Pieter Kruizinga, John W. Romein
2025Tide: A Distributed Runtime Management Framework for Things-Edge-Cloud Computing Continuum.
Xiaohui Peng, Wenkai Yan, Yifan Wang, Shoujian Zheng, Zhiwei Xu
2025To Compress or Not to Compress: Energy Trade-Offs and Benefits of Lossy Compressed I/O.
Grant Wilkins, Sheng Di, Jon C. Calhoun, Robert Underwood, Franck Cappello
2025Unified Designs of Multi-Rail-Aware MPI Allreduce and Alltoall Operations Across Diverse GPU and Interconnect Systems.
Chen-Chun Chen, Jinghan Yao, Lang Xu, Hari Subramoni, Dhabaleswar K. Panda
2025VerifyIO: Verifying Adherence to Parallel I/O Consistency Semantics.
Chen Wang, Zhaobin Zhu, Kathryn M. Mohror, Sarah Neuwirth, Marc Snir