ICPP B

79 papers

YearTitle / Authors
2025A Fast Sparse Triangular Solve for Structured-grid Problems on Heterogeneous Processors.
Zhengding Hu, Yi Zong, Jingwei Sun, Wei Xue, Guangzhong Sun
2025A High-Accuracy Sketch for Measuring Low-Entropy Flows in Distributed AI Training.
Jin Wang, Chenye Zhu, Jinbin Hu
2025ADAPT: Dynamic Grouping and Cross-Group Aggregation for GC-Efficient Log-Structured Storage in SSD Arrays.
Ruisong Zhou, Peng Wang, Chunhua Li, Ke Zhou, Hui Li
2025AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs.
Sasindu Wijeratne, Rajgopal Kannan, Viktor K. Prasanna
2025Accelerating Erasure Coding on Persistent Memory via Adaptive Prefetcher Scheduling.
Guanglei Xu, Hai Zhou, Yuchong Hu, Dan Feng, Renzhi Xiao
2025Accelerating Multi-Output GBDTs with GPUs.
Hanfeng Liu, Xuemei Peng, Zeyi Wen
2025Accelerating an Electromagnetic Simulation via Memory-Constrained Task-Based Load Balancing.
Jonathan Lifflander, Nicole Slattengren, Philippe P. Pébay, Pierre L. Pebay, Caleb Schilly, Robert A. Pfeiffer, Joseph D. Kotulski
2025Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning.
Waylon Luo, Jiapeng Zhao, Tong Zhan, Qiang Guan
2025Amber: Towards Fast and Space-Efficient Incremental Checkpointing in Large Language Model Training.
Zhiqiang Wang, Wenzhe Zhu, Zaigui Zhang, Chaomei Yan, Fan Guo, Yongkun Li, Yinlong Xu
2025Architecture-Aware Models of AI Engines for High-Performance Matrix Matrix Multiplication.
Elliott D. Binder, Jeffrey Low, Tze Meng Low
2025Auto-Stencil: Performance-Driven Stencil Optimization with Hardware Feedback for LLMs.
Quan Deng, Lin Gan, Hongkun Yu, Wenlai Zhao, Guangwen Yang
2025Automated FPGA Accelerator Generation Framework for Transformers with Dataflow Optimization.
Wenqi Lou, Yunji Qin, Zihao Wang, Chao Wang, Lei Gong, Xuehai Zhou
2025BMapper: A Scalable and Efficient Framework for Brain Simulations Acceleration on Supercomputers.
Yubing Bao, Zhihui Lu, Qiang Duan, Xin Du, Zhongyu Chen, Yicong Zhao, Xiaoyi Li, Yandan Tan, Shuhan Yang, Ziyi Wang, Yang Chen, Yang Xu
2025Bridging Cache-Friendliness and Concurrency: A Locality-Optimized In-Memory B-Skiplist.
Yicong Luo, Senhe Hao, Brian Wheatman, Prashant Pandey, Helen Xu
2025COF: Cycle and transmission co-mapping framework for CNN mapping in PIM architecture.
Xianfa Zhou, Tun Li, Yuhuan Xia, Ruiyu Zhang
2025Carbon-Aware Workflow Scheduling with Fixed Mapping and Deadline Constraint.
Dominik Schweisgut, Anne Benoit, Yves Robert, Henning Meyerhenke
2025CompreGel: Efficient Distributed Graph Propagation via Error-Bounded Lossy Message Compression.
Tianhao Wu, Da Yan, Qihao Cheng, Lyuheng Yuan, Sheng Di, Jiao Han, Zhongyi Huang, Ji Cheng
2025CoreTuner: Predicting and Scheduling Framework for Optimizing the Joint Allocation of CPU and GPU in Training Cluster.
Hao Dong, Yuehao Xu, Xiaohui Wang, Xinhua Ji, Zhijun Ding
2025Cross-Architecture Performance Analysis Using the RAJA Performance Suite.
Dewi Yokelson, Stephanie Brink, Jason Burmark, Michael McKinsey, Befikir Bogale, Ian Lumsden, Michela Taufer, Tom Scogland, Olga Pearce
2025Cycle-Aware Parallel Optimization for Mitigating ZZ Crosstalk on Quantum Hardware.
Jiayi Zhong, Yuxin Deng
2025Deadline-Aware Scheduling of Mixed-Criticality Tasks.
Maxime Gonthier, Kyle Chard, Ian T. Foster, Loris Marchal, Frédéric Vivien
2025Decision Shuffle: Efficient Pre-scheduling System for Push-based Shuffle in DAG Computing Frameworks.
Shihao Zhang, Chi Zhang, Chentao Wu, Jie Li, Minyi Guo, Hui Li, Liqiang Zhang
2025Design and Optimization of GPU-Aware MPI Allreduce Using Direct Sendrecv Communication.
Chen-Chun Chen, Jinghan Yao, Hari Subramoni, Dhabaleswar K. Panda
2025Design of Interposer Interconnection Network Based on High-Radix Interposer Routers.
Xue Xiao, Yi Dai, Yanqiang Sun, Jianmin Zhang, Tiejun Li
2025ESC: Effective Submanifold Convolution using Tensor Cores.
Xuezhu Wang, Hailong Yang, Xin You, Yufan Xu, Xiaoyan Liu, Siqi Wang, Kaige Zhang, Mingzhen Li, Zhongzhi Luan, Yi Liu, Depei Qian
2025Efficient Construction of Large Search Spaces for Auto-Tuning.
Floris-Jan Willemsen, Rob V. van Nieuwpoort, Ben van Werkhoven
2025Efficient Cross-Datacenter Congestion Control with Fast Control Loops.
Baosen Zhao, Jianan Sun, Xu Zhou, Wanghong Yang, Wenji Du, Fukang Chen, Yongmao Ren, Stefan Schmid
2025Efficient Parallel Algorithms for Dynamic Percolation Centrality.
Prajjwal Nijhara, Lokesh Venkatachalam, Agam Harpreet Singh, Athreya Chandramouli, Sayantan Jana, Kishore Kothapalli, Dip Sankar Banerjee
2025FLEX: Leveraging FPGA-CPU Synergy for Mixed-Cell-Height Legalization Acceleration.
Xingyu Liu, Jiawei Liang, Linfeng Du, Yipu Zhang, Chaofang Ma, Hanwei Fan, Jiang Xu, Wei Zhang
2025Fast Exact Diameter Computation of Sparse Graphs.
Cameron Bradley, Anju Mongandampulath Akathoott, Martin Burtscher
2025Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores.
Brian Curless, Michael Gowanlock
2025FedWCM: Unleashing the Potential of Momentum-based Federated Learning in Long-Tailed Scenarios.
Tianle Li, Yongzhi Huang, Linshan Jiang, Qipeng Xie, Chang Liu, Wenfeng Du, Lu Wang, Kaishun Wu
2025HHOTuner: Efficient Performance Tuning with Harris Hawks Optimization.
Akash Dutta, Ali Jannesari
2025HMGraph: Boosting GNN Training on Hierarchical Memory via Coordinated Cache.
Lizhi Zhang, Menghan Jia, Zhiquan Lai, Qiao Li, Yiming Zhang, Dongsheng Li
2025HeatList: The Case for Retrofitting In-memory Range Index with Hotspot Awareness.
Junru Shen, Miao Cai, Kangyue Gao, Baoliu Ye, Guo Cheng
2025Heterogeneity-aware Federated Edge Learning via UAV Sampling and D2D Communications.
Yanfeng Lu, Tao Wu, Chao Chang, Hongjun Wang, Mingxing Ke, Jian Wang
2025Heterogeneity-aware Task Scheduling based on Personalized Federated Reinforcement Learning.
Xin Yong, Li Yan, Zhuozhao Li
2025IRIS-MASH: Efficient Multi-device Asynchronous Multi-Stream Heterogeneous Computing.
Narasinga Rao Miniskar, Aaron R. Young, Mohammad Alaul Haque Monil, Kazi Asifuzzaman, Beau Johnston, Keita Teranishi, Jeffrey S. Vetter
2025It Takes Two: Accelerating Accurate Federated Learning through Pipelined Intra-Batch Data Sampling and Training.
Chenghao Nu, Zhe Zhang, Ye Li, Yanchao Zhao
2025Joint Prediction and Matching for Computing Resource Exchange Platforms.
Da Huo, Zhenzhe Zheng, Xiaoyao Huang, Hao Chen, Jianfeng Hu, Zhiyong Yan, Fan Wu, Jie Wu
2025Joint Task Scheduling and Resource Allocation in Cloud-Edge Collaborative Computing Systems.
Boyu Du, Jingya Zhou, Jin Wang, Jiangwei Wang, Zhijun Li
2025LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling.
Zhongchun Zhou, Chengtao Lai, Wei Zhang
2025Leave No One Behind: Fair and Efficient Tiered Memory Management for Multi-Applications.
Wenda Tang, Yiduo Wang, Yanwen Wang, Jie Wu
2025Lias: Leveraging Performance Counters for Interference Quantification and Mitigation in Multi-processor Systems.
Yangfan Qiao, Zhuozhao Li
2025MixLoRA: An Efficient Multi-Tenant Framework for Concurrently Serving Diverse LoRA Models in Large Language Models.
Ronghuai Chen, Ce Yu, Hao Fu, Xiaoteng Hu, Bin Yang
2025Multiprocessor Scheduling with Memory Constraints: Fundamental Properties and Finding Optimal Solutions.
Pál András Papp, Toni Böhnlein, Albert-Jan Nicholas Yzelman
2025OVERT: Orchestrating Vector-Scalar Execution for Efficient SpMV on Modern CPUs.
Kelun Lei, Hailong Yang, Kaige Zhang, Shaokang Du, Marc Casas, Yufan Xu, Zhongzhi Luan, Yi Liu, Depei Qian
2025One GPU, Many Ranks: Enabling Performance and Energy-Efficient In-Transit Visualization via Resource Sharing.
Matheus Costa, Philippe O. A. Navaux, Silvio Rizzi, Arthur Francisco Lorenzon
2025Optimizing Direct Convolutions on High-Performance Multi-Core DSPs.
Pengyu Wang, Xiaotian Chen, Jianbin Fang, Peng Zhang, Yonggang Che, Chun Huang, Jie Ren
2025Optimizing Incomplete Cholesky Factorization on MIMD Many-core Architecture.
Yongzhen Shi, Qinglin Wang, Jie Liu, Lian Wang, Zhiyan Liu, Bingwei Wang, Feiming Liu, Xiangdong Pei
2025Optimizing NumPy with SVE Acceleration on ARM Architectures.
Kuldeep Pal, Aniket P. Garade, Deepika H. V, Haribabu P, S. A. Kumar, S. D. Sudarsan
2025Origami: Efficient ML-Driven Metadata Load Balancing for Distributed File Systems.
Yiduo Wang, Wenda Tang, Linghang Meng, Liang Li, Jie Wu
2025P3P-Fed: Peer-to-Peer Personalized Federated Learning with DHT-based Local Clustering.
Sooho Jang, Ahyeon Lim, Yuchan Lee, Sookwang Lee, Jaehwan Lee
2025PISCES: Push-Pull Hybrid Optimization for Graph Pattern Matching.
Changjie Xu, Ke Meng, Zhiheng Lin, Guangming Tan
2025PTWalker: Cache-Efficient Random Walks via Alternating Dual-Subgraph Walker Updating.
Shuai Lin, Rui Wang, Zaigui Zhang, Long Deng, Wenzhe Zhu, Yongkun Li, Yinlong Xu
2025ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks.
Joshua Hoke Davis, Daniel Nichols, Ishan Khillan, Abhinav Bhatele
2025ParaCOSM: A Parallel Framework for Continuous Subgraph Matching.
Haibin Lai, Sicheng Zhou, Site Fan, Zhuozhao Li
2025Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision.
Evelyne Ringoot, Rabab Alomairy, Valentin Churavy, Alan Edelman
2025Pisces: Towards Adaptive and Fair Congestion Control via Multi-Agent Meta-Reinforcement Learning.
He Bai, Hui Li, Jianming Que, Minglong Zhang, Zhiqiang Hu, Ximing Xu, Bing Lin, Runhuai Huang, Junyang Qiu, Shaowen Deng
2025Power Capping of GPU Servers for Machine Learning Inference Optimization.
Yuan Ma, Srinivasan Subramaniyan, Xiaorui Wang
2025Proceedings of the 54th International Conference on Parallel Processing, ICPP 2025, San Diego, CA, USA, September 8-11, 2025
2025Q-GEAR: Improving quantum simulation framework.
Ziqing Guo, Jan Balewski, Ziwen Pan
2025Revisiting Multi-threaded Compaction in LSM-trees: Enabling Compaction Pipelining.
Hongsu Byun, Honghyeon Yoo, Sungyong Park
2025SINA: Accelerating Time Synchronization in Large-Scale Network Simulation Using In-Network Allreduce.
Dinghuang Hu, Dezun Dong, Xiangke Liao
2025SYgraph: A Portable Heterogeneous Graph Analytics Framework for GPUs.
Antonio De Caro, Gennaro Cordasco, Biagio Cosenza
2025Scaling Distributed Graph Processing to Hundreds of GPUs.
George M. Slota, Michael Mandulak
2025Scheduling based on Block Features for Concurrent Inference with Unseen DNN Models on GPU.
Diaohan Luo, Zhen Tang, Heran Gao, Yuewen Wu, Heng Wu, Xi Han, Wenbo Zhang
2025SmartBlock: Adaptive Block Floating Point Quantization for Efficient DNN Acceleration.
Xin Ju, Jingkui Yang, Mei Wen, Jun He, Jing Feng, Minjin Tang, Zhaoyun Chen, Yang Shi
2025Solving Extended Flexible Job Shop Scheduling Problems with Deep Reinforcement Learning.
Haonan Jiang, Yusen Li, Xiaoguang Liu, Gang Wang, Xuebo Zhang
2025SpeedSketch: An Ultra-Fast Sketch Generation and Delta Encoding Framework for Delta Compression.
Fengkui Yang, Yuanzhang Wang, Chunhua Li, Ke Zhou, Hui Li
2025SpiderCache: Semantic-Aware Caching Strategy for DNN Training.
Zesong Wang, Peng Fang, Fang Wang, Hong Jiang, Yimin Lu, Zhan Shi, Dan Feng
2025TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks.
Ziji Shi, Le Jiang, Ang Wang, Jie Zhang, Chencan Wu, Yong Li, Xiaokui Xiao, Wei Lin, Jialin Li
2025TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference.
Hongbin Zhang, Taosheng Wei, Zhenyi Zheng, Jiangsu Du, Zhiguang Chen, Yutong Lu
2025Thievory: Graph Processing with Multi-GPU Memory Stealing.
João Brotas, Ricardo Nobre, Aleksandar Ilic
2025VES: Vectorized Sparse General Matrix-Matrix Multiplication on Multi-Core DSPs.
Chuhe Hong, Qinglin Wang, Xing Peng, Gencheng Liu, Qingyang Zhang, Xinhai Chen, Jie Liu
2025ViReC: The Virtual Register Context Architecture for Efficient Near-Memory Multithreading.
Matthew Barondeau, Sophia Jiang, Jonathan Beard, Andreas Gerstlauer
2025WinRS: Accelerate Winograd Backward-Filter Convolution with Tiny Workspace.
Zhiyi Zhang, Junshi Chen, Jingwei Sun, Pengfei Zhang, Zhuopin Xu, Jun Shi, Qi Wang
2025ZTP: A Scalable and Lightweight Privacy-Preserving Blockchain via Scale-Free Quorums and Geometric Fragmentation.
Abdullah Al-Mamun, Dongfang Zhao, Gagan Agrawal, Ahmed Aleroud, Mohamed I. Ibrahem
2025pyGinkgo: A Sparse Linear Algebra Operator Framework for Python.
Keshvi Tuteja, Gregor Olenik, Roman Mishchuk, Yu-Hsiang Tsai, Markus Götz, Achim Streit, Hartwig Anzt, Charlotte Debus