IPDPS A

89 papers

YearTitle / Authors
2024A Cholesky QR type algorithm for computing tall-skinny QR factorization with column pivoting.
Takeshi Fukaya, Yuji Nakatsukasa, Yusaku Yamamoto
2024A Comparative Study of Intersection-Based Triangle Counting Algorithms on GPUs.
Jiangbo Li, Zichen Xu, Minh Pham, Yicheng Tu, Qihe Zhou
2024A Parallel Partial Merge Repair Algorithm for Multi-block Failures for Erasure Storage Systems.
Shuaipeng Zhang, Shiyi Li, Chentao Wu, Ruobin Wu, Saiqin Long, Wen Xia
2024A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis.
Dong Kyu Sung, Yongseok Son, Alex Sim, Kesheng Wu, Suren Byna, Houjun Tang, Hyeonsang Eom, Changjong Kim, Sunggon Kim
2024AMST: Accelerating Large-Scale Graph Minimum Spanning Tree Computation on FPGA.
Haishuang Fan, Rui Meng, Qichu Sun, Jingya Wu, Wenyan Lu, Xiaowei Li, Guihai Yan
2024ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor.
Yi-Chien Lin, Yuyang Chen, Sameh Gobriel, Nilesh Jain, Gopi Krishna Jha, Viktor K. Prasanna
2024Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures.
Yuke Li, Arjun Kashyap, Weicong Chen, Yanfei Guo, Xiaoyi Lu
2024Adaptive Prefetching for Fine-grain Communication in PGAS Programs.
Thomas B. Rolinger, Alan Sussman
2024Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources.
Thanh Son Phung, Douglas Thain
2024Alternative Basis Matrix Multiplication is Fast and Stable.
Oded Schwartz, Sivan Toledo, Noa Vaknin, Gal Wiernik
2024Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs.
Herbert Owen, Dominik Ernst, Thomas Gruber, Oriol Lehmkuhl, Guillaume Houzeaux, Lucas Gasparino, Gerhard Wellein
2024An O(N) distributed-memory parallel direct solver for planar integral equations.
Tianyu Liang, Chao Chen, Per-Gunnar Martinsson, George Biros
2024An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression.
Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur
2024Application-Attuned Memory Management for Containerized HPC Workflows.
Moiz Arif, Avinash Maurya, M. Mustafa Rafique, Dimitrios S. Nikolopoulos, Ali Raza Butt
2024Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching.
Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, Viktor K. Prasanna
2024Aurora: A Versatile and Flexible Accelerator for Graph Neural Networks.
Jiaqi Yang, Hao Zheng, Ahmed Louri
2024Automatic Task Parallelization of Dataflow Graphs in ML/DL Models.
Srinjoy Das, Lawrence Rauchwerger
2024Automating GPU Scalability for Complex Scientific Models: Phonon Boltzmann Transport Equation.
Eric Heisler, Siddharth Saurav, Aadesh Deshmukh, Sandip Mazumder, Hari Sundar
2024Benchmarking and Dissecting the Nvidia Hopper GPU Architecture.
Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Qiang Wang, Xiaowen Chu
2024CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems.
Yunfei Gu, Yihui Lu, Chentao Wu, Jie Li, Minyi Guo
2024CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems.
Mark Hildebrand, Jason Lowe-Power, Venkatesh Akella
2024Capturing Periodic I/O Using Frequency Techniques.
Ahmad Tarraf, Alexis Bandet, Francieli Boito, Guillaume Pallez, Felix Wolf
2024CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction.
Zizhe Jian, Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Haiying Xu, Robert Underwood, Shixun Wu, Jiajun Huang, Zizhong Chen, Franck Cappello
2024CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion.
Jan Laukemann, Thomas Gruber, Georg Hager, Dossay Oryspayev, Gerhard Wellein
2024CoCG: Fine-grained Cloud Game Co-location on Heterogeneous Platform.
Taolei Wang, Chao Li, Jing Wang, Cheng Xu, Xiaofeng Hou, Minyi Guo
2024Comparative Study of Large Language Model Architectures on Frontier.
Junqi Yin, Avishek Bose, Guojing Cong, Isaac Lyngaas, Quentin Anthony
2024Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters.
Di Zhang, Monish Soundar Raj, Bing Xie, Sheng Di, Dong Dai
2024DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware.
Malith Jayaweera, Yanyu Li, Yanzhi Wang, Bin Ren, David R. Kaeli
2024Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration.
Hammad Ather, Jean Luca Bez, Yankun Xia, Suren Byna
2024Druto: Upper-Bounding Silent Data Corruption Vulnerability in GPU Applications.
Md Hasanur Rahman, Sheng Di, Shengjian Guo, Xiaoyi Lu, Guanpeng Li, Franck Cappello
2024Enabling High-Performance Physical Based Rendering on New Sunway Supercomputer.
Zeyu Song, Lin Gan, Shengye Xiang, Yinuo Wang, Xiaohui Duan, Guangwen Yang
2024Enhancing the Generalization of Personalized Federated Learning with Multi-head Model and Ensemble Voting.
Van An Le, Nam Duong Tran, Phuong Nam Nguyen, Thanh Hung Nguyen, Phi Le Nguyen, Truong Thao Nguyen, Yusheng Ji
2024Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference.
Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda
2024Exploiting long vectors with a CFD code: a co-design show case.
Marc Blancafort, Roger Ferrer, Guillaume Houzeaux, Marta Garcia-Gasulla, Filippo Mantovani
2024Exploration of Trade-offs Between General-Purpose and Specialized Processing Elements in HPC-Oriented CGRA.
Emanuele Del Sozzo, Xinyuan Wang, Boma A. Adhi, Carlos Cortes, Jason Anderson, Kentaro Sano
2024FEDGE: An Interference-Aware QoS Prediction Framework for Black-Box Scenario in IaaS Clouds with Domain Generalization.
Yunlong Cheng, Xiuqi Huang, Zifeng Liu, Jiadong Chen, Xiaofeng Gao, Zhen Fang, Yongqiang Yang
2024Fast Abort-Freedom for Deterministic Transactions.
Chen Chen, Xingbo Wu, Wenshao Zhong, Jakob Eriksson
2024Fast Policy Convergence for Traffic Engineering with Proactive Distributed Message-Passing.
Zicheng Wang, Zirui Zhuang, Jingyu Wang, Qi Qi, Haifeng Sun, Jianxin Liao
2024Fast multiplication of random dense matrices with sparse matrices.
Tianyu Liang, Riley Murray, Aydin Buluç, James Demmel
2024Flexible NVMe Request Routing for Virtual Machines.
Tu Dinh Ngoc, Boris Teabe, Georges Da Costa, Daniel Hagimont
2024GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs.
Yihua Wei, Peng Jiang
2024Graph Analytics on Jellyfish topology.
Md Nahid Newaz, Sayan Ghosh, Joshua Suetterlein, Nathan R. Tallent, Md Atiqul Mollah, Ming Hua
2024HA-CSD: Host and SSD Coordinated Compression for Capacity and Performance.
Xiang Chen, Tao Lu, Jiapin Wang, Yu Zhong, Guangchun Xie, Xueming Cao, Yuanpeng Ma, Bing Si, Feng Ding, Ying Yang, Yunxin Huang, Yafei Yang, You Zhou, Fei Wu
2024HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions.
Bharath Ramesh, Nick Contini, Nawras Alnaasan, Kaushik Kandadi Suresh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda
2024Hadar: Heterogeneity-Aware Optimization-Based Online Scheduling for Deep Learning Cluster.
Abeda Sultana, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng
2024Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators.
Payman Behnam, Uday Kamal, Ali Shafiee, Alexey Tumanov, Saibal Mukhopadhyay
2024Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures.
Evangelos Georganas, Dhiraj D. Kalamkar, Kirill Voronin, Abhisek Kundu, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke
2024IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024
2024IPU-EpiDet: Identifying Gene Interactions on Massively Parallel Graph-Based AI Accelerators.
Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, Leonel Sousa
2024Interpretable Analysis of Production GPU Clusters Monitoring Data via Association Rule Mining.
Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari
2024LightDAG: A Low-latency DAG-based BFT Consensus through Lightweight Broadcast.
Xiaohai Dai, Guanxiong Wang, Jiang Xiao, Zhengxuan Guo, Rui Hao, Xia Xie, Hai Jin
2024LockillerTM: Enhancing Performance Lower Bounds in Best-Effort Hardware Transactional Memory.
Li Wan, Fu Chao, Qiang Li, Jun Han
2024Low-Depth Spatial Tree Algorithms.
Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski
2024MAAD: A Distributed Anomaly Detection Architecture for Microservices Systems.
Rongyuan Tan, Zhuozhao Li
2024MPI Errors Detection using GNN Embedding and Vector Embedding over LLVM IR.
Jad El Karchi, Hanze Chen, Ali TehraniJamsaz, Ali Jannesari, Mihail Popov, Emmanuelle Saillard
2024MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic.
Zijian Li, Zixuan Chen, Yiying Tang, Xin Ai, Yuanyi Zhu, Zhigao Zhao, Jiang Shao, Guowei Liu, Sen Liu, Bin Liu, Yang Xu
2024Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern Multi-Core Systems.
Yufan Xia, Giuseppe Maria Junior Barca
2024NVMe-oPF: Designing Efficient Priority Schemes for NVMe-over-Fabrics with Multi-Tenancy Support.
Darren Ng, Andrew Lin, Arjun Kashyap, Guanpeng Li, Xiaoyi Lu
2024OneShot: View-Adapting Streamlined BFT Protocols with Trusted Execution Environments.
Jérémie Decouchant, David Kozhaya, Vincent Rahli, Jiangshan Yu
2024OpenFFT-SME: An Efficient Outer Product Pattern FFT Library on ARM SME CPUs.
Ruge Zhang, Haipeng Jia, Yunquan Zhang, Baicheng Yan, Penghao Ma, Long Wang, Wenxuan Zhao
2024Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method.
Ahmed H. Mahmoud, Hesam Salehipour, Massimiliano Meneghin
2024Optimizing General Matrix Multiplications on Modern Multi-core DSPs.
Kainan Yu, Xinxin Qi, Peng Zhang, Jianbin Fang, Dezun Dong, Ruibo Wang, Tao Tang, Chun Huang, Yonggang Che, Zheng Wang
2024Optimizing and Scaling the 3D Reconstruction of Single-Particle Imaging.
Niteya Shah, Christine Sweeney, Vinay Ramakrishnaiah, Jeffrey Donatelli, Wu-chun Feng
2024Paldia: Enabling SLO-Compliant and Cost-Effective Serverless Computing on Heterogeneous Hardware.
Vivek M. Bhasi, Aakash Sharma, Shruti Mohanty, Mahmut Taylan Kandemir, Chita R. Das
2024Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications.
Xiran Zhang, Sameh Abdulah, Jian Cao, Hatem Ltaief, Ying Sun, Marc G. Genton, David E. Keyes
2024Parallel Derandomization for Coloring.
Sam Coy, Artur Czumaj, Peter Davies-Peck, Gopinath Mishra
2024PckGNN: Optimizing Aggregation Operators with Packing Strategies in Graph Neural Networks.
Zhengding Hu, Jingwei Sun, Zhongyang Li, Guangzhong Sun
2024Performance-Portable Multiphase Flow Solutions with Discontinuous Galerkin Methods.
Tobias S. Flynn, Robert Manson-Sawko, Gihan R. Mudalige
2024Picasso: Memory-Efficient Graph Coloring Using Palettes With Applications in Quantum Computing.
S. M. Ferdous, Reece Neff, Bo Peng, Salman Shuvo, Marco Minutoli, Sayak Mukherjee, Karol Kowalski, Michela Becchi, Mahantesh Halappanavar
2024Practically Tackling Memory Bottlenecks of Graph-Processing Workloads.
Alexandre Valentin Jamet, Georgios Vavouliotis, Daniel A. Jiménez, Lluc Alvarez, Marc Casas
2024Predicting Cross-Architecture Performance of Parallel Programs.
Daniel Nichols, Alexander Movsesyan, Jae-Seung Yeom, Abhik Sarkar, Daniel Milroy, Tapasya Patki, Abhinav Bhatele
2024QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices.
Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu
2024SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs.
Jing Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs
2024SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors.
Marta Navarro, Josué Feliu, Salvador Petit, María Engracia Gómez, Julio Sahuquillo
2024Scalable and Differentiable Simulator for Quantum Computational Chemistry.
Zhiqian Xu, Honghui Shang, Yi Fan, Xiongzhi Zeng, Yunquan Zhang, Chu Guo
2024Software Resource Disaggregation for HPC with Serverless Computing.
Marcin Copik, Marcin Chrapek, Larissa Schmid, Alexandru Calotoiu, Torsten Hoefler
2024TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning.
Gangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor K. Prasanna
2024TEEMO: Temperature Aware Energy Efficient Multi-Retention STT-RAM Cache Architecture.
Sukarn Agarwal, Shounak Chakraborty, Magnus Själander
2024Tackling Cold Start in Serverless Computing with Multi-Level Container Reuse.
Amelie Chi Zhou, Rongzheng Huang, Zhoubin Ke, Yusen Li, Yi Wang, Rui Mao
2024The Self-adaptive and Topology-aware MPI_Bcast leveraging Collective offload on Tianhe Express Interconnect.
Chongshan Liang, Yi Dai, Jun Xia, Jinbo Xu, Jintao Peng, Weixia Xu, Ming Xie, Jie Liu, Zhiquan Lai, Sheng Ma, Qi Zhu
2024Time-Color Tradeoff on Uniform Circle Formation by Asynchronous Robots.
Debasish Pattanayak, Gokarna Sharma
2024To Store or Not to Store: a graph theoretical approach for Dataset Versioning.
Anxin Guo, Jingwei Li, Pattara Sukprasert, Samir Khuller, Amol Deshpande, Koyel Mukherjee
2024TunIO: An AI-powered Framework for Optimizing HPC I/O.
Neeraj Rajesh, Keith Bateman, Jean Luca Bez, Suren Byna, Anthony Kougkas, Xian-He Sun
2024Two-Stage Block Orthogonalization to Improve Performance of s-step GMRES.
Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld
2024UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving.
Yifei Li, Ryan Chard, Yadu N. Babuji, Kyle Chard, Ian T. Foster, Zhuozhao Li
2024VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs.
Luhan Wang, Haipeng Jia, Lei Xu, Cunyang Wei, Kun Li, Xianmeng Jiang, Yunquan Zhang
2024Wait-free Trees with Asymptotically-Efficient Range Queries.
Ilya Kokorin, Victor Yudov, Vitaly Aksenov, Dan Alistarh
2024cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding.
Lihan Hu, Jing Li, Peng Jiang
2024nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling.
David Álvarez, Kevin Sala, Vicenç Beltran