| 2024 | A Cholesky QR type algorithm for computing tall-skinny QR factorization with column pivoting. Takeshi Fukaya, Yuji Nakatsukasa, Yusaku Yamamoto |
| 2024 | A Comparative Study of Intersection-Based Triangle Counting Algorithms on GPUs. Jiangbo Li, Zichen Xu, Minh Pham, Yicheng Tu, Qihe Zhou |
| 2024 | A Parallel Partial Merge Repair Algorithm for Multi-block Failures for Erasure Storage Systems. Shuaipeng Zhang, Shiyi Li, Chentao Wu, Ruobin Wu, Saiqin Long, Wen Xia |
| 2024 | A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis. Dong Kyu Sung, Yongseok Son, Alex Sim, Kesheng Wu, Suren Byna, Houjun Tang, Hyeonsang Eom, Changjong Kim, Sunggon Kim |
| 2024 | AMST: Accelerating Large-Scale Graph Minimum Spanning Tree Computation on FPGA. Haishuang Fan, Rui Meng, Qichu Sun, Jingya Wu, Wenyan Lu, Xiaowei Li, Guihai Yan |
| 2024 | ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor. Yi-Chien Lin, Yuyang Chen, Sameh Gobriel, Nilesh Jain, Gopi Krishna Jha, Viktor K. Prasanna |
| 2024 | Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures. Yuke Li, Arjun Kashyap, Weicong Chen, Yanfei Guo, Xiaoyi Lu |
| 2024 | Adaptive Prefetching for Fine-grain Communication in PGAS Programs. Thomas B. Rolinger, Alan Sussman |
| 2024 | Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources. Thanh Son Phung, Douglas Thain |
| 2024 | Alternative Basis Matrix Multiplication is Fast and Stable. Oded Schwartz, Sivan Toledo, Noa Vaknin, Gal Wiernik |
| 2024 | Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs. Herbert Owen, Dominik Ernst, Thomas Gruber, Oriol Lehmkuhl, Guillaume Houzeaux, Lucas Gasparino, Gerhard Wellein |
| 2024 | An O(N) distributed-memory parallel direct solver for planar integral equations. Tianyu Liang, Chao Chen, Per-Gunnar Martinsson, George Biros |
| 2024 | An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression. Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur |
| 2024 | Application-Attuned Memory Management for Containerized HPC Workflows. Moiz Arif, Avinash Maurya, M. Mustafa Rafique, Dimitrios S. Nikolopoulos, Ali Raza Butt |
| 2024 | Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching. Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, Viktor K. Prasanna |
| 2024 | Aurora: A Versatile and Flexible Accelerator for Graph Neural Networks. Jiaqi Yang, Hao Zheng, Ahmed Louri |
| 2024 | Automatic Task Parallelization of Dataflow Graphs in ML/DL Models. Srinjoy Das, Lawrence Rauchwerger |
| 2024 | Automating GPU Scalability for Complex Scientific Models: Phonon Boltzmann Transport Equation. Eric Heisler, Siddharth Saurav, Aadesh Deshmukh, Sandip Mazumder, Hari Sundar |
| 2024 | Benchmarking and Dissecting the Nvidia Hopper GPU Architecture. Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Qiang Wang, Xiaowen Chu |
| 2024 | CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems. Yunfei Gu, Yihui Lu, Chentao Wu, Jie Li, Minyi Guo |
| 2024 | CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems. Mark Hildebrand, Jason Lowe-Power, Venkatesh Akella |
| 2024 | Capturing Periodic I/O Using Frequency Techniques. Ahmad Tarraf, Alexis Bandet, Francieli Boito, Guillaume Pallez, Felix Wolf |
| 2024 | CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction. Zizhe Jian, Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Haiying Xu, Robert Underwood, Shixun Wu, Jiajun Huang, Zizhong Chen, Franck Cappello |
| 2024 | CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion. Jan Laukemann, Thomas Gruber, Georg Hager, Dossay Oryspayev, Gerhard Wellein |
| 2024 | CoCG: Fine-grained Cloud Game Co-location on Heterogeneous Platform. Taolei Wang, Chao Li, Jing Wang, Cheng Xu, Xiaofeng Hou, Minyi Guo |
| 2024 | Comparative Study of Large Language Model Architectures on Frontier. Junqi Yin, Avishek Bose, Guojing Cong, Isaac Lyngaas, Quentin Anthony |
| 2024 | Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters. Di Zhang, Monish Soundar Raj, Bing Xie, Sheng Di, Dong Dai |
| 2024 | DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware. Malith Jayaweera, Yanyu Li, Yanzhi Wang, Bin Ren, David R. Kaeli |
| 2024 | Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration. Hammad Ather, Jean Luca Bez, Yankun Xia, Suren Byna |
| 2024 | Druto: Upper-Bounding Silent Data Corruption Vulnerability in GPU Applications. Md Hasanur Rahman, Sheng Di, Shengjian Guo, Xiaoyi Lu, Guanpeng Li, Franck Cappello |
| 2024 | Enabling High-Performance Physical Based Rendering on New Sunway Supercomputer. Zeyu Song, Lin Gan, Shengye Xiang, Yinuo Wang, Xiaohui Duan, Guangwen Yang |
| 2024 | Enhancing the Generalization of Personalized Federated Learning with Multi-head Model and Ensemble Voting. Van An Le, Nam Duong Tran, Phuong Nam Nguyen, Thanh Hung Nguyen, Phi Le Nguyen, Truong Thao Nguyen, Yusheng Ji |
| 2024 | Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda |
| 2024 | Exploiting long vectors with a CFD code: a co-design show case. Marc Blancafort, Roger Ferrer, Guillaume Houzeaux, Marta Garcia-Gasulla, Filippo Mantovani |
| 2024 | Exploration of Trade-offs Between General-Purpose and Specialized Processing Elements in HPC-Oriented CGRA. Emanuele Del Sozzo, Xinyuan Wang, Boma A. Adhi, Carlos Cortes, Jason Anderson, Kentaro Sano |
| 2024 | FEDGE: An Interference-Aware QoS Prediction Framework for Black-Box Scenario in IaaS Clouds with Domain Generalization. Yunlong Cheng, Xiuqi Huang, Zifeng Liu, Jiadong Chen, Xiaofeng Gao, Zhen Fang, Yongqiang Yang |
| 2024 | Fast Abort-Freedom for Deterministic Transactions. Chen Chen, Xingbo Wu, Wenshao Zhong, Jakob Eriksson |
| 2024 | Fast Policy Convergence for Traffic Engineering with Proactive Distributed Message-Passing. Zicheng Wang, Zirui Zhuang, Jingyu Wang, Qi Qi, Haifeng Sun, Jianxin Liao |
| 2024 | Fast multiplication of random dense matrices with sparse matrices. Tianyu Liang, Riley Murray, Aydin Buluç, James Demmel |
| 2024 | Flexible NVMe Request Routing for Virtual Machines. Tu Dinh Ngoc, Boris Teabe, Georges Da Costa, Daniel Hagimont |
| 2024 | GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs. Yihua Wei, Peng Jiang |
| 2024 | Graph Analytics on Jellyfish topology. Md Nahid Newaz, Sayan Ghosh, Joshua Suetterlein, Nathan R. Tallent, Md Atiqul Mollah, Ming Hua |
| 2024 | HA-CSD: Host and SSD Coordinated Compression for Capacity and Performance. Xiang Chen, Tao Lu, Jiapin Wang, Yu Zhong, Guangchun Xie, Xueming Cao, Yuanpeng Ma, Bing Si, Feng Ding, Ying Yang, Yunxin Huang, Yafei Yang, You Zhou, Fei Wu |
| 2024 | HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions. Bharath Ramesh, Nick Contini, Nawras Alnaasan, Kaushik Kandadi Suresh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda |
| 2024 | Hadar: Heterogeneity-Aware Optimization-Based Online Scheduling for Deep Learning Cluster. Abeda Sultana, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng |
| 2024 | Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators. Payman Behnam, Uday Kamal, Ali Shafiee, Alexey Tumanov, Saibal Mukhopadhyay |
| 2024 | Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures. Evangelos Georganas, Dhiraj D. Kalamkar, Kirill Voronin, Abhisek Kundu, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke |
| 2024 | IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024 |
| 2024 | IPU-EpiDet: Identifying Gene Interactions on Massively Parallel Graph-Based AI Accelerators. Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, Leonel Sousa |
| 2024 | Interpretable Analysis of Production GPU Clusters Monitoring Data via Association Rule Mining. Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari |
| 2024 | LightDAG: A Low-latency DAG-based BFT Consensus through Lightweight Broadcast. Xiaohai Dai, Guanxiong Wang, Jiang Xiao, Zhengxuan Guo, Rui Hao, Xia Xie, Hai Jin |
| 2024 | LockillerTM: Enhancing Performance Lower Bounds in Best-Effort Hardware Transactional Memory. Li Wan, Fu Chao, Qiang Li, Jun Han |
| 2024 | Low-Depth Spatial Tree Algorithms. Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski |
| 2024 | MAAD: A Distributed Anomaly Detection Architecture for Microservices Systems. Rongyuan Tan, Zhuozhao Li |
| 2024 | MPI Errors Detection using GNN Embedding and Vector Embedding over LLVM IR. Jad El Karchi, Hanze Chen, Ali TehraniJamsaz, Ali Jannesari, Mihail Popov, Emmanuelle Saillard |
| 2024 | MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic. Zijian Li, Zixuan Chen, Yiying Tang, Xin Ai, Yuanyi Zhu, Zhigao Zhao, Jiang Shao, Guowei Liu, Sen Liu, Bin Liu, Yang Xu |
| 2024 | Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern Multi-Core Systems. Yufan Xia, Giuseppe Maria Junior Barca |
| 2024 | NVMe-oPF: Designing Efficient Priority Schemes for NVMe-over-Fabrics with Multi-Tenancy Support. Darren Ng, Andrew Lin, Arjun Kashyap, Guanpeng Li, Xiaoyi Lu |
| 2024 | OneShot: View-Adapting Streamlined BFT Protocols with Trusted Execution Environments. Jérémie Decouchant, David Kozhaya, Vincent Rahli, Jiangshan Yu |
| 2024 | OpenFFT-SME: An Efficient Outer Product Pattern FFT Library on ARM SME CPUs. Ruge Zhang, Haipeng Jia, Yunquan Zhang, Baicheng Yan, Penghao Ma, Long Wang, Wenxuan Zhao |
| 2024 | Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method. Ahmed H. Mahmoud, Hesam Salehipour, Massimiliano Meneghin |
| 2024 | Optimizing General Matrix Multiplications on Modern Multi-core DSPs. Kainan Yu, Xinxin Qi, Peng Zhang, Jianbin Fang, Dezun Dong, Ruibo Wang, Tao Tang, Chun Huang, Yonggang Che, Zheng Wang |
| 2024 | Optimizing and Scaling the 3D Reconstruction of Single-Particle Imaging. Niteya Shah, Christine Sweeney, Vinay Ramakrishnaiah, Jeffrey Donatelli, Wu-chun Feng |
| 2024 | Paldia: Enabling SLO-Compliant and Cost-Effective Serverless Computing on Heterogeneous Hardware. Vivek M. Bhasi, Aakash Sharma, Shruti Mohanty, Mahmut Taylan Kandemir, Chita R. Das |
| 2024 | Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications. Xiran Zhang, Sameh Abdulah, Jian Cao, Hatem Ltaief, Ying Sun, Marc G. Genton, David E. Keyes |
| 2024 | Parallel Derandomization for Coloring. Sam Coy, Artur Czumaj, Peter Davies-Peck, Gopinath Mishra |
| 2024 | PckGNN: Optimizing Aggregation Operators with Packing Strategies in Graph Neural Networks. Zhengding Hu, Jingwei Sun, Zhongyang Li, Guangzhong Sun |
| 2024 | Performance-Portable Multiphase Flow Solutions with Discontinuous Galerkin Methods. Tobias S. Flynn, Robert Manson-Sawko, Gihan R. Mudalige |
| 2024 | Picasso: Memory-Efficient Graph Coloring Using Palettes With Applications in Quantum Computing. S. M. Ferdous, Reece Neff, Bo Peng, Salman Shuvo, Marco Minutoli, Sayak Mukherjee, Karol Kowalski, Michela Becchi, Mahantesh Halappanavar |
| 2024 | Practically Tackling Memory Bottlenecks of Graph-Processing Workloads. Alexandre Valentin Jamet, Georgios Vavouliotis, Daniel A. Jiménez, Lluc Alvarez, Marc Casas |
| 2024 | Predicting Cross-Architecture Performance of Parallel Programs. Daniel Nichols, Alexander Movsesyan, Jae-Seung Yeom, Abhik Sarkar, Daniel Milroy, Tapasya Patki, Abhinav Bhatele |
| 2024 | QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices. Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu |
| 2024 | SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs. Jing Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs |
| 2024 | SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors. Marta Navarro, Josué Feliu, Salvador Petit, María Engracia Gómez, Julio Sahuquillo |
| 2024 | Scalable and Differentiable Simulator for Quantum Computational Chemistry. Zhiqian Xu, Honghui Shang, Yi Fan, Xiongzhi Zeng, Yunquan Zhang, Chu Guo |
| 2024 | Software Resource Disaggregation for HPC with Serverless Computing. Marcin Copik, Marcin Chrapek, Larissa Schmid, Alexandru Calotoiu, Torsten Hoefler |
| 2024 | TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning. Gangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor K. Prasanna |
| 2024 | TEEMO: Temperature Aware Energy Efficient Multi-Retention STT-RAM Cache Architecture. Sukarn Agarwal, Shounak Chakraborty, Magnus Själander |
| 2024 | Tackling Cold Start in Serverless Computing with Multi-Level Container Reuse. Amelie Chi Zhou, Rongzheng Huang, Zhoubin Ke, Yusen Li, Yi Wang, Rui Mao |
| 2024 | The Self-adaptive and Topology-aware MPI_Bcast leveraging Collective offload on Tianhe Express Interconnect. Chongshan Liang, Yi Dai, Jun Xia, Jinbo Xu, Jintao Peng, Weixia Xu, Ming Xie, Jie Liu, Zhiquan Lai, Sheng Ma, Qi Zhu |
| 2024 | Time-Color Tradeoff on Uniform Circle Formation by Asynchronous Robots. Debasish Pattanayak, Gokarna Sharma |
| 2024 | To Store or Not to Store: a graph theoretical approach for Dataset Versioning. Anxin Guo, Jingwei Li, Pattara Sukprasert, Samir Khuller, Amol Deshpande, Koyel Mukherjee |
| 2024 | TunIO: An AI-powered Framework for Optimizing HPC I/O. Neeraj Rajesh, Keith Bateman, Jean Luca Bez, Suren Byna, Anthony Kougkas, Xian-He Sun |
| 2024 | Two-Stage Block Orthogonalization to Improve Performance of s-step GMRES. Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld |
| 2024 | UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving. Yifei Li, Ryan Chard, Yadu N. Babuji, Kyle Chard, Ian T. Foster, Zhuozhao Li |
| 2024 | VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs. Luhan Wang, Haipeng Jia, Lei Xu, Cunyang Wei, Kun Li, Xianmeng Jiang, Yunquan Zhang |
| 2024 | Wait-free Trees with Asymptotically-Efficient Range Queries. Ilya Kokorin, Victor Yudov, Vitaly Aksenov, Dan Alistarh |
| 2024 | cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding. Lihan Hu, Jing Li, Peng Jiang |
| 2024 | nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling. David Álvarez, Kevin Sala, Vicenç Beltran |