| 2024 | A Conflict-aware Divide-and-Conquer Algorithm for Symmetric Sparse Matrix-Vector Multiplication. Haozhong Qiu, Chuanfu Xu, Jianbin Fang, Jian Zhang, Liang Deng, Yue Ding, Qingsong Wang, Shizhao Chen, Yonggang Che, Jie Liu |
| 2024 | A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale. Wesley Brewer, Matthias Maiterth, Vineet Kumar, Rafal P. Wojda, Sedrick Bouknight, Jesse Hines, Woong Shin, Scott Greenwood, David Grant, Wesley Williams, Feiyi Wang |
| 2024 | A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization. Daoce Wang, Pascal Grosset, Jesus Pulido, Tushar M. Athawale, Jiannan Tian, Kai Zhao, Zarija Lukic, Axel Huebl, Zhe Wang, James P. Ahrens, Dingwen Tao |
| 2024 | A Performance-Portable Kilometer-Scale Global Ocean Model on ORISE and New Sunway Heterogeneous Supercomputers. Junlin Wei, Xiang Han, Jiangfeng Yu, Jinrong Jiang, Hailong Liu, Pengfei Lin, Maoxue Yu, Kai Xu, Lian Zhao, Pengfei Wang, Weipeng Zheng, Jingwei Xie, Yanzhi Zhou, Tao Zhang, Feng Zhang, Yehong Zhang, Yue Yu, Yuzhu Wang, Yidi Bai, Chen Li, Zipeng Yu, Haoyu Deng, Yaxin Li, Xuebin Chi |
| 2024 | A Probabilistic Approach To Selecting Build Configurations in Package Managers. Daniel Nichols, Harshitha Menon, Todd Gamblin, Abhinav Bhatele |
| 2024 | A Scalable Algorithm for Active Learning. Youguang Chen, Zheyu Wen, George Biros |
| 2024 | A Sparsity-Aware Distributed-Memory Algorithm for Sparse-Sparse Matrix Multiplication. Yuxi Hong, Aydin Buluç |
| 2024 | A Workflow Roofline Model for End-to-End Workflow Performance Analysis. Nan Ding, Brian Austin, Yang Liu, Neil Mehta, Steven Farrell, Johannes P. Blaschke, Leonid Oliker, Hai Ah Nam, Nicholas J. Wright, Samuel Williams |
| 2024 | APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes. Yuanxin Wei, Jiangsu Du, Jiazhi Jiang, Xiao Shi, Xianwei Zhang, Dan Huang, Nong Xiao, Yutong Lu |
| 2024 | Accelerated Atomistic Kinetic Monte Carlo Simulations of Resistive Memory Arrays. Manasa Kaniselvan, Alexander Maeder, Marko Mladenovic, Mathieu Luisier, Alexandros Nikolaos Ziogas |
| 2024 | Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression. Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao |
| 2024 | Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching. Weihu Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng |
| 2024 | Accurate and Convenient Energy Measurements for GPUs: A Detailed Study of NVIDIA GPU's Built-In Power Sensor. Zeyu Yang, Karel Adámek, Wesley Armour |
| 2024 | Adaptive Patching for High-resolution Image Segmentation with Transformers. Enzhi Zhang, Isaac Lyngaas, Peng Chen, Xiao Wang, Jun Igarashi, Yuankai Huo, Masaharu Munetomo, Mohamed Wahib |
| 2024 | AmgT: Algebraic Multigrid Solver on Tensor Cores. Yuechen Lu, Lijie Zeng, Tengcheng Wang, Xu Fu, Wenxuan Li, Helin Cheng, Dechuang Yang, Zhou Jin, Marc Casas, Weifeng Liu |
| 2024 | An Evaluation of the Effect of Network Cost Optimization for Leadership Class Supercomputers. Awais Khan, John R. Lange, Nick Hagerty, Edwin F. Posada, John K. Holmen, James B. White, James Austin Harris, Verónica Melesse Vergara, Christopher Zimmer, Scott Atchley |
| 2024 | Application-Driven Exascale: The JUPITER Benchmark Suite. Andreas Herten, Sebastian Achilles, Damian Alvarez, Jayesh Badwaik, Eric Behle, Mathis Bode, Thomas Breuer, Daniel Caviedes-Voullième, Mehdi Cherti, Adel Dabah, Salem El Sayed, Wolfgang Frings, Ana Gonzalez-Nicolas, Eric B. Gregory, Kaveh Haghighi Mood, Thorsten Hater, Jenia Jitsev, Chelsea Maria John, Jan H. Meinke, Catrin I. Meyer, Pavel Mezentsev, Jan-Oliver Mirus, Stepan Nassyr, Carolin Penke, Manoel Römmer, Ujjwal Sinha, Benedikt von St. Vieth, Olaf Stein, Estela Suarez, Dennis Willsch, Ilya Zhukov |
| 2024 | Asynchronous Distributed-Memory Parallel Algorithms for Influence Maximization. Shubhendra Pal Singhal, Souvadra Hati, Jeffrey Young, Vivek Sarkar, Akihiro Hayashi, Richard W. Vuduc |
| 2024 | Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs. Mingkuan Xu, Shiyi Cao, Xupeng Miao, Umut A. Acar, Zhihao Jia |
| 2024 | AutoCheck: Automatically Identifying Variables for Checkpointing by Data Dependency Analysis. Xiang Fu, Weiping Zhang, Shiman Meng, Xin Huang, Wubiao Xu, Luanzheng Guo, Kento Sato |
| 2024 | Automated Code Generation of High-Order Stencils for a Dataflow Architecture. Ryuichi Sai, John M. Mellor-Crummey, Jinfan Xu, Mauricio Araya-Polo |
| 2024 | Boosting Data Center Performance via Intelligently Managed Multi-backend Disaggregated Memory. Jing Wang, Hanzhang Yang, Chao Li, Yiming Zhuansun, Wang Yuan, Cheng Xu, Xiaofeng Hou, Minyi Guo, Yang Hu, Yaqian Zhao |
| 2024 | Boosting Earth System Model Outputs And Saving PetaBytes in Their Storage Using Exascale Climate Emulators. Sameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, Ying Sun |
| 2024 | Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 Potentials. Ryan Stocks, Jorge L. Galvez Vallejo, Fiona C. Y. Yu, Calum Snowdon, Elise Palethorpe, Jakub Kurzak, Dmytro Bykov, Giuseppe M. J. Barca |
| 2024 | Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System. Kylee Santos, Stan G. Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan P. Thompson, Delyan Z. Kalchev, Danny Perez, Robert Schreiber, Scott Pakin, Edgar A. Leon, James H. Laros III, Michael James, Sivasankaran Rajamanickam |
| 2024 | CARP: Range Query-Optimized Indexing for Streaming Data. Ankush Jain, Charles D. Cranor, Qing Zheng, Bradley W. Settlemyer, George Amvrosiadis, Gary A. Grider |
| 2024 | COAXIAL: A CXL-Centric Memory System for Scalable Servers. Albert Cho, Anish Saxena, Moinuddin Qureshi, Alexandros Daglis |
| 2024 | CUDASTF: Bridging the Gap Between CUDA and Task Parallelism. Cédric Augonnet, Andrei Alexandrescu, Albert Sidelnik, Michael Garland |
| 2024 | CoRD: Combining Raid and Delta for Fast Partial Updates in Erasure-Coded Storage Clusters. Hai Zhou, Dan Feng, Yuchong Hu, Wei Wang, Huadong Huang |
| 2024 | DBSR: An Efficient Storage Format for Vectorizing Sparse Triangular Solvers on Structured Grids. Xiaojian Yang, Shengguo Li, Fan Yuan, Dezun Dong |
| 2024 | DFTracer: An Analysis-Friendly Data Flow Tracer for AI-Driven Workflows. Hariharan Devarajan, Loïc Pottier, Kaushik Velusamy, Huihuo Zheng, Izzet Yildirim, Olga Kogiou, Weikuan Yu, Anthony Kougkas, Xian-He Sun, Jae-Seung Yeom, Kathryn M. Mohror |
| 2024 | Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers. Siddharth Singh, Prajwal Singhania, Aditya K. Ranjan, John Kirchenbauer, Jonas Geiping, Yuxin Wen, Neel Jain, Abhimanyu Hans, Manli Shu, Aditya Tomar, Tom Goldstein, Abhinav Bhatele |
| 2024 | Designing a GPU-Accelerated Communication Layer for Efficient Fluid-Structure Interaction Computations on Heterogeneous Systems. Aristotle X. Martin, Geng Liu, Bálint Joó, Runxin Wu, Mohammed Shihab Kabir, Erik W. Draeger, Amanda Randles |
| 2024 | Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication. Isuru Ranawaka, Md Taufique Hussain, Charles Block, Gerasimos Gerogiannis, Josep Torrellas, Ariful Azad |
| 2024 | Doubling Graph Traversal Efficiency to 198 TeraTEPS on the Supercomputer Fugaku. Junya Arai, Masahiro Nakao, Yuto Inoue, Kanto Teranishi, Koji Ueno, Keiichiro Yamamura, Mitsuhisa Sato, Katsuki Fujisawa |
| 2024 | EXO: Accelerating Storage Paravirtualization with eBPF. Shi Qiu, Li Wang, Yiming Zhang |
| 2024 | EcoLife: Carbon-Aware Serverless Function Scheduling for Sustainable Computing. Yankai Jiang, Rohan Basu Roy, Baolin Li, Devesh Tiwari |
| 2024 | Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link. Dong Xu, Yuan Feng, Kwangsik Shin, Daewoo Kim, Hyeran Jeon, Dong Li |
| 2024 | Efficient Weighted Graph Matching on GPUs. Michael Mandulak, Sayan Ghosh, S. M. Ferdous, Mahantesh Halappanavar, George M. Slota |
| 2024 | Enabling 13K-Atom Excited-State GW Calculations via Low-Rank Approximations and HPC on the New Sunway Supercomputer. Wentiao Wu, Zhengbang Zhou, Qingcai Jiang, Junwei Feng, Xinming Qin, Huanhuan Ma, Zhenwei Cao, Junshi Chen, Sheng Chen, Xinyong Meng, Bingkun Hou, Yuanfan Xiong, Linhao Wang, Yixuan Sun, Hong An, Jinlong Yang, Wei Hu |
| 2024 | Enumeration of Billions of Maximal Bicliques in Bipartite Graphs without Using GPUs. Zhe Pan, Shuibing He, Xu Li, Xuechen Zhang, Yanlong Yin, Rui Wang, Lidan Shou, Mingli Song, Xian-He Sun, Gang Chen |
| 2024 | Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of Interest. Xuan Wu, Qian Gong, Jieyang Chen, Qing Liu, Norbert Podhorszki, Xin Liang, Scott Klasky |
| 2024 | Exploring Efficient Partial Differential Equation Solution Using Speed Galerkin Transformer. Xun Wang, Zeyang Zhu, Xiangyu Meng, Tao Song |
| 2024 | Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects. Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler |
| 2024 | Fast and Efficient Scaling for Microservices with SurgeGuard. Anyesha Ghosh, Neeraja J. Yadwadkar, Mattan Erez |
| 2024 | Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning. Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei, Junjie Qiu, Hui Qu, Zehui Ren, Zhangli Sha, Xuecheng Su, Xiaowen Sun, Yixuan Tan, Minghui Tang, Shiyu Wang, Yaohui Wang, Yongji Wang, Ziwei Xie, Yiliang Xiong, Yanhong Xu, Shengfeng Ye, Shuiping Yu, Yukun Zha, Liyue Zhang, Haowei Zhang, Mingchuan Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Yuheng Zou |
| 2024 | GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems. Xin You, Zhibo Xuan, Hailong Yang, Zhongzhi Luan, Yi Liu, Depei Qian |
| 2024 | HPAC-ML: A Programming Model for Embedding ML Surrogates in Scientific Applications. Zane Fink, Konstantinos Parasyris, Praneet Rathi, Giorgis Georgakoudis, Harshitha Menon, Peer-Timo Bremer |
| 2024 | HiRace: Accurate and Fast Data Race Checking for GPU Programs. John Jacobson, Martin Burtscher, Ganesh Gopalakrishnan |
| 2024 | High Performance Unstructured SpMM Computation Using Tensor Cores. Patrik Okanovic, Grzegorz Kwasniewski, Paolo Sylos Labini, Maciej Besta, Flavio Vella, Torsten Hoefler |
| 2024 | Hydrogen: Contention-Aware Hybrid Memory for Heterogeneous CPU-GPU Architectures. Yiwei Li, Mingyu Gao |
| 2024 | KaMPIng: Flexible and (Near) Zero-Overhead C++ Bindings for MPI. Tim Niklas Uhl, Matthias Schimek, Lukas Hübner, Demian Hespe, Florian Kurpicz, Daniel Seemaier, Christoph Stelz, Peter Sanders |
| 2024 | LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming. Siyuan Shen, Langwen Huang, Marcin Chrapek, Timo Schneider, Jai Dayal, Manisha Gajbe, Robert W. Wisniewski, Torsten Hoefler |
| 2024 | LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services. Malgorzata Lazuka, Andreea Anghel, Thomas P. Parnell |
| 2024 | Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning. Hongwei Jin, George Papadimitriou, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang, Anirban Mandal, Ewa Deelman |
| 2024 | Learning Generalizable Program and Architecture Representations for Performance Modeling. Lingda Li, Thomas Flynn, Adolfy Hoisie |
| 2024 | LexiQL: Quantum Natural Language Processing on NISQ-era Machines. Daniel Silver, Aditya Ranjan, Rakesh Achutha, Tirthak Patel, Devesh Tiwari |
| 2024 | LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores. Yiwei Zhang, Kun Li, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang |
| 2024 | Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity. Tuowei Wang, Kun Li, Zixu Hao, Donglin Bai, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang |
| 2024 | M3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUs. Dongho Ha, Yunan Zhang, Chen-Chien Kao, Christopher J. Hughes, Won Woo Ro, Hung-Wei Tseng |
| 2024 | MCBound: An Online Framework to Characterize and Classify Memory/Compute-bound HPC Jobs. Francesco Antici, Andrea Bartolini, Zeynep Kiziltan, Özalp Babaoglu, Yuetsu Kodama |
| 2024 | MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators. Zheng Zhang, Donglin Yang, Xiaobo Zhou, Dazhao Cheng |
| 2024 | MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization. Gautham Dharuman, Kyle Hippe, Alexander Brace, Sam Foreman, Väinö Hatanpää, Varuni Katti Sastry, Huihuo Zheng, Logan T. Ward, Servesh Muralidharan, Archit Vasan, Bharat Kale, Carla M. Mann, Heng Ma, Yun-Hsuan Cheng, Yuliana Zamora, Shengchao Liu, Chaowei Xiao, Murali Emani, Tom Gibbs, Mahidhar Tatineni, Deepak Canchi, Jerome Mitchell, Koichi Yamada, Maria Garzaran, Michael E. Papka, Ian T. Foster, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan |
| 2024 | Many-Body Electronic Correlation Energy using Krylov Subspace Linear Solvers. Shikhar Shah, Boqin Zhang, Hua Huang, John E. Pask, Phanish Suryanarayana, Edmond Chow |
| 2024 | Matrix-Free Finite Volume Kernels on a Dataflow Architecture. Ryuichi Sai, François P. Hamon, John M. Mellor-Crummey, Mauricio Araya-Polo |
| 2024 | MegaMmap: Blurring the Boundary Between Memory and Storage for Data-Intensive Workloads. Luke Logan, Anthony Kougkas, Xian-He Sun |
| 2024 | Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUs. Dechuang Yang, Yuxuan Zhao, Yiduo Niu, Weile Jia, En Shao, Weifeng Liu, Guangming Tan, Zhou Jin |
| 2024 | MixQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction. Yidong Chen, Chen Zhang, Rongchao Dong, Haoyuan Zhang, Yonghua Zhang, Zhonghua Lu, Jidong Zhai |
| 2024 | Moirae: Generating High-Performance Composite Stencil Programs with Global Optimizations. Xiaoyan Liu, Xinyu Yang, Kejie Ma, Shanghao Liu, Kaige Zhang, Hailong Yang, Yi Liu, Zhongzhi Luan, Depei Qian |
| 2024 | NetCL: A Unified Programming Framework for In-Network Computing. George Karlos, Henri E. Bal, Lin Wang |
| 2024 | Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI. Mikhail Khalilov, Salvatore Di Girolamo, Marcin Chrapek, Rami Nudelman, Gil Bloch, Torsten Hoefler |
| 2024 | ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability. Xiao Wang, Siyan Liu, Aristeidis Tsaris, Jong-Youl Choi, Ashwin M. Aji, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash |
| 2024 | On the Efficacy of Surface Codes in Compensating for Radiation Events in Superconducting Devices. Marzio Vallero, Gioele Casagranda, Flavio Vella, Paolo Rech |
| 2024 | Optimizing Distributed ML Communication with Fused Computation-Collective Operations. Kishore Punniyamurthy, Khaled Hamidouche, Bradford M. Beckmann |
| 2024 | Optimizing Quantum Fourier Transformation (QFT) Kernels for Modern NISQ and FT Architectures. Yuwei Jin, Xiangyu Gao, Minghao Guo, Henry Chen, Fei Hua, Chi Zhang, Eddy Z. Zhang |
| 2024 | PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters. Rutwik Jain, Brandon Tran, Keting Chen, Matthew D. Sinclair, Shivaram Venkataraman |
| 2024 | Parallax: A Compiler for Neutral Atom Quantum Computers under Hardware Constraints. Jason Ludmir, Tirthak Patel |
| 2024 | ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments. Munkyu Lee, Sihoon Seong, Minki Kang, Jihyuk Lee, Gap-Joo Na, In-Geol Chun, Dimitrios S. Nikolopoulos, Cheol-Ho Hong |
| 2024 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation. Branden Butler, Sixing Yu, Arya Mazaheri, Ali Jannesari |
| 2024 | Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2024, Atlanta, GA, USA, November 17-22, 2024 |
| 2024 | Pushing the Limit of Quantum Mechanical Simulation to the Raman Spectra of a Biological System with 100 Million Atoms. Honghui Shang, Ying Liu, Zhikun Wu, Zhenchuan Chen, Jinfeng Liu, Meiyue Shao, Yingzhou Li, Bowen Kan, Huimin Cui, Xiaobing Feng, Yunquan Zhang, Donald G. Truhlar, Hong An, Xiao He, Jinlong Yang |
| 2024 | Rapid GPU-Based Pangenome Graph Layout. Jiajie Li, Jan-Niklas Schmelzle, Yixiao Du, Simon Heumos, Andrea Guarracino, Giulia Guidi, Pjotr Prins, Erik Garrison, Zhiru Zhang |
| 2024 | Realizing Joint Extreme-Scale Simulations on Multiple Supercomputers - Two Superfacility Case Studies. Theresa Pollinger, Alexander Van Craen, Philipp Offenhäuser, Dirk Pflüger |
| 2024 | Realizing Quantum Kernel Models at Scale with Matrix Product State Simulation. Mekena Metcalf, Pablo Andrés-Martínez, Nathan Fitzpatrick |
| 2024 | RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules. Zaifeng Pan, Zhen Zheng, Feng Zhang, Bing Xie, Ruofan Wu, Shaden Smith, Chuanjie Liu, Olatunji Ruwase, Xiaoyong Du, Yufei Ding |
| 2024 | Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine. Barry Sly-Delgado, Ben Tovar, Jin Zhou, Douglas Thain |
| 2024 | Revisiting Computation for Research: Practices and Trends. Jeremiah Giordani, Ziyang Xu, Ella Colby, August Ning, Bhargav Reddy Godala, Ishita Chaturvedi, Shaowei Zhu, Yebin Chon, Greg Chan, Zujun Tan, Galen Collier, Jonathan D. Halverson, Enrico Armenio Deiana, Jasper Liang, Federico Sossai, Yian Su, Atmn Patel, Bangyen Pham, Nathan Greiner, Simone Campanoni, David I. August |
| 2024 | SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing. Chengzhi Lu, Huanle Xu, Yudan Li, Wenyan Chen, Kejiang Ye, Chengzhong Xu |
| 2024 | Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day. Jianxiong Li, Boyang Li, Zhuoqiang Guo, Mingzhen Li, Enji Li, Lijun Liu, Guojun Yuan, Zhan Wang, Guangming Tan, Weile Jia |
| 2024 | Scaling New Heights: Transformative Cross-GPU Sampling for Training Billion-Edge Graphs. Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng |
| 2024 | Static Generation of Efficient OpenMP Offload Data Mappings. Luke Marzen, Akash Dutta, Ali Jannesari |
| 2024 | Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing. Hanfei Yu, Hao Wang, Devesh Tiwari, Jian Li, Seung-Jong Park |
| 2024 | Surpassing Sycamore: Achieving Energetic Superiority Through System-Level Circuit Simulation. Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-yang Lu, Jian-Wei Pan, Zhilin Pei, Xingcheng Zhang, Wanli Ouyang |
| 2024 | Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale Integration. Yinxiao Feng, Kaisheng Ma |
| 2024 | Tango: A Cross-layer Approach to Managing I/O Interference over Local Ephemeral Storage. Zhenbo Qiao, Qirui Tian, Zhenlu Qin, Jinzhen Wang, Qing Liu, Norbert Podhorszki, Scott Klasky, Hongjian Zhu |
| 2024 | TorchGT: A Holistic System for Large-Scale Graph Transformer Training. Meng Zhang, Jie Sun, Qinghao Hu, Peng Sun, Zeke Wang, Yonggang Wen, Tianwei Zhang |
| 2024 | Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression. Hatem Ltaief, Rabab Alomairy, Qinglei Cao, Jie Ren, Lotfi Slim, Thorsten Kurth, Benedikt Dorschner, Salim Bougouffa, Rached Abdelkhalak, David E. Keyes |
| 2024 | Toward High-Performance Blockchain System by Blurring the Line between Ordering and Execution. Donghyeon Ryu, Chanik Park |
| 2024 | Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku Supercomputer. Ana Luisa Veroneze Solórzano, Kento Sato, Keiji Yamamoto, Fumiyoshi Shoji, Jim M. Brandt, Benjamin Schwaller, Sara Petra Walton, Jennifer Green, Devesh Tiwari |
| 2024 | Towards Exascale Simulations of Nanoelectronic Devices in the GW Approximation. Leonard Deuschle, Alexander Maeder, Vincent Maillou, Nicolas Vetsch, Anders Winka, Jiang Cao, Alexandros Nikolaos Ziogas, Mathieu Luisier |
| 2024 | Towards Highly Compatible I/O-Aware Workflow Scheduling on HPC Systems. Yiqin Dai, Ruibo Wang, Yong Dong, Kai Lu |
| 2024 | UNR: Unified Notifiable RMA Library for HPC. Guangnan Feng, Jiabin Xie, Dezun Dong, Yutong Lu |
| 2024 | Understanding Data Movement Patterns in HPC: A NERSC Case Study. Anna Giannakou, Damian Hazen, Bjoern Enders, Lavanya Ramakrishnan, Nicholas J. Wright |
| 2024 | Unlocking High Performance with Low-Bit NPUs and CPUs for Highly Optimized HPL-MxP on Cloud Brain II. Weicheng Xue, Kai Yang, Yongxiang Liu, Dengdong Fan, Pengxiang Xu, Yonghong Tian |
| 2024 | Versatile Datapath Soft Error Detection on the Cheap for HPC Applications. Yafan Huang, Sheng Di, Zhaorui Zhang, Xiaoyi Lu, Guanpeng Li |
| 2024 | autoGEMM: Pushing the Limits of Irregular Matrix Multiplication on Arm Architectures. Du Wu, Jintao Meng, Wenxi Zhu, Minwen Deng, Xiao Wang, Tao Luo, Mohamed Wahib, Yanjie Wei |
| 2024 | cuSZ-i: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level Interpolation. Jinyang Liu, Jiannan Tian, Shixun Wu, Sheng Di, Boyuan Zhang, Robert Underwood, Yafan Huang, Jiajun Huang, Kai Zhao, Guanpeng Li, Dingwen Tao, Zizhong Chen, Franck Cappello |
| 2024 | cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio. Yafan Huang, Sheng Di, Guanpeng Li, Franck Cappello |
| 2024 | hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression. Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Zizhe Jian, Xin Liang, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur |