SC A

109 papers

YearTitle / Authors
2024A Conflict-aware Divide-and-Conquer Algorithm for Symmetric Sparse Matrix-Vector Multiplication.
Haozhong Qiu, Chuanfu Xu, Jianbin Fang, Jian Zhang, Liang Deng, Yue Ding, Qingsong Wang, Shizhao Chen, Yonggang Che, Jie Liu
2024A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale.
Wesley Brewer, Matthias Maiterth, Vineet Kumar, Rafal P. Wojda, Sedrick Bouknight, Jesse Hines, Woong Shin, Scott Greenwood, David Grant, Wesley Williams, Feiyi Wang
2024A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization.
Daoce Wang, Pascal Grosset, Jesus Pulido, Tushar M. Athawale, Jiannan Tian, Kai Zhao, Zarija Lukic, Axel Huebl, Zhe Wang, James P. Ahrens, Dingwen Tao
2024A Performance-Portable Kilometer-Scale Global Ocean Model on ORISE and New Sunway Heterogeneous Supercomputers.
Junlin Wei, Xiang Han, Jiangfeng Yu, Jinrong Jiang, Hailong Liu, Pengfei Lin, Maoxue Yu, Kai Xu, Lian Zhao, Pengfei Wang, Weipeng Zheng, Jingwei Xie, Yanzhi Zhou, Tao Zhang, Feng Zhang, Yehong Zhang, Yue Yu, Yuzhu Wang, Yidi Bai, Chen Li, Zipeng Yu, Haoyu Deng, Yaxin Li, Xuebin Chi
2024A Probabilistic Approach To Selecting Build Configurations in Package Managers.
Daniel Nichols, Harshitha Menon, Todd Gamblin, Abhinav Bhatele
2024A Scalable Algorithm for Active Learning.
Youguang Chen, Zheyu Wen, George Biros
2024A Sparsity-Aware Distributed-Memory Algorithm for Sparse-Sparse Matrix Multiplication.
Yuxi Hong, Aydin Buluç
2024A Workflow Roofline Model for End-to-End Workflow Performance Analysis.
Nan Ding, Brian Austin, Yang Liu, Neil Mehta, Steven Farrell, Johannes P. Blaschke, Leonid Oliker, Hai Ah Nam, Nicholas J. Wright, Samuel Williams
2024APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes.
Yuanxin Wei, Jiangsu Du, Jiazhi Jiang, Xiao Shi, Xianwei Zhang, Dan Huang, Nong Xiao, Yutong Lu
2024Accelerated Atomistic Kinetic Monte Carlo Simulations of Resistive Memory Arrays.
Manasa Kaniselvan, Alexander Maeder, Marko Mladenovic, Mathieu Luisier, Alexandros Nikolaos Ziogas
2024Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression.
Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao
2024Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching.
Weihu Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng
2024Accurate and Convenient Energy Measurements for GPUs: A Detailed Study of NVIDIA GPU's Built-In Power Sensor.
Zeyu Yang, Karel Adámek, Wesley Armour
2024Adaptive Patching for High-resolution Image Segmentation with Transformers.
Enzhi Zhang, Isaac Lyngaas, Peng Chen, Xiao Wang, Jun Igarashi, Yuankai Huo, Masaharu Munetomo, Mohamed Wahib
2024AmgT: Algebraic Multigrid Solver on Tensor Cores.
Yuechen Lu, Lijie Zeng, Tengcheng Wang, Xu Fu, Wenxuan Li, Helin Cheng, Dechuang Yang, Zhou Jin, Marc Casas, Weifeng Liu
2024An Evaluation of the Effect of Network Cost Optimization for Leadership Class Supercomputers.
Awais Khan, John R. Lange, Nick Hagerty, Edwin F. Posada, John K. Holmen, James B. White, James Austin Harris, Verónica Melesse Vergara, Christopher Zimmer, Scott Atchley
2024Application-Driven Exascale: The JUPITER Benchmark Suite.
Andreas Herten, Sebastian Achilles, Damian Alvarez, Jayesh Badwaik, Eric Behle, Mathis Bode, Thomas Breuer, Daniel Caviedes-Voullième, Mehdi Cherti, Adel Dabah, Salem El Sayed, Wolfgang Frings, Ana Gonzalez-Nicolas, Eric B. Gregory, Kaveh Haghighi Mood, Thorsten Hater, Jenia Jitsev, Chelsea Maria John, Jan H. Meinke, Catrin I. Meyer, Pavel Mezentsev, Jan-Oliver Mirus, Stepan Nassyr, Carolin Penke, Manoel Römmer, Ujjwal Sinha, Benedikt von St. Vieth, Olaf Stein, Estela Suarez, Dennis Willsch, Ilya Zhukov
2024Asynchronous Distributed-Memory Parallel Algorithms for Influence Maximization.
Shubhendra Pal Singhal, Souvadra Hati, Jeffrey Young, Vivek Sarkar, Akihiro Hayashi, Richard W. Vuduc
2024Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs.
Mingkuan Xu, Shiyi Cao, Xupeng Miao, Umut A. Acar, Zhihao Jia
2024AutoCheck: Automatically Identifying Variables for Checkpointing by Data Dependency Analysis.
Xiang Fu, Weiping Zhang, Shiman Meng, Xin Huang, Wubiao Xu, Luanzheng Guo, Kento Sato
2024Automated Code Generation of High-Order Stencils for a Dataflow Architecture.
Ryuichi Sai, John M. Mellor-Crummey, Jinfan Xu, Mauricio Araya-Polo
2024Boosting Data Center Performance via Intelligently Managed Multi-backend Disaggregated Memory.
Jing Wang, Hanzhang Yang, Chao Li, Yiming Zhuansun, Wang Yuan, Cheng Xu, Xiaofeng Hou, Minyi Guo, Yang Hu, Yaqian Zhao
2024Boosting Earth System Model Outputs And Saving PetaBytes in Their Storage Using Exascale Climate Emulators.
Sameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, Ying Sun
2024Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 Potentials.
Ryan Stocks, Jorge L. Galvez Vallejo, Fiona C. Y. Yu, Calum Snowdon, Elise Palethorpe, Jakub Kurzak, Dmytro Bykov, Giuseppe M. J. Barca
2024Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System.
Kylee Santos, Stan G. Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan P. Thompson, Delyan Z. Kalchev, Danny Perez, Robert Schreiber, Scott Pakin, Edgar A. Leon, James H. Laros III, Michael James, Sivasankaran Rajamanickam
2024CARP: Range Query-Optimized Indexing for Streaming Data.
Ankush Jain, Charles D. Cranor, Qing Zheng, Bradley W. Settlemyer, George Amvrosiadis, Gary A. Grider
2024COAXIAL: A CXL-Centric Memory System for Scalable Servers.
Albert Cho, Anish Saxena, Moinuddin Qureshi, Alexandros Daglis
2024CUDASTF: Bridging the Gap Between CUDA and Task Parallelism.
Cédric Augonnet, Andrei Alexandrescu, Albert Sidelnik, Michael Garland
2024CoRD: Combining Raid and Delta for Fast Partial Updates in Erasure-Coded Storage Clusters.
Hai Zhou, Dan Feng, Yuchong Hu, Wei Wang, Huadong Huang
2024DBSR: An Efficient Storage Format for Vectorizing Sparse Triangular Solvers on Structured Grids.
Xiaojian Yang, Shengguo Li, Fan Yuan, Dezun Dong
2024DFTracer: An Analysis-Friendly Data Flow Tracer for AI-Driven Workflows.
Hariharan Devarajan, Loïc Pottier, Kaushik Velusamy, Huihuo Zheng, Izzet Yildirim, Olga Kogiou, Weikuan Yu, Anthony Kougkas, Xian-He Sun, Jae-Seung Yeom, Kathryn M. Mohror
2024Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers.
Siddharth Singh, Prajwal Singhania, Aditya K. Ranjan, John Kirchenbauer, Jonas Geiping, Yuxin Wen, Neel Jain, Abhimanyu Hans, Manli Shu, Aditya Tomar, Tom Goldstein, Abhinav Bhatele
2024Designing a GPU-Accelerated Communication Layer for Efficient Fluid-Structure Interaction Computations on Heterogeneous Systems.
Aristotle X. Martin, Geng Liu, Bálint Joó, Runxin Wu, Mohammed Shihab Kabir, Erik W. Draeger, Amanda Randles
2024Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication.
Isuru Ranawaka, Md Taufique Hussain, Charles Block, Gerasimos Gerogiannis, Josep Torrellas, Ariful Azad
2024Doubling Graph Traversal Efficiency to 198 TeraTEPS on the Supercomputer Fugaku.
Junya Arai, Masahiro Nakao, Yuto Inoue, Kanto Teranishi, Koji Ueno, Keiichiro Yamamura, Mitsuhisa Sato, Katsuki Fujisawa
2024EXO: Accelerating Storage Paravirtualization with eBPF.
Shi Qiu, Li Wang, Yiming Zhang
2024EcoLife: Carbon-Aware Serverless Function Scheduling for Sustainable Computing.
Yankai Jiang, Rohan Basu Roy, Baolin Li, Devesh Tiwari
2024Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link.
Dong Xu, Yuan Feng, Kwangsik Shin, Daewoo Kim, Hyeran Jeon, Dong Li
2024Efficient Weighted Graph Matching on GPUs.
Michael Mandulak, Sayan Ghosh, S. M. Ferdous, Mahantesh Halappanavar, George M. Slota
2024Enabling 13K-Atom Excited-State GW Calculations via Low-Rank Approximations and HPC on the New Sunway Supercomputer.
Wentiao Wu, Zhengbang Zhou, Qingcai Jiang, Junwei Feng, Xinming Qin, Huanhuan Ma, Zhenwei Cao, Junshi Chen, Sheng Chen, Xinyong Meng, Bingkun Hou, Yuanfan Xiong, Linhao Wang, Yixuan Sun, Hong An, Jinlong Yang, Wei Hu
2024Enumeration of Billions of Maximal Bicliques in Bipartite Graphs without Using GPUs.
Zhe Pan, Shuibing He, Xu Li, Xuechen Zhang, Yanlong Yin, Rui Wang, Lidan Shou, Mingli Song, Xian-He Sun, Gang Chen
2024Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of Interest.
Xuan Wu, Qian Gong, Jieyang Chen, Qing Liu, Norbert Podhorszki, Xin Liang, Scott Klasky
2024Exploring Efficient Partial Differential Equation Solution Using Speed Galerkin Transformer.
Xun Wang, Zeyang Zhu, Xiangyu Meng, Tao Song
2024Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects.
Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler
2024Fast and Efficient Scaling for Microservices with SurgeGuard.
Anyesha Ghosh, Neeraja J. Yadwadkar, Mattan Erez
2024Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning.
Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei, Junjie Qiu, Hui Qu, Zehui Ren, Zhangli Sha, Xuecheng Su, Xiaowen Sun, Yixuan Tan, Minghui Tang, Shiyu Wang, Yaohui Wang, Yongji Wang, Ziwei Xie, Yiliang Xiong, Yanhong Xu, Shengfeng Ye, Shuiping Yu, Yukun Zha, Liyue Zhang, Haowei Zhang, Mingchuan Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Yuheng Zou
2024GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems.
Xin You, Zhibo Xuan, Hailong Yang, Zhongzhi Luan, Yi Liu, Depei Qian
2024HPAC-ML: A Programming Model for Embedding ML Surrogates in Scientific Applications.
Zane Fink, Konstantinos Parasyris, Praneet Rathi, Giorgis Georgakoudis, Harshitha Menon, Peer-Timo Bremer
2024HiRace: Accurate and Fast Data Race Checking for GPU Programs.
John Jacobson, Martin Burtscher, Ganesh Gopalakrishnan
2024High Performance Unstructured SpMM Computation Using Tensor Cores.
Patrik Okanovic, Grzegorz Kwasniewski, Paolo Sylos Labini, Maciej Besta, Flavio Vella, Torsten Hoefler
2024Hydrogen: Contention-Aware Hybrid Memory for Heterogeneous CPU-GPU Architectures.
Yiwei Li, Mingyu Gao
2024KaMPIng: Flexible and (Near) Zero-Overhead C++ Bindings for MPI.
Tim Niklas Uhl, Matthias Schimek, Lukas Hübner, Demian Hespe, Florian Kurpicz, Daniel Seemaier, Christoph Stelz, Peter Sanders
2024LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming.
Siyuan Shen, Langwen Huang, Marcin Chrapek, Timo Schneider, Jai Dayal, Manisha Gajbe, Robert W. Wisniewski, Torsten Hoefler
2024LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services.
Malgorzata Lazuka, Andreea Anghel, Thomas P. Parnell
2024Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning.
Hongwei Jin, George Papadimitriou, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang, Anirban Mandal, Ewa Deelman
2024Learning Generalizable Program and Architecture Representations for Performance Modeling.
Lingda Li, Thomas Flynn, Adolfy Hoisie
2024LexiQL: Quantum Natural Language Processing on NISQ-era Machines.
Daniel Silver, Aditya Ranjan, Rakesh Achutha, Tirthak Patel, Devesh Tiwari
2024LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores.
Yiwei Zhang, Kun Li, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang
2024Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity.
Tuowei Wang, Kun Li, Zixu Hao, Donglin Bai, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang
2024M3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUs.
Dongho Ha, Yunan Zhang, Chen-Chien Kao, Christopher J. Hughes, Won Woo Ro, Hung-Wei Tseng
2024MCBound: An Online Framework to Characterize and Classify Memory/Compute-bound HPC Jobs.
Francesco Antici, Andrea Bartolini, Zeynep Kiziltan, Özalp Babaoglu, Yuetsu Kodama
2024MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators.
Zheng Zhang, Donglin Yang, Xiaobo Zhou, Dazhao Cheng
2024MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization.
Gautham Dharuman, Kyle Hippe, Alexander Brace, Sam Foreman, Väinö Hatanpää, Varuni Katti Sastry, Huihuo Zheng, Logan T. Ward, Servesh Muralidharan, Archit Vasan, Bharat Kale, Carla M. Mann, Heng Ma, Yun-Hsuan Cheng, Yuliana Zamora, Shengchao Liu, Chaowei Xiao, Murali Emani, Tom Gibbs, Mahidhar Tatineni, Deepak Canchi, Jerome Mitchell, Koichi Yamada, Maria Garzaran, Michael E. Papka, Ian T. Foster, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan
2024Many-Body Electronic Correlation Energy using Krylov Subspace Linear Solvers.
Shikhar Shah, Boqin Zhang, Hua Huang, John E. Pask, Phanish Suryanarayana, Edmond Chow
2024Matrix-Free Finite Volume Kernels on a Dataflow Architecture.
Ryuichi Sai, François P. Hamon, John M. Mellor-Crummey, Mauricio Araya-Polo
2024MegaMmap: Blurring the Boundary Between Memory and Storage for Data-Intensive Workloads.
Luke Logan, Anthony Kougkas, Xian-He Sun
2024Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUs.
Dechuang Yang, Yuxuan Zhao, Yiduo Niu, Weile Jia, En Shao, Weifeng Liu, Guangming Tan, Zhou Jin
2024MixQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction.
Yidong Chen, Chen Zhang, Rongchao Dong, Haoyuan Zhang, Yonghua Zhang, Zhonghua Lu, Jidong Zhai
2024Moirae: Generating High-Performance Composite Stencil Programs with Global Optimizations.
Xiaoyan Liu, Xinyu Yang, Kejie Ma, Shanghao Liu, Kaige Zhang, Hailong Yang, Yi Liu, Zhongzhi Luan, Depei Qian
2024NetCL: A Unified Programming Framework for In-Network Computing.
George Karlos, Henri E. Bal, Lin Wang
2024Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI.
Mikhail Khalilov, Salvatore Di Girolamo, Marcin Chrapek, Rami Nudelman, Gil Bloch, Torsten Hoefler
2024ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability.
Xiao Wang, Siyan Liu, Aristeidis Tsaris, Jong-Youl Choi, Ashwin M. Aji, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash
2024On the Efficacy of Surface Codes in Compensating for Radiation Events in Superconducting Devices.
Marzio Vallero, Gioele Casagranda, Flavio Vella, Paolo Rech
2024Optimizing Distributed ML Communication with Fused Computation-Collective Operations.
Kishore Punniyamurthy, Khaled Hamidouche, Bradford M. Beckmann
2024Optimizing Quantum Fourier Transformation (QFT) Kernels for Modern NISQ and FT Architectures.
Yuwei Jin, Xiangyu Gao, Minghao Guo, Henry Chen, Fei Hua, Chi Zhang, Eddy Z. Zhang
2024PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters.
Rutwik Jain, Brandon Tran, Keting Chen, Matthew D. Sinclair, Shivaram Venkataraman
2024Parallax: A Compiler for Neutral Atom Quantum Computers under Hardware Constraints.
Jason Ludmir, Tirthak Patel
2024ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments.
Munkyu Lee, Sihoon Seong, Minki Kang, Jihyuk Lee, Gap-Joo Na, In-Geol Chun, Dimitrios S. Nikolopoulos, Cheol-Ho Hong
2024PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation.
Branden Butler, Sixing Yu, Arya Mazaheri, Ali Jannesari
2024Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2024, Atlanta, GA, USA, November 17-22, 2024
2024Pushing the Limit of Quantum Mechanical Simulation to the Raman Spectra of a Biological System with 100 Million Atoms.
Honghui Shang, Ying Liu, Zhikun Wu, Zhenchuan Chen, Jinfeng Liu, Meiyue Shao, Yingzhou Li, Bowen Kan, Huimin Cui, Xiaobing Feng, Yunquan Zhang, Donald G. Truhlar, Hong An, Xiao He, Jinlong Yang
2024Rapid GPU-Based Pangenome Graph Layout.
Jiajie Li, Jan-Niklas Schmelzle, Yixiao Du, Simon Heumos, Andrea Guarracino, Giulia Guidi, Pjotr Prins, Erik Garrison, Zhiru Zhang
2024Realizing Joint Extreme-Scale Simulations on Multiple Supercomputers - Two Superfacility Case Studies.
Theresa Pollinger, Alexander Van Craen, Philipp Offenhäuser, Dirk Pflüger
2024Realizing Quantum Kernel Models at Scale with Matrix Product State Simulation.
Mekena Metcalf, Pablo Andrés-Martínez, Nathan Fitzpatrick
2024RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules.
Zaifeng Pan, Zhen Zheng, Feng Zhang, Bing Xie, Ruofan Wu, Shaden Smith, Chuanjie Liu, Olatunji Ruwase, Xiaoyong Du, Yufei Ding
2024Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine.
Barry Sly-Delgado, Ben Tovar, Jin Zhou, Douglas Thain
2024Revisiting Computation for Research: Practices and Trends.
Jeremiah Giordani, Ziyang Xu, Ella Colby, August Ning, Bhargav Reddy Godala, Ishita Chaturvedi, Shaowei Zhu, Yebin Chon, Greg Chan, Zujun Tan, Galen Collier, Jonathan D. Halverson, Enrico Armenio Deiana, Jasper Liang, Federico Sossai, Yian Su, Atmn Patel, Bangyen Pham, Nathan Greiner, Simone Campanoni, David I. August
2024SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing.
Chengzhi Lu, Huanle Xu, Yudan Li, Wenyan Chen, Kejiang Ye, Chengzhong Xu
2024Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day.
Jianxiong Li, Boyang Li, Zhuoqiang Guo, Mingzhen Li, Enji Li, Lijun Liu, Guojun Yuan, Zhan Wang, Guangming Tan, Weile Jia
2024Scaling New Heights: Transformative Cross-GPU Sampling for Training Billion-Edge Graphs.
Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng
2024Static Generation of Efficient OpenMP Offload Data Mappings.
Luke Marzen, Akash Dutta, Ali Jannesari
2024Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing.
Hanfei Yu, Hao Wang, Devesh Tiwari, Jian Li, Seung-Jong Park
2024Surpassing Sycamore: Achieving Energetic Superiority Through System-Level Circuit Simulation.
Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-yang Lu, Jian-Wei Pan, Zhilin Pei, Xingcheng Zhang, Wanli Ouyang
2024Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale Integration.
Yinxiao Feng, Kaisheng Ma
2024Tango: A Cross-layer Approach to Managing I/O Interference over Local Ephemeral Storage.
Zhenbo Qiao, Qirui Tian, Zhenlu Qin, Jinzhen Wang, Qing Liu, Norbert Podhorszki, Scott Klasky, Hongjian Zhu
2024TorchGT: A Holistic System for Large-Scale Graph Transformer Training.
Meng Zhang, Jie Sun, Qinghao Hu, Peng Sun, Zeke Wang, Yonggang Wen, Tianwei Zhang
2024Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression.
Hatem Ltaief, Rabab Alomairy, Qinglei Cao, Jie Ren, Lotfi Slim, Thorsten Kurth, Benedikt Dorschner, Salim Bougouffa, Rached Abdelkhalak, David E. Keyes
2024Toward High-Performance Blockchain System by Blurring the Line between Ordering and Execution.
Donghyeon Ryu, Chanik Park
2024Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku Supercomputer.
Ana Luisa Veroneze Solórzano, Kento Sato, Keiji Yamamoto, Fumiyoshi Shoji, Jim M. Brandt, Benjamin Schwaller, Sara Petra Walton, Jennifer Green, Devesh Tiwari
2024Towards Exascale Simulations of Nanoelectronic Devices in the GW Approximation.
Leonard Deuschle, Alexander Maeder, Vincent Maillou, Nicolas Vetsch, Anders Winka, Jiang Cao, Alexandros Nikolaos Ziogas, Mathieu Luisier
2024Towards Highly Compatible I/O-Aware Workflow Scheduling on HPC Systems.
Yiqin Dai, Ruibo Wang, Yong Dong, Kai Lu
2024UNR: Unified Notifiable RMA Library for HPC.
Guangnan Feng, Jiabin Xie, Dezun Dong, Yutong Lu
2024Understanding Data Movement Patterns in HPC: A NERSC Case Study.
Anna Giannakou, Damian Hazen, Bjoern Enders, Lavanya Ramakrishnan, Nicholas J. Wright
2024Unlocking High Performance with Low-Bit NPUs and CPUs for Highly Optimized HPL-MxP on Cloud Brain II.
Weicheng Xue, Kai Yang, Yongxiang Liu, Dengdong Fan, Pengxiang Xu, Yonghong Tian
2024Versatile Datapath Soft Error Detection on the Cheap for HPC Applications.
Yafan Huang, Sheng Di, Zhaorui Zhang, Xiaoyi Lu, Guanpeng Li
2024autoGEMM: Pushing the Limits of Irregular Matrix Multiplication on Arm Architectures.
Du Wu, Jintao Meng, Wenxi Zhu, Minwen Deng, Xiao Wang, Tao Luo, Mohamed Wahib, Yanjie Wei
2024cuSZ-i: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level Interpolation.
Jinyang Liu, Jiannan Tian, Shixun Wu, Sheng Di, Boyuan Zhang, Robert Underwood, Yafan Huang, Jiajun Huang, Kai Zhao, Guanpeng Li, Dingwen Tao, Zizhong Chen, Franck Cappello
2024cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio.
Yafan Huang, Sheng Di, Guanpeng Li, Franck Cappello
2024hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression.
Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Zizhe Jian, Xin Liang, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur