ICS A

84 papers

YearTitle / Authors
2025A Cost-Effective Dueling Framework for Set-Associative Cache Indexing.
Kevin Weston, Vahid Janfaza, Avery Johnson, Abdullah Muzahid
2025A Device-Side Execution Model for Multi-GPU Task Graphs.
Ilyas Turimbetov, Mohamed Wahib, Didem Unat
2025A Global Perspective on Supercomputer Power Provisioning: Case Studies from United States and Europe.
Tapasya Patki, Barry Rountree, Torsten Wilde, Andrea Bartolini, Stephanie Brink, Esa Heiskanen, Sachin Idgunji, Matthias Maiterth, James H. Rogers, Ermal Rrapaj, Ralf Schneider, Woong Shin, Kathleen Shoga, Christian Simmendinger, Nicholas J. Wright, Zhengji Zhao
2025A Multi-GPU Algorithm for Computing Maximal Independent Sets in Large Graphs.
Anju Mongandampulath Akathoott, Benila Virgin Jerald Xavier, Martin Burtscher
2025A3FR: Agile 3D Gaussian Splatting with Incremental Gaze Tracked Foveated Rendering in Virtual Reality.
Shuo Xin, Haiyu Wang, Sai Qian Zhang
2025Accelerating Complex Stencil Computations with Adaptive Fusion Strategy.
Siqi Wang, Hailong Yang, Pengbo Wang, Shaokang Du, Yufan Xu, Qingxiao Sun, Xiaoyan Liu, Xuezhu Wang, Xuning Liang, Zhongzhi Luan, Yi Liu, Depei Qian
2025An Efficient 2D Fusion Method for High-Performance Two-Stage Eigensolvers on Modern Heterogeneous Architectures.
Yongxiao Zhou, Yi Zong, Yuyang Jin, Heng Li, Wei Xue
2025Analyzing the Performance of Applications at Exascale.
Dragana Grbic, John M. Mellor-Crummey
2025Auto-Healer: Self-Healing Hardware for Perception Stage Faults in Autonomous Driving Systems.
Ali Suvizi, Guru Venkataramani
2025BMQSim: Overcoming Memory Constraints in Quantum Circuit Simulation with a High-Fidelity Compression Framework.
Boyuan Zhang, Bo Fang, Fanjiang Ye, Luanzheng Guo, Fengguang Song, Nathan R. Tallent, Dingwen Tao
2025BitWeaver: Read-Time Truncation in Memory.
Garrett Gagnon, Srikanth Malla, Yangwook Kang, Liu Liu
2025CB-SpMV: A Data Aggregating and Balance Algorithm for for Cache-Friendly Block-Based SpMV on GPUs.
Xing Cong, FuKai Sun, Yifan Chen, Chenhao Xie, Yi Liu, Depei Qian
2025CIExplorer: Microarchitecture-Aware Exploration for Tightly Integrated Custom Instruction.
Xiaoyu Hao, Sen Zhang, Liang Qiao, Qingcai Jiang, Jun Shi, Junshi Chen, Hong An, Xulong Tang, Hao Shu, Honghui Yuan
2025CLOVER: A GPU-native, Spatio-graph-based Approach to Exact kNN.
Victor Kamel, Hanxueyu Yan, Sean Chester
2025CRAMG: A Communication-Reduced Algebraic Multigrid Method.
Fan Yuan, Xiaojian Yang, Yunqing Huang, Dezun Dong, Chuanfu Xu, Jie Liu, Xiaoqiang Yue, Shengguo Li, Hongxia Wang
2025CTCCL: Cost-Efficient Joint Device-Network Load Balancing for LLM Training in RoCE-based Intelligent Computing Network.
Zhuotong Li, Liang Xu, Ziqi Huang, Shuyun Qian, Hongwei Bu, Ming Yang, Mengyun Luan, Weiguo Chen, Xu Wen
2025Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models.
Runsheng Benson Guo, Utkarsh Anand, Arthur Chen, Khuzaima Daudjee
2025Cherry: Breaking the GPU Memory Wall for Large-Scale GNN Training via Micro-Batching.
Yan Wang, Qinghua Guo, Haoran Kong, Kai Sheng, Zhen Xie, Hao Chen, Weile Jia, Dingwen Tao, Xin He
2025CoLa: Towards Communication-efficient Distributed Sparse Matrix-Matrix Multiplication on GPUs.
Lixing Zhang, Yingxia Shao, Shigang Li
2025ConCo: Optimizing Compilation of Concurrent Tensor Programs on Shared GPU.
Jiamin Lu, Jingwei Sun, Yunlong Xu, Peng Sun, Guangzhong Sun
2025ConTraPh: Contrastive Learning for Parallelization and Performance Optimization.
Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Nesreen K. Ahmed, Theodore L. Willke, Ali Jannesari
2025D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage.
Maxime Gonthier, Dante D. Sánchez-Gallegos, Haochen Pan, Bogdan Nicolae, Sicheng Zhou, Hai Duc Nguyen, Valérie Hayot-Sasson, J. Gregory Pauloski, Jesús Carretero, Kyle Chard, Ian T. Foster
2025DALdex: A DPU-Accelerated Persistent Learned Index via Incremental Learning.
Aoyang Tong, Yu Hua, Menglei Chen
2025DEDUPKV: A Space-Efficient and High-Performance Key-Value Store via Fine-Grained Deduplication.
Safdar Jamil, Awais Khan, Xubin He, Youngjae Kim
2025DIMPLES: Distributed Influence Maximization for Pandemic pLanning on Exascale Systems.
Marco Minutoli, Reece Neff, Naw Safrin Sattar, Hao Lu, John Feo, Henning S. Mortveit, Anil Vullikanti, Dawen Xie, Mandy L. Wilson, Gregor von Laszewski, Parantapa Bhattacharya, S. M. Ferdous, Ananth Kalyanaraman, Michela Becchi, Madhav V. Marathe, Mahantesh Halappanavar
2025DIV: An Index & Value compression method for SpMV on large matrices.
Dimitrios Galanopoulos, Panagiotis Mpakos, Petros Anastasiadis, Nectarios Koziris, Georgios I. Goumas
2025DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs.
Yuebo Luo, Shiyang Li, Junran Tao, Kiran Gautam Thorat, Xi Xie, Hongwu Peng, Nuo Xu, Caiwen Ding, Shaoyi Huang
2025DREAM: Device-Driven Efficient Access to Virtual Memory.
Nurlan Nazaraliyev, Elaheh Sadredini, Nael B. Abu-Ghazaleh
2025DeCOS: Data-Efficient Reinforcement Learning for Compiler Optimization Selection Ignited by LLM.
Tianming Cui, Pen-Chung Yew, Stephen McCamant, Antonia Zhai
2025EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC.
Siyuan Shen, Mikhail Khalilov, Lukas Gianinazzi, Timo Schneider, Marcin Chrapek, Jai Dayal, Manisha Gajbe, Robert W. Wisniewski, Torsten Hoefler
2025EPIClear: Exploiting Domain-Specific Features for Epistasis Detection Acceleration on Tensor Cores.
Ricardo Nobre, Miguel Graça, Leonel Sousa, Aleksandar Ilic
2025EVeREST-C: An Effective and Versatile Runtime Energy Saving Tool for CPUs.
Anna Yue, Pen-Chung Yew, Sanyam Mehta
2025Efficient Locality-aware Instruction Stream Scheduling for Stencil Computation on ARM Processors.
Shanghao Liu, Hailong Yang, Xin You, Zhongzhi Luan, Yi Liu, Depei Qian
2025Efficient Server Consolidation through a balanced mix of Transformer-based and Conventional Applications.
Pablo Abad, Pablo Prieto, Valentin Puente, José-Ángel Gregorio
2025Fast and Fair Training for Deep Learning in Heterogeneous GPU Clusters.
Zizhao Mo, Huanle Xu, Wing Cheong Lau
2025From Islands to Archipelago: Towards Collaborative and Adaptive Burst Buffer for HPC Systems.
Mingtian Shao, Ruibo Wang, Wenzhe Zhang, Kai Lu, Yiqin Dai, Huijun Wu
2025Fused3S: Fast Sparse Attention on Tensor Cores.
Zitong Li, Aparna Chandramowlishwaran
2025G^3SA: A GPU-Accelerated Gold Standard Genomics Library for End-to-End Sequence Alignment.
Yeejoo Han, Sunwoo Kim, Seongyeon Park, Jinho Lee
2025Generating Microservice Graphs with Production Characteristics for Efficient Resource Scaling.
Fanrong Du, Jiuchen Shi, Quan Chen, Pu Pang, Li Li, Minyi Guo
2025GraCFL: A Holistically Designed Vertex-Centric Graph System for CFL Reachability.
Sakib Fuad, Amir Hossein Nodehi Sabet, Umar Farooq, Zhijia Zhao
2025Graph Convolutional Network Acceleration Using Adiabatic Superconductor Josephson Devices.
Zhengang Li, Hongwu Peng, Xuan Shen, Masoud Zabihi, Xi Xie, Geng Yuan, Yanzhi Wang, Olivia Chen, Caiwen Ding
2025HARNESS: Holistic Resource Management for Diversely Scaled Edge Cloud Systems.
Ismet Dagli, Justin Davis, Mehmet Esat Belviranli
2025HR-SpMM: Adaptive Row Partitioning and Hybrid Kernel Design for Sparse Matrix Multiplication.
Qi Wang, Yaobin Wang, Yi Luo, Rong Luo, Pingping Tang
2025IA-Chol: Input-Aware Cholesky Decomposition on CPU and GPU.
Jixiao Deng, Qinglin Wang, Lin Chen, Tun Li, Bo Yang, Xinhai Chen, Jie Liu
2025JBSA: A Bit-Serial Accelerator for Deep Neural Networks Using Superconducting SFQ Logic.
Yang Su, Sheng Li, Huilong Jiang, Haofei Yin, Rongliang Fu, JunYing Huang, Xiaochun Ye, Zhimin Zhang, Jie Ren, Xiaoping Gao, Tsung-Yi Ho, Dongrui Fan
2025Leonid: Exploring Automated Kernel Fusion in Performance-Portable Programming Models for Scientific Computation.
Chenchen Zhang, Hao Luo, Chao Yang
2025Light-FP: Analyze Floating-Point Error in a Highly Condensed Approach.
Jiazhi Mi, Li Chen, Haoyu Wang, Ruixiang Gao, Hongze Zhang, Ronghong Shen, Kai Lin, You Fu, Huimin Cui
2025Loop Fusion in Matrix Multiplications with Sparse Dependence.
Mohammad Mahdi Salehi Dezfuli, Kazem Cheshmi
2025MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs.
Jordi Wolfson-Pou, Jan Laukemann, Fabrizio Petrini
2025MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem.
Melina Soysal, Konstantina Koliogeorgi, Can Firtina, Nika Mansouri-Ghiasi, Rakesh Nadig, Haiyu Mao, Geraldo Francisco de Oliveira Junior, Yu Liang, Klea Zambaku, Mohammad Sadrosadati, Onur Mutlu
2025MEMPLEX: A Memory System with Replication and Migration of Data for Multi-Chiplet NUMA Architectures.
Neethu Bal Mallya, Bhavishya Goel, Ioannis Sourdis
2025MG-αGCD: Accelerating Graph Community Detection on Multi-GPU Platforms.
Shuai Yang, Changyou Zhang
2025Multi-Node Multi-GPU Datalog.
Ahmedur Rahman Shovon, Yihao Sun, Kristopher K. Micinski, Thomas Gilray, Sidharth Kumar
2025NeurLZ: An Online Neural Learning-based Method to Enhance Scientific Lossy Compression.
Wenqi Jia, Zhewen Hu, Youyuan Liu, Boyuan Zhang, Jinzhen Wang, Jinyang Liu, Wei Niu, Stavros Kalafatis, Junzhou Huang, Sian Jin, Daoce Wang, Jiannan Tian, Miao Yin
2025OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths.
Leo Gold, Adam Bienkowski, David Sidoti, Krishna R. Pattipati, Omer Khan
2025ORA: Job Runtime Prediction for High-Performance Computing Platforms Using the Online Retrieval-Augmented Language Model.
Hongyi Liu, Yinping Ma, Xiaosong Huang, Lingzhe Zhang, Tong Jia, Ying Li
2025ORION: Optimizing OLAP Query Execution with Proactive Caching and Separate Operators.
Zhixin Tong, Jiuchen Shi, Quan Chen, Pu Pang, Shixuan Sun, Jie Meng, Jiang Liu, En Shao, Minyi Guo
2025OpaQue: Program Output Obfuscation for Quantum Software Circuits in Quantum Clouds.
Tirthak Patel, Aditya Ranjan, Daniel Silver, Harshitta Gandhi, William Cutler, Devesh Tiwari
2025PIE: Enabling Fast and Scalable Incremental Evolving Graph Analytics on Persistent Memory.
Yunmo Zhang, Jiacheng Huang, Xizhe Yin, Junqiao Qiu, Hong Xu, Chun Jason Xue
2025PIM-CARE: A Compiler-Assisted Dynamic Resource Allocation Framework for Real-world DRAM PIM.
Inyong Hwang, Donghyeon Kim, Seokwon Kang, Taehyeong Park, Taehoon Kim, Jiwon Seo, Hanjun Kim, Youngsok Kim, Yongjun Park
2025Page Migration for Hardware Memory Disaggregation Across a Network.
Archit Patke, Christian Pinto, Saurabh Jha, Haoran Qiu, Zbigniew Kalbarczyk, Ravishankar K. Iyer
2025Parallel Contraction Hierarchies Can Be Efficient and Scalable.
Zijin Wan, Xiaojun Dong, Letong Wang, Enzuo Zhu, Yan Gu, Yihan Sun
2025Pearl: Automatic Code Optimization Using Deep Reinforcement Learning.
Djamel Rassem Lamouri, Iheb Nassim Aouadj, Smail Kourta, Riyadh Baghdadi
2025Persistent Memory Objects on the Cheap.
Derrick Greenspan, Naveed Ul Mustafa, Jongouk Choi, Mark Heinrich, Yan Solihin
2025PortFC: Designing High-performance Deadlock-free BCube Networks.
Peirui Cao, Rui Ning, Hongwei Yang, Zhaochen Zhang, Chang Liu, Rui Li, Yongqi Yang, Yunzhuo Liu, Chengyuan Huang, Tao Sun, Xiaodong Duan, Guihai Chen, Chen Tian
2025Proceedings of the 39th ACM International Conference on Supercomputing, ICS 2025, Salt Lake City, UT, USA, June 8-11, 2025
2025Proteus: Achieving High-Performance Processing-Using-DRAM with Dynamic Bit-Precision, Adaptive Data Representation, and Flexible Arithmetic.
Geraldo Francisco de Oliveira Junior, Mayank Kabra, Yuxin Guo, Kangqi Chen, Abdullah Giray Yaglikçi, Melina Soysal, Mohammad Sadrosadati, Joaquín Olivares Bueno, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu
2025Pushing the Limits of GPU Lossy Compression: A Hierarchical Delta Approach.
Boyuan Zhang, Yafan Huang, Sheng Di, Fengguang Song, Guanpeng Li, Franck Cappello
2025ROCKET: An RNS-based Photonic Accelerator for High-Precision and Energy-Efficient DNN Training.
Hao Zhang, Haibo Zhang, Chengpeng Xia, Zhiyi Huang, Yawen Chen, Amanda S. Barnard
2025SYprox: Combining Host and Device Perforation with Mixed Precision Approximation on Heterogeneous Architectures.
Lorenzo Carpentieri, Biagio Cosenza
2025Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers.
Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib
2025SmartNIC-GPU-CPU Heterogeneous System for Large Machine Learning Model with Software-Hardware Codesign.
Anqi Guo, Yuchen Hao, Xiteng Yao, Shining Yang, Jianyu Huang, Tony Tong Geng, Martin C. Herbordt
2025SnuSOLVER: Optimizing Sparse Direct Solvers for Heterogeneous Systems.
Chaewon Kim, Jaehwan Lee, Jinpyo Kim, Dohyun Kim, Kyusu Ahn, Hyung Uk Cho, Seungin Baek, Jaejin Lee
2025SortingHat: System Topology-aware Scheduling of Deep Neural Network Models on Multi-GPU Systems.
Seok Namkoong, Taehyeong Park, Kiung Jung, Jinyoung Kim, Yongjun Park
2025SparsePIM: An Efficient HBM-Based PIM Architecture for Sparse Matrix-Vector Multiplications.
Taewoon Kang, Geonwoo Choi, Taeweon Suh, Gunjae Koo
2025Statistical Treatment of Variable MPI Latencies and MPI-Communication Hiding for Matrix-Free Finite Element Operators.
Max Heldman, Johann Rudi, Julie Bessac
2025StructILU: Dependency-Preserving Incomplete LU with Hierarchical Parallelism for Structured Grid PDEs on GPUs.
Hao Luo, Qianchao Zhu, Xiaochen Hao, Chunxi Lei, Chengdi Ma, Chenchen Zhang, Yun Liang, Chao Yang
2025TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations.
Jiexiong Guan, Zhenqing Hu, Christos D. Antonopoulos, Nikolaos Bellas, Spyros Lalis, Evgenia Smirni, Gang Zhou, Gagan Agrawal, Bin Ren
2025Taking GPU Programming Models to Task for Performance Portability.
Joshua Hoke Davis, Pranav Sivaraman, Joy Kitson, Konstantinos Parasyris, Harshitha Menon, Isaac Minn, Giorgis Georgakoudis, Abhinav Bhatele
2025UJOpt: Heuristic Approach for Applying Unroll-and-Jam Optimization and Loop Order Selection.
Shilpa Babalad, Shirish K. Shevade, Matthew Jacob Thazhuthaveetil, R. Govindarajan
2025Understanding the Idiosyncrasies of Emerging BlueField DPUs.
Arjun Kashyap, Yuke Li, Darren Ng, Xiaoyi Lu
2025WisIO: Automated I/O Bottleneck Detection with Multi-Perspective Views for HPC Workflows.
Izzet Yildirim, Hariharan Devarajan, Anthony Kougkas, Xian-He Sun, Kathryn M. Mohror
2025YH-Light: Yielding Hierarchy-aware Partitioner for Large-scale Graph Processing.
Xinbiao Gan, Tiejun Li, Chunye Gong, Jie Liu, Kai Lu
2025ghZCCL: Advancing GPU-aware Collective Communications with Homomorphic Compression.
Jiajun Huang, Sheng Di, Yafan Huang, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur