HPCA A*

122 papers

YearTitle / Authors
2025A Hardware-Software Design Framework for SpMV Acceleration with Flexible Access Pattern Portfolio.
Zhenyu Wu, Maolin Wang, Hayden Kwok-Hay So
2025ARTEMIS: Agile Discovery of Efficient Real-Time Systems-on-Chips in the Heterogeneous Era.
Subhankar Pal, Aporva Amarnath, Behzad Boroujerdian, Augusto Vega, Alper Buyuktosunoglu, John-David Wellman, Vijay Janapa Reddi, Pradip Bose
2025AccelES: Accelerating Top-K SpMV for Embedding Similarity via Low-bit Pruning.
Jiaqi Zhai, Xuanhua Shi, Kaiyi Huang, Chencheng Ye, Weifang Hu, Bingsheng He, Hai Jin
2025Adyna: Accelerating Dynamic Neural Networks with Adaptive Scheduling.
Zhiyao Li, Bohan Yang, Jiaxiang Li, Taijie Chen, Xintong Li, Mingyu Gao
2025Anaheim: Architecture and Algorithms for Processing Fully Homomorphic Encryption in Memory.
Jongmin Kim, Sungmin Yun, Hyesung Ji, Wonseok Choi, Sangpyo Kim, Jung Ho Ahn
2025Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format.
Chao Fang, Man Shi, Robin Geens, Arne Symons, Zhongfeng Wang, Marian Verhelst
2025Architecting Space Microdatacenters: A System-level Approach.
Nathan Bleier, Rick Eason, Michael Lembeck, Rakesh Kumar
2025Architecting Value Prediction around In-Order Execution.
Pierre Ravenel, Arthur Perais, Benoît Dupont de Dinechin, Frédéric Pétrot
2025Ariadne: A Hotness-Aware and Size-Adaptive Compressed Swap Technique for Fast Application Relaunch and Reduced CPU Usage on Mobile Devices.
Yu Liang, Aofeng Shen, Chun Jason Xue, Riwei Pan, Haiyu Mao, Nika Mansouri-Ghiasi, Qingcai Jiang, Rakesh Nadig, Lei Li, Rachata Ausavarungnirun, Mohammad Sadrosadati, Onur Mutlu
2025AsyncDIMM: Achieving Asynchronous Execution in DIMM-Based Near-Memory Processing.
Liyan Chen, Dongxu Lyu, Jianfei Jiang, Qin Wang, Zhigang Mao, Naifeng Jing
2025AutoRFM: Scaling Low-Cost in-DRAM Trackers to Ultra-Low Rowhammer Thresholds.
Moinuddin Qureshi
2025BOSS: Blocking algorithm for optimizing shuttling scheduling in Ion Trap.
Xian Wu, Chenghong Zhu, Jingbo Wang, Xin Wang
2025Bit-slice Architecture for DNN Acceleration with Slice-level Sparsity Enhancement and Exploitation.
Insu Choi, Young-Seo Yoon, Joon-Sung Yang
2025BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration.
Yuzong Chen, Ahmed F. AbouElhamayed, Xilai Dai, Yang Wang, Marta Andronic, George A. Constantinides, Mohamed S. Abdelfattah
2025BrokenSleep: Remote Power Timing Attack Exploiting Processor Idle States.
Hyosang Kim, Ki-Dong Kang, Gyeongseo Park, Seungkyu Lee, Daehoon Kim
2025Buffalo: Enabling Large-Scale GNN Training via Memory-Efficient Bucketization.
Shuangyan Yang, Minjia Zhang, Dong Li
2025CORDOBA: Carbon-Efficient Optimization Framework for Computing Systems.
Mariam Elgamal, Doug Carmean, Elnaz Ansari, Okay Zed, Ramesh Peri, Srilatha Manne, Udit Gupta, Gu-Yeon Wei, David Brooks, Gage Hills, Carole-Jean Wu
2025CROSS: Compiler-Driven Optimization of Sparse DNNs Using Sparse/Dense Computation Kernels.
Fangxin Liu, Shiyuan Huang, Ning Yang, Zongwu Wang, Haomin Li, Li Jiang
2025Cambricon-DG: An Accelerator for Redundant-Free Dynamic Graph Neural Networks Based on Nonlinear Isolation.
Zhifei Yue, Xinkai Song, Tianbo Liu, Xing Hu, Rui Zhang, Zidong Du, Wei Li, Qi Guo, Tianshi Chen
2025ChameleonEC: Exploiting Tunability of Erasure Coding for Low-Interference Repair.
Yuhui Cai, Shiyao Lin, Zhirong Shen, Jiahui Yang, Jiwu Shu
2025Choco-Q: Commute Hamiltonian-based QAOA for Constrained Binary Optimization.
Debin Xiang, Qifan Jiang, Liqiang Lu, Siwei Tan, Jianwei Yin
2025Chronus: Understanding and Securing the Cutting-Edge Industry Solutions to DRAM Read Disturbance.
Oguzhan Canpolat, A. Giray Yaglikçi, Geraldo F. Oliveira, Ataberk Olgun, Nisa Bostanci, Ismail Emir Yuksel, Haocong Luo, Oguz Ergin, Onur Mutlu
2025CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design.
Zishen Wan, Hanchen Yang, Ritik Raj, Che-Kai Liu, Ananda Samajdar, Arijit Raychowdhury, Tushar Krishna
2025Concord: Rethinking Distributed Coherence for Software Caches in Serverless Environments.
Jovan Stojkovic, Chloe Alverti, Alan Andrade, Nikoleta Iliakopoulou, Hubertus Franke, Tianyin Xu, Josep Torrellas
2025Cooperative Warp Execution in Tensor Core for RISC-V GPGPU.
Abubakr Nada, Giuseppe Maria Sarda, Erwan Lenormand
2025Criticality-Aware Instruction-Centric Bandwidth Partitioning for Data Center Applications.
Liren Zhu, Liujia Li, Jianyu Wu, Yiming Yao, Zhan Shi, Jie Zhang, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, Diyu Zhou
2025DAPPER: A Performance-Attack-Resilient Tracker for RowHammer Defense.
Jeonghyun Woo, Prashant J. Nair
2025DPUaudit: DPU-assisted Pull-based Architecture for Near-Zero Cost System Auditing.
Peng Jiang, Hanlin Jiang, Ruizhe Huang, Hanwen Lei, Zhineng Zhong, Shaokun Zhang, Yuxin Ren, Ning Jia, Xinwei Hu, Yao Guo, Xiangqun Chen, Ding Li
2025Delinquent Loop Pre-execution Using Predicated Helper Threads.
Anirudh Seshadri, Eric Rotenberg
2025Ditto: Accelerating Diffusion Model via Temporal Value Similarity.
Sungbin Kim, Hyunwuk Lee, Wonho Cho, Mincheol Park, Won Woo Ro
2025DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency.
Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, Esha Choukse
2025EDA: Energy-Efficient Inter-Layer Model Compilation for Edge DNN Inference Acceleration.
Bo Ren Pao, I-Chia Chen, En-Hao Chang, Tsung Tai Yeh
2025EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform.
Yi Huang, Xinsheng Gong, Xiangyu Kong, Dibei Chen, Jianfeng Zhu, Wenping Zhu, Liangwei Li, Mingyu Gao, Shaojun Wei, Aoyang Zhang, Leibo Liu
2025EIGEN: Enabling Efficient 3DIC Interconnect with Heterogeneous Dual-Layer Network-on-Active-Interposer.
Siyao Jia, Bo Jiao, Haozhe Zhu, Chixiao Chen, Qi Liu, Ming Liu
2025ER-DCIM: Error-Resilient Digital CIM Architecture with Run-Time MAC-Cell Error Correction.
Zhen He, Yiqi Wang, Zihan Wu, Shaojun Wei, Yang Hu, Fengbin Tu, Shouyi Yin
2025EXION: Exploiting Inter-and Intra-Iteration Output Sparsity for Diffusion Models.
Jaehoon Heo, Adiwena Putra, Jieon Yoon, Sungwoong Yune, Hangyeol Lee, Ji-Hoon Kim, Joo-Young Kim
2025Efficient Caching with A Tag-enhanced DRAM.
Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael R. Miller, Taeksang Song, Thomas Vogelsang, Steven C. Woo, Jason Lowe-Power
2025Efficient Memory Side-Channel Protection for Embedding Generation in Machine Learning.
Muhammad Umar, Akhilesh Parag Marathe, Monami Dutta Gupta, Shubham Jogprakash Ghosh, G. Edward Suh, Wenjie Xiong
2025Efficient Optimization with Encoded Ising Models.
Devrath Iyer, Sara Achour
2025Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization.
Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Chang Zhou, Dennis Cai, Yuan Xie, Binzhang Fu
2025Enterprise Class Modular Cache Hierarchy.
Craig R. Walters, Deanna Postles Dunn Berger, Robert J. Sonnelitter, Alper Buyuktosunoglu
2025Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs.
Qizhe Wu, Huawen Liang, Yuchen Gui, Zhichen Zeng, Zerong He, Linfeng Tao, Xiaotian Wang, Letian Zhao, Zhaoxi Zeng, Wei Yuan, Wei Wu, Xi Jin
2025FACIL: Flexible DRAM Address Mapping for SoC-PIM Cooperative On-device LLM Inference.
Seong Hoon Seo, Junghoon Kim, Donghyun Lee, Seonah Yoo, Seokwon Moon, Yeonhong Park, Jae W. Lee
2025FHENDI: A Near-DRAM Accelerator for Compiler-Generated Fully Homomorphic Encryption Applications.
Yongmo Park, Aporva Amarnath, Subhankar Pal, Karthik Swaminathan, Alper Buyuktosunoglu, Hayim Shaul, Ehud Aharoni, Nir Drucker, Wei D. Lu, Omri Soceanu, Pradip Bose
2025FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables.
Gunho Park, Hyeokjun Kwon, JiWoo Kim, Jeongin Bae, Baeseong Park, Dongsoo Lee, Youngjoo Lee
2025From Optimal to Practical: Efficient Micro-op Cache Replacement Policies for Data Center Applications.
Kan Zhu, Yilong Zhao, Yufei Gao, Peter Braun, Tanvir Ahmed Khan, Heiner Litz, Baris Kasikci, Shuwen Deng
2025GSArch: Breaking Memory Barriers in 3D Gaussian Splatting Training via Architectural Support.
Houshu He, Gang Li, Fangxin Liu, Li Jiang, Xiaoyao Liang, Zhuoran Song
2025Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR.
Zhifan Ye, Yonggan Fu, Jingqun Zhang, Leshu Li, Yongan Zhang, Sixu Li, Cheng Wan, Chenxi Wan, Chaojian Li, Sreemanth Prathipati, Yingyan Celine Lin
2025Gaze into the Pattern: Characterizing Spatial Patterns with Internal Temporal Correlations for Hardware Prefetching.
Zixiao Chen, Chentao Wu, Yunfei Gu, Ranhao Jia, Jie Li, Minyi Guo
2025Gemina: A Coordinated and High-Performance Memory Deduplication Engine.
Zhehua Zhang, Suzhen Wu, Wenyan You, Chunfeng Du, Bo Mao
2025GoPIM: GCN-Oriented Pipeline Optimization for PIM Accelerators.
Siling Yang, Shuibing He, Wenjiong Wang, Yanlong Yin, Tong Wu, Weijian Chen, Xuechen Zhang, Xian-He Sun, Dan Feng
2025Grad: Intelligent Microservice Scaling by Harnessing Resource Fungibility.
Liao Chen, Chenyu Lin, Shutian Luo, Huanle Xu, Chengzhong Xu
2025HATT: Hamiltonian Adaptive Ternary Tree for Optimizing Fermion-to-Qubit Mapping.
Yuhao Liu, Kevin Yao, Jonathan Hong, Julien Froustey, Ermal Rrapaj, Costin Iancu, Gushu Li, Yunong Shi
2025HILP: Accounting for Workload-Level Parallelism in System-on-Chip Design Space Exploration.
Joseph Rogers, Lieven Eeckhout, Magnus Jahre
2025HSMU-SpGEMM: Achieving High Shared Memory Utilization for Parallel Sparse General Matrix-Matrix Multiplication on Modern GPUs.
Min Wu, Huizhang Luo, Fenfang Li, Yiran Zhang, Zhuo Tang, Kenli Li, Jeff Zhang, Chubo Liu
2025Hydra: Scale-out FHE Accelerator Architecture for Secure Deep Learning on FPGA.
Yinghao Yang, Xicheng Xu, Haibin Zhang, Jie Song, Xin Tang, Hang Lu, Xiaowei Li
2025I-DGNN: A Graph Dissimilarity-based Framework for Designing Scalable and Efficient DGNN Accelerators.
Jiaqi Yang, Hao Zheng, Ahmed Louri
2025IEEE International Symposium on High Performance Computer Architecture, HPCA 2025, Las Vegas, NV, USA, March 1-5, 2025
2025IRIS: Unleashing ISP-Software Cooperation to Optimize the Machine Vision Pipeline.
Raúl Taranco, José-María Arnau, Antonio González
2025InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference.
Xiurui Pan, Endian Li, Qiao Li, Shengwen Liang, Yizhou Shan, Ke Zhou, Yingwei Luo, Xiaolin Wang, Jie Zhang
2025Integrating Prefetcher Selection with Dynamic Request Allocation Improves Prefetching Efficiency.
Mengming Li, Qijun Zhang, Yongqing Ren, Zhiyao Xie
2025Interleaved Logical Qubits in Atom Arrays.
Joshua Viszlai, Sophia Fuhui Lin, Siddharth Dangwal, Conor Bradley, Vikram Ramesh, Jonathan M. Baker, Hannes Bernien, Frederic T. Chong
2025LAD: Efficient Accelerator for Generative Inference of LLM with Locality Aware Decoding.
Haoran Wang, Yuming Li, Haobo Xu, Ying Wang, Liqi Liu, Jun Yang, Yinhe Han
2025LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications.
Yujun Lin, Zhekai Zhang, Song Han
2025LSQCA: Resource-Efficient Load/Store Architecture for Limited-Scale Fault-Tolerant Quantum Computing.
Takumi Kobori, Yasunari Suzuki, Yosuke Ueno, Teruo Tanimoto, Synge Todo, Yuuki Tokunaga
2025LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator.
Guoyu Li, Shengyu Ye, Chunyun Chen, Yang Wang, Fan Yang, Ting Cao, Cheng Liu, Mohamed M. Sabry Aly, Mao Yang
2025LegoZK: A Dynamically Reconfigurable Accelerator for Zero-Knowledge Proof.
Zhengbang Yang, Lutan Zhao, Peinan Li, Han Liu, Kai Li, Boyan Zhao, Dan Meng, Rui Hou
2025Let-Me-In: (Still) Employing In-pointer Bounds Metadata for Fine-grained GPU Memory Safety.
Jaewon Lee, Euijun Chung, Saurabh Singh, Seonjin Na, Yonghae Kim, Jaekyu Lee, Hyesoon Kim
2025Lincoln: Real-Time 50~100B LLM Inference on Consumer Devices with LPDDR-Interfaced, Compute-Enabled Flash Memory.
Weiyi Sun, Mingyu Gao, Zhaoshi Li, Aoyang Zhang, Iris Ying Chou, Jianfeng Zhu, Shaojun Wei, Leibo Liu
2025M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type.
Weiming Hu, Haoyan Zhang, Cong Guo, Yu Feng, Renyang Guan, Zhendong Hua, Zihan Liu, Yue Guan, Minyi Guo, Jingwen Leng
2025MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI.
Arya Tschand, Arun Tejusve Raghunath Rajan, Sachin Idgunji, Anirban Ghosh, Jeremy Holleman, Csaba Király, Pawan Ambalkar, Ritika Borkar, Ramesh Chukka, Trevor Cockrell, Oliver Curtis, Grigori Fursin, Miro Hodak, Hiwot Kassa, Anton Lokhmotov, Dejan Miskovic, Yuechao Pan, Manu Prasad Manmathan, Liz Raymond, Tom St. John, Arjun Suresh, Rowan Taubitz, Sean Zhan, Scott Wasson, David Kanter, Vijay Janapa Reddi
2025Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory.
Jie Ren, Bin Ma, Shuangyan Yang, Benjamin Francis, Ehsan K. Ardestani, Min Si, Dong Li
2025Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM.
Lian Liu, Shixin Zhao, Bing Li, Haimeng Ren, Zhaohui Xu, Mengdi Wang, Xiaowei Li, Yinhe Han, Ying Wang
2025Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput.
Jiwon Lee, Gun Ko, Myung Kuk Yoon, Ipoom Jeong, Yunho Oh, Won Woo Ro
2025Mascot: Predicting Memory Dependencies and Opportunities for Speculative Memory Bypassing.
Karl H. Mose, Sebastian S. Kim, Alberto Ros, Timothy M. Jones, Robert D. Mullins
2025MeHyper: Accelerating Hypergraph Neural Networks by Exploring Implicit Dataflows.
Wenju Zhao, Pengcheng Yao, Dan Chen, Long Zheng, Xiaofei Liao, Qinggang Wang, Shaobo Ma, Yu Li, Haifeng Liu, Wenjing Xiao, Yufei Sun, Bing Zhu, Hai Jin, Jingling Xue
2025Mithril: A Scalable System for Deep GNN Training.
Jingji Chen, Zhuoming Chen, Xuehai Qian
2025Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing.
Alireza Khadem, Daichi Fujiki, Hilbert Chen, Yufeng Gu, Nishil Talati, Scott A. Mahlke, Reetuparna Das
2025NOVA: A Novel Vertex Management Architecture for Scalable Graph Processing.
Marjan Fariborz, Mahyar Samani, Austin York, S. J. Ben Yoo, Jason Lowe-Power, Venkatesh Akella
2025NVMePass: A Lightweight, High-performance and Scalable NVMe Virtualization Architecture with I/O Queues Passthrough.
Yiquan Chen, Zhen Jin, Yijing Wang, Yi Chen, Jiexiong Xu, Hao Yu, Jinlong Chen, Wenhai Lin, Kanghua Fang, Keyao Zhang, Chengkun Wei, Qiang Liu, Yuan Xie, Wenzhi Chen
2025NearFetch: Saving Inter-Module Bandwidth in Many-Chip-Module GPUs.
Xia Zhao, Guangda Zhang, Lu Wang, Shiqing Zhang, Huadong Dai
2025NeuVSA: A Unified and Efficient Accelerator for Neural Vector Search.
Ziming Yuan, Lei Dai, Wen Li, Jie Zhang, Shengwen Liang, Ying Wang, Cheng Liu, Huawei Li, Xiaowei Li, Jiafeng Guo, Peng Wang, Renhai Chen, Gong Zhang
2025No Rush in Executing Atomic Instructions.
Ashkan Asgharzadeh, Josué Feliu, Manuel E. Acacio, Stefanos Kaxiras, Alberto Ros
2025OASIS: Object-Aware Page Management for Multi-GPU Systems.
Yueqi Wang, Bingyao Li, Mohamed Tarek Ibn Ziad, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang
2025PAISE: PIM-Accelerated Inference Scheduling Engine for Transformer-based LLM.
Hyojung Lee, Daehyeon Baek, Jimyoung Son, Jieun Choi, Kihyo Moon, Minsung Jang
2025PIMnet: A Domain-Specific Network for Efficient Collective Communication in Scalable PIM.
Hyojun Son, Gilbert Jonatan, Xiangyu Wu, Haeyoon Cho, Kaustubh Shivdikar, José L. Abellán, Ajay Joshi, David R. Kaeli, John Kim
2025PROCA: Programmable Probabilistic Processing Unit Architecture with Accept/Reject Prediction & Multicore Pipelining for Causal Inference.
Yihan Fu, Anjunyi Fan, Wenshuo Yue, Hongxiao Zhao, Daijing Shi, Qiuping Wu, Jiayi Li, Xiangyu Zhang, Yaoyu Tao, Yuchao Yang, Bonan Yan
2025Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design.
Haojie Ye, Yuchen Xia, Yuhan Chen, Kuan-Yu Chen, Yichao Yuan, Shuwen Deng, Baris Kasikci, Trevor N. Mudge, Nishil Talati
2025Panacea: Novel DNN Accelerator using Accuracy-Preserving Asymmetric Quantization and Energy-Saving Bit-Slice Sparsity.
Dongyun Kam, Myeongji Yun, Sunwoo Yoo, Seungwoo Hong, Zhengya Zhang, Youngjoo Lee
2025Piccolo: Large-Scale Graph Processing with Fine-Grained in-Memory Scatter-Gather.
Changmin Shin, Jaeyong Song, Hongsun Jang, Dogeun Kim, Jun Sung, Taehee Kwon, Jae Hyung Ju, Frank Liu, YeonKyu Choi, Jinho Lee
2025Predicting DRAM-Caused Risky VMs in Large-Scale Clouds.
Yaoguang Yong, Xiaoming Du, Xuhua Ma, Yuxiang Wang, Bin Yao, Xudong Zheng, Huite Yi
2025Prosperity: Accelerating Spiking Neural Networks via Product Sparsity.
Chiyue Wei, Cong Guo, Feng Cheng, Shiyu Li, Hao (Frank) Yang, Hai Helen Li, Yiran Chen
2025Push Multicast: A Speculative and Coherent Interconnect for Mitigating Manycore CPU Communication Bottleneck.
Jiayi Huang, Yanhua Chen, Zhe Wang, Christopher J. Hughes, Yufei Ding, Yuan Xie
2025QPRAC: Towards Secure and Practical PRAC-based Rowhammer Mitigation using Priority Queues.
Jeonghyun Woo, Shaopeng Chris Lin, Prashant J. Nair, Aamer Jaleel, Gururaj Saileshwar
2025QuCLEAR: Clifford Extraction and Absorption for Quantum Circuit Optimization.
Ji Liu, Alvin Gonzales, Benchen Huang, Zain Hamid Saleem, Paul D. Hovland
2025R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead.
Lieven Eeckhout
2025Rethinking Dead Block Prediction for Intermittent Computing.
Gan Fang, Changhee Jung
2025Reuse-Aware Compilation for Zoned Quantum Architectures Based on Neutral Atoms.
Wan-Hsuan Lin, Daniel Bochen Tan, Jason Cong
2025Revisiting Reliability in Large-Scale Machine Learning Research Clusters.
Apostolos Kokolis, Michael Kuchnik, John Hoffman, Adithya Kumar, Parth Malani, Faye Ma, Zachary DeVito, Shubho Sengupta, Kalyan Saladi, Carole-Jean Wu
2025Reviving In-Storage Hardware Compression on ZNS SSDs through Host-SSD Collaboration.
Yingjia Wang, Tao Lu, Yuhong Liang, Xiang Chen, Ming-Chang Yang
2025RpcNIC: Enabling Efficient Datacenter RPC Offloading on PCIe-attached SmartNICs.
Jie Zhang, Hongjing Huang, Xuzheng Chen, Xiang Li, Jieru Zhao, Ming Liu, Zeke Wang
2025SPARK: Sparsity Aware, Low Area, Energy-Efficient, Near-memory Architecture for Accelerating Linear Programming Problems.
Siddhartha Raman Sundara Raman, Lizy Kurian John, Jaydeep P. Kulkarni
2025SkyByte: Architecting an Efficient Memory-Semantic CXL-based SSD with OS and Hardware Co-design.
Haoyang Zhang, Yuqi Xue, Yirui Eric Zhou, Shaobo Li, Jian Huang
2025SoMa: Identifying, Exploring, and Understanding the DRAM Communication Scheduling Space for DNN Accelerators.
Jingwei Cai, Xuan Wang, Mingyu Gao, Sen Peng, Zijian Zhu, Yuchen Wei, Zuotong Wu, Kaisheng Ma
2025SparseWeaver: Converting Sparse Operations as Dense Operations on GPUs for Graph Workloads.
Shinnung Jeong, Liam Paul Coopert, Ju Min Lee, Heelim Choi, Nicholas Parnenzini, Chihyo Ahn, Yongwoo Lee, Hanjun Kim, Hyesoon Kim
2025SpecMPK: Efficient In-Process Isolation with Speculative and Secure Permission Update Instruction.
Debpratim Adak, Huiyang Zhou, Eric Rotenberg, Amro Awad
2025TB-STC: Transposable Block-wise N: M Structured Sparse Tensor Core.
Jun Liu, Shulin Zeng, Junbo Zhao, Li Ding, Zeyu Wang, Jinhao Li, Zhenhua Zhu, Xuefei Ning, Chen Zhang, Yu Wang, Guohao Dai
2025The Importance of Generalizability in Machine Learning for Systems.
Varun Gohil, Sundar Dev, Gaurang Upasani, David Lo, Parthasarathy Ranganathan, Christina Delimitrou
2025TidalMesh: Topology-Driven AllReduce Collective Communication for Mesh Topology.
Dongkyun Lim, John Kim
2025To Cross, or Not to Cross Pages for Prefetching?
Georgios Vavouliotis, Martí Torrents, Boris Grot, Kleovoulos Kalaitzidis, Leeor Peled, Marc Casas
2025Understanding RowHammer Under Reduced Refresh Latency: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions.
Yahya Can Tugrul, A. Giray Yaglikçi, Ismail Emir Yüksel, Ataberk Olgun, Oguzhan Canpolat, Nisa Bostanci, Mohammad Sadrosadati, Oguz Ergin, Onur Mutlu
2025Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers.
Chaojian Li, Sixu Li, Linrui Jiang, Jingqun Zhang, Yingyan Celine Lin
2025UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures.
Tongxin Xie, Zhenhua Zhu, Bing Li, Yukai He, Cong Li, Guangyu Sun, Huazhong Yang, Yuan Xie, Yu Wang
2025VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference.
Zihan Liu, Xinhao Luo, Junxian Guo, Wentao Ni, Yangjie Zhou, Yue Guan, Cong Guo, Weihao Cui, Yu Feng, Minyi Guo, Yuhao Zhu, Minjia Zhang, Chen Jin, Jingwen Leng
2025VR-Pipe: Streamlining Hardware Graphics Pipeline for Volume Rendering.
Junseo Lee, Jaisung Kim, Junyong Park, Jaewoong Sim
2025Variable Read Disturbance: An Experimental Analysis of Temporal Variation in DRAM Read Disturbance.
Ataberk Olgun, F. Nisa Bostanci, Ismail Emir Yüksel, Oguzhan Canpolat, Haocong Luo, Geraldo F. Oliveira, A. Giray Yaglikçi, Minesh Patel, Onur Mutlu
2025Veritas - Demystifying Silent Data Corruptions: μArch-Level Modeling and Fleet Data of Modern x86 CPUs.
Odysseas Chatzopoulos, Nikos Karystinos, George Papadimitriou, Dimitris Gizopoulos, Harish Dattatraya Dixit, Sriram Sankar
2025WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores.
Guang Fan, Mingzhe Zhang, Fangyu Zheng, Shengyu Fan, Tian Zhou, Xianglong Deng, Wenxu Tang, Liang Kong, Yixuan Song, Shoumeng Yan
2025Warped-Compaction: Maximizing GPU Register File Bandwidth Utilization via Operand Compaction.
Eunbi Jeong, Ipoom Jeong, Myung Kuk Yoon, Nam Sung Kim
2025Zebra: Efficient Redundant Array of Zoned Namespace SSDs Enabled by Zone Random Write Area (ZRWA).
Tianyang Jiang, Guangyan Zhang, Xiaojian Liao, Yuqi Zhou
2025eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models.
Minsik Cho, Keivan Alizadeh-Vahid, Qichen Fu, Saurabh Adya, Carlo C. del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal
2025throttLL'eM: Predictive GPU Throttling for Energy Efficient LLM Inference Serving.
Andreas Kosmas Kakolyris, Dimosthenis Masouros, Petros Vavaroutsos, Sotirios Xydis, Dimitrios Soudris