| 2025 | A Hardware-Software Design Framework for SpMV Acceleration with Flexible Access Pattern Portfolio. Zhenyu Wu, Maolin Wang, Hayden Kwok-Hay So |
| 2025 | ARTEMIS: Agile Discovery of Efficient Real-Time Systems-on-Chips in the Heterogeneous Era. Subhankar Pal, Aporva Amarnath, Behzad Boroujerdian, Augusto Vega, Alper Buyuktosunoglu, John-David Wellman, Vijay Janapa Reddi, Pradip Bose |
| 2025 | AccelES: Accelerating Top-K SpMV for Embedding Similarity via Low-bit Pruning. Jiaqi Zhai, Xuanhua Shi, Kaiyi Huang, Chencheng Ye, Weifang Hu, Bingsheng He, Hai Jin |
| 2025 | Adyna: Accelerating Dynamic Neural Networks with Adaptive Scheduling. Zhiyao Li, Bohan Yang, Jiaxiang Li, Taijie Chen, Xintong Li, Mingyu Gao |
| 2025 | Anaheim: Architecture and Algorithms for Processing Fully Homomorphic Encryption in Memory. Jongmin Kim, Sungmin Yun, Hyesung Ji, Wonseok Choi, Sangpyo Kim, Jung Ho Ahn |
| 2025 | Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format. Chao Fang, Man Shi, Robin Geens, Arne Symons, Zhongfeng Wang, Marian Verhelst |
| 2025 | Architecting Space Microdatacenters: A System-level Approach. Nathan Bleier, Rick Eason, Michael Lembeck, Rakesh Kumar |
| 2025 | Architecting Value Prediction around In-Order Execution. Pierre Ravenel, Arthur Perais, Benoît Dupont de Dinechin, Frédéric Pétrot |
| 2025 | Ariadne: A Hotness-Aware and Size-Adaptive Compressed Swap Technique for Fast Application Relaunch and Reduced CPU Usage on Mobile Devices. Yu Liang, Aofeng Shen, Chun Jason Xue, Riwei Pan, Haiyu Mao, Nika Mansouri-Ghiasi, Qingcai Jiang, Rakesh Nadig, Lei Li, Rachata Ausavarungnirun, Mohammad Sadrosadati, Onur Mutlu |
| 2025 | AsyncDIMM: Achieving Asynchronous Execution in DIMM-Based Near-Memory Processing. Liyan Chen, Dongxu Lyu, Jianfei Jiang, Qin Wang, Zhigang Mao, Naifeng Jing |
| 2025 | AutoRFM: Scaling Low-Cost in-DRAM Trackers to Ultra-Low Rowhammer Thresholds. Moinuddin Qureshi |
| 2025 | BOSS: Blocking algorithm for optimizing shuttling scheduling in Ion Trap. Xian Wu, Chenghong Zhu, Jingbo Wang, Xin Wang |
| 2025 | Bit-slice Architecture for DNN Acceleration with Slice-level Sparsity Enhancement and Exploitation. Insu Choi, Young-Seo Yoon, Joon-Sung Yang |
| 2025 | BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration. Yuzong Chen, Ahmed F. AbouElhamayed, Xilai Dai, Yang Wang, Marta Andronic, George A. Constantinides, Mohamed S. Abdelfattah |
| 2025 | BrokenSleep: Remote Power Timing Attack Exploiting Processor Idle States. Hyosang Kim, Ki-Dong Kang, Gyeongseo Park, Seungkyu Lee, Daehoon Kim |
| 2025 | Buffalo: Enabling Large-Scale GNN Training via Memory-Efficient Bucketization. Shuangyan Yang, Minjia Zhang, Dong Li |
| 2025 | CORDOBA: Carbon-Efficient Optimization Framework for Computing Systems. Mariam Elgamal, Doug Carmean, Elnaz Ansari, Okay Zed, Ramesh Peri, Srilatha Manne, Udit Gupta, Gu-Yeon Wei, David Brooks, Gage Hills, Carole-Jean Wu |
| 2025 | CROSS: Compiler-Driven Optimization of Sparse DNNs Using Sparse/Dense Computation Kernels. Fangxin Liu, Shiyuan Huang, Ning Yang, Zongwu Wang, Haomin Li, Li Jiang |
| 2025 | Cambricon-DG: An Accelerator for Redundant-Free Dynamic Graph Neural Networks Based on Nonlinear Isolation. Zhifei Yue, Xinkai Song, Tianbo Liu, Xing Hu, Rui Zhang, Zidong Du, Wei Li, Qi Guo, Tianshi Chen |
| 2025 | ChameleonEC: Exploiting Tunability of Erasure Coding for Low-Interference Repair. Yuhui Cai, Shiyao Lin, Zhirong Shen, Jiahui Yang, Jiwu Shu |
| 2025 | Choco-Q: Commute Hamiltonian-based QAOA for Constrained Binary Optimization. Debin Xiang, Qifan Jiang, Liqiang Lu, Siwei Tan, Jianwei Yin |
| 2025 | Chronus: Understanding and Securing the Cutting-Edge Industry Solutions to DRAM Read Disturbance. Oguzhan Canpolat, A. Giray Yaglikçi, Geraldo F. Oliveira, Ataberk Olgun, Nisa Bostanci, Ismail Emir Yuksel, Haocong Luo, Oguz Ergin, Onur Mutlu |
| 2025 | CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design. Zishen Wan, Hanchen Yang, Ritik Raj, Che-Kai Liu, Ananda Samajdar, Arijit Raychowdhury, Tushar Krishna |
| 2025 | Concord: Rethinking Distributed Coherence for Software Caches in Serverless Environments. Jovan Stojkovic, Chloe Alverti, Alan Andrade, Nikoleta Iliakopoulou, Hubertus Franke, Tianyin Xu, Josep Torrellas |
| 2025 | Cooperative Warp Execution in Tensor Core for RISC-V GPGPU. Abubakr Nada, Giuseppe Maria Sarda, Erwan Lenormand |
| 2025 | Criticality-Aware Instruction-Centric Bandwidth Partitioning for Data Center Applications. Liren Zhu, Liujia Li, Jianyu Wu, Yiming Yao, Zhan Shi, Jie Zhang, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, Diyu Zhou |
| 2025 | DAPPER: A Performance-Attack-Resilient Tracker for RowHammer Defense. Jeonghyun Woo, Prashant J. Nair |
| 2025 | DPUaudit: DPU-assisted Pull-based Architecture for Near-Zero Cost System Auditing. Peng Jiang, Hanlin Jiang, Ruizhe Huang, Hanwen Lei, Zhineng Zhong, Shaokun Zhang, Yuxin Ren, Ning Jia, Xinwei Hu, Yao Guo, Xiangqun Chen, Ding Li |
| 2025 | Delinquent Loop Pre-execution Using Predicated Helper Threads. Anirudh Seshadri, Eric Rotenberg |
| 2025 | Ditto: Accelerating Diffusion Model via Temporal Value Similarity. Sungbin Kim, Hyunwuk Lee, Wonho Cho, Mincheol Park, Won Woo Ro |
| 2025 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency. Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, Esha Choukse |
| 2025 | EDA: Energy-Efficient Inter-Layer Model Compilation for Edge DNN Inference Acceleration. Bo Ren Pao, I-Chia Chen, En-Hao Chang, Tsung Tai Yeh |
| 2025 | EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform. Yi Huang, Xinsheng Gong, Xiangyu Kong, Dibei Chen, Jianfeng Zhu, Wenping Zhu, Liangwei Li, Mingyu Gao, Shaojun Wei, Aoyang Zhang, Leibo Liu |
| 2025 | EIGEN: Enabling Efficient 3DIC Interconnect with Heterogeneous Dual-Layer Network-on-Active-Interposer. Siyao Jia, Bo Jiao, Haozhe Zhu, Chixiao Chen, Qi Liu, Ming Liu |
| 2025 | ER-DCIM: Error-Resilient Digital CIM Architecture with Run-Time MAC-Cell Error Correction. Zhen He, Yiqi Wang, Zihan Wu, Shaojun Wei, Yang Hu, Fengbin Tu, Shouyi Yin |
| 2025 | EXION: Exploiting Inter-and Intra-Iteration Output Sparsity for Diffusion Models. Jaehoon Heo, Adiwena Putra, Jieon Yoon, Sungwoong Yune, Hangyeol Lee, Ji-Hoon Kim, Joo-Young Kim |
| 2025 | Efficient Caching with A Tag-enhanced DRAM. Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael R. Miller, Taeksang Song, Thomas Vogelsang, Steven C. Woo, Jason Lowe-Power |
| 2025 | Efficient Memory Side-Channel Protection for Embedding Generation in Machine Learning. Muhammad Umar, Akhilesh Parag Marathe, Monami Dutta Gupta, Shubham Jogprakash Ghosh, G. Edward Suh, Wenjie Xiong |
| 2025 | Efficient Optimization with Encoded Ising Models. Devrath Iyer, Sara Achour |
| 2025 | Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization. Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Chang Zhou, Dennis Cai, Yuan Xie, Binzhang Fu |
| 2025 | Enterprise Class Modular Cache Hierarchy. Craig R. Walters, Deanna Postles Dunn Berger, Robert J. Sonnelitter, Alper Buyuktosunoglu |
| 2025 | Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs. Qizhe Wu, Huawen Liang, Yuchen Gui, Zhichen Zeng, Zerong He, Linfeng Tao, Xiaotian Wang, Letian Zhao, Zhaoxi Zeng, Wei Yuan, Wei Wu, Xi Jin |
| 2025 | FACIL: Flexible DRAM Address Mapping for SoC-PIM Cooperative On-device LLM Inference. Seong Hoon Seo, Junghoon Kim, Donghyun Lee, Seonah Yoo, Seokwon Moon, Yeonhong Park, Jae W. Lee |
| 2025 | FHENDI: A Near-DRAM Accelerator for Compiler-Generated Fully Homomorphic Encryption Applications. Yongmo Park, Aporva Amarnath, Subhankar Pal, Karthik Swaminathan, Alper Buyuktosunoglu, Hayim Shaul, Ehud Aharoni, Nir Drucker, Wei D. Lu, Omri Soceanu, Pradip Bose |
| 2025 | FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables. Gunho Park, Hyeokjun Kwon, JiWoo Kim, Jeongin Bae, Baeseong Park, Dongsoo Lee, Youngjoo Lee |
| 2025 | From Optimal to Practical: Efficient Micro-op Cache Replacement Policies for Data Center Applications. Kan Zhu, Yilong Zhao, Yufei Gao, Peter Braun, Tanvir Ahmed Khan, Heiner Litz, Baris Kasikci, Shuwen Deng |
| 2025 | GSArch: Breaking Memory Barriers in 3D Gaussian Splatting Training via Architectural Support. Houshu He, Gang Li, Fangxin Liu, Li Jiang, Xiaoyao Liang, Zhuoran Song |
| 2025 | Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR. Zhifan Ye, Yonggan Fu, Jingqun Zhang, Leshu Li, Yongan Zhang, Sixu Li, Cheng Wan, Chenxi Wan, Chaojian Li, Sreemanth Prathipati, Yingyan Celine Lin |
| 2025 | Gaze into the Pattern: Characterizing Spatial Patterns with Internal Temporal Correlations for Hardware Prefetching. Zixiao Chen, Chentao Wu, Yunfei Gu, Ranhao Jia, Jie Li, Minyi Guo |
| 2025 | Gemina: A Coordinated and High-Performance Memory Deduplication Engine. Zhehua Zhang, Suzhen Wu, Wenyan You, Chunfeng Du, Bo Mao |
| 2025 | GoPIM: GCN-Oriented Pipeline Optimization for PIM Accelerators. Siling Yang, Shuibing He, Wenjiong Wang, Yanlong Yin, Tong Wu, Weijian Chen, Xuechen Zhang, Xian-He Sun, Dan Feng |
| 2025 | Grad: Intelligent Microservice Scaling by Harnessing Resource Fungibility. Liao Chen, Chenyu Lin, Shutian Luo, Huanle Xu, Chengzhong Xu |
| 2025 | HATT: Hamiltonian Adaptive Ternary Tree for Optimizing Fermion-to-Qubit Mapping. Yuhao Liu, Kevin Yao, Jonathan Hong, Julien Froustey, Ermal Rrapaj, Costin Iancu, Gushu Li, Yunong Shi |
| 2025 | HILP: Accounting for Workload-Level Parallelism in System-on-Chip Design Space Exploration. Joseph Rogers, Lieven Eeckhout, Magnus Jahre |
| 2025 | HSMU-SpGEMM: Achieving High Shared Memory Utilization for Parallel Sparse General Matrix-Matrix Multiplication on Modern GPUs. Min Wu, Huizhang Luo, Fenfang Li, Yiran Zhang, Zhuo Tang, Kenli Li, Jeff Zhang, Chubo Liu |
| 2025 | Hydra: Scale-out FHE Accelerator Architecture for Secure Deep Learning on FPGA. Yinghao Yang, Xicheng Xu, Haibin Zhang, Jie Song, Xin Tang, Hang Lu, Xiaowei Li |
| 2025 | I-DGNN: A Graph Dissimilarity-based Framework for Designing Scalable and Efficient DGNN Accelerators. Jiaqi Yang, Hao Zheng, Ahmed Louri |
| 2025 | IEEE International Symposium on High Performance Computer Architecture, HPCA 2025, Las Vegas, NV, USA, March 1-5, 2025 |
| 2025 | IRIS: Unleashing ISP-Software Cooperation to Optimize the Machine Vision Pipeline. Raúl Taranco, José-María Arnau, Antonio González |
| 2025 | InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference. Xiurui Pan, Endian Li, Qiao Li, Shengwen Liang, Yizhou Shan, Ke Zhou, Yingwei Luo, Xiaolin Wang, Jie Zhang |
| 2025 | Integrating Prefetcher Selection with Dynamic Request Allocation Improves Prefetching Efficiency. Mengming Li, Qijun Zhang, Yongqing Ren, Zhiyao Xie |
| 2025 | Interleaved Logical Qubits in Atom Arrays. Joshua Viszlai, Sophia Fuhui Lin, Siddharth Dangwal, Conor Bradley, Vikram Ramesh, Jonathan M. Baker, Hannes Bernien, Frederic T. Chong |
| 2025 | LAD: Efficient Accelerator for Generative Inference of LLM with Locality Aware Decoding. Haoran Wang, Yuming Li, Haobo Xu, Ying Wang, Liqi Liu, Jun Yang, Yinhe Han |
| 2025 | LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications. Yujun Lin, Zhekai Zhang, Song Han |
| 2025 | LSQCA: Resource-Efficient Load/Store Architecture for Limited-Scale Fault-Tolerant Quantum Computing. Takumi Kobori, Yasunari Suzuki, Yosuke Ueno, Teruo Tanimoto, Synge Todo, Yuuki Tokunaga |
| 2025 | LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator. Guoyu Li, Shengyu Ye, Chunyun Chen, Yang Wang, Fan Yang, Ting Cao, Cheng Liu, Mohamed M. Sabry Aly, Mao Yang |
| 2025 | LegoZK: A Dynamically Reconfigurable Accelerator for Zero-Knowledge Proof. Zhengbang Yang, Lutan Zhao, Peinan Li, Han Liu, Kai Li, Boyan Zhao, Dan Meng, Rui Hou |
| 2025 | Let-Me-In: (Still) Employing In-pointer Bounds Metadata for Fine-grained GPU Memory Safety. Jaewon Lee, Euijun Chung, Saurabh Singh, Seonjin Na, Yonghae Kim, Jaekyu Lee, Hyesoon Kim |
| 2025 | Lincoln: Real-Time 50~100B LLM Inference on Consumer Devices with LPDDR-Interfaced, Compute-Enabled Flash Memory. Weiyi Sun, Mingyu Gao, Zhaoshi Li, Aoyang Zhang, Iris Ying Chou, Jianfeng Zhu, Shaojun Wei, Leibo Liu |
| 2025 | M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type. Weiming Hu, Haoyan Zhang, Cong Guo, Yu Feng, Renyang Guan, Zhendong Hua, Zihan Liu, Yue Guan, Minyi Guo, Jingwen Leng |
| 2025 | MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI. Arya Tschand, Arun Tejusve Raghunath Rajan, Sachin Idgunji, Anirban Ghosh, Jeremy Holleman, Csaba Király, Pawan Ambalkar, Ritika Borkar, Ramesh Chukka, Trevor Cockrell, Oliver Curtis, Grigori Fursin, Miro Hodak, Hiwot Kassa, Anton Lokhmotov, Dejan Miskovic, Yuechao Pan, Manu Prasad Manmathan, Liz Raymond, Tom St. John, Arjun Suresh, Rowan Taubitz, Sean Zhan, Scott Wasson, David Kanter, Vijay Janapa Reddi |
| 2025 | Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory. Jie Ren, Bin Ma, Shuangyan Yang, Benjamin Francis, Ehsan K. Ardestani, Min Si, Dong Li |
| 2025 | Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM. Lian Liu, Shixin Zhao, Bing Li, Haimeng Ren, Zhaohui Xu, Mengdi Wang, Xiaowei Li, Yinhe Han, Ying Wang |
| 2025 | Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput. Jiwon Lee, Gun Ko, Myung Kuk Yoon, Ipoom Jeong, Yunho Oh, Won Woo Ro |
| 2025 | Mascot: Predicting Memory Dependencies and Opportunities for Speculative Memory Bypassing. Karl H. Mose, Sebastian S. Kim, Alberto Ros, Timothy M. Jones, Robert D. Mullins |
| 2025 | MeHyper: Accelerating Hypergraph Neural Networks by Exploring Implicit Dataflows. Wenju Zhao, Pengcheng Yao, Dan Chen, Long Zheng, Xiaofei Liao, Qinggang Wang, Shaobo Ma, Yu Li, Haifeng Liu, Wenjing Xiao, Yufei Sun, Bing Zhu, Hai Jin, Jingling Xue |
| 2025 | Mithril: A Scalable System for Deep GNN Training. Jingji Chen, Zhuoming Chen, Xuehai Qian |
| 2025 | Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing. Alireza Khadem, Daichi Fujiki, Hilbert Chen, Yufeng Gu, Nishil Talati, Scott A. Mahlke, Reetuparna Das |
| 2025 | NOVA: A Novel Vertex Management Architecture for Scalable Graph Processing. Marjan Fariborz, Mahyar Samani, Austin York, S. J. Ben Yoo, Jason Lowe-Power, Venkatesh Akella |
| 2025 | NVMePass: A Lightweight, High-performance and Scalable NVMe Virtualization Architecture with I/O Queues Passthrough. Yiquan Chen, Zhen Jin, Yijing Wang, Yi Chen, Jiexiong Xu, Hao Yu, Jinlong Chen, Wenhai Lin, Kanghua Fang, Keyao Zhang, Chengkun Wei, Qiang Liu, Yuan Xie, Wenzhi Chen |
| 2025 | NearFetch: Saving Inter-Module Bandwidth in Many-Chip-Module GPUs. Xia Zhao, Guangda Zhang, Lu Wang, Shiqing Zhang, Huadong Dai |
| 2025 | NeuVSA: A Unified and Efficient Accelerator for Neural Vector Search. Ziming Yuan, Lei Dai, Wen Li, Jie Zhang, Shengwen Liang, Ying Wang, Cheng Liu, Huawei Li, Xiaowei Li, Jiafeng Guo, Peng Wang, Renhai Chen, Gong Zhang |
| 2025 | No Rush in Executing Atomic Instructions. Ashkan Asgharzadeh, Josué Feliu, Manuel E. Acacio, Stefanos Kaxiras, Alberto Ros |
| 2025 | OASIS: Object-Aware Page Management for Multi-GPU Systems. Yueqi Wang, Bingyao Li, Mohamed Tarek Ibn Ziad, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang |
| 2025 | PAISE: PIM-Accelerated Inference Scheduling Engine for Transformer-based LLM. Hyojung Lee, Daehyeon Baek, Jimyoung Son, Jieun Choi, Kihyo Moon, Minsung Jang |
| 2025 | PIMnet: A Domain-Specific Network for Efficient Collective Communication in Scalable PIM. Hyojun Son, Gilbert Jonatan, Xiangyu Wu, Haeyoon Cho, Kaustubh Shivdikar, José L. Abellán, Ajay Joshi, David R. Kaeli, John Kim |
| 2025 | PROCA: Programmable Probabilistic Processing Unit Architecture with Accept/Reject Prediction & Multicore Pipelining for Causal Inference. Yihan Fu, Anjunyi Fan, Wenshuo Yue, Hongxiao Zhao, Daijing Shi, Qiuping Wu, Jiayi Li, Xiangyu Zhang, Yaoyu Tao, Yuchao Yang, Bonan Yan |
| 2025 | Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design. Haojie Ye, Yuchen Xia, Yuhan Chen, Kuan-Yu Chen, Yichao Yuan, Shuwen Deng, Baris Kasikci, Trevor N. Mudge, Nishil Talati |
| 2025 | Panacea: Novel DNN Accelerator using Accuracy-Preserving Asymmetric Quantization and Energy-Saving Bit-Slice Sparsity. Dongyun Kam, Myeongji Yun, Sunwoo Yoo, Seungwoo Hong, Zhengya Zhang, Youngjoo Lee |
| 2025 | Piccolo: Large-Scale Graph Processing with Fine-Grained in-Memory Scatter-Gather. Changmin Shin, Jaeyong Song, Hongsun Jang, Dogeun Kim, Jun Sung, Taehee Kwon, Jae Hyung Ju, Frank Liu, YeonKyu Choi, Jinho Lee |
| 2025 | Predicting DRAM-Caused Risky VMs in Large-Scale Clouds. Yaoguang Yong, Xiaoming Du, Xuhua Ma, Yuxiang Wang, Bin Yao, Xudong Zheng, Huite Yi |
| 2025 | Prosperity: Accelerating Spiking Neural Networks via Product Sparsity. Chiyue Wei, Cong Guo, Feng Cheng, Shiyu Li, Hao (Frank) Yang, Hai Helen Li, Yiran Chen |
| 2025 | Push Multicast: A Speculative and Coherent Interconnect for Mitigating Manycore CPU Communication Bottleneck. Jiayi Huang, Yanhua Chen, Zhe Wang, Christopher J. Hughes, Yufei Ding, Yuan Xie |
| 2025 | QPRAC: Towards Secure and Practical PRAC-based Rowhammer Mitigation using Priority Queues. Jeonghyun Woo, Shaopeng Chris Lin, Prashant J. Nair, Aamer Jaleel, Gururaj Saileshwar |
| 2025 | QuCLEAR: Clifford Extraction and Absorption for Quantum Circuit Optimization. Ji Liu, Alvin Gonzales, Benchen Huang, Zain Hamid Saleem, Paul D. Hovland |
| 2025 | R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead. Lieven Eeckhout |
| 2025 | Rethinking Dead Block Prediction for Intermittent Computing. Gan Fang, Changhee Jung |
| 2025 | Reuse-Aware Compilation for Zoned Quantum Architectures Based on Neutral Atoms. Wan-Hsuan Lin, Daniel Bochen Tan, Jason Cong |
| 2025 | Revisiting Reliability in Large-Scale Machine Learning Research Clusters. Apostolos Kokolis, Michael Kuchnik, John Hoffman, Adithya Kumar, Parth Malani, Faye Ma, Zachary DeVito, Shubho Sengupta, Kalyan Saladi, Carole-Jean Wu |
| 2025 | Reviving In-Storage Hardware Compression on ZNS SSDs through Host-SSD Collaboration. Yingjia Wang, Tao Lu, Yuhong Liang, Xiang Chen, Ming-Chang Yang |
| 2025 | RpcNIC: Enabling Efficient Datacenter RPC Offloading on PCIe-attached SmartNICs. Jie Zhang, Hongjing Huang, Xuzheng Chen, Xiang Li, Jieru Zhao, Ming Liu, Zeke Wang |
| 2025 | SPARK: Sparsity Aware, Low Area, Energy-Efficient, Near-memory Architecture for Accelerating Linear Programming Problems. Siddhartha Raman Sundara Raman, Lizy Kurian John, Jaydeep P. Kulkarni |
| 2025 | SkyByte: Architecting an Efficient Memory-Semantic CXL-based SSD with OS and Hardware Co-design. Haoyang Zhang, Yuqi Xue, Yirui Eric Zhou, Shaobo Li, Jian Huang |
| 2025 | SoMa: Identifying, Exploring, and Understanding the DRAM Communication Scheduling Space for DNN Accelerators. Jingwei Cai, Xuan Wang, Mingyu Gao, Sen Peng, Zijian Zhu, Yuchen Wei, Zuotong Wu, Kaisheng Ma |
| 2025 | SparseWeaver: Converting Sparse Operations as Dense Operations on GPUs for Graph Workloads. Shinnung Jeong, Liam Paul Coopert, Ju Min Lee, Heelim Choi, Nicholas Parnenzini, Chihyo Ahn, Yongwoo Lee, Hanjun Kim, Hyesoon Kim |
| 2025 | SpecMPK: Efficient In-Process Isolation with Speculative and Secure Permission Update Instruction. Debpratim Adak, Huiyang Zhou, Eric Rotenberg, Amro Awad |
| 2025 | TB-STC: Transposable Block-wise N: M Structured Sparse Tensor Core. Jun Liu, Shulin Zeng, Junbo Zhao, Li Ding, Zeyu Wang, Jinhao Li, Zhenhua Zhu, Xuefei Ning, Chen Zhang, Yu Wang, Guohao Dai |
| 2025 | The Importance of Generalizability in Machine Learning for Systems. Varun Gohil, Sundar Dev, Gaurang Upasani, David Lo, Parthasarathy Ranganathan, Christina Delimitrou |
| 2025 | TidalMesh: Topology-Driven AllReduce Collective Communication for Mesh Topology. Dongkyun Lim, John Kim |
| 2025 | To Cross, or Not to Cross Pages for Prefetching? Georgios Vavouliotis, Martí Torrents, Boris Grot, Kleovoulos Kalaitzidis, Leeor Peled, Marc Casas |
| 2025 | Understanding RowHammer Under Reduced Refresh Latency: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions. Yahya Can Tugrul, A. Giray Yaglikçi, Ismail Emir Yüksel, Ataberk Olgun, Oguzhan Canpolat, Nisa Bostanci, Mohammad Sadrosadati, Oguz Ergin, Onur Mutlu |
| 2025 | Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers. Chaojian Li, Sixu Li, Linrui Jiang, Jingqun Zhang, Yingyan Celine Lin |
| 2025 | UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures. Tongxin Xie, Zhenhua Zhu, Bing Li, Yukai He, Cong Li, Guangyu Sun, Huazhong Yang, Yuan Xie, Yu Wang |
| 2025 | VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference. Zihan Liu, Xinhao Luo, Junxian Guo, Wentao Ni, Yangjie Zhou, Yue Guan, Cong Guo, Weihao Cui, Yu Feng, Minyi Guo, Yuhao Zhu, Minjia Zhang, Chen Jin, Jingwen Leng |
| 2025 | VR-Pipe: Streamlining Hardware Graphics Pipeline for Volume Rendering. Junseo Lee, Jaisung Kim, Junyong Park, Jaewoong Sim |
| 2025 | Variable Read Disturbance: An Experimental Analysis of Temporal Variation in DRAM Read Disturbance. Ataberk Olgun, F. Nisa Bostanci, Ismail Emir Yüksel, Oguzhan Canpolat, Haocong Luo, Geraldo F. Oliveira, A. Giray Yaglikçi, Minesh Patel, Onur Mutlu |
| 2025 | Veritas - Demystifying Silent Data Corruptions: μArch-Level Modeling and Fleet Data of Modern x86 CPUs. Odysseas Chatzopoulos, Nikos Karystinos, George Papadimitriou, Dimitris Gizopoulos, Harish Dattatraya Dixit, Sriram Sankar |
| 2025 | WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores. Guang Fan, Mingzhe Zhang, Fangyu Zheng, Shengyu Fan, Tian Zhou, Xianglong Deng, Wenxu Tang, Liang Kong, Yixuan Song, Shoumeng Yan |
| 2025 | Warped-Compaction: Maximizing GPU Register File Bandwidth Utilization via Operand Compaction. Eunbi Jeong, Ipoom Jeong, Myung Kuk Yoon, Nam Sung Kim |
| 2025 | Zebra: Efficient Redundant Array of Zoned Namespace SSDs Enabled by Zone Random Write Area (ZRWA). Tianyang Jiang, Guangyan Zhang, Xiaojian Liao, Yuqi Zhou |
| 2025 | eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models. Minsik Cho, Keivan Alizadeh-Vahid, Qichen Fu, Saurabh Adya, Carlo C. del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal |
| 2025 | throttLL'eM: Predictive GPU Throttling for Energy Efficient LLM Inference Serving. Andreas Kosmas Kakolyris, Dimosthenis Masouros, Petros Vavaroutsos, Sotirios Xydis, Dimitrios Soudris |