| 2026 | A Deadlock-Free Bridge Module for Inter-Chiplet Cache-Coherent Communication in an Open Chiplet Ecosystem. Zhiqiang Chen, Wenwen Fu, Yongwen Wang, Hongwei Zhou |
| 2026 | A PN-Free Digital 3-SAT Accelerator Using Crossbar Architecture and Frequency-Controlled Counters. Zhezheng Ren, Chenao Yuan, Yuke Zhang, Shiyu Su |
| 2026 | AQPIM: Breaking the PIM Capacity Wall for LLMs with in-Memory Activation Quantization. Kosuke Matsushima, Yasuyuki Okoshi, Masato Motomura, Daichi Fujiki |
| 2026 | ARIADNE: Adaptive UVM Management for Efficient GPU Memory Oversubscription. Hyunkyun Shin, Seongtae Bang, Hyungwon Park, Daehoon Kim |
| 2026 | ASPA: Reassigning DDR5 Parity Bandwidth. Fan Li, Qiufeng Li, Yanan Guo, Weidong Cao, Xin Xin |
| 2026 | AUM: Unleashing the Efficiency Potential of Shared Processors with Accelerator Units for LLM Serving. Xinkai Wang, Chao Li, Yiming Zhuansun, Jinyang Guo, Xiaofeng Hou, Jing Wang, Luping Wang, Weigao Chen, Cheng Huang, Guodong Yang, Liping Zhang, Minyi Guo |
| 2026 | AccelFlow: Orchestrating an On-Package Ensemble of Fine-Grained Accelerators for Microservices. Jovan Stojkovic, Abraham Farrell, Zhangxiaowen Gong, Christopher J. Hughes, Josep Torrellas |
| 2026 | Adaptive Draft Sequence Length: Enhancing Speculative Decoding Throughput on PIM-Enabled Systems. Runze Wang, Qinggang Wang, Haifeng Liu, Long Zheng, Xiaofei Liao, Hai Jin, Jingling Xue |
| 2026 | Advancing Full-Stack Acceleration for SchröDinger-Style Quantum Simulation. Shuang Liang, Yuncheng Lu, Ce Guo, Paul H. J. Kelly, Wayne Luk, Hongxiang Fan |
| 2026 | An Efficient and Scalable Hardware Architecture for Number Theoretic Transform on FPGA with Design Automation. Yilan Zhu, Geng Yang, Xingyu Tian, Dilshan Kumarathunga, Liang Kong, Xianglong Deng, Shengyu Fan, Guang Fan, Guiming Shi, Lei Chen, Bo Zhang, Yisong Chang, Shoumeng Yan, Zhenman Fang, Mingzhe Zhang |
| 2026 | Area Bloating and the Future of Specialization. Qixuan Yu, David Wentzlaff |
| 2026 | Athena: Synergizing Data Prefetching and Off-Chip Prediction via Online Reinforcement Learning. Rahul Bera, Zhenrong Lang, Caroline Hengartner, Konstantinos Kanellopoulos, Rakesh Kumar, Mohammad Sadrosadati, Onur Mutlu |
| 2026 | AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN Performance. Seungkwan Kang, Seungjun Lee, Donghyun Gouk, Miryeong Kwon, Hyunkyu Choi, Junhyeok Jang, Sangwon Lee, Huiwon Choi, Jie Zhang, Wonil Choi, Mahmut Taylan Kandemir, Myoungsoo Jung |
| 2026 | AutoHAAP: Automated Heterogeneity-Aware Asymmetric Partitioning for LLM Training. Yuanyuan Wang, Nana Tang, Yuyang Wang, Shu Pan, Dingding Yu, Zeyue Wang, Mou Sun, Kejie Fu, Fangyu Wang, Yunchuan Chen, Ning Sun, Fei Yang |
| 2026 | BARD: Reducing Write Latency of DDR5 Memory by Exploiting Bank-Parallelism. Suhas K. Vittal, Moinuddin Qureshi |
| 2026 | BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache. Dayou Du, Shijie Cao, Jianyi Cheng, Luo Mai, Ting Cao, Mao Yang |
| 2026 | CLINE: Improving Control Flow Compilation of Quantum Programs with Control Line Encoding. Anbang Wu, Liqiang Lu, Jianwei Yin, Jingwen Leng, Minyi Guo |
| 2026 | COMET: Communication and Memory Co-Design for Fine-Grained AI Inference in MCM Accelerators. Taishu Sheng, Guangyu Sun, Dezun Dong |
| 2026 | CROPHE: Cross-Operator Dataflow Optimization for Fully Homomorphic Encryption Accelerators. Xinhua Chen, Jiangbin Dong, Hongren Zheng, Tian Tang, Mingyu Gao |
| 2026 | Cambricon-CIM: Enabling Energy-Efficient and Error-Resilient Analog CIM Acceleration via Reformation of Coding Bases. Hongrui Guo, Tianrui Ma, Zidong Du, Mo Zou, Yifan Hao, Yongwei Zhao, Rui Zhang, Wei Li, Xing Hu, Zhiwei Xu, Qi Guo, Tianshi Chen |
| 2026 | Cambricon-GS: An Accelerator for 3D Gaussian Splatting Training With Gaussian-Pixel Hybrid Parallelism. Rui Wen, Zhifei Yue, Tianbo Liu, Xinkai Song, Jin Li, Di Huang, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, Tianshi Chen |
| 2026 | Characterizing Cloud-Native LLM Inference at Bytedance and Exposing Optimization Challenges and Opportunities for Future AI Accelerators. Jingwei Cai, Dehao Kong, Hantao Huang, Zishan Jiang, Zixuan Ma, Qingyu Guo, Zhenxing Zhang, Guiming Shi, Mingyu Gao, Kaisheng Ma, Minghui Yu |
| 2026 | CoCoTree: A Computation-Capable Architecture for Collective Communication in Scalable PIM. Shunchen Shi, Qijia Yang, Fan Yang, Yu Huang, Youwei Zhuo, Zhichun Li, Ninghui Sun, Xueqi Li |
| 2026 | Cohet: A CXL-Driven Coherent Heterogeneous Computing Framework with Hardware-Calibrated Full-System Simulation. Yanjing Wang, Lizhou Wu, Sunfeng Gao, Yibo Tang, Junhui Luo, Zicong Wang, Yang Ou, Dezun Dong, Nong Xiao, Mingche Lai |
| 2026 | Compression-Aware Gradient Splitting for Collective Communications in Distributed Training. Pranati Majhi, Sabuj Laskar, Abdullah Muzahid, Eun Jung Kim |
| 2026 | Conduit: Programmer-Transparent Near-Data Processing Using Multiple Compute-Capable Resources in Solid State Drives. Rakesh Nadig, Vamanan Arulchelvan, Mayank Kabra, Harshita Gupta, Rahul Bera, Nika Mansouri-Ghiasi, Nanditha Rao, Qingcai Jiang, Andreas Kosmas Kakolyris, Yu Liang, Mohammad Sadrosadati, Onur Mutlu |
| 2026 | Conflux: A High-Performance Keyword Private Retrieval System for Dynamic Datasets. Zehao Chen, Zhaoyan Shen, Qian Wei, Hang Lu, Lei Ju |
| 2026 | Count2Multiply: Reliable In-Memory High-Radix Counting. João Paulo C. de Lima, Benjamin F. Morris III, Asif Ali Khan, Jerónimo Castrillón, Alex K. Jones |
| 2026 | Cyclone: Designing Efficient and Highly Parallel QCCD Architectural Codesigns for Fault Tolerant Quantum Memory. Sahil Khan, Abhinav Anand, Kenneth R. Brown, Jonathan M. Baker |
| 2026 | C³: CXL Coherence Controllers for Heterogeneous Architectures. Anatole Lefort, David Schall, Nicolò Carpentieri, Julian Pritzi, Soham Chakraborty, Nicolai Oswald, Pramod Bhatotia |
| 2026 | D'ArQ: A QOC Framework with Causality-Aware Grouping and Basis Selection. Changheon Lee, Hyungseok Kim, Seungwoo Choi, Youngmin Kim, Won Woo Ro |
| 2026 | DC-MBQC: A Distributed Compilation Framework for Measurement-Based Quantum Computing. Yecheng Xue, Rui Yang, Zhiding Liang, Tongyang Li |
| 2026 | DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics. Anshu Gupta, Yingqi Cao, Jason Liang, Yatish Turakhia |
| 2026 | DRACO: A Hardware-Efficient Robot Rigid Body Dynamics Accelerator with Precision-Aware Quantization Framework. Xingyu Liu, Jiawei Liang, Yipu Zhang, Linfeng Du, Chaofang Ma, Hui Yu, Jiang Xu, Wei Zhang |
| 2026 | DSAssassin: Cross-VM Side-Channel Attacks by Exploiting Intel Data Streaming Accelerator. Ben Chen, Kunlin Li, Shuwen Deng, Dongsheng Wang, Yun Chen |
| 2026 | ELORA: Efficient LoRA and KV Cache Management for Multi-LoRA LLM Serving. Jiuchen Shi, Hang Zhang, Yixiao Wang, Quan Chen, Yizhou Shan, Kaihua Fu, Wei Wang, Minyi Guo |
| 2026 | ESTroM: Element-Flow Architecture for Processing Sparse Tractable Probabilistic Models. Anjunyi Fan, Xuejie Liu, Anji Liu, Qiuping Wu, Jiaqi Yang, Yuchao Qin, Guy Van den Broeck, Yitao Liang, Bonan Yan |
| 2026 | Enterprise Class On-Chip Accelerator Integration. Deanna Postles Dunn Berger, Alper Buyuktosunoglu, Craig R. Walters, Robert J. Sonnelitter, Hailey Nicholson, Ashraf ElSharif, Yamil Rivera, Avery Francois, Cédric Lichtenau, Jason Kohl |
| 2026 | Exploration of LLM Workload Reliability Based on di/dt Effects and Voltage Droops. Zhixing Jiang, Justin Garrigus, Allison Seigler, Ethan Syed, Yan-Lun Huang, Mehdi Sadi, Tawfik Rahal-Arabi, Lizy Kurian John |
| 2026 | FACE: Fully Overlapped PD Scheduling and Multi-Level Architecture Co-Exploration on Wafer. Zheng Xu, Dehao Kong, Jiaxin Liu, Dingcheng Jiang, Xu Dai, Jinyi Deng, Yang Hu, Shouyi Yin |
| 2026 | FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection. Ziyu Huang, Yangjie Zhou, Zihan Liu, Xinhao Luo, Yijia Diao, Minyi Guo, Jidong Zhai, Yu Feng, Chen Zhang, Anbang Wu, Jingwen Leng |
| 2026 | Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models. Chiyue Wei, Cong Guo, Junyao Zhang, Haoxuan Shan, Yifan Xu, Ziyue Zhang, Yudong Liu, Qinsi Wang, Changchun Zhou, Hai Helen Li, Yiran Chen |
| 2026 | FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing. Yuzhe Fu, Changchun Zhou, Hancheng Ye, Bowen Duan, Qiyu Huang, Chiyue Wei, Cong Guo, Hai Helen Li, Yiran Chen |
| 2026 | Fully Parallelized BP Decoding for Quantum LDPC Codes Can Outperform BP-OSD. Ming Wang, Ang Li, Frank Mueller |
| 2026 | GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering. Junseo Lee, Sangyun Jeon, Jungi Lee, Junyong Park, Jaewoong Sim |
| 2026 | GenPairX: A Hardware-Algorithm Co-Designed Accelerator for Paired-End Read Mapping. Julien Eudine, Chu Li, Zhuo Cheng, Renzo Andri, Can Firtina, Mohammad Sadrosadati, Nika Mansouri-Ghiasi, Konstantina Koliogeorgi, Anirban Nag, Arash Tavakkol, Haiyu Mao, Onur Mutlu, Shai Bergman, Ji Zhang |
| 2026 | GustavSNN: Unleashing the Power of Gustavson's Algorithm on SNN Acceleration with Column-Parallel Tick-Batch Dataflow. Sangwoo Hwang, Donghun Lee, Jahyun Koo, Jaeha Kung |
| 2026 | GyRot: Leveraging Hidden Synergy Between Rotation and Fine-Grained Group Quantization for Low-Bit LLM Inference. Sangjin Kim, Yuseon Chou, Byeongcheol Kim, Jungjun Oh, Hoi-Jun Yoo |
| 2026 | HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs. Daoxuan Xu, Ying Li, Yuwei Sun, Jie Ren, Yifan Sun |
| 2026 | HERO-Sign: Hierarchical Tuning and Efficient Compiler-Time GPU Optimizations for SPHINCS+ Signature Generation. Yaoyun Zhou, Qian Wang |
| 2026 | HR-DCIM: High-Reliability Floating-Point Digital CIM Architecture With Unified Low-Cost Iterative Error Correction. Zhen He, Yiqi Wang, Zhiheng Yue, Zihan Wu, Huiming Han, Shaojun Wei, Yang Hu, Fengbin Tu, Shouyi Yin |
| 2026 | I-POP: Ignite Positive Prefetchers. Yiquan Lin, Wenhai Lin, Yiquan Chen, Jiexiong Xu, Shishun Cai, Jiarong Ye, Zonghui Wang, Wenzhi Chen |
| 2026 | IEEE International Symposium on High Performance Computer Architecture, HPCA 2026, Sydney, Australia, January 31 - Feb. 4, 2026 |
| 2026 | IVE: An Accelerator for Single-Server Private Information Retrieval Using Versatile Processing Elements. Sangpyo Kim, Hyesung Ji, Jongmin Kim, Wonseok Choi, Jaiyoung Park, Jung Ho Ahn |
| 2026 | Intermittence-Aware Cache Compression. Gan Fang, Jianping Zeng, Yuchen Zhou, Changhee Jung |
| 2026 | LEGO: Supporting LLM-Enhanced Games with One Gaming GPU. Han Zhao, Weihao Cui, Zeshen Zhang, Wenhao Zhang, Jiangtong Li, Quan Chen, Pu Pang, Zijun Li, Zhenhua Han, Yuqing Yang, Minyi Guo |
| 2026 | LRM-GPU: Alleviating Synchronization Overhead for Multi-Chiplet GPU Architecture. Baiqing Zhong, Zhirong Ye, Xiaojie Li, Peilin Wang, Haiqiu Huang, Zhaolin Li, Zhiyi Yu, Mingyu Wang |
| 2026 | Leveraging ASIC AI Chips for Homomorphic Encryption. Jianming Tong, Tianhao Huang, Jingtian Dang, Leo de Castro, Anirudh Itagi, Anupam Golder, Asra Ali, Jeremy Kun, Jevin Jiang, Arvind, G. Edward Suh, Tushar Krishna |
| 2026 | LiLo: Harnessing the on-Chip Accelerators in Intel CPUs for Compressed LLM Inference Acceleration. Hyungyo Kim, Qirong Xia, Jinghan Huang, Nachuan Wang, Younjoo Lee, Jung Ho Ahn, Wajdi K. Feghali, Ren Wang, Nam Sung Kim |
| 2026 | LoCaLUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM. Junguk Hong, Changmin Shin, Sukjin Kim, Si Ung Noh, Taehee Kwon, Seongyeon Park, Hanjun Kim, Youngsok Kim, Jinho Lee |
| 2026 | LowCarb: Carbon-Aware Scheduling of Serverless Functions. Rohan Basu Roy, Devesh Tiwari |
| 2026 | MIRZA: Efficiently Mitigating Rowhammer with Randomization and ALERT. Hritvik Taneja, Ali Hajiabadi, Michele Marazzi, Kaveh Razavi, Moinuddin Qureshi |
| 2026 | MemSOS: OS-Guided Selective Memory Mirroring. Junghoon Kim, Jongheon Jeong, Seokwon Moon, Seong Hoon Seo, Yeonhong Park, Jinkyu Jeong, Nam Sung Kim, Jae W. Lee |
| 2026 | MoEntwine: Unleashing the Potential of Wafer-Scale Chips for Large-Scale Expert Parallel Inference. Xinru Tang, Jingxiang Hou, Dingcheng Jiang, Taiquan Wei, Jiaxin Liu, Jinyi Deng, Huizheng Wang, Qize Yang, Haoran Shang, Chao Li, Yang Hu, Shouyi Yin |
| 2026 | N-DIPPER: A Distributed Inter-Die Peak Power Management Network for Nand Systems. Jinwoo Park, John Kim |
| 2026 | NP-CAM: Efficient and Scalable DNA Classification using a NoC-Partitioned CAM Architecture. Benjamin F. Morris III, Tergel Molom-Ochir, Changchun Zhou, Yiran Chen, Alex K. Jones, Hai Li |
| 2026 | NPUWattch: ML-Based Power, Area, and Timing Modeling for Neural Accelerators. Sehyeon Kim, Minkwan Kim, Chanho Park, Hanmok Park, Seonghoon Kim, Taigon Song, William J. Song |
| 2026 | Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates. Wenjun Yu, Sitian Chen, Cheng Chen, Amelie Chi Zhou |
| 2026 | Nugget: Portable Program Snippets. Zhantong Qiu, Mahyar Samani, Jason Lowe-Power |
| 2026 | ORANGE: Exploring Ockham's Razor for Neural Rendering by Accelerating 3DGS on NPUs with GEMM-Friendly Blending and Balanced Workloads. Haomin Li, Yun Liang, Fangxin Liu, Bowen Zhu, Zongwu Wang, Yu Feng, Liqiang Lu, Li Jiang, Haibing Guan |
| 2026 | PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion. Huizheng Wang, Hongbin Wang, Zichuan Wang, Zhiheng Yue, Yang Wang, Chao Li, Yang Hu, Shouyi Yin |
| 2026 | PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models. Eunyeong Cho, Jehyeon Bang, Ranggi Hwang, Minsoo Rhu |
| 2026 | PIM-Malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures. Dongjae Lee, Bongjoon Hyun, Youngjin Kwon, Minsoo Rhu |
| 2026 | PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-Based Long-Context LLM Inference System. Hyucksung Kwon, Kyungmo Koo, Janghyeon Kim, Woongkyu Lee, Minjae Lee, Gyeonggeun Jung, Hyungdeok Lee, Yousub Jung, Jaehan Park, Yosub Song, Byeongsu Yang, Haerang Choi, Guhyun Kim, Jongsoon Won, Woojae Shin, Changhyun Kim, Gyeongcheol Shin, Yongkee Kwon, Ilkon Kim, Euicheol Lim, John Kim, Jungwook Choi |
| 2026 | Peregrine: Accelerating TFHE Bootstrapping on GPUs via Multi-Level External Product Co-Design. Haoqi He, Zhiwei Wang, Lutan Zhao, Dian Jiao, Dan Meng, Rui Hou |
| 2026 | PhasedStore: Supporting High-Performance Write-Through Cache-Coherence Protocols Under TSO. Burak Ocalan, Chloe Alverti, Shashwat Jaiswal, Antonis Psistakis, David A. Koufaty, Suyash Mahar, Steven Swanson, Josep Torrellas |
| 2026 | PinDrop: Breaking the Silence on SDCs in a Large-Scale Fleet. Peter W. Deutsch, Harish Dattatraya Dixit, Gautham Vunnam, Carl Moran, Eleanor Ozer, Sriram Sankar |
| 2026 | Pinball: A Cryogenic Predecoder for Quantum Error Correction Decoding Under Circuit-Level Noise. Alexander Knapen, Guanchen Tao, Jacob Mack, Tomas Bruno, Mehdi Saligane, Dennis Sylvester, Qirui Zhang, Gokul Subramanian Ravi |
| 2026 | Predicting DRAM Failures at Scale: A Two-Stage Approach for Heterogeneous Systems. Chenglin Wang, Shouxin Wang, Zhirong Shen, Lu Tang, Shuyue Zhou, Ronglong Wu, Min Zhou, Jialiang Yu, Yiming Zhang |
| 2026 | Protean: A Programmable Spectre Defense. Nicholas Mosier, Hamed Nemati, John C. Mitchell, Caroline Trippel |
| 2026 | Pulse: Fine-Grained Hierarchical Hashing Index for Disaggregated Memory. Guangyang Deng, Zixiang Yu, Zhirong Shen, Qiangsheng Su, Zhinan Cheng, Jiwu Shu |
| 2026 | QuCo: Efficient and Flexible Hardware-Driven Automatic Configuration of Tile Transfers in GPUs. Nicolás Meseguer, Daoxuan Xu, Yifan Sun, Michael Pellauer, José L. Abellán, Manuel E. Acacio |
| 2026 | REASON: Accelerating Probabilistic Logical Reasoning for Scalable Neuro-Symbolic Intelligence. Zishen Wan, Che-Kai Liu, Jiayi Qian, Hanchen Yang, Arijit Raychowdhury, Tushar Krishna |
| 2026 | RPU - A Reasoning Processing Unit. Matthew Joseph Adiletta, Gu-Yeon Wei, David Brooks |
| 2026 | ReScue: Reliable and Secure CXL Memory. Chihun Song, Austin Antony Cruz, Michael Jaemin Kim, Minbok Wi, Gaohan Ye, Kyungsan Kim, Sangyeol Lee, Jung Ho Ahn, Nam Sung Kim |
| 2026 | ReThermal: Co-Design of Thermal-Aware Static and Dynamic Scheduling for LLM Training on Liquid-Cooled Wafer-Scale Chips. Chengran Li, Huizheng Wang, Jiaxin Liu, Jingyao Liu, Zhiheng Yue, Xia Li, Shenfei Jiang, Jinyi Deng, Yang Hu, Shouyi Yin |
| 2026 | RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs. Hongshi Tan, Yao Chen, Xinyu Chen, Qizhen Zhang, Cheng Chen, Weng-Fai Wong, Bingsheng He |
| 2026 | RoMe: Row Granularity Access Memory System for Large Language Models. Hwayong Nam, Seungmin Baek, Jumin Kim, Michael Jaemin Kim, Jung Ho Ahn |
| 2026 | SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis. Nika Mansouri-Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu |
| 2026 | SALT: Track-and-Mitigate Subarrays, Not Rows, for Blast-Radius-Free Rowhammer Defense. Moinuddin K. Qureshi |
| 2026 | SCALE: Tackling Communication Bottlenecks in Confidential Distributed Machine Learning. Joongun Park, Yongqin Wang, Huan Xu, Hanjiang Wu, Mengyuan Li, Tushar Krishna |
| 2026 | SFD: Towards Segment Fusion Dataflow for Spatial Accelerators. Fuyu Wang, Minghua Shen, Yufei Ding, Nong Xiao, Yutong Lu |
| 2026 | SMTcheck: Accurate SMT Interference Prediction to Improve Scheduling Efficiency in Datacenters. Sanghyun Kim, Jinhyeok Oh, Taehun Kim, Gyutae Kim, Youngsok Kim, Jaehyun Hwang, Joonsung Kim |
| 2026 | SPLATONIC: Architectural Support for 3D Gaussian Splatting SLAM via Sparse Processing. Xiaotong Huang, He Zhu, Tianrui Ma, Yuxiang Xiong, Fangxin Liu, Zhezhi He, Yiming Gan, Zihan Liu, Jingwen Leng, Yu Feng, Minyi Guo |
| 2026 | SSBleed: Non-Speculative Side-Channel Attacks via Speculative Store Bypass on Armv9 CPUs. Chang Liu, Hongpei Zheng, Xin Zhang, Dapeng Ju, Dongsheng Wang, Yinqian Zhang, Trevor E. Carlson |
| 2026 | Sassy: SmartNIC-Assisted Notification Delivery for μs-Scale RDMA Workloads. Hamed Seyedroudbari, Alexandros Daglis |
| 2026 | Scaling Graph Neural Network Training via Geometric Optimization. Fangzhou Ye, Lingxiang Yin, Hao Zheng |
| 2026 | Secret Caching Sauce for High-Performance Secure Memory. Xu Jiang, Xueliang Wei, Yifei Qu, Dan Feng, Yulai Xie, Wei Tong |
| 2026 | SpotCC: Facilitating Coded Computation for Prediction Serving Systems on Spot Instances. Lin Wang, Yuchong Hu, Ziling Duan, Mingqi Li, Chenxuan Yao, Feifan Liu, Xiaolu Li, Leihua Qin, Dan Feng |
| 2026 | Streamlined on-Chip Temporal Prefetching. Quang Duong, Calvin Lin |
| 2026 | Swift: High-Performance Sparse-Dense Matrix Multiplication on GPUs. Jinyu Hu, Huizhang Luo, Hong Jiang, Marc Casas, Kenli Li, Chubo Liu |
| 2026 | TEMP: A Memory Efficient Physical-Aware Tensor Partition-Mapping Framework on Wafer-Scale Chips. Huizheng Wang, Taiquan Wei, Zichuan Wang, Dingcheng Jiang, Qize Yang, Jiaxin Liu, Jingxiang Hou, Chao Li, Jinyi Deng, Yang Hu, Shouyi Yin |
| 2026 | TENET-v2: Applying Relation-Centric Notation to Model and Optimize Data Swizzle in the Cache of Modern NPU. Hanyu Zhang, Fangxu Guo, Liqiang Lu, Long Wang, Yunfei Du, Zhe Wang, Jinghan Zhang, Jie Zhang, Chenli Xue, Chengpeng Wu, Ziyi Zhang, Yun Liang, Size Zheng, Jianwei Yin |
| 2026 | Tempranillo: Non-Speculative Early Register Release. Carlos Escuin, Paolo Salvatore Galfano, Davide Basilio Bartolini, Leeor Peled, Mehdi Alipour |
| 2026 | The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective. Jiin Kim, Byeongjun Shin, Jinha Chung, Minsoo Rhu |
| 2026 | The Last-Level Branch Predictor Revisited. David Schall, Mária Duracková, Boris Grot |
| 2026 | The Memory Processing Unit: A Generalized Interface for End-to-End In-Memory Execution. Minh S. Q. Truong, Yiqiu Sun, Dawei Xiong, Amol Shah, Alexander Glass, Abraham Farrell, James A. Bain, L. Richard Carley, Saugata Ghose |
| 2026 | Toward Scalable Gate-Level Parallelism on Trapped-Ion Processors with Racetrack Electrodes. Enhyeok Jang, Hyungseok Kim, Yongju Lee, Jaewon Kwon, Yipeng Huang, Won Woo Ro |
| 2026 | Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems. Chen Zhang, Qijun Zhang, Zhuoshan Zhou, Yijia Diao, Haibo Wang, Zhe Zhou, Zhipeng Tu, Zhiyao Li, Guangyu Sun, Zhuoran Song, Zhigang Ji, Jingwen Leng, Minyi Guo |
| 2026 | Towards Resource-Efficient Serverless LLM Inference with SLINFER. Chuhao Xu, Zijun Li, Quan Chen, Han Zhao, Xueyan Tang, Minyi Guo |
| 2026 | TraceQ: Trace-Based Reconstruction of Quantum Circuit Dataflow in Surface-Code Fault-Tolerant Quantum Computing. Theodoros Trochatos, Christopher Kang, Andrew Wang, Frederic T. Chong, Jakub Szefer |
| 2026 | TraceRTL: Agile Performance Evaluation for Microarchitecture Exploration. Zifei Zhang, Yinan Xu, Sa Wang, Dan Tang, Yungang Bao |
| 2026 | TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification. Yang Zhong, Haoran Wu, Xueqi Li, Sa Wang, David Boland, Yungang Bao, Kan Shi |
| 2026 | Uni-STC: Unified Sparse Tensor Core. Haocheng Lian, Qiyue Zhang, Xinran Zhao, Meichen Dong, Yijie Nie, Zhengyi Zhao, Junzhong Shen, Wei Guo, Chun Huang, Bingcai Sui, Weifeng Liu |
| 2026 | UniFHE: Faster Accelerator for FHE with Diverse Algebraic Structure and Balanced Memory System. Qingyun Niu, Lutan Zhao, Ming Cai, Kai Li, Dan Meng, Rui Hou |
| 2026 | V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval. Donghyuk Kim, Sejeong Yang, Wonjin Shin, Joo-Young Kim |
| 2026 | VAR-Turbo: Unlocking the Potential of Visual Autoregressive Models Through Dual Redundancy. Xujiang Xiang, Fengbin Tu |
| 2026 | VectorLiteRAG: Latency-Aware and Fine-Grained Resource Partitioning for Efficient RAG. Junkyum Kim, Divya Mahajan |
| 2026 | VeloxGNN: Efficient Out-of-Core GNN Training with Delayed Gradient Propagation. Yi Li, Tsun-Yu Yang, Zhaoyan Shen, Ming-Chang Yang, Bingzhe Li |
| 2026 | WATOS: Efficient LLM Training Strategies and Architecture Co-Exploration for Wafer-Scale Chip. Huizheng Wang, Zichuan Wang, Hongbin Wang, Jingxiang Hou, Taiquan Wei, Chao Li, Yang Hu, Shouyi Yin |
| 2026 | eGPU: Production-Scale Elastic Sharing Over 10,000 GPUs. Xiaochuan Tang, Hao Qi, Jianbo Dong, Yinghao Yu, Zhennan Xue, Zhengyu Zhang, Daocheng Ying, Zheng Cao, Xiaoyi Lu |
| 2026 | zkPHIRE: A Programmable Accelerator for ZKPs over HIgh-degRee, Expressive Gates. Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Siddharth Garg, Brandon Reagen |
| 2026 | µShare: Non-Intrusive Kernel Co-Locating on NVIDIA GPUs. Wenhao Huang, Zhaolin Duan, Laiping Zhao, Yuhao Zhang, Yanjie Wang, Yiming Li, Yihan Wang, Yichi Chen, Zhihang Tang, Kang Chen, Deze Zeng, Wenxin Li, Keqiu Li |