HPCA A*

123 papers

YearTitle / Authors
2026A Deadlock-Free Bridge Module for Inter-Chiplet Cache-Coherent Communication in an Open Chiplet Ecosystem.
Zhiqiang Chen, Wenwen Fu, Yongwen Wang, Hongwei Zhou
2026A PN-Free Digital 3-SAT Accelerator Using Crossbar Architecture and Frequency-Controlled Counters.
Zhezheng Ren, Chenao Yuan, Yuke Zhang, Shiyu Su
2026AQPIM: Breaking the PIM Capacity Wall for LLMs with in-Memory Activation Quantization.
Kosuke Matsushima, Yasuyuki Okoshi, Masato Motomura, Daichi Fujiki
2026ARIADNE: Adaptive UVM Management for Efficient GPU Memory Oversubscription.
Hyunkyun Shin, Seongtae Bang, Hyungwon Park, Daehoon Kim
2026ASPA: Reassigning DDR5 Parity Bandwidth.
Fan Li, Qiufeng Li, Yanan Guo, Weidong Cao, Xin Xin
2026AUM: Unleashing the Efficiency Potential of Shared Processors with Accelerator Units for LLM Serving.
Xinkai Wang, Chao Li, Yiming Zhuansun, Jinyang Guo, Xiaofeng Hou, Jing Wang, Luping Wang, Weigao Chen, Cheng Huang, Guodong Yang, Liping Zhang, Minyi Guo
2026AccelFlow: Orchestrating an On-Package Ensemble of Fine-Grained Accelerators for Microservices.
Jovan Stojkovic, Abraham Farrell, Zhangxiaowen Gong, Christopher J. Hughes, Josep Torrellas
2026Adaptive Draft Sequence Length: Enhancing Speculative Decoding Throughput on PIM-Enabled Systems.
Runze Wang, Qinggang Wang, Haifeng Liu, Long Zheng, Xiaofei Liao, Hai Jin, Jingling Xue
2026Advancing Full-Stack Acceleration for SchröDinger-Style Quantum Simulation.
Shuang Liang, Yuncheng Lu, Ce Guo, Paul H. J. Kelly, Wayne Luk, Hongxiang Fan
2026An Efficient and Scalable Hardware Architecture for Number Theoretic Transform on FPGA with Design Automation.
Yilan Zhu, Geng Yang, Xingyu Tian, Dilshan Kumarathunga, Liang Kong, Xianglong Deng, Shengyu Fan, Guang Fan, Guiming Shi, Lei Chen, Bo Zhang, Yisong Chang, Shoumeng Yan, Zhenman Fang, Mingzhe Zhang
2026Area Bloating and the Future of Specialization.
Qixuan Yu, David Wentzlaff
2026Athena: Synergizing Data Prefetching and Off-Chip Prediction via Online Reinforcement Learning.
Rahul Bera, Zhenrong Lang, Caroline Hengartner, Konstantinos Kanellopoulos, Rakesh Kumar, Mohammad Sadrosadati, Onur Mutlu
2026AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN Performance.
Seungkwan Kang, Seungjun Lee, Donghyun Gouk, Miryeong Kwon, Hyunkyu Choi, Junhyeok Jang, Sangwon Lee, Huiwon Choi, Jie Zhang, Wonil Choi, Mahmut Taylan Kandemir, Myoungsoo Jung
2026AutoHAAP: Automated Heterogeneity-Aware Asymmetric Partitioning for LLM Training.
Yuanyuan Wang, Nana Tang, Yuyang Wang, Shu Pan, Dingding Yu, Zeyue Wang, Mou Sun, Kejie Fu, Fangyu Wang, Yunchuan Chen, Ning Sun, Fei Yang
2026BARD: Reducing Write Latency of DDR5 Memory by Exploiting Bank-Parallelism.
Suhas K. Vittal, Moinuddin Qureshi
2026BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache.
Dayou Du, Shijie Cao, Jianyi Cheng, Luo Mai, Ting Cao, Mao Yang
2026CLINE: Improving Control Flow Compilation of Quantum Programs with Control Line Encoding.
Anbang Wu, Liqiang Lu, Jianwei Yin, Jingwen Leng, Minyi Guo
2026COMET: Communication and Memory Co-Design for Fine-Grained AI Inference in MCM Accelerators.
Taishu Sheng, Guangyu Sun, Dezun Dong
2026CROPHE: Cross-Operator Dataflow Optimization for Fully Homomorphic Encryption Accelerators.
Xinhua Chen, Jiangbin Dong, Hongren Zheng, Tian Tang, Mingyu Gao
2026Cambricon-CIM: Enabling Energy-Efficient and Error-Resilient Analog CIM Acceleration via Reformation of Coding Bases.
Hongrui Guo, Tianrui Ma, Zidong Du, Mo Zou, Yifan Hao, Yongwei Zhao, Rui Zhang, Wei Li, Xing Hu, Zhiwei Xu, Qi Guo, Tianshi Chen
2026Cambricon-GS: An Accelerator for 3D Gaussian Splatting Training With Gaussian-Pixel Hybrid Parallelism.
Rui Wen, Zhifei Yue, Tianbo Liu, Xinkai Song, Jin Li, Di Huang, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, Tianshi Chen
2026Characterizing Cloud-Native LLM Inference at Bytedance and Exposing Optimization Challenges and Opportunities for Future AI Accelerators.
Jingwei Cai, Dehao Kong, Hantao Huang, Zishan Jiang, Zixuan Ma, Qingyu Guo, Zhenxing Zhang, Guiming Shi, Mingyu Gao, Kaisheng Ma, Minghui Yu
2026CoCoTree: A Computation-Capable Architecture for Collective Communication in Scalable PIM.
Shunchen Shi, Qijia Yang, Fan Yang, Yu Huang, Youwei Zhuo, Zhichun Li, Ninghui Sun, Xueqi Li
2026Cohet: A CXL-Driven Coherent Heterogeneous Computing Framework with Hardware-Calibrated Full-System Simulation.
Yanjing Wang, Lizhou Wu, Sunfeng Gao, Yibo Tang, Junhui Luo, Zicong Wang, Yang Ou, Dezun Dong, Nong Xiao, Mingche Lai
2026Compression-Aware Gradient Splitting for Collective Communications in Distributed Training.
Pranati Majhi, Sabuj Laskar, Abdullah Muzahid, Eun Jung Kim
2026Conduit: Programmer-Transparent Near-Data Processing Using Multiple Compute-Capable Resources in Solid State Drives.
Rakesh Nadig, Vamanan Arulchelvan, Mayank Kabra, Harshita Gupta, Rahul Bera, Nika Mansouri-Ghiasi, Nanditha Rao, Qingcai Jiang, Andreas Kosmas Kakolyris, Yu Liang, Mohammad Sadrosadati, Onur Mutlu
2026Conflux: A High-Performance Keyword Private Retrieval System for Dynamic Datasets.
Zehao Chen, Zhaoyan Shen, Qian Wei, Hang Lu, Lei Ju
2026Count2Multiply: Reliable In-Memory High-Radix Counting.
João Paulo C. de Lima, Benjamin F. Morris III, Asif Ali Khan, Jerónimo Castrillón, Alex K. Jones
2026Cyclone: Designing Efficient and Highly Parallel QCCD Architectural Codesigns for Fault Tolerant Quantum Memory.
Sahil Khan, Abhinav Anand, Kenneth R. Brown, Jonathan M. Baker
2026C³: CXL Coherence Controllers for Heterogeneous Architectures.
Anatole Lefort, David Schall, Nicolò Carpentieri, Julian Pritzi, Soham Chakraborty, Nicolai Oswald, Pramod Bhatotia
2026D'ArQ: A QOC Framework with Causality-Aware Grouping and Basis Selection.
Changheon Lee, Hyungseok Kim, Seungwoo Choi, Youngmin Kim, Won Woo Ro
2026DC-MBQC: A Distributed Compilation Framework for Measurement-Based Quantum Computing.
Yecheng Xue, Rui Yang, Zhiding Liang, Tongyang Li
2026DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics.
Anshu Gupta, Yingqi Cao, Jason Liang, Yatish Turakhia
2026DRACO: A Hardware-Efficient Robot Rigid Body Dynamics Accelerator with Precision-Aware Quantization Framework.
Xingyu Liu, Jiawei Liang, Yipu Zhang, Linfeng Du, Chaofang Ma, Hui Yu, Jiang Xu, Wei Zhang
2026DSAssassin: Cross-VM Side-Channel Attacks by Exploiting Intel Data Streaming Accelerator.
Ben Chen, Kunlin Li, Shuwen Deng, Dongsheng Wang, Yun Chen
2026ELORA: Efficient LoRA and KV Cache Management for Multi-LoRA LLM Serving.
Jiuchen Shi, Hang Zhang, Yixiao Wang, Quan Chen, Yizhou Shan, Kaihua Fu, Wei Wang, Minyi Guo
2026ESTroM: Element-Flow Architecture for Processing Sparse Tractable Probabilistic Models.
Anjunyi Fan, Xuejie Liu, Anji Liu, Qiuping Wu, Jiaqi Yang, Yuchao Qin, Guy Van den Broeck, Yitao Liang, Bonan Yan
2026Enterprise Class On-Chip Accelerator Integration.
Deanna Postles Dunn Berger, Alper Buyuktosunoglu, Craig R. Walters, Robert J. Sonnelitter, Hailey Nicholson, Ashraf ElSharif, Yamil Rivera, Avery Francois, Cédric Lichtenau, Jason Kohl
2026Exploration of LLM Workload Reliability Based on di/dt Effects and Voltage Droops.
Zhixing Jiang, Justin Garrigus, Allison Seigler, Ethan Syed, Yan-Lun Huang, Mehdi Sadi, Tawfik Rahal-Arabi, Lizy Kurian John
2026FACE: Fully Overlapped PD Scheduling and Multi-Level Architecture Co-Exploration on Wafer.
Zheng Xu, Dehao Kong, Jiaxin Liu, Dingcheng Jiang, Xu Dai, Jinyi Deng, Yang Hu, Shouyi Yin
2026FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection.
Ziyu Huang, Yangjie Zhou, Zihan Liu, Xinhao Luo, Yijia Diao, Minyi Guo, Jidong Zhai, Yu Feng, Chen Zhang, Anbang Wu, Jingwen Leng
2026Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models.
Chiyue Wei, Cong Guo, Junyao Zhang, Haoxuan Shan, Yifan Xu, Ziyue Zhang, Yudong Liu, Qinsi Wang, Changchun Zhou, Hai Helen Li, Yiran Chen
2026FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing.
Yuzhe Fu, Changchun Zhou, Hancheng Ye, Bowen Duan, Qiyu Huang, Chiyue Wei, Cong Guo, Hai Helen Li, Yiran Chen
2026Fully Parallelized BP Decoding for Quantum LDPC Codes Can Outperform BP-OSD.
Ming Wang, Ang Li, Frank Mueller
2026GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering.
Junseo Lee, Sangyun Jeon, Jungi Lee, Junyong Park, Jaewoong Sim
2026GenPairX: A Hardware-Algorithm Co-Designed Accelerator for Paired-End Read Mapping.
Julien Eudine, Chu Li, Zhuo Cheng, Renzo Andri, Can Firtina, Mohammad Sadrosadati, Nika Mansouri-Ghiasi, Konstantina Koliogeorgi, Anirban Nag, Arash Tavakkol, Haiyu Mao, Onur Mutlu, Shai Bergman, Ji Zhang
2026GustavSNN: Unleashing the Power of Gustavson's Algorithm on SNN Acceleration with Column-Parallel Tick-Batch Dataflow.
Sangwoo Hwang, Donghun Lee, Jahyun Koo, Jaeha Kung
2026GyRot: Leveraging Hidden Synergy Between Rotation and Fine-Grained Group Quantization for Low-Bit LLM Inference.
Sangjin Kim, Yuseon Chou, Byeongcheol Kim, Jungjun Oh, Hoi-Jun Yoo
2026HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs.
Daoxuan Xu, Ying Li, Yuwei Sun, Jie Ren, Yifan Sun
2026HERO-Sign: Hierarchical Tuning and Efficient Compiler-Time GPU Optimizations for SPHINCS+ Signature Generation.
Yaoyun Zhou, Qian Wang
2026HR-DCIM: High-Reliability Floating-Point Digital CIM Architecture With Unified Low-Cost Iterative Error Correction.
Zhen He, Yiqi Wang, Zhiheng Yue, Zihan Wu, Huiming Han, Shaojun Wei, Yang Hu, Fengbin Tu, Shouyi Yin
2026I-POP: Ignite Positive Prefetchers.
Yiquan Lin, Wenhai Lin, Yiquan Chen, Jiexiong Xu, Shishun Cai, Jiarong Ye, Zonghui Wang, Wenzhi Chen
2026IEEE International Symposium on High Performance Computer Architecture, HPCA 2026, Sydney, Australia, January 31 - Feb. 4, 2026
2026IVE: An Accelerator for Single-Server Private Information Retrieval Using Versatile Processing Elements.
Sangpyo Kim, Hyesung Ji, Jongmin Kim, Wonseok Choi, Jaiyoung Park, Jung Ho Ahn
2026Intermittence-Aware Cache Compression.
Gan Fang, Jianping Zeng, Yuchen Zhou, Changhee Jung
2026LEGO: Supporting LLM-Enhanced Games with One Gaming GPU.
Han Zhao, Weihao Cui, Zeshen Zhang, Wenhao Zhang, Jiangtong Li, Quan Chen, Pu Pang, Zijun Li, Zhenhua Han, Yuqing Yang, Minyi Guo
2026LRM-GPU: Alleviating Synchronization Overhead for Multi-Chiplet GPU Architecture.
Baiqing Zhong, Zhirong Ye, Xiaojie Li, Peilin Wang, Haiqiu Huang, Zhaolin Li, Zhiyi Yu, Mingyu Wang
2026Leveraging ASIC AI Chips for Homomorphic Encryption.
Jianming Tong, Tianhao Huang, Jingtian Dang, Leo de Castro, Anirudh Itagi, Anupam Golder, Asra Ali, Jeremy Kun, Jevin Jiang, Arvind, G. Edward Suh, Tushar Krishna
2026LiLo: Harnessing the on-Chip Accelerators in Intel CPUs for Compressed LLM Inference Acceleration.
Hyungyo Kim, Qirong Xia, Jinghan Huang, Nachuan Wang, Younjoo Lee, Jung Ho Ahn, Wajdi K. Feghali, Ren Wang, Nam Sung Kim
2026LoCaLUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM.
Junguk Hong, Changmin Shin, Sukjin Kim, Si Ung Noh, Taehee Kwon, Seongyeon Park, Hanjun Kim, Youngsok Kim, Jinho Lee
2026LowCarb: Carbon-Aware Scheduling of Serverless Functions.
Rohan Basu Roy, Devesh Tiwari
2026MIRZA: Efficiently Mitigating Rowhammer with Randomization and ALERT.
Hritvik Taneja, Ali Hajiabadi, Michele Marazzi, Kaveh Razavi, Moinuddin Qureshi
2026MemSOS: OS-Guided Selective Memory Mirroring.
Junghoon Kim, Jongheon Jeong, Seokwon Moon, Seong Hoon Seo, Yeonhong Park, Jinkyu Jeong, Nam Sung Kim, Jae W. Lee
2026MoEntwine: Unleashing the Potential of Wafer-Scale Chips for Large-Scale Expert Parallel Inference.
Xinru Tang, Jingxiang Hou, Dingcheng Jiang, Taiquan Wei, Jiaxin Liu, Jinyi Deng, Huizheng Wang, Qize Yang, Haoran Shang, Chao Li, Yang Hu, Shouyi Yin
2026N-DIPPER: A Distributed Inter-Die Peak Power Management Network for Nand Systems.
Jinwoo Park, John Kim
2026NP-CAM: Efficient and Scalable DNA Classification using a NoC-Partitioned CAM Architecture.
Benjamin F. Morris III, Tergel Molom-Ochir, Changchun Zhou, Yiran Chen, Alex K. Jones, Hai Li
2026NPUWattch: ML-Based Power, Area, and Timing Modeling for Neural Accelerators.
Sehyeon Kim, Minkwan Kim, Chanho Park, Hanmok Park, Seonghoon Kim, Taigon Song, William J. Song
2026Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates.
Wenjun Yu, Sitian Chen, Cheng Chen, Amelie Chi Zhou
2026Nugget: Portable Program Snippets.
Zhantong Qiu, Mahyar Samani, Jason Lowe-Power
2026ORANGE: Exploring Ockham's Razor for Neural Rendering by Accelerating 3DGS on NPUs with GEMM-Friendly Blending and Balanced Workloads.
Haomin Li, Yun Liang, Fangxin Liu, Bowen Zhu, Zongwu Wang, Yu Feng, Liqiang Lu, Li Jiang, Haibing Guan
2026PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion.
Huizheng Wang, Hongbin Wang, Zichuan Wang, Zhiheng Yue, Yang Wang, Chao Li, Yang Hu, Shouyi Yin
2026PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models.
Eunyeong Cho, Jehyeon Bang, Ranggi Hwang, Minsoo Rhu
2026PIM-Malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures.
Dongjae Lee, Bongjoon Hyun, Youngjin Kwon, Minsoo Rhu
2026PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-Based Long-Context LLM Inference System.
Hyucksung Kwon, Kyungmo Koo, Janghyeon Kim, Woongkyu Lee, Minjae Lee, Gyeonggeun Jung, Hyungdeok Lee, Yousub Jung, Jaehan Park, Yosub Song, Byeongsu Yang, Haerang Choi, Guhyun Kim, Jongsoon Won, Woojae Shin, Changhyun Kim, Gyeongcheol Shin, Yongkee Kwon, Ilkon Kim, Euicheol Lim, John Kim, Jungwook Choi
2026Peregrine: Accelerating TFHE Bootstrapping on GPUs via Multi-Level External Product Co-Design.
Haoqi He, Zhiwei Wang, Lutan Zhao, Dian Jiao, Dan Meng, Rui Hou
2026PhasedStore: Supporting High-Performance Write-Through Cache-Coherence Protocols Under TSO.
Burak Ocalan, Chloe Alverti, Shashwat Jaiswal, Antonis Psistakis, David A. Koufaty, Suyash Mahar, Steven Swanson, Josep Torrellas
2026PinDrop: Breaking the Silence on SDCs in a Large-Scale Fleet.
Peter W. Deutsch, Harish Dattatraya Dixit, Gautham Vunnam, Carl Moran, Eleanor Ozer, Sriram Sankar
2026Pinball: A Cryogenic Predecoder for Quantum Error Correction Decoding Under Circuit-Level Noise.
Alexander Knapen, Guanchen Tao, Jacob Mack, Tomas Bruno, Mehdi Saligane, Dennis Sylvester, Qirui Zhang, Gokul Subramanian Ravi
2026Predicting DRAM Failures at Scale: A Two-Stage Approach for Heterogeneous Systems.
Chenglin Wang, Shouxin Wang, Zhirong Shen, Lu Tang, Shuyue Zhou, Ronglong Wu, Min Zhou, Jialiang Yu, Yiming Zhang
2026Protean: A Programmable Spectre Defense.
Nicholas Mosier, Hamed Nemati, John C. Mitchell, Caroline Trippel
2026Pulse: Fine-Grained Hierarchical Hashing Index for Disaggregated Memory.
Guangyang Deng, Zixiang Yu, Zhirong Shen, Qiangsheng Su, Zhinan Cheng, Jiwu Shu
2026QuCo: Efficient and Flexible Hardware-Driven Automatic Configuration of Tile Transfers in GPUs.
Nicolás Meseguer, Daoxuan Xu, Yifan Sun, Michael Pellauer, José L. Abellán, Manuel E. Acacio
2026REASON: Accelerating Probabilistic Logical Reasoning for Scalable Neuro-Symbolic Intelligence.
Zishen Wan, Che-Kai Liu, Jiayi Qian, Hanchen Yang, Arijit Raychowdhury, Tushar Krishna
2026RPU - A Reasoning Processing Unit.
Matthew Joseph Adiletta, Gu-Yeon Wei, David Brooks
2026ReScue: Reliable and Secure CXL Memory.
Chihun Song, Austin Antony Cruz, Michael Jaemin Kim, Minbok Wi, Gaohan Ye, Kyungsan Kim, Sangyeol Lee, Jung Ho Ahn, Nam Sung Kim
2026ReThermal: Co-Design of Thermal-Aware Static and Dynamic Scheduling for LLM Training on Liquid-Cooled Wafer-Scale Chips.
Chengran Li, Huizheng Wang, Jiaxin Liu, Jingyao Liu, Zhiheng Yue, Xia Li, Shenfei Jiang, Jinyi Deng, Yang Hu, Shouyi Yin
2026RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs.
Hongshi Tan, Yao Chen, Xinyu Chen, Qizhen Zhang, Cheng Chen, Weng-Fai Wong, Bingsheng He
2026RoMe: Row Granularity Access Memory System for Large Language Models.
Hwayong Nam, Seungmin Baek, Jumin Kim, Michael Jaemin Kim, Jung Ho Ahn
2026SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis.
Nika Mansouri-Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu
2026SALT: Track-and-Mitigate Subarrays, Not Rows, for Blast-Radius-Free Rowhammer Defense.
Moinuddin K. Qureshi
2026SCALE: Tackling Communication Bottlenecks in Confidential Distributed Machine Learning.
Joongun Park, Yongqin Wang, Huan Xu, Hanjiang Wu, Mengyuan Li, Tushar Krishna
2026SFD: Towards Segment Fusion Dataflow for Spatial Accelerators.
Fuyu Wang, Minghua Shen, Yufei Ding, Nong Xiao, Yutong Lu
2026SMTcheck: Accurate SMT Interference Prediction to Improve Scheduling Efficiency in Datacenters.
Sanghyun Kim, Jinhyeok Oh, Taehun Kim, Gyutae Kim, Youngsok Kim, Jaehyun Hwang, Joonsung Kim
2026SPLATONIC: Architectural Support for 3D Gaussian Splatting SLAM via Sparse Processing.
Xiaotong Huang, He Zhu, Tianrui Ma, Yuxiang Xiong, Fangxin Liu, Zhezhi He, Yiming Gan, Zihan Liu, Jingwen Leng, Yu Feng, Minyi Guo
2026SSBleed: Non-Speculative Side-Channel Attacks via Speculative Store Bypass on Armv9 CPUs.
Chang Liu, Hongpei Zheng, Xin Zhang, Dapeng Ju, Dongsheng Wang, Yinqian Zhang, Trevor E. Carlson
2026Sassy: SmartNIC-Assisted Notification Delivery for μs-Scale RDMA Workloads.
Hamed Seyedroudbari, Alexandros Daglis
2026Scaling Graph Neural Network Training via Geometric Optimization.
Fangzhou Ye, Lingxiang Yin, Hao Zheng
2026Secret Caching Sauce for High-Performance Secure Memory.
Xu Jiang, Xueliang Wei, Yifei Qu, Dan Feng, Yulai Xie, Wei Tong
2026SpotCC: Facilitating Coded Computation for Prediction Serving Systems on Spot Instances.
Lin Wang, Yuchong Hu, Ziling Duan, Mingqi Li, Chenxuan Yao, Feifan Liu, Xiaolu Li, Leihua Qin, Dan Feng
2026Streamlined on-Chip Temporal Prefetching.
Quang Duong, Calvin Lin
2026Swift: High-Performance Sparse-Dense Matrix Multiplication on GPUs.
Jinyu Hu, Huizhang Luo, Hong Jiang, Marc Casas, Kenli Li, Chubo Liu
2026TEMP: A Memory Efficient Physical-Aware Tensor Partition-Mapping Framework on Wafer-Scale Chips.
Huizheng Wang, Taiquan Wei, Zichuan Wang, Dingcheng Jiang, Qize Yang, Jiaxin Liu, Jingxiang Hou, Chao Li, Jinyi Deng, Yang Hu, Shouyi Yin
2026TENET-v2: Applying Relation-Centric Notation to Model and Optimize Data Swizzle in the Cache of Modern NPU.
Hanyu Zhang, Fangxu Guo, Liqiang Lu, Long Wang, Yunfei Du, Zhe Wang, Jinghan Zhang, Jie Zhang, Chenli Xue, Chengpeng Wu, Ziyi Zhang, Yun Liang, Size Zheng, Jianwei Yin
2026Tempranillo: Non-Speculative Early Register Release.
Carlos Escuin, Paolo Salvatore Galfano, Davide Basilio Bartolini, Leeor Peled, Mehdi Alipour
2026The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective.
Jiin Kim, Byeongjun Shin, Jinha Chung, Minsoo Rhu
2026The Last-Level Branch Predictor Revisited.
David Schall, Mária Duracková, Boris Grot
2026The Memory Processing Unit: A Generalized Interface for End-to-End In-Memory Execution.
Minh S. Q. Truong, Yiqiu Sun, Dawei Xiong, Amol Shah, Alexander Glass, Abraham Farrell, James A. Bain, L. Richard Carley, Saugata Ghose
2026Toward Scalable Gate-Level Parallelism on Trapped-Ion Processors with Racetrack Electrodes.
Enhyeok Jang, Hyungseok Kim, Yongju Lee, Jaewon Kwon, Yipeng Huang, Won Woo Ro
2026Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems.
Chen Zhang, Qijun Zhang, Zhuoshan Zhou, Yijia Diao, Haibo Wang, Zhe Zhou, Zhipeng Tu, Zhiyao Li, Guangyu Sun, Zhuoran Song, Zhigang Ji, Jingwen Leng, Minyi Guo
2026Towards Resource-Efficient Serverless LLM Inference with SLINFER.
Chuhao Xu, Zijun Li, Quan Chen, Han Zhao, Xueyan Tang, Minyi Guo
2026TraceQ: Trace-Based Reconstruction of Quantum Circuit Dataflow in Surface-Code Fault-Tolerant Quantum Computing.
Theodoros Trochatos, Christopher Kang, Andrew Wang, Frederic T. Chong, Jakub Szefer
2026TraceRTL: Agile Performance Evaluation for Microarchitecture Exploration.
Zifei Zhang, Yinan Xu, Sa Wang, Dan Tang, Yungang Bao
2026TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification.
Yang Zhong, Haoran Wu, Xueqi Li, Sa Wang, David Boland, Yungang Bao, Kan Shi
2026Uni-STC: Unified Sparse Tensor Core.
Haocheng Lian, Qiyue Zhang, Xinran Zhao, Meichen Dong, Yijie Nie, Zhengyi Zhao, Junzhong Shen, Wei Guo, Chun Huang, Bingcai Sui, Weifeng Liu
2026UniFHE: Faster Accelerator for FHE with Diverse Algebraic Structure and Balanced Memory System.
Qingyun Niu, Lutan Zhao, Ming Cai, Kai Li, Dan Meng, Rui Hou
2026V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval.
Donghyuk Kim, Sejeong Yang, Wonjin Shin, Joo-Young Kim
2026VAR-Turbo: Unlocking the Potential of Visual Autoregressive Models Through Dual Redundancy.
Xujiang Xiang, Fengbin Tu
2026VectorLiteRAG: Latency-Aware and Fine-Grained Resource Partitioning for Efficient RAG.
Junkyum Kim, Divya Mahajan
2026VeloxGNN: Efficient Out-of-Core GNN Training with Delayed Gradient Propagation.
Yi Li, Tsun-Yu Yang, Zhaoyan Shen, Ming-Chang Yang, Bingzhe Li
2026WATOS: Efficient LLM Training Strategies and Architecture Co-Exploration for Wafer-Scale Chip.
Huizheng Wang, Zichuan Wang, Hongbin Wang, Jingxiang Hou, Taiquan Wei, Chao Li, Yang Hu, Shouyi Yin
2026eGPU: Production-Scale Elastic Sharing Over 10,000 GPUs.
Xiaochuan Tang, Hao Qi, Jianbo Dong, Yinghao Yu, Zhennan Xue, Zhengyu Zhang, Daocheng Ying, Zheng Cao, Xiaoyi Lu
2026zkPHIRE: A Programmable Accelerator for ZKPs over HIgh-degRee, Expressive Gates.
Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Siddharth Garg, Brandon Reagen
2026µShare: Non-Intrusive Kernel Co-Locating on NVIDIA GPUs.
Wenhao Huang, Zhaolin Duan, Laiping Zhao, Yuhao Zhang, Yanjie Wang, Yiming Li, Yihan Wang, Yichi Chen, Zhihang Tang, Kang Chen, Deze Zeng, Wenxin Li, Keqiu Li