MICRO A*

116 papers

YearTitle / Authors
202457th IEEE/ACM International Symposium on Microarchitecture, MICRO 2024, Austin, TX, USA, November 2-6, 2024
2024A Case for Speculative Address Translation with Rapid Validation for GPUs.
Junhyeok Park, Osang Kwon, Yongho Lee, Seongwook Kim, Gwangeun Byeon, Jihun Yoon, Prashant J. Nair, Seokin Hong
2024A Compiler-Like Framework for Optimizing Cryptographic Big Integer Multiplication on GPUs.
Zhuoran Ji, Jianyu Zhao, Zhaorui Zhang, Jiming Xu, Shoumeng Yan, Lei Ju
2024A Framework for Fine-Grained Program Versioning.
Yishen Chen, Saman P. Amarasinghe
2024A Mess of Memory System Benchmarking, Simulation and Application Profiling.
Pouya Esmaili-Dokht, Francesco Sgherzi, Valéria Soldera Girelli, Isaac Boixaderas, Mariana Carmin, Alireza Monemi, Adrià Armejach, Estanislao Mercadal, Germán Llort, Petar Radojkovic, Miquel Moretó, Judit Giménez, Xavier Martorell, Eduard Ayguadé, Jesús Labarta, Emanuele Confalonieri, Rishabh Dubey, Jason Adlard
2024A Scalable, Efficient, and Robust Dynamic Memory Management Library for HLS-based FPGAs.
Qinggang Wang, Long Zheng, Zhaozeng An, Shuyi Xiong, Runze Wang, Yu Huang, Pengcheng Yao, Xiaofei Liao, Hai Jin, Jingling Xue
2024Acamar: A Dynamically Reconfigurable Scientific Computing Accelerator for Robust Convergence and Minimal Resource Underutilization.
Ubaid Bakhtiar, Helya Hosseini, Bahar Asgari
2024Accelerating Zero-Knowledge Proofs Through Hardware-Algorithm Co-Design.
Nikola Samardzic, Simon Langowski, Srinivas Devadas, Daniel Sánchez
2024ActiveN: A Scalable and Flexibly-Programmable Event-Driven Neuromorphic Processor.
Xiaoyi Liu, Zhongzhu Pu, Peng Qu, Weimin Zheng, Youhui Zhang
2024AdapTiV: Sign-Similarity Based Image-Adaptive Token Merging for Vision Transformer Acceleration.
Seungjae Yoo, Hangyeol Kim, Joo-Young Kim
2024Ares-Flash: Efficient Parallel Integer Arithmetic Operations Using NAND Flash Memory.
Jian Chen, Congming Gao, Youyou Lu, Yuhao Zhang, Jiwu Shu
2024Atomic Cache: Enabling Efficient Fine-Grained Synchronization with Relaxed Memory Consistency on GPGPUs Through In-Cache Atomic Operations.
Yicong Zhang, Mingyu Wang, Wangguang Wang, Yangzhan Mai, Haiqiu Huang, Zhiyi Yu
2024Azul: An Accelerator for Sparse Iterative Solvers Leveraging Distributed On-Chip Memory.
Axel Feldmann, Courtney Golden, Yifan Yang, Joel S. Emer, Daniel Sánchez
2024BABOL: A Software-Defined NAND Flash Controller.
Kibin Park, Alberto Lerner, Sangjin Lee, Philippe Bonnet, Yong Ho Song, Philippe Cudré-Mauroux, Jungwook Choi
2024BBS: Bi-Directional Bit-Level Sparsity for Deep Learning Acceleration.
Yuzong Chen, Jian Meng, Jae-sun Seo, Mohamed S. Abdelfattah
2024Beehive: A Flexible Network Stack for Direct-Attached Accelerators.
Katie Lim, Matthew Giordano, Theano Stavrinos, Irene Zhang, Jacob Nelson, Baris Kasikci, Thomas E. Anderson
2024Blenda: Dynamically-Reconfigurable Stacked DRAM.
Mohammad Bakhshalipour, HamidReza Zare, Farid Samandi, Fatemeh Golshan, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad
2024BreakHammer: Enhancing RowHammer Mitigations by Carefully Throttling Suspect Threads.
Oguzhan Canpolat, A. Giray Yaglikçi, Ataberk Olgun, Ismail Emir Yuksel, Yahya Can Tugrul, Konstantinos Kanellopoulos, Oguz Ergin, Onur Mutlu
2024Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture Codesign.
Pouya Haghi, Chunshu Wu, Zahra Azad, Yanfei Li, Andrew Gui, Yuchen Hao, Ang Li, Tony Tong Geng
2024COMPASS: SRAM-Based Computing-in-Memory SNN Accelerator with Adaptive Spike Speculation.
Zongwu Wang, Fangxin Liu, Ning Yang, Shiyuan Huang, Haomin Li, Li Jiang
2024CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization.
Preyesh Dalmia, Rajesh Shashi Kumar, Matthew D. Sinclair
2024CacheCraft: Enhancing GPU Performance under Memory Protection through Reconstructed Caching.
Soyoung Park, Hojung Namkoong, Boyeol Choi, Michael B. Sullivan, Jungrae Kim
2024CamPU: A Multi-Camera Processing Unit for Deep Learning-based 3D Spatial Computing Systems.
Dongseok Im, Hoi-Jun Yoo
2024Cambricon-C: Efficient 4-Bit Matrix Unit via Primitivization.
Yi Chen, Yongwei Zhao, Yifan Hao, Yuanbo Wen, Yuntao Dai, Xiaqing Li, Yang Liu, Rui Zhang, Mo Zou, Xinkai Song, Xing Hu, Zidong Du, Huaping Chen, Qi Guo, Tianshi Chen
2024Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM.
Zhongkai Yu, Shengwen Liang, TianYun Ma, Yunke Cai, Ziyuan Nan, Di Huang, Xinkai Song, Yifan Hao, Jie Zhang, Tian Zhi, Yongwei Zhao, Zidong Du, Xing Hu, Qi Guo, Tianshi Chen
2024Cambricon-M: A Fibonacci-Coded Charge-Domain SRAM-Based CIM Accelerator for DNN Inference.
Hongrui Guo, Mo Zou, Yifan Hao, Zidong Du, Erxiang Ren, Yang Liu, Yongwei Zhao, Tianrui Ma, Rui Zhang, Xing Hu, Fei Qiao, Zhiwei Xu, Qi Guo, Tianshi Chen
2024Chaining Transactions for Effective Concurrency Management in Hardware Transactional Memory.
Víctor Nicolás-Conesa, J. Rubén Titos Gil, Ricardo Fernández-Pascual, Manuel E. Acacio, Alberto Ros
2024Concurrency-Aware Register Stacks for Efficient GPU Function Calls.
Ni Kang, Ahmad Alawneh, Mengchi Zhang, Timothy G. Rogers
2024Customizing Cache Indexing Through Entropy Estimation.
Kevin Weston, Avery Johnson, Vahid Janfaza, Farabi Mahmud, Abdullah Muzahid
2024DRCTL: A Disorder-Resistant Computation Translation Layer Enhancing the Lifetime and Performance of Memristive CIM Architecture.
Heng Zhou, Bing Wu, Huan Cheng, Jinpeng Liu, Taoming Lei, Dan Feng, Wei Tong
2024Defending Against EMI Attacks on Just-In-Time Checkpoint for Resilient Intermittent Systems.
Jaeseok Choi, Hyunwoo Joe, Changhee Jung, Jongouk Choi
2024DelayAVF: Calculating Architectural Vulnerability Factors for Delay Faults.
Peter W. Deutsch, Vincent Quentin Ulitzsch, Sudhanva Gurumurthi, Vilas Sridharan, Joel S. Emer, Mengjia Yan
2024Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective.
Houxiang Ji, Srikar Vanavasam, Yang Zhou, Qirong Xia, Jinghan Huang, Yifan Yuan, Ren Wang, Pekon Gupta, Bhushan Chitlur, Ipoom Jeong, Nam Sung Kim
2024Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page Table.
Osang Kwon, Yongho Lee, Junhyeok Park, Sungbin Jang, Byungchul Tak, Seokin Hong
2024Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching.
Sungmin Yun, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim, Byeongho Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn
2024Elastic Translations: Fast Virtual Memory with Multiple Translation Sizes.
Stratos Psomadakis, Chloe Alverti, Vasileios Karakostas, Christos Katsakioris, Dimitrios Siakavaras, Konstantinos Nikas, Georgios I. Goumas, Nectarios Koziris
2024Extending GPU Ray-Tracing Units for Hierarchical Search Acceleration.
Aaron Barnes, Fangjia Shen, Timothy G. Rogers
2024Flag-Proxy Networks: Overcoming the Architectural, Scheduling and Decoding Obstacles of Quantum LDPC Codes.
Suhas Vittal, Ali Javadi-Abhari, Andrew W. Cross, Lev S. Bishop, Moinuddin Qureshi
2024FloatAP: Supporting High-Performance Floating-Point Arithmetic in Associative Processors.
Kailin Yang, José F. Martínez
2024FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design.
Nandeeka Nayak, Xinrui Wu, Toluwanimi O. Odemuyiwa, Michael Pellauer, Joel S. Emer, Christopher W. Fletcher
2024Fusion-3D: Integrated Acceleration for Instant 3D Reconstruction and Real-Time Rendering.
Sixu Li, Yang Zhao, Chaojian Li, Bowei Guo, Jingqun Zhang, Wenbo Zhu, Zhifan Ye, Cheng Wan, Yingyan Celine Lin
2024GauSPU: 3D Gaussian Splatting Processor for Real-Time SLAM Systems.
Lizhou Wu, Haozhe Zhu, Siqi He, Jiapei Zheng, Chixiao Chen, Xiaoyang Zeng
2024Generalizing Ray Tracing Accelerators for Tree Traversals on GPUs.
Dongho Ha, Lufei Liu, Yuan-Hsi Chou, Seokjin Go, Won Woo Ro, Hung-Wei Tseng, Tor M. Aamodt
2024Genie Cache: Non-Blocking Miss Handling and Replacement in Page-Table-Based DRAM Cache.
Youngin Kim, William J. Song
2024Ghost Arbitration: Mitigating Interconnect Side-Channel Timing Attacks in GPU.
Zhixian Jin, Jaeguk Ahn, Jiho Kim, Hans Kasan, Jina Song, WonJun Song, John Kim
2024Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms.
Yuqi Xue, Yiqi Liu, Lifeng Nai, Jian Huang
2024Hestia: An Efficient Cross-Level Debugger for High-Level Synthesis.
Ruifan Xu, Jin Luo, Yawen Zhang, Yibo Lin, Runsheng Wang, Ru Huang, Yun Liang
2024HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference.
Yiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam
2024HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs.
Jianchao Yang, Mei Wen, Dong Chen, Zhaoyun Chen, Zeyu Xue, Yuhang Li, Junzhong Shen, Yang Shi
2024HyperTEE: A Decoupled TEE Architecture with Secure Enclave Management.
Yunkai Bai, Peinan Li, Yubiao Huang, Michael C. Huang, Shijun Zhao, Lutan Zhao, Fengwei Zhang, Dan Meng, Rui Hou
2024ICED: An Integrated CGRA Framework Enabling DVFS-Aware Acceleration.
Cheng Tan, Miaomiao Jiang, Deepak Patil, Yanghui Ou, Zhaoying Li, Lei Ju, Tulika Mitra, Hyunchul Park, Antonino Tumeo, Jeff Zhang
2024ImPress: Securing DRAM Against Data-Disturbance Errors via Implicit Row-Press Mitigation.
Anish Saxena, Aamer Jaleel, Moinuddin Qureshi
2024IvLeague: Side Channel-Resistant Secure Architectures Using Isolated Domains of Dynamic Integrity Trees.
Md Hafizul Islam Chowdhuryy, Fan Yao
2024LIBRA: Memory Bandwidth- and Locality-Aware Parallel Tile Rendering.
Aurora Tomás, Juan L. Aragón, Joan-Manuel Parcerisa, Antonio González
2024LUCIE: A Universal Chiplet-Interposer Design Framework for Plug-and-Play Integration.
Zixi Li, David Wentzlaff
2024Leveraging Cache Coherence to Detect and Repair False Sharing On-the-fly.
Vipin Patel, Swarnendu Biswas, Mainak Chaudhuri
2024Leviathan: A Unified System for General-Purpose Near-Data Computing.
Brian C. Schwedock, Nathan Beckmann
2024LightWSP: Whole-System Persistence on the Cheap.
Yuchen Zhou, Jianping Zeng, Changhee Jung
2024LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks.
Ruokai Yin, Youngeun Kim, Di Wu, Priyadarshini Panda
2024Localizing the Tag Comparisons in the Wakeup Logic to Reduce Energy Consumption of the Issue Queue.
Kenichiro Mori, Sota Kosugi, Hiroto Yoshida, Hajime Shimada, Hideki Ando
2024Looking into the Black Box: Monitoring Computer Architecture Simulations in Real-Time with AkitaRTM.
Ali Mosallaei, Katherine E. Isaacs, Yifan Sun
2024Low-Overhead General-Purpose Near-Data Processing in CXL Memory Expanders.
Hyungkyu Ham, Jeongmin Hong, Geonwoo Park, Yunseon Shin, Okkyun Woo, Wonhyuk Yang, Jinhoon Bae, Eunhyeok Park, Hyojin Sung, Euicheol Lim, Gwangsun Kim
2024MINT: Securely Mitigating Rowhammer with a Minimalist in-DRAM Tracker.
Moinuddin Qureshi, Salman Qazi, Aamer Jaleel
2024MeMCISA: Memristor-Enabled Memory-Centric Instruction-Set Architecture for Database Workloads.
Yihang Zhu, Lei Cai, Lianfeng Yu, Anjunyi Fan, Longhao Yan, Zhaokun Jing, Bonan Yan, Pek Jun Tiw, Yuqi Li, Yaoyu Tao, Yuchao Yang
2024Memory Allocation Under Hardware Compression.
Muhammad Laghari, Yuqing Liu, Gagandeep Panwar, David Bears, Chandler Jearls, Raghavendra Srinivas, Esha Choukse, Kirk W. Cameron, Ali Raza Butt, Xun Jian
2024Message from the MICRO 2024 General Chairs: "Hi, How Are you?" - "Jeremiah The Innocent" Mural.
Daniel Johnson
2024Message from the MICRO 2024 Program Chairs.
Daniel A. Jiménez, Alaa R. Alameldeen
2024Mosaic: Harnessing the Micro-Architectural Resources of Servers in Serverless Environments.
Jovan Stojkovic, Esha Choukse, Enrique Saurez, Íñigo Goiri, Josep Torrellas
2024Multi-Issue Butterfly Architecture for Sparse Convex Quadratic Programming.
Maolin Wang, Ian McInerney, Bartolomeo Stellato, Fengbin Tu, Stephen P. Boyd, Hayden Kwok-Hay So, Kwang-Ting Cheng
2024NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering.
Zhe Zhou, Yiqi Chen, Tao Zhang, Yang Wang, Ran Shu, Shuotao Xu, Peng Cheng, Lei Qu, Yongqiang Xiong, Jie Zhang, Guangyu Sun
2024Over-Synchronization in GPU Programs.
Ajay Nayak, Arkaprava Basu
2024PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences.
Pingyi Huo, Anusha Devulapally, Hasan Al Maruf, Minseo Park, Krishnakumar Nair, Meena Arunachalam, Gulsum Gudukbay Akbulut, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan
2024PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems.
Dongjae Lee, Bongjoon Hyun, Taehun Kim, Minsoo Rhu
2024PointCIM: A Computing-in-Memory Architecture for Accelerating Deep Point Cloud Analytics.
Xuan-Jun Chen, Han-Ping Chen, Chia-Lin Yang
2024Polymorphic Error Correction.
Evgeny Manzhosov, Simha Sethumadhavan
2024Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs.
Rishabh Jain, Vivek M. Bhasi, Adwait Jog, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das
2024PyPIM: Integrating Digital Processing-in-Memory from Microarchitectural Design to Python Tensors.
Orian Leitersdorf, Ronny Ronen, Shahar Kvatinsky
2024Qoncord: A Multi-Device Job Scheduling Framework for Variational Quantum Algorithms.
Meng Wang, Poulami Das, Prashant J. Nair
2024RAHP: A Redundancy-aware Accelerator for High-performance Hypergraph Neural Network.
Hui Yu, Yu Zhang, Ligang He, Yingqi Zhao, Xintao Li, Ruida Xin, Jin Zhao, Xiaofei Liao, Haikun Liu, Bingsheng He, Hai Jin
2024RTL2MμPATH: Multi-μPATH Synthesis with Applications to Hardware Security Verification.
Yao Hsiao, Nikos Nikoleris, Artem Khyzha, Dominic P. Mulligan, Gustavo Petri, Christopher W. Fletcher, Caroline Trippel
2024Rearchitecting a Neuromorphic Processor for Spike-Driven Brain-Computer Interfacing.
Hunjun Lee, Yeongwoo Jang, Daye Jung, Seunghyun Song, Jangwoo Kim
2024Ring Road: A Scalable Polar-Coordinate-based 2D Network-on-Chip Architecture.
Yinxiao Feng, Wei Li, Kaisheng Ma
2024SCALE: A Structure-Centric Accelerator for Message Passing Graph Neural Networks.
Lingxiang Yin, Sanjay Gandham, Mingjie Lin, Hao Zheng
2024SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators.
Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque
2024SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling.
Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qinze Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin
2024SOPHGO BM1684X: A Commercial High Performance Terminal AI Processor with Large Model Support.
Peng Gao, Yang Liu, Jun Wang, Wanlin Cai, Guangchong Shen, Zonghui Hong, Jiali Qu, Ning Wang
2024SOPHIE: A Scalable Recurrent Ising Machine Using Optically Addressed Phase Change Memory.
Guowei Yang, Sina Karimi, Carlos A. Ríos Ocampo, Ayse K. Coskun, Ajay Joshi
2024SRender: Boosting Neural Radiance Field Efficiency via Sensitivity-Aware Dynamic Precision Rendering.
Zhuoran Song, Houshu He, Fangxin Liu, Yifan Hao, Xinkai Song, Li Jiang, Xiaoyao Liang
2024STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU.
Bingyao Li, Yueqi Wang, Tianyu Wang, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang
2024SUV: Static Analysis Guided Unified Virtual Memory.
Pratheek B, Guilherme Cox, Ján Veselý, Arkaprava Basu
2024SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts.
Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Xiaoyan Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Zhengyu Chen, Kaizhao Liang, Swayambhoo Jain, Urmish Thakker, Dawei Huang, Sumti Jairath, Kevin J. Brown, Kunle Olukotun
2024Scalar Vector Runahead.
Jaime Roelandts, Ajeya Naithani, Sam Ainsworth, Timothy M. Jones, Lieven Eeckhout
2024Secure Prefetching for Secure Cache Systems.
Sumon Nath, Agustín Navarro-Torres, Alberto Ros, Biswabandan Panda
2024Self-Managing DRAM: A Low-Cost Framework for Enabling Autonomous and Efficient DRAM Maintenance Operations.
Hasan Hassan, Ataberk Olgun, A. Giray Yaglikçi, Haocong Luo, Onur Mutlu
2024Sparsepipe: Sparse Inter-operator Dataflow Architecture with Cross-Iteration Reuse.
Yunan Zhang, Po-An Tsai, Hung-Wei Tseng
2024StarNUMA: Mitigating NUMA Challenges with Memory Pooling.
Albert Cho, Alexandros Daglis
2024Stellar: An Automated Design Framework for Dense and Sparse Spatial Accelerators.
Hasan Nazim Genc, Hansung Kim, Prashanth Ganesh, Yakun Sophia Shao
2024Stream-Based Data Placement for Near-Data Processing with Extended Memory.
Yiwei Li, Boyu Tian, Yi Ren, Mingyu Gao
2024SuperCore: An Ultra-Fast Superconducting Processor for Cryogenic Applications.
Junhyuk Choi, Ilkwon Byun, Juwon Hong, Dongmoon Min, Junpyo Kim, Jungmin Cho, Hyeonseong Jeong, Masamitsu Tanaka, Koji Inoue, Jangwoo Kim
2024Surf-Deformer: Mitigating Dynamic Defects on Surface Code via Adaptive Deformation.
Keyi Yin, Xiang Fang, Travis S. Humble, Ang Li, Yunong Shi, Yufei Ding
2024TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning.
William Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Swati Gupta, Tushar Krishna
2024TMiner: A Vertex-Based Task Scheduling Architecture for Graph Pattern Mining.
Zerun Li, Xiaoming Chen, Yinhe Han
2024Temporarily Unauthorized Stores: Write First, Ask for Permission Later.
Juan M. Cebrian, Magnus Jahre, Alberto Ros
2024Terminus: A Programmable Accelerator for Read and Update Operations on Sparse Data Structures.
Hyun Ryong Lee, Daniel Sánchez
2024The Last-Level Branch Predictor.
David Schall, Andreas Sandberg, Boris Grot
2024The TYR Dataflow Architecture: Improving Locality by Taming Parallelism.
Nikhil Agarwal, Mitchell Fream, Souradip Ghosh, Brian C. Schwedock, Nathan Beckmann
2024ThreadFuser: A SIMT Analysis Framework for MIMD Programs.
Ahmad Alawneh, Ni Kang, Mahmoud Khairy, Timothy G. Rogers
2024Timely, Efficient, and Accurate Branch Precomputation.
Aniket Deshmukh, Lingzhe Chester Cai, Yale N. Patt
2024Trinity: A General Purpose FHE Accelerator.
Xianglong Deng, Shengyu Fan, Zhicheng Hu, Zhuoyu Tian, Zihao Yang, Jiangrui Yu, Dingyuan Cao, Dan Meng, Rui Hou, Meng Li, Qian Lou, Mingzhe Zhang
2024UFC: A Unified Accelerator for Fully Homomorphic Encryption.
Minxuan Zhou, Yujin Nam, Xuan Wang, Youhak Lee, Chris Wilkerson, Raghavan Kumar, Sachin Taneja, Sanu Mathew, Rosario Cammarota, Tajana Rosing
2024Uncovering Real GPU NoC Characteristics: Implications on Interconnect Architecture.
Zhixian Jin, Christopher Rocca, Jiho Kim, Hans Kasan, Minsoo Rhu, Ali Bakhoda, Tor M. Aamodt, John Kim
2024Unleashing CPU Potential for Executing GPU Programs Through Compiler/Runtime Optimizations.
Ruobing Han, Jisheng Zhao, Hyesoon Kim
2024VGA: Hardware Accelerator for Scalable Long Sequence Model Inference.
Seung Yul Lee, Hyunseung Lee, Jihoon Hong, SangLyul Cho, Jae W. Lee
2024Veiled Pathways: Investigating Covert and Side Channels Within GPU Uncore.
Yuanqing Miao, Yingtian Zhang, Dinghao Wu, Danfeng Zhang, Gang Tan, Rui Zhang, Mahmut Taylan Kandemir
2024Weeding out Front-End Stalls with Uneven Block Size Instruction Cache.
Roman Brunner, Rakesh Kumar
2024vTrain: A Simulation Framework for Evaluating Cost-Effective and Compute-Optimal Large Language Model Training.
Jehyeon Bang, Yujeong Choi, Myeongwoo Kim, Yongdeok Kim, Minsoo Rhu