| 2025 | A Flexible and Accurate Circuit-Level Substrate for Future DRAM Design and Analysis. S. M. Mojahidul Ahsan, Mohammad Nouri, Ramesh Reddy Ganapam, Mohammad Alian, Tamzidul Hoque |
| 2025 | A Real-Time, Auto-Regression Method for in-Situ Feature Extraction in Hydrodynamics Simulations. Kewei Yan, Yonghong Yan |
| 2025 | ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput. Junsoo Kim, Hunjong Lee, Geonwoo Ko, Gyubin Choi, Seri Ham, Seongmin Hong, Joo-Young Kim |
| 2025 | ASLink: Modeling Multi-GPU Execution in Accel-Sim. Christin Bose, Cesar Avalos, Junrui Pan, Yechen Liu, Mahmoud Khairy, Clay Hughes, Timothy G. Rogers |
| 2025 | An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN Accelerators. Fareed Qararyah, Mohammad Ali Maleki, Pedro Trancoso |
| 2025 | Analysis of the RISC-V Vector Extension for Vulkan Graphics Kernels. Martin Troiber, Martin Schulz, Blaise Tine, Hyesoon Kim |
| 2025 | Beethoven: A Heterogeneous Multi-Core Accelerator System Composer. Chris Kjellqvist, Brendan Peercy, Alvin R. Lebeck, Lisa Wu Wills |
| 2025 | Benchmarking 3D Gaussian Splatting Rendering. Saichand Samudrala, Sushant Kondguli, Paul Gratz |
| 2025 | Beyond the Numbers: Measuring Android Performance Through User Perception. Jaeheon Lee, Juhyung Park, Seonggyun Oh, Jinhyung Koo, Sungjin Lee |
| 2025 | COCOSSim: A Cycle-Accurate Simulator for Heterogeneous Systolic Array Architectures. Mansi Choudhary, Chris Kjellqvist, Jiaao Ma, Lisa Wu Wills |
| 2025 | COSMOS: An LLC Contention Slowdown Model for Heterogeneous Multi-Core Systems. Yongju Lee, Jaewon Kwon, Cheolhwan Kim, Enhyeok Jang, Jiwon Lee, Hyunwuk Lee, Won Woo Ro |
| 2025 | Carbon-Aware Server Replacement. Iris Uwizeyimana, Natalie Enright Jerger |
| 2025 | Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power Implications. Seonho Lee, Jihwan Oh, Seokjin Go, Divya Mahajan |
| 2025 | Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures. Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen |
| 2025 | ConCCL: Optimizing ML Concurrent Computation and Communication with GPU DMA Engines. Anirudha Agrawal, Shaizeen Aga, Suchita Pati, Mahzabeen Islam |
| 2025 | Concurrent PIM and Load/Store Servicing in PIM-Enabled Memory. Sudhanshu Gupta, Niti Madan, Sooraj Puthoor, Nuwan Jayasena, Sandhya Dwarkadas |
| 2025 | Dissecting Performance Overheads of Confidential Computing on GPU-based Systems. Yang Yang, Mohammad Sonji, Adwait Jog |
| 2025 | Energon: A Sustainability-Driven Modeling Framework for AI Data Centers. Wenzhe Guo, Joyjit Kundu, Uras Tos, Giuliano Sisto, Cedric Rolin, Lars-Åke Ragnarsson, Timon Evenblij |
| 2025 | Evaluating Compute in Memory Architectures for Matrix Multiplication: A Dataflow-Centric Perspective. Tanvi Sharma, Indranil Chakraborty, Mustafa Fayez Ali, Kaushik Roy |
| 2025 | Evaluation and Comparison of the Energy Efficiency of Several Intel Multicore Processors. Thomas Rauber, Gudula Rünger |
| 2025 | Evaluation of MindPalace for Chip Design Tradeoffs on Function-as-a-Service. Kaifeng Xu, Georgios Tziantzioulis, David Wentzlaff |
| 2025 | Exploring Constrained Dataflow Accelerators for Real-Time Multi-Task Multi-Model Ml Workloads. Jamin Seo, Jianming Tong, Tushar Krishna, Hyoukjun Kwon |
| 2025 | FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUs. Carlos Agulló-Domingo, Óscar Vera-López, Seyda Nur Güzelhan, Lohit Daksha, Aymane El Jerari, Kaustubh Shivdikar, Rashmi S. Agrawal, David R. Kaeli, Ajay Joshi, José L. Abellán |
| 2025 | FinGraV: Methodology for Fine-Grain GPU Power Visibility and Insights. Varsha Singhania, Shaizeen Aga, Mohamed Assem Ibrahim |
| 2025 | GPU Simulation Acceleration via Parallelization. Rodrigo Huerta, Antonio González |
| 2025 | Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability. Zishen Wan, Jiayi Qian, Yuhang Du, Jason Jabbour, Yilun Du, Yang Zhao, Arijit Raychowdhury, Tushar Krishna, Vijay Janapa Reddi |
| 2025 | Hierarchical Traversal Stack Design Using Shared Memory for GPU Ray Tracing. Eunsoo Jung, Eunbi Jeong, Gunjae Koo, Yunho Oh, Myung Kuk Yoon |
| 2025 | IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2025, Ghent, Belgium, May 11-13, 2025 |
| 2025 | Identifying Important Data Transformations for Synthesizing Effective Lossless Compressors. Noushin Azami, Martin Burtscher |
| 2025 | Intel ® in-Memory Analytics Accelerator: Performance Characterization and Guidelines. Jaeyoung Kang, Qirong Xia, Ipoom Jeong, Yongjoo Park, Nam Sung Kim |
| 2025 | Interconnect Performance Estimation for ML Accelerators via Lightweight Analytical Model. Rahul Tripathy, Sumit K. Mandal |
| 2025 | La Superba: Leveraging a Self-Comparison Method to Understand the Performance Benefits of Sparse Acceleration Optimizations. Nebil Ozer, Gregory Kollmer, Ramyad Hadidi, Bahar Asgari |
| 2025 | Library of Networks: An Online Tool for Design and Analysis of Network Topologies. Aniket Chatterjee, Conor James Green, Mithuna Thottethodi |
| 2025 | Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs. Matin Raayai Ardakani, Andrew Nguyen, Ivan Rosales, Daoxuan Xu, Yuwei Sun, Yifan Sun, David Kaeli, Norman Rubin |
| 2025 | MeMo: Enhancing Representative Sampling via Mechanistic Micro-Model Signatures. Chenji Han, Huai Xu, Guangyao Guo, Yuxuan Wu, Fuxin Zhang |
| 2025 | Measuring Performance Overheads of Software Memory Management Using Functional-First Simulators. Yves Vandriessche, Wim Heirman, Ed Nutting, Jeremy Birch, Judah Daniels, Mae Hood, Pascal Costanza |
| 2025 | Multi-Core Aware Evaluation of Prefetchers. Martí Torrents, Paul Caheny, Stijn Eyerman, Wim Heirman |
| 2025 | PIM-BEACON: A Benchmarking and Emulation Framework Supporting Adaptive CONfigurations in DRAM-Based Processing-in-Memory Systems. Inseong Hwang, Jihoon Jang, Chaewon Park, Hyun Kim |
| 2025 | Performance Analysis of GEMM Workloads on the AMD Versal Platform. Kaustubh Manohar Mhatre, Venkata Guru Prashanth Mulleti, Curt John Bansil, Endri Taka, Aman Arora |
| 2025 | PowerSensor3: A Fast and Accurate Open Source Power Measurement Tool. Steven van der Vlugt, Leon C. Oostrum, Gijs Schoonderbeek, Ben van Werkhoven, Bram Veenboer, Krijn Doekemeijer, John W. Romein |
| 2025 | Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson. Abhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle |
| 2025 | RayFlex: An Open-Source RTL Implementation of the Hardware Ray Tracer Datapath. Fangjia Shen, Aaron Barnes, Anusuya Nallathambi, Timothy G. Rogers |
| 2025 | SAGA: A Surrogate Assisted Genetic Algorithm for Fast CPU Power Virus Generation. Panteleimonas Chatzimiltis, Georgia Antoniou, Haris Volos, Yiannakis Sazeides |
| 2025 | SCALE-Sim V3: a Modular Cycle-Accurate Systolic Accelerator Simulator for End-To-End System Analysis. Ritik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdar, Tushar Krishna |
| 2025 | TPNM: A CXL Based General Purpose Tiered Process Near Memory Framework. Pingyi Huo, Anusha Devulapally, Hasan Al Maruf, Meena Arunachalam, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan |
| 2025 | The Fake-Busy and True-Idle Problems of Running Graph Applications on Chiplet-Based Multi-Cores. Rashid Aligholipour, Yuan Yao |
| 2025 | The Future of Instruction-Level Parallelism (ILP). Alexandra W. Chadwick, Márton Erdos, Utpal Bora, Akshay Bhosale, Bob Lytton, Yuxin Guo, Richard Cooper, Giacomo Gabrielli, Timothy M. Jones |
| 2025 | Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads. Rachid Karami, Sheng-Chun Kao, Hyoukjun Kwon |
| 2025 | Use Equal-Work or Equal-Time Speedup, Not Geomean Speedup. Lieven Eeckhout |