ISPASS B

49 papers

YearTitle / Authors
2025A Flexible and Accurate Circuit-Level Substrate for Future DRAM Design and Analysis.
S. M. Mojahidul Ahsan, Mohammad Nouri, Ramesh Reddy Ganapam, Mohammad Alian, Tamzidul Hoque
2025A Real-Time, Auto-Regression Method for in-Situ Feature Extraction in Hydrodynamics Simulations.
Kewei Yan, Yonghong Yan
2025ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput.
Junsoo Kim, Hunjong Lee, Geonwoo Ko, Gyubin Choi, Seri Ham, Seongmin Hong, Joo-Young Kim
2025ASLink: Modeling Multi-GPU Execution in Accel-Sim.
Christin Bose, Cesar Avalos, Junrui Pan, Yechen Liu, Mahmoud Khairy, Clay Hughes, Timothy G. Rogers
2025An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN Accelerators.
Fareed Qararyah, Mohammad Ali Maleki, Pedro Trancoso
2025Analysis of the RISC-V Vector Extension for Vulkan Graphics Kernels.
Martin Troiber, Martin Schulz, Blaise Tine, Hyesoon Kim
2025Beethoven: A Heterogeneous Multi-Core Accelerator System Composer.
Chris Kjellqvist, Brendan Peercy, Alvin R. Lebeck, Lisa Wu Wills
2025Benchmarking 3D Gaussian Splatting Rendering.
Saichand Samudrala, Sushant Kondguli, Paul Gratz
2025Beyond the Numbers: Measuring Android Performance Through User Perception.
Jaeheon Lee, Juhyung Park, Seonggyun Oh, Jinhyung Koo, Sungjin Lee
2025COCOSSim: A Cycle-Accurate Simulator for Heterogeneous Systolic Array Architectures.
Mansi Choudhary, Chris Kjellqvist, Jiaao Ma, Lisa Wu Wills
2025COSMOS: An LLC Contention Slowdown Model for Heterogeneous Multi-Core Systems.
Yongju Lee, Jaewon Kwon, Cheolhwan Kim, Enhyeok Jang, Jiwon Lee, Hyunwuk Lee, Won Woo Ro
2025Carbon-Aware Server Replacement.
Iris Uwizeyimana, Natalie Enright Jerger
2025Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power Implications.
Seonho Lee, Jihwan Oh, Seokjin Go, Divya Mahajan
2025Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures.
Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen
2025ConCCL: Optimizing ML Concurrent Computation and Communication with GPU DMA Engines.
Anirudha Agrawal, Shaizeen Aga, Suchita Pati, Mahzabeen Islam
2025Concurrent PIM and Load/Store Servicing in PIM-Enabled Memory.
Sudhanshu Gupta, Niti Madan, Sooraj Puthoor, Nuwan Jayasena, Sandhya Dwarkadas
2025Dissecting Performance Overheads of Confidential Computing on GPU-based Systems.
Yang Yang, Mohammad Sonji, Adwait Jog
2025Energon: A Sustainability-Driven Modeling Framework for AI Data Centers.
Wenzhe Guo, Joyjit Kundu, Uras Tos, Giuliano Sisto, Cedric Rolin, Lars-Åke Ragnarsson, Timon Evenblij
2025Evaluating Compute in Memory Architectures for Matrix Multiplication: A Dataflow-Centric Perspective.
Tanvi Sharma, Indranil Chakraborty, Mustafa Fayez Ali, Kaushik Roy
2025Evaluation and Comparison of the Energy Efficiency of Several Intel Multicore Processors.
Thomas Rauber, Gudula Rünger
2025Evaluation of MindPalace for Chip Design Tradeoffs on Function-as-a-Service.
Kaifeng Xu, Georgios Tziantzioulis, David Wentzlaff
2025Exploring Constrained Dataflow Accelerators for Real-Time Multi-Task Multi-Model Ml Workloads.
Jamin Seo, Jianming Tong, Tushar Krishna, Hyoukjun Kwon
2025FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUs.
Carlos Agulló-Domingo, Óscar Vera-López, Seyda Nur Güzelhan, Lohit Daksha, Aymane El Jerari, Kaustubh Shivdikar, Rashmi S. Agrawal, David R. Kaeli, Ajay Joshi, José L. Abellán
2025FinGraV: Methodology for Fine-Grain GPU Power Visibility and Insights.
Varsha Singhania, Shaizeen Aga, Mohamed Assem Ibrahim
2025GPU Simulation Acceleration via Parallelization.
Rodrigo Huerta, Antonio González
2025Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability.
Zishen Wan, Jiayi Qian, Yuhang Du, Jason Jabbour, Yilun Du, Yang Zhao, Arijit Raychowdhury, Tushar Krishna, Vijay Janapa Reddi
2025Hierarchical Traversal Stack Design Using Shared Memory for GPU Ray Tracing.
Eunsoo Jung, Eunbi Jeong, Gunjae Koo, Yunho Oh, Myung Kuk Yoon
2025IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2025, Ghent, Belgium, May 11-13, 2025
2025Identifying Important Data Transformations for Synthesizing Effective Lossless Compressors.
Noushin Azami, Martin Burtscher
2025Intel ® in-Memory Analytics Accelerator: Performance Characterization and Guidelines.
Jaeyoung Kang, Qirong Xia, Ipoom Jeong, Yongjoo Park, Nam Sung Kim
2025Interconnect Performance Estimation for ML Accelerators via Lightweight Analytical Model.
Rahul Tripathy, Sumit K. Mandal
2025La Superba: Leveraging a Self-Comparison Method to Understand the Performance Benefits of Sparse Acceleration Optimizations.
Nebil Ozer, Gregory Kollmer, Ramyad Hadidi, Bahar Asgari
2025Library of Networks: An Online Tool for Design and Analysis of Network Topologies.
Aniket Chatterjee, Conor James Green, Mithuna Thottethodi
2025Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs.
Matin Raayai Ardakani, Andrew Nguyen, Ivan Rosales, Daoxuan Xu, Yuwei Sun, Yifan Sun, David Kaeli, Norman Rubin
2025MeMo: Enhancing Representative Sampling via Mechanistic Micro-Model Signatures.
Chenji Han, Huai Xu, Guangyao Guo, Yuxuan Wu, Fuxin Zhang
2025Measuring Performance Overheads of Software Memory Management Using Functional-First Simulators.
Yves Vandriessche, Wim Heirman, Ed Nutting, Jeremy Birch, Judah Daniels, Mae Hood, Pascal Costanza
2025Multi-Core Aware Evaluation of Prefetchers.
Martí Torrents, Paul Caheny, Stijn Eyerman, Wim Heirman
2025PIM-BEACON: A Benchmarking and Emulation Framework Supporting Adaptive CONfigurations in DRAM-Based Processing-in-Memory Systems.
Inseong Hwang, Jihoon Jang, Chaewon Park, Hyun Kim
2025Performance Analysis of GEMM Workloads on the AMD Versal Platform.
Kaustubh Manohar Mhatre, Venkata Guru Prashanth Mulleti, Curt John Bansil, Endri Taka, Aman Arora
2025PowerSensor3: A Fast and Accurate Open Source Power Measurement Tool.
Steven van der Vlugt, Leon C. Oostrum, Gijs Schoonderbeek, Ben van Werkhoven, Bram Veenboer, Krijn Doekemeijer, John W. Romein
2025Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson.
Abhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle
2025RayFlex: An Open-Source RTL Implementation of the Hardware Ray Tracer Datapath.
Fangjia Shen, Aaron Barnes, Anusuya Nallathambi, Timothy G. Rogers
2025SAGA: A Surrogate Assisted Genetic Algorithm for Fast CPU Power Virus Generation.
Panteleimonas Chatzimiltis, Georgia Antoniou, Haris Volos, Yiannakis Sazeides
2025SCALE-Sim V3: a Modular Cycle-Accurate Systolic Accelerator Simulator for End-To-End System Analysis.
Ritik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdar, Tushar Krishna
2025TPNM: A CXL Based General Purpose Tiered Process Near Memory Framework.
Pingyi Huo, Anusha Devulapally, Hasan Al Maruf, Meena Arunachalam, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan
2025The Fake-Busy and True-Idle Problems of Running Graph Applications on Chiplet-Based Multi-Cores.
Rashid Aligholipour, Yuan Yao
2025The Future of Instruction-Level Parallelism (ILP).
Alexandra W. Chadwick, Márton Erdos, Utpal Bora, Akshay Bhosale, Bob Lytton, Yuxin Guo, Richard Cooper, Giacomo Gabrielli, Timothy M. Jones
2025Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads.
Rachid Karami, Sheng-Chun Kao, Hyoukjun Kwon
2025Use Equal-Work or Equal-Time Speedup, Not Geomean Speedup.
Lieven Eeckhout