| 2025 | 34th International Conference on Parallel Architectures and Compilation Techniques, PACT 2025, Irvine, CA, USA, November 3-6, 2025 |
| 2025 | A Stable Marriage Requires a Shared Residence with Low Contention and Mutual Complementarity. Jiaxin Liu, Rubao Lee, Cathy H. Xia, Xia Odong Zhang |
| 2025 | ANG: Accelerating NFA processing on GPUs via Exploring Multi-Level Fine-Grained Parallelism. Yuguang Wang, Yunmo Zhang, Zeyu Liu, Junqiao Qiu, Zhenlin Wang |
| 2025 | Accelerating DFS-based Subgraph Matching on GPU via Reusing Intersection. Chen Chen, Shanzhi Gu, Junsheng Chang, Li Shen |
| 2025 | Agentic Auto-Scheduling: An Experimental Study of LLM-Guided Loop Optimization. Massinissa Merouani, Islem Kara Bernou, Riyadh Baghdadi |
| 2025 | Automatic Code-Generation for Accelerating Structured-Mesh-Based Explicit Numerical Solvers on FPGAs. Beniel Thileepan, Suhaib A. Fahmy, Gihan R. Mudalige |
| 2025 | Automatic Generation of Actor-based Parallelism from Shared-Memory Parallel Programs. Jun Shirako, Vivek Sarkar |
| 2025 | Bancroft: Genomics Acceleration Beyond On-Device Memory. Se-Min Lim, Seongyoung Kang, Sang-Woo Jun |
| 2025 | Bit-Level Semantics: Scalable RAG Retrieval with Neurosymbolic Hyperdimensional Computing. Hyunsei Lee, Shinhyoung Jang, Jaewoo Gwak, Jongho Park, Yeseong Kim |
| 2025 | CPC: Coordinated Page Cache for Serverless Computing. Keun Soo Lim, Yunjay Hong, Jongheon Jeong, Sam Son, Donguk Kim, Yeonhong Park, Jae W. Lee, Jinkyu Jeong |
| 2025 | Cache Miss Curve Analysis via Cardinality Domain. Eishi Arima, Martin Schulz |
| 2025 | CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations. Zhuolun Jiang, Songyue Wang, Xiaokun Pei, Tianyue, Mingyu Chen |
| 2025 | DMO-DB: Mitigating the Data Movement Bottlenecks of GPU-Accelerated Relational OLAP. Chaemin Lim, Suhyun Lee, Jinwoo Choi, Joonsung Kim, Jinho Lee, Youngsok Kim |
| 2025 | Doppeladler: Adaptive Tensor Parallelism for Latency-Critical LLM Deployment on CPU-GPU Integrated End-User Device. Jiazhi Jiang, Xiao Liu, Jiangsu Du, Dan Huang, Yutong Lu |
| 2025 | EARTH: Efficient Architecture for RISC-V Vector Memory Access. Hongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang |
| 2025 | Energy-Efficient Acceleration of Hash-Based Post-Quantum Cryptographic Schemes on Embedded Spatial Architectures. Yanze Wu, Md Tanvir Arafin |
| 2025 | Exploring Memory Tiering Systems in the CXL Era via FPGA-based Emulation and Device-Side Management. Yiqi Chen, Xiping Dong, Zhe Zhou, Zhao Wang, Jie Zhang, Guangyu Sun |
| 2025 | FLASH: An Abstract Machine for Modelling Fully Homomorphic Encryption Accelerators. Alireza Tabatabaeian, Arrvindh Shriraman |
| 2025 | Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration. Robin Geens, Arne Symons, Marian Verhelst |
| 2025 | GPU Stream-Aware Communication for Effective Pipelining. Naveen Namashivayam, Krishna Kandalla, Pen-Chung Yew, Trey White, Larry Kaplan, Mark Pagel |
| 2025 | Generating Two-Level, GPU-Aware Mappings for Distributed Tensor Computations. Botao Wu, Martin Kong |
| 2025 | Guess, Measure & Edit: Using Lowering to Lift Tensor Code. José Wesley de Souza Magalhães, Jackson Woodruff, Jordi Armengol-Estapé, Alexander Brauckmann, Luc Jaulmes, Elizabeth Polgreen, Michael F. P. O'Boyle |
| 2025 | Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations. Yujeong Choi, John Kim, Minsoo Rhu |
| 2025 | LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers. Massinissa Merouani, Afif Boudaoud, Iheb Nassim Aouadj, Nassim Tchoulak, Islem Kara Bernou, Hamza Benyamina, Fatima Benbouzid-Si Tayeb, Karima Benatchba, Hugh Leather, Riyadh Baghdadi |
| 2025 | LibraPIM: Dynamic Load Rebalancing to Maximize Utilization in PIM-Assisted LLM Inference Systems. Hyeongjun Cho, Yoonho Jang, HyunGi Kim, Seongwook Kim, Keewon Kwon, Gwangsun Kim, Seokin Hong |
| 2025 | Multiway Merge Partitioning for Sparse-Sparse Matrix Multiplication on GPUs. Eric Lorimer, Ruobing Han, Sung Ha Kang, Hyesoon Kim |
| 2025 | Optimize Winograd Convolution for a Novel MIMD Many-core Architecture PEZY-SC3s. Yi Zhou, Qinglin Wang, Lian Wang, Zhiyan Liu, Bingwei Wang, Feiming Liu, Xiangdong Pei, Jie Liu |
| 2025 | Optimizing 3D Gaussian Splattering for Mobile GPUs. Md. Musfiqur Rahman Sanim, Zhihao Shu, Bahram Afsharmanesh, AmirAli Mirian, Jiexiong Guan, Wei Niu, Bin Ren, Gagan Agrawal |
| 2025 | POSTER: DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures. Geraldo F. Oliveira, Alain Kohli, David Novo, Ataberk Olgun, A. Giray Yaglikçi, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu |
| 2025 | POSTER: IRISX: A Dynamic Trade-off System for Performance Portability on Multi-Accelerator Platforms. Sanil Rao, Mohammad Alaul Haque Monil, Het Mankad, Narasinga Rao Miniskar, Keita Teranishi, Jeffrey S. Vetter, Franz Franchetti |
| 2025 | POSTER: PIMAP: Characterizing a Real Processing-in-Memory System for Analytical Data Processing. Manos Frouzakis, Juan Gómez-Luna, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu |
| 2025 | Poster: HeteroSched: Co-Optimizing Scheduling and Parallelization for Deep Learning Workloads. Bahram Afsharmanesh, Md. Musfiqur Rahman Sanim, AmirAli Mirian, Gagan Agrawal |
| 2025 | Poster:Value-Aware Scheduler for Energy Reduction. Haiyue Ma, Kaifeng Xu, David Wentzlaff |
| 2025 | SCREME: A Scalable Framework for Resilient Memory Design. Fan Li, Mimi Xie, Yanan Guo, Huize Li, Xin Xin |
| 2025 | SPipe: Hybrid GPU and CPU Pipeline for Training LLMs under Memory Pressure. Junyeol Ryu, Yujin Jeong, Daeyoung Park, Jinpyo Kim, Heehoon Kim, Jaejin Lee |
| 2025 | Salient Store: Enabling Smart Storage for Continuous Learning Edge Servers. Cyan Subhra Mishra, Deeksha Chaudhary, Mahmut Taylan Kandemir, Chita R. Das |
| 2025 | Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits. Dowon Kim, Minjae Lee, Janghyeon Kim, Hyucksung Kwon, Hyeonggyu Jeong, Sang-Soo Park, Minyong Yoon, Si-Dong Roh, Yongsuk Kwon, Jinin So, Jungwook Choi |
| 2025 | ScaleMoE: A Fast and Scalable Distributed Training Framework for Large-Scale Mixture-of-Experts Models. Seohong Choi, Huize Hong, Tae Hee Han, Joonsung Kim |
| 2025 | Squire: A General-Purpose Accelerator to Exploit Fine-Grain Parallelism on Dependency-Bound Kernels. Rubén Langarita, Jesús Alastruey-Benedé, Pablo Ibáñez-Marín, Santiago Marco-Sola, Miquel Moretó, Adrià Armejach |
| 2025 | TPE: XPU-Point: Simulator-Agnostic Sample Selection Methodology for Heterogeneous CPU-GPU Applications. Alen Sabu, Harish Patil, Wim Heirman, Changxi Liu, Trevor E. Carlson |