PACT B

40 papers

YearTitle / Authors
202534th International Conference on Parallel Architectures and Compilation Techniques, PACT 2025, Irvine, CA, USA, November 3-6, 2025
2025A Stable Marriage Requires a Shared Residence with Low Contention and Mutual Complementarity.
Jiaxin Liu, Rubao Lee, Cathy H. Xia, Xia Odong Zhang
2025ANG: Accelerating NFA processing on GPUs via Exploring Multi-Level Fine-Grained Parallelism.
Yuguang Wang, Yunmo Zhang, Zeyu Liu, Junqiao Qiu, Zhenlin Wang
2025Accelerating DFS-based Subgraph Matching on GPU via Reusing Intersection.
Chen Chen, Shanzhi Gu, Junsheng Chang, Li Shen
2025Agentic Auto-Scheduling: An Experimental Study of LLM-Guided Loop Optimization.
Massinissa Merouani, Islem Kara Bernou, Riyadh Baghdadi
2025Automatic Code-Generation for Accelerating Structured-Mesh-Based Explicit Numerical Solvers on FPGAs.
Beniel Thileepan, Suhaib A. Fahmy, Gihan R. Mudalige
2025Automatic Generation of Actor-based Parallelism from Shared-Memory Parallel Programs.
Jun Shirako, Vivek Sarkar
2025Bancroft: Genomics Acceleration Beyond On-Device Memory.
Se-Min Lim, Seongyoung Kang, Sang-Woo Jun
2025Bit-Level Semantics: Scalable RAG Retrieval with Neurosymbolic Hyperdimensional Computing.
Hyunsei Lee, Shinhyoung Jang, Jaewoo Gwak, Jongho Park, Yeseong Kim
2025CPC: Coordinated Page Cache for Serverless Computing.
Keun Soo Lim, Yunjay Hong, Jongheon Jeong, Sam Son, Donguk Kim, Yeonhong Park, Jae W. Lee, Jinkyu Jeong
2025Cache Miss Curve Analysis via Cardinality Domain.
Eishi Arima, Martin Schulz
2025CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations.
Zhuolun Jiang, Songyue Wang, Xiaokun Pei, Tianyue, Mingyu Chen
2025DMO-DB: Mitigating the Data Movement Bottlenecks of GPU-Accelerated Relational OLAP.
Chaemin Lim, Suhyun Lee, Jinwoo Choi, Joonsung Kim, Jinho Lee, Youngsok Kim
2025Doppeladler: Adaptive Tensor Parallelism for Latency-Critical LLM Deployment on CPU-GPU Integrated End-User Device.
Jiazhi Jiang, Xiao Liu, Jiangsu Du, Dan Huang, Yutong Lu
2025EARTH: Efficient Architecture for RISC-V Vector Memory Access.
Hongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang
2025Energy-Efficient Acceleration of Hash-Based Post-Quantum Cryptographic Schemes on Embedded Spatial Architectures.
Yanze Wu, Md Tanvir Arafin
2025Exploring Memory Tiering Systems in the CXL Era via FPGA-based Emulation and Device-Side Management.
Yiqi Chen, Xiping Dong, Zhe Zhou, Zhao Wang, Jie Zhang, Guangyu Sun
2025FLASH: An Abstract Machine for Modelling Fully Homomorphic Encryption Accelerators.
Alireza Tabatabaeian, Arrvindh Shriraman
2025Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration.
Robin Geens, Arne Symons, Marian Verhelst
2025GPU Stream-Aware Communication for Effective Pipelining.
Naveen Namashivayam, Krishna Kandalla, Pen-Chung Yew, Trey White, Larry Kaplan, Mark Pagel
2025Generating Two-Level, GPU-Aware Mappings for Distributed Tensor Computations.
Botao Wu, Martin Kong
2025Guess, Measure & Edit: Using Lowering to Lift Tensor Code.
José Wesley de Souza Magalhães, Jackson Woodruff, Jordi Armengol-Estapé, Alexander Brauckmann, Luc Jaulmes, Elizabeth Polgreen, Michael F. P. O'Boyle
2025Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations.
Yujeong Choi, John Kim, Minsoo Rhu
2025LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers.
Massinissa Merouani, Afif Boudaoud, Iheb Nassim Aouadj, Nassim Tchoulak, Islem Kara Bernou, Hamza Benyamina, Fatima Benbouzid-Si Tayeb, Karima Benatchba, Hugh Leather, Riyadh Baghdadi
2025LibraPIM: Dynamic Load Rebalancing to Maximize Utilization in PIM-Assisted LLM Inference Systems.
Hyeongjun Cho, Yoonho Jang, HyunGi Kim, Seongwook Kim, Keewon Kwon, Gwangsun Kim, Seokin Hong
2025Multiway Merge Partitioning for Sparse-Sparse Matrix Multiplication on GPUs.
Eric Lorimer, Ruobing Han, Sung Ha Kang, Hyesoon Kim
2025Optimize Winograd Convolution for a Novel MIMD Many-core Architecture PEZY-SC3s.
Yi Zhou, Qinglin Wang, Lian Wang, Zhiyan Liu, Bingwei Wang, Feiming Liu, Xiangdong Pei, Jie Liu
2025Optimizing 3D Gaussian Splattering for Mobile GPUs.
Md. Musfiqur Rahman Sanim, Zhihao Shu, Bahram Afsharmanesh, AmirAli Mirian, Jiexiong Guan, Wei Niu, Bin Ren, Gagan Agrawal
2025POSTER: DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures.
Geraldo F. Oliveira, Alain Kohli, David Novo, Ataberk Olgun, A. Giray Yaglikçi, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu
2025POSTER: IRISX: A Dynamic Trade-off System for Performance Portability on Multi-Accelerator Platforms.
Sanil Rao, Mohammad Alaul Haque Monil, Het Mankad, Narasinga Rao Miniskar, Keita Teranishi, Jeffrey S. Vetter, Franz Franchetti
2025POSTER: PIMAP: Characterizing a Real Processing-in-Memory System for Analytical Data Processing.
Manos Frouzakis, Juan Gómez-Luna, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu
2025Poster: HeteroSched: Co-Optimizing Scheduling and Parallelization for Deep Learning Workloads.
Bahram Afsharmanesh, Md. Musfiqur Rahman Sanim, AmirAli Mirian, Gagan Agrawal
2025Poster:Value-Aware Scheduler for Energy Reduction.
Haiyue Ma, Kaifeng Xu, David Wentzlaff
2025SCREME: A Scalable Framework for Resilient Memory Design.
Fan Li, Mimi Xie, Yanan Guo, Huize Li, Xin Xin
2025SPipe: Hybrid GPU and CPU Pipeline for Training LLMs under Memory Pressure.
Junyeol Ryu, Yujin Jeong, Daeyoung Park, Jinpyo Kim, Heehoon Kim, Jaejin Lee
2025Salient Store: Enabling Smart Storage for Continuous Learning Edge Servers.
Cyan Subhra Mishra, Deeksha Chaudhary, Mahmut Taylan Kandemir, Chita R. Das
2025Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits.
Dowon Kim, Minjae Lee, Janghyeon Kim, Hyucksung Kwon, Hyeonggyu Jeong, Sang-Soo Park, Minyong Yoon, Si-Dong Roh, Yongsuk Kwon, Jinin So, Jungwook Choi
2025ScaleMoE: A Fast and Scalable Distributed Training Framework for Large-Scale Mixture-of-Experts Models.
Seohong Choi, Huize Hong, Tae Hee Han, Joonsung Kim
2025Squire: A General-Purpose Accelerator to Exploit Fine-Grain Parallelism on Dependency-Bound Kernels.
Rubén Langarita, Jesús Alastruey-Benedé, Pablo Ibáñez-Marín, Santiago Marco-Sola, Miquel Moretó, Adrià Armejach
2025TPE: XPU-Point: Simulator-Agnostic Sample Selection Methodology for Heterogeneous CPU-GPU Applications.
Alen Sabu, Harish Patil, Wim Heirman, Changxi Liu, Trevor E. Carlson