| 2021 | 30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021, Atlanta, GA, USA, September 26-29, 2021 Jaejin Lee, Albert Cohen |
| 2021 | A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers. Phitchaya Mangpo Phothilimthana, Amit Sabne, Nikhil Sarda, Karthik Srinivasa Murthy, Yanqi Zhou, Christof Angermueller, Mike Burrows, Sudip Roy, Ketan Mandke, Rezsa Farahani, Yu Emma Wang, Berkin Ilbeyi, Blake A. Hechtman, Bjarke Roune, Shen Wang, Yuanzhong Xu, Samuel J. Kaufman |
| 2021 | AIBench Scenario: Scenario-Distilling AI Benchmarking. Wanling Gao, Fei Tang, Jianfeng Zhan, Xu Wen, Lei Wang, Zheng Cao, Chuanxin Lan, Chunjie Luo, Xiaoli Liu, Zihan Jiang |
| 2021 | Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles. Sultan Durrani, Muhammad Saad Chughtai, Mert Hidayetoglu, Rashid Tahir, Abdul Dakkak, Lawrence Rauchwerger, Fareed Zaffar, Wen-mei W. Hwu |
| 2021 | CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling. Nadja Ramhöj Holtryd, Madhavan Manivannan, Per Stenström, Miquel Pericàs |
| 2021 | CoPlace: Effectively Mitigating Cache Conflicts in Modern Clouds. Xiaowei Shang, Weiwei Jia, Jianchen Shan, Xiaoning Ding |
| 2021 | Dryadic: Flexible and Fast Graph Pattern Matching at Scale. Daniel Mawhirter, Sam Reinehr, Wei Han, Noah Fields, Miles Claver, Connor Holmes, Jedidiah McClurg, Tongping Liu, Bo Wu |
| 2021 | Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks. Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu |
| 2021 | HERTI: A Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems. Myeonggyun Han, Woongki Baek |
| 2021 | InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-Aware Inner Product Processing. Daehyeon Baek, Soojin Hwang, Taekyung Heo, Daehoon Kim, Jaehyuk Huh |
| 2021 | Invalidate or Update? Revisiting Coherence for Tomorrow's Cache Hierarchies. Mingcan Zhu, Amna Shahab, Antonios Katsarakis, Boris Grot |
| 2021 | NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models. Joonsung Kim, Suyeon Hur, Eunbok Lee, Seungho Lee, Jangwoo Kim |
| 2021 | PIM-DL: Boosting DNN Inference on Digital Processing In-Memory Architectures via Data Layout Optimizations. Minxuan Zhou, Guoyang Chen, Mohsen Imani, Saransh Gupta, Weifeng Zhang, Tajana Rosing |
| 2021 | PolyGym: Polyhedral Optimizations as an Environment for Reinforcement Learning. Alexander Brauckmann, Andrés Goens, Jerónimo Castrillón |
| 2021 | Polygeist: Raising C to Polyhedral MLIR. William S. Moses, Lorenzo Chelini, Ruizhe Zhao, Oleksandr Zinenko |
| 2021 | Precision Batching: Bitserial Decomposition for Efficient Neural Network Inference on GPUs. Maximilian Lam, Zachary Yedidia, Colby R. Banbury, Vijay Janapa Reddi |
| 2021 | Program Lifting using Gray-Box Behavior. Bruce Collie, Michael F. P. O'Boyle |
| 2021 | SEER: A Time Prediction Model for CNNs from GPU Kernel's View. Guodong Liu, Sa Wang, Yungang Bao |
| 2021 | SURFNet: Super-Resolution of Turbulent Flows with Transfer Learning using Small Datasets. Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran |
| 2021 | Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs. Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, Minyi Guo |
| 2021 | SumPA: Efficient Pattern-Centric Graph Mining with Pattern Abstraction. Chuangyi Gui, Xiaofei Liao, Long Zheng, Pengcheng Yao, Qinggang Wang, Hai Jin |
| 2021 | Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory Computing. Minxuan Zhou, Lingxi Wu, Muzhou Li, Niema Moshiri, Kevin Skadron, Tajana Rosing |
| 2021 | Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators. Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna |
| 2021 | Write Prediction for Persistent Memory Systems. Suyash Mahar, Sihang Liu, Korakit Seemakhupt, Vinson Young, Samira Manabi Khan |
| 2021 | X-Layer: Building Composable Pipelined Dataflows for Low-Rank Convolutions. Naveen Vedula, Reza Hojabr, Ahmad Khonsari, Arrvindh Shriraman |
| 2021 | nuKSM: NUMA-aware Memory De-duplication on Multi-socket Servers. Akash Panda, Ashish Panwar, Arkaprava Basu |