| 2021 | A performance portability framework for Python. Nader Al Awar, Steven Zhu, George Biros, Milos Gligoric |
| 2021 | A practical tile size selection model for affine loop nests. Kumudha Narasimhan, Aravind Acharya, Abhinav Baid, Uday Bondhugula |
| 2021 | A systematic approach to improving data locality across Fourier transforms and linear algebra operations. Doru-Thom Popovici, Andrew Canning, Zhengji Zhao, Lin-Wang Wang, John Shalf |
| 2021 | ALTO: adaptive linearized storage of sparse tensors. Ahmed E. Helal, Jan Laukemann, Fabio Checconi, Jesmin Jahan Tithi, Teresa M. Ranadive, Fabrizio Petrini, Jeewhan Choi |
| 2021 | AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator. Siling Yang, Weijian Chen, Xuechen Zhang, Shuibing He, Yanlong Yin, Xian-He Sun |
| 2021 | Accelerating DNNs inference with predictive layer fusion. MohammadHossein Olyaiy, Christopher Ng, Mieszko Lis |
| 2021 | An optimized tensor completion library for multiple GPUs. Ming Dun, Yunchun Li, Hailong Yang, Qingxiao Sun, Zhongzhi Luan, Depei Qian |
| 2021 | Athena: high-performance sparse tensor contraction sequence on heterogeneous memory. Jiawen Liu, Dong Li, Roberto Gioiosa, Jiajia Li |
| 2021 | ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning. Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao |
| 2021 | DSGEN: concolic testing GPU implementations of concurrent dynamic data structures. Xiaofan Sun, Rajiv Gupta |
| 2021 | Delay sensitivity-driven congestion mitigation for HPC systems. Archit Patke, Saurabh Jha, Haoran Qiu, Jim M. Brandt, Ann C. Gentile, Joe Greenseid, Zbigniew Kalbarczyk, Ravishankar K. Iyer |
| 2021 | Distributed merge forest: a new fast and scalable approach for topological analysis at scale. Xuan Huang, Pavol Klacansky, Steve Petruzza, Attila Gyulassy, Peer-Timo Bremer, Valerio Pascucci |
| 2021 | Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication. Oguz Selvitopi, Benjamin Brock, Israt Nisa, Alok Tripathy, Katherine A. Yelick, Aydin Buluç |
| 2021 | Does it matter?: OMPSanitizer: an impact analyzer of reported data races in OpenMP programs. Wenwen Wang, Pei-Hung Lin |
| 2021 | Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators. Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang, Dong Li |
| 2021 | FT-BLAS: a high performance BLAS implementation with online fault tolerance. Yujia Zhai, Elisabeth Giem, Quan Fan, Kai Zhao, Jinyang Liu, Zizhong Chen |
| 2021 | FULL-W2V: fully exploiting data reuse for W2V on GPU-accelerated systems. Thomas Randall, Tyler N. Allen, Rong Ge |
| 2021 | HyQuas: hybrid partitioner based quantum circuit simulation system on GPU. Chen Zhang, Zeyu Song, Haojie Wang, Kaiyuan Rong, Jidong Zhai |
| 2021 | ICS '21: 2021 International Conference on Supercomputing, Virtual Event, USA, June 14-17, 2021. Huiyang Zhou, Jose Moreira, Frank Mueller, Yoav Etsion |
| 2021 | Inter-loop optimization in RAJA using loop chains. Brandon Neth, Thomas R. W. Scogland, Bronis R. de Supinski, Michelle Mills Strout |
| 2021 | MD-HM: memoization-based molecular dynamics simulations on big memory system. Zhen Xie, Wenqian Dong, Jie Liu, Ivy Bo Peng, Yanbao Ma, Dong Li |
| 2021 | NPBench: a benchmarking suite for high-performance NumPy. Alexandros Nikolaos Ziogas, Tal Ben-Nun, Timo Schneider, Torsten Hoefler |
| 2021 | NumaPerf: predictive NUMA profiling. Xin Zhao, Jin Zhou, Hui Guan, Wei Wang, Xu Liu, Tongping Liu |
| 2021 | Omegaflow: a high-performance dependency-based architecture. Yaoyang Zhou, Zihao Yu, Chuanqi Zhang, Yinan Xu, Huizhe Wang, Sa Wang, Ninghui Sun, Yungang Bao |
| 2021 | On the automatic parallelization of subscripted subscript patterns using array property analysis. Akshay Bhosale, Rudolf Eigenmann |
| 2021 | Optimizing large-scale plasma simulations on persistent memory-based heterogeneous memory with effective data placement across memory hierarchy. Jie Ren, Jiaolin Luo, Ivy Bo Peng, Kai Wu, Dong Li |
| 2021 | PLANAR: a programmable accelerator for near-memory data rearrangement. Adrián Barredo, Adrià Armejach, Jonathan C. Beard, Miquel Moretó |
| 2021 | PSSM: achieving secure memory for GPUs with partitioned and sectored security metadata. Shougang Yuan, Yan Solihin, Huiyang Zhou |
| 2021 | Partitioning sparse deep neural networks for scalable training and inference. Gunduz Vehbi Demirci, Hakan Ferhatosmanoglu |
| 2021 | Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizations. Peng Chen, Mohamed Wahib, Xiao Wang, Shin'ichiro Takizawa, Takahiro Hirofuchi, Hirotaka Ogawa, Satoshi Matsuoka |
| 2021 | Power and energy efficient routing for Mach-Zehnder interferometer based photonic switches. Markos Kynigos, Jose Antonio Pascual, Javier Navaridas, John Goodacre, Mikel Luján |
| 2021 | ProMT: optimizing integrity tree updates for write-intensive pages in secure NVMs. Mazen Al-Wadi, Aziz Mohaisen, Amro Awad |
| 2021 | Proxima: accelerating the integration of machine learning in atomistic simulations. Yuliana Zamora, Logan T. Ward, Ganesh Sivaraman, Ian T. Foster, Henry Hoffmann |
| 2021 | Sandslash: a two-level framework for efficient graph pattern mining. Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Loc Hoang, Keshav Pingali |
| 2021 | SumMerge: an efficient algorithm and implementation for weight repetition-aware DNN inference. Rohan Baskar Prabhakar, Sachit Kuhar, Rohit Agrawal, Christopher J. Hughes, Christopher W. Fletcher |
| 2021 | Task-graph scheduling extensions for efficient synchronization and communication. Seonmyeong Bak, Oscar R. Hernandez, Mark Gates, Piotr Luszczek, Vivek Sarkar |
| 2021 | ThundeRiNG: generating multiple independent random number sequences on FPGAs. Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, Weng-Fai Wong |
| 2021 | Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation. Khaled Abdelaal, Martin Kong |
| 2021 | Topology-aware optimizations for multi-GPU ptychographic image reconstruction. Xiaodong Yu, Tekin Bicer, Rajkumar Kettimuthu, Ian T. Foster |
| 2021 | μSteal: a theory-backed framework for preemptive work and resource stealing in mixed-criticality microservices. Amirhossein Mirhosseini, Thomas F. Wenisch |