| 2022 | A data-centric optimization framework for machine learning. Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler |
| 2022 | ASAP: automatic synthesis of area-efficient and precision-aware CGRAs. Cheng Tan, Thierry Tambe, Jeff Jun Zhang, Bo Fang, Tong Geng, Gu-Yeon Wei, David Brooks, Antonino Tumeo, Ganesh Gopalakrishnan, Ang Li |
| 2022 | AnySeq/GPU: a novel approach for faster sequence alignment on GPUs. André Müller, Bertil Schmidt, Richard Membarth, Roland Leißa, Sebastian Hack |
| 2022 | Beyond time complexity: data movement complexity analysis for matrix multiplication. Wesley Smith, Aidan Goldfarb, Chen Ding |
| 2022 | Bring orders into uncertainty: enabling efficient uncertain graph processing via novel path sampling on multi-accelerator systems. Heng Zhang, Lingda Li, Hang Liu, Donglin Zhuang, Rui Liu, Chengying Huan, Shuang Song, Dingwen Tao, Yongchao Liu, Charles He, Yanjun Wu, Shuaiwen Leon Song |
| 2022 | CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression. Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li, Dingwen Tao |
| 2022 | Calipers: a criticality-aware framework for modeling processor performance. Hossein Golestani, Rathijit Sen, Vinson Young, Gagan Gupta |
| 2022 | Clairvoyant: a log-based transformer-decoder for failure prediction in large-scale systems. Khalid Ayedh Alharthi, Arshad Jhumka, Sheng Di, Franck Cappello |
| 2022 | Cloak: tolerating non-volatile cache read latency. Apostolos Kokolis, Namrata Mantri, Shrikanth Ganapathy, Josep Torrellas, John Kalamatianos |
| 2022 | Dense dynamic blocks: optimizing SpMM for processors with vector and matrix units using machine learning techniques. Serif Yesil, José E. Moreira, Josep Torrellas |
| 2022 | Dynamic memory management in massively parallel systems: a case on GPUs. Minh Pham, Hao Li, Yongke Yuan, Chengcheng Mou, Kandethody Ramachandran, Zichen Xu, Yicheng Tu |
| 2022 | Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores. Zhuoran Ji, Cho-Li Wang |
| 2022 | Efficient, out-of-memory sparse MTTKRP on massively parallel architectures. Andy Nguyen, Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Jesmin Jahan Tithi, Yongseok Soh, Teresa M. Ranadive, Fabrizio Petrini, Jee W. Choi |
| 2022 | Efficiently emulating high-bitwidth computation with low-bitwidth hardware. Zixuan Ma, Haojie Wang, Guanyu Feng, Chen Zhang, Lei Xie, Jiaao He, Shengqi Chen, Jidong Zhai |
| 2022 | Fast-track cache: a huge racetrack memory L1 data cache. Hugo Tárrega, Alejandro Valero, Vicente Lorente, Salvador Petit, Julio Sahuquillo |
| 2022 | GAPS: GPU-acceleration of PDE solvers for wave simulation. Bagus Hanindhito, Dimitrios Gourounas, Arash Fathi, Dimitar Trenev, Andreas Gerstlauer, Lizy K. John |
| 2022 | Handling heavy-tailed input of transformer inference on GPUs. Jiangsu Du, Jiazhi Jiang, Yang You, Dan Huang, Yutong Lu |
| 2022 | High throughput multidimensional tridiagonal system solvers on FPGAs. Kamalakkannan Kamalavasan, Gihan R. Mudalige, István Z. Reguly, Suhaib A. Fahmy |
| 2022 | ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28 - 30, 2022 Lawrence Rauchwerger, Kirk W. Cameron, Dimitrios S. Nikolopoulos, Dionisios N. Pnevmatikatos |
| 2022 | KrakenOnMem: a memristor-augmented HW/SW framework for taxonomic profiling. Taha Shahroodi, Mahdi Zahedi, Abhairaj Singh, Stephan Wong, Said Hamdioui |
| 2022 | LITE: a low-cost practical inter-operable GPU TEE. Ardhi Wiratama Baskara Yudha, Jake Meyer, Shougang Yuan, Huiyang Zhou, Yan Solihin |
| 2022 | Lifting C semantics for dataflow optimization. Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad, Torsten Hoefler |
| 2022 | Low overhead and context sensitive profiling of CPU-accelerated applications. Keren Zhou, Jonathon M. Anderson, Xiaozhu Meng, John M. Mellor-Crummey |
| 2022 | MASTIFF: structure-aware minimum spanning tree/forest. Mohsen Koohi Esfahani, Peter Kilpatrick, Hans Vandierendonck |
| 2022 | MegTaiChi: dynamic tensor-based memory management optimization for DNN training. Zhongzhe Hu, Junmin Xiao, Zheye Deng, Mingyi Li, Kewei Zhang, Xiaoyang Zhang, Ke Meng, Ninghui Sun, Guangming Tan |
| 2022 | Optimized MPI collective algorithms for dragonfly topology. Guangnan Feng, Dezun Dong, Yutong Lu |
| 2022 | PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences. Shulai Zhang, Weihao Cui, Quan Chen, Zhengnian Zhang, Yue Guan, Jingwen Leng, Chao Li, Minyi Guo |
| 2022 | Parallel K-clique counting on GPUs. Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu |
| 2022 | Performance-detective: automatic deduction of cheap and accurate performance models. Larissa Schmid, Marcin Copik, Alexandru Calotoiu, Dominik Werle, Andreas Reiter, Michael Selzer, Anne Koziolek, Torsten Hoefler |
| 2022 | Preparing for performance analysis at exascale. Jonathon M. Anderson, Yumeng Liu, John M. Mellor-Crummey |
| 2022 | Rethinking graph data placement for graph neural network training on multiple GPUs. Shihui Song, Peng Jiang |
| 2022 | Seamless optimization of the GEMM kernel for task-based programming models. Arthur Francisco Lorenzon, Sandro Matheus V. N. Marques, Antoni C. Navarro, Vicenç Beltran |
| 2022 | SnuHPL: high performance LINPACK for heterogeneous GPUs. Jinpyo Kim, Hyungdal Kwon, Jintaek Kang, Jihwan Park, Seungwook Lee, Jaejin Lee |
| 2022 | SnuQS: scaling quantum circuit simulation using storage devices. Daeyoung Park, Heehoon Kim, Jinpyo Kim, Taehyun Kim, Jaejin Lee |
| 2022 | Software-defined floating-point number formats and their application to graph processing. Hans Vandierendonck |
| 2022 | SparseLNR: accelerating sparse tensor computations using loop nest restructuring. Adhitha Dias, Kirshanthan Sundararajah, Charitha Saumya, Milind Kulkarni |
| 2022 | Toward accelerated stencil computation by adapting tensor core unit on GPU. Xiaoyan Liu, Yi Liu, Hailong Yang, Jianjin Liao, Mingzhen Li, Zhongzhi Luan, Depei Qian |
| 2022 | Towards low-latency I/O services for mixed workloads using ultra-low latency SSDs. Mingzhe Liu, Haikun Liu, Chencheng Ye, Xiaofei Liao, Hai Jin, Yu Zhang, Ran Zheng, Liting Hu |
| 2022 | VICO: demand-driven verification for improving compiler optimizations. Sharjeel Khan, Bodhisatwa Chatterjee, Santosh Pande |
| 2022 | uiCA: accurate throughput prediction of basic blocks on recent intel microarchitectures. Andreas Abel, Jan Reineke |