ICPP B

122 papers

YearTitle / Authors
2024A Distributed Framework for Subgraph Isomorphism Leveraging CPU and GPU Heterogeneous Computing.
Chen Chen, Li Shen, Yingwen Chen
2024A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel Applications.
Kaveh Mahdavi
2024A Motion Trace Decomposition-based overset grid method for parallel CFD simulations with moving boundaries.
Ran Zhao, Chao Li, Xiaowei Guo, Sen Zhang, Xi Yang, Tao Tang, Canqun Yang
2024AUTOHET: An Automated Heterogeneous ReRAM-Based Accelerator for DNN Inference.
Tong Wu, Shuibing He, Jianxin Zhu, Weijian Chen, Siling Yang, Ping Chen, Yanlong Yin, Xuechen Zhang, Xian-He Sun, Gang Chen
2024Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures.
Yongseok Soh, Ramakrishnan Kannan, Piyush Sao, Jee W. Choi
2024Achieving Efficient Scheduling based on Accurate Measurement of Small Flows in Data Center.
Jiawei Huang, Qile Wang, Zhaoyi Li, Yijun Li, Zihao Chen, Sitan Li, Jing Shao, Jingling Liu, Min Zhan, Jianxin Wang
2024Achieving High Efficiency for Datacenter Multicast using Skewed Bloom Filter.
Jiawei Huang, Zihao Chen, Yiting Wang, Hui Li, Zhaoyi Li, Qile Wang, Sitan Li, Zhidong He, Wanchun Jiang
2024AdCoalescer: An Adaptive Coalescer to Reduce the Inter-Module Traffic in MCM-GPUs.
Xu Zhang, Guangda Zhang, Lu Wang, Shiqing Zhang, Xia Zhao
2024Arlo: Serving Transformer-based Language Models with Dynamic Input Lengths.
Xin Tan, Jiamin Li, Yitao Yang, Jingzong Li, Hong Xu
2024AutoPipe: Automatic Configuration of Pipeline Parallelism in Shared GPU Cluster.
Jinbin Hu, Ying Liu, Hao Wang, Jin Wang
2024BandSlim: A Novel Bandwidth and Space-Efficient KV-SSD with an Escape-from-Block Approach.
Junhyeok Park, Chang-Gyu Lee, Soon Hwang, Soonyeal Yang, Jungki Noh, Woosuk Chung, Junghee Lee, Youngjae Kim
2024Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning.
Zichen Tang, Junlin Huang, Rudan Yan, Yuxin Wang, Zhenheng Tang, Shaohuai Shi, Amelie Chi Zhou, Xiaowen Chu
2024Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor Cores.
Yuang Chen, Jeffrey Xu Yu
2024BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core System.
Haopeng Huang, Yuyang Jin, Wei Xue
2024BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs.
Mahesh Lakshminarasimhan, Mary W. Hall, Samuel Williams, Oscar Antepara
2024CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPU.
Jihu Guo, Rui Xia, Jie Liu, Xiaoxiong Zhu, Xiang Zhang
2024CIM-KF: Efficient Computing-in-memory Circuits for Full-Process Execution of Kalman Filter Algorithm.
Pingdan Xiao, Qinghui Hong, Sichun Du, Jiliang Zhang
2024CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPU.
Shinnung Jeong, Sungjun Cho, Yongwoo Lee, Hyunjun Park, Seonyeong Heo, Gwangsun Kim, Youngsok Kim, Hanjun Kim
2024Cache Line Pinning for Mitigating Row Hammer Attack.
Praseetha M, Madhu Mutyam, Venkata Kalyan Tavva
2024ChronusFed: Reinforcement-Based Adaptive Partial Training for Heterogeneous Federated Learning.
Fuyuan Xia, Chenhao Ying, David S. L. Wei, Wei Chen, Weiting Zhang, Haiming Jin, Yuan Luo
2024Co-Design of Convolutional Algorithms and Long Vector RISC-V Processors for Efficient CNN Model Serving.
Sonia Rani Gupta, Nikela Papadopoulou, Jing Chen, Miquel Pericàs
2024Coupling Congestion Control and Flow Pausing in Data Center Network.
Jiawei Huang, Shengwen Zhou, Zhaoyi Li, Yijun Li, Zihao Chen, Xiaojun Zhu, Jing Shao, Sitan Li, Wanchun Jiang, Jianxin Wang, Ping Zhong
2024DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT Calculations.
Zhong Zheng, Junshi Chen, Yang Zhao, Longsheng Song, Xinming Qin, Hong An
2024DPC: DPU-accelerated High-Performance File System Client.
Kan Zhong, Zhiwang Yu, Qiao Li, Xianqiang Luo, Linbo Long, Yujuan Tan, Ao Ren, Duo Liu
2024DeInfer: A GPU resource allocation algorithm with spatial sharing for near-deterministic inferring tasks.
Yingwen Chen, Wenxin Li, Huan Zhou, Xiangrui Yang, Yanfei Yin
2024Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File Accesses.
Guantian Lin, Si Wu, Cheng Li, Yinlong Xu
2024Detailed Analysis and Optimization of Irregular-Shaped Matrix Multiplication on Multi-Core DSPs.
Haotian Mo, Qinglin Wang, Linyu Liao, Biao Li, Lihua Chi, Jie Liu
2024DiStore: A Fully Memory Disaggregation Friendly Key-Value Store with Improved Tail Latency and Space Efficiency.
Ziwei Xiong, Dejun Jiang, Jin Xiong
2024Diminishing cold starts in serverless computing with approximation algorithms.
Tomasz Kanas, Krzysztof Rzadca
2024Dissecting Convolutional Neural Networks for Runtime and Scalability Prediction.
Tim Beringer, Jakob Stock, Arya Mazaheri, Felix Wolf
2024Distributed Minimax Fair Optimization over Hierarchical Networks.
Wen Xu, Juncheng Wang, Ben Liang, Gary Boudreau, Hamza Umit Sokun
2024Enabling Performance Observability for Heterogeneous HPC Workflows with SOMA.
Dewi Yokelson, Mikhail Titov, Srinivasan Ramesh, Ozgur O. Kilic, Matteo Turilli, Shantenu Jha, Allen D. Malony
2024Enhancing Heterogeneous Computing Through OpenMP and GPU Graph.
Chenle Yu, Sara Royuela, Eduardo Quiñones
2024Evaluating and optimising compiler code generation for NVIDIA Grace.
Ricardo Jesus, Michèle Weiland
2024Exploring Scalability in C++ Parallel STL Implementations.
Ruben Laso, Diego Krupitza, Sascha Hunold
2024Extending Segment Tree for Polygon Clipping and Parallelizing using OpenMP and OpenACC Directives.
Buddhi Ashan Mallika Kankanamalage, Satish Puri, Sushil K. Prasad
2024FNCC: Fast Notification Congestion Control in Data Center Networks.
Jing Xu, Zhan Wang, Fan Yang, Ning Kang, Zhenlong Ma, Guojun Yuan, Guangming Tan, Ninghui Sun
2024FP16 Acceleration in Structured Multigrid Preconditioner for Real-World Applications.
Yi Zong, Peinan Yu, Haopeng Huang, Wei Xue
2024Fast Leiden Algorithm for Community Detection in Shared Memory Setting.
Subhajit Sahu, Kishore Kothapalli, Dip Sankar Banerjee
2024FedCA: Efficient Federated Learning with Client Autonomy.
Na Lv, Zhi Shen, Chen Chen, Zhifeng Jiang, Jiayi Zhang, Quan Chen, Minyi Guo
2024FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering.
Md Sirajul Islam, Simin Javaherian, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng
2024Federated Edge Learning with Blurred or Pseudo Data Sharing.
Yinlong Li, Hao Zhang, Siyao Cheng, Jie Liu
2024FlatDD: A High-Performance Quantum Circuit Simulator using Decision Diagram and Flat Array.
Shui Jiang, Rongliang Fu, Lukas Burgholzer, Robert Wille, Tsung-Yi Ho, Tsung-Wei Huang
2024FlexSP: (1 + β)-Choice based Flexible Stream Partitioning for Stateful Operators.
Siyuan Chen, Decheng Zuo, Zhan Zhang
2024FreeStencil: A Fine-Grained Solver Compiler with Graph and Kernel Optimizations on Structured Meshes for Modern GPUs.
Qianchao Zhu
2024GMM: An Efficient GPU Memory Management-based Model Serving System for Multiple DNN Inference Models.
XinYu Piao, Jong-Kook Kim
2024GNNDrive: Reducing Memory Contention and I/O Congestion for Disk-based GNN Training.
Qisheng Jiang, Lei Jia, Chundong Wang
2024GPU Algorithms for Fastest Path Problem in Temporal Graphs.
Mithinti Srikanth, Prashant Singh, G. Ramakrishna
2024GSAP: A GPU-Accelerated Stochastic Graph Partitioner.
Chih-Chun Chang, Boyang Zhang, Tsung-Wei Huang
2024Gradient Free Personalized Federated Learning.
Haoyu Chen, Yuxin Zhang, Jin Zhao, Xin Wang, Yuedong Xu
2024HASFL: Harnessing Heterogeneous Models Across Diverse Devices for Enhanced Federated Learning.
Jiangshan Hao, Fang Dong, Bingheng Cen, Shucun Fu, Ruiting Zhou, Ding Ding
2024HMT: A Hybrid Mitigating and Transferring Approach on I/O Throughput Degradation for Erasure Coded Storage Systems.
Piao Hu, Huangzhen Xue, Chentao Wu, Jie Li, Minyi Guo
2024HStream: A hierarchical data streaming engine for high-throughput scientific applications.
Jaime Cernuda, Jie Ye, Anthony Kougkas, Xian-He Sun
2024Hardware Acceleration of Minimap2 Genomic Sequence Alignment Algorithm.
Jie Cheng, Lifu Hu, Wei Xu, Hanhua Chen, Tian Xia
2024Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper.
Gabin Schieffer, Jacob Wahlgren, Jie Ren, Jennifer Faj, Ivy Peng
2024Hi-ZNS: High Space Efficiency and Zero-Copy LSM-Tree Based Stores on ZNS SSDs.
Renping Liu, Junhua Chen, Peng Chen, Linbo Long, Anping Xiong, Duo Liu
2024High-Performance 3D convolution on the Latest Generation Sunway Processor.
Jialin Li, Zhichen Feng, Yaqian Gao, Shaobo Tian, Haoyuan Zhang, Huang Ye, Jian Zhang
2024High-Performance Sorting-Based K-mer Counting in Distributed Memory with Flexible Hybrid Parallelism.
Yifan Li, Giulia Guidi
2024High-Performance, Accurate Large-Scale Quantum Chemistry Calculations on GPU Supercomputers using Coulomb-Perturbed Fragmentation.
Fazeleh S. Kazemian, Jorge L. Galvez Vallejo, Giuseppe M. J. Barca
2024Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment.
Fei Yang, Shuang Peng, Ning Sun, Fangyu Wang, Yuanyuan Wang, Fu Wu, Jiezhong Qiu, Aimin Pan
2024HyperDB: a Novel Key Value Store for Reducing Background Traffic in Heterogeneous SSD Storage.
Ruisong Zhou, Yuzhan Zhang, Chunhua Li, Ke Zhou, Peng Wang, Gong Zhang, Ji Zhang, Guangyu Zhang
2024IMI: In-memory Multi-job Inference Acceleration for Large Language Models.
Bin Gao, Zhehui Wang, Zhuomin He, Tao Luo, Weng-Fai Wong, Zhi Zhou
2024Im2col-Winograd: An Efficient and Flexible Fused-Winograd Convolution for NHWC Format on GPUs.
Zhiyi Zhang, Pengfei Zhang, Zhuopin Xu, Bingjie Yan, Qi Wang
2024Improving Performance on Replica-Exchange Molecular Dynamics Simulations by Optimizing GPU Core Utilization.
Taisuke Boku, Masatake Sugita, Ryohei Kobayashi, Shinnosuke Furuya, Takuya Fujie, Masahito Ohue, Yutaka Akiyama
2024Improving efficiency of Monte Carlo method via code intrinsic framework.
Qifeng Pan, Ralf Schneider
2024In-Situ Binary Segmentation of 3D time-dependent Flows into Laminar and Turbulent Regions.
Jiahui Liu, Tobias Edwards, Kristina Durovic, Philipp Schlatter, Tino Weinkauf
2024Jigsaw: Accelerating SpMM with Vector Sparsity on Sparse Tensor Core.
Kaige Zhang, Xiaoyan Liu, Hailong Yang, Tianyu Feng, Xinyu Yang, Yi Liu, Zhongzhi Luan, Depei Qian
2024Kanva: A Lock-free Learned Search Data Structure.
Gaurav Bhardwaj, Bapi Chatterjee, Abhinav Sharma, Sathya Peri, Siddharth Nayak
2024Large-scale Phase-Field Simulations for Solid-Solid Phase Transformations involving Elastic Energy.
Yaqian Gao, Jian Zhang, Huang Ye, Xuebin Chi
2024LpaqHP: A High-Performance FPGA Accelerator for LPAQ Compression.
Weilin Zhu, Wei Tong, Hujun Ge, Zuoxian Zhang, Mengran Zhang, Wen Zhou
2024MIGER: Integrating Multi-Instance GPU and Multi-Process Service for Deep Learning Clusters.
Bowen Zhang, Shuxin Li, Zhuozhao Li
2024Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms✱.
Svetlana Kulagina, Henning Meyerhenke, Anne Benoit
2024Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUs.
André Weißenberger, Bertil Schmidt
2024Multi-level Load Balancing Strategies for Massively Parallel Smoothed Particle Hydrodynamics Simulation.
Yi Zhang, Ziyu Zhang, Yang Zhao, Junshi Chen, Hong An, Zhanming Wang, Longkui Chen
2024Murmuration: On-the-fly DNN Adaptation for SLO-Aware Distributed Inference in Dynamic Edge Environments.
Jieyu Lin, Minghao Li, Sai Qian Zhang, Alberto Leon-Garcia
2024Nebula: An Edge-Cloud Collaborative Learning Framework for Dynamic Edge Environments.
Yan Zhuang, Zhenzhe Zheng, Yunfeng Shao, Bingshuai Li, Fan Wu, Guihai Chen
2024NetSmith: An Optimization Framework for Machine-Discovered Network Topologies.
Conor James Green, Mithuna Thottethodi
2024OP-PIC - an Unstructured-Mesh Particle-in-Cell DSL for Developing Nuclear Fusion Simulations.
Zaman Lantra, Steven A. Wright, Gihan R. Mudalige
2024Online Non-preemptive Multi-Resource Scheduling for Weighted Completion Time on Multiple Machines.
Donney Fan, Ben Liang
2024Online Scheduling and Pricing for Multi-LoRA Fine-Tuning Tasks.
Ying Zheng, Lei Jiao, Han Yang, Lulu Chen, Ying Liu, Yuxiao Wang, Yuedong Xu, Xin Wang, Zongpeng Li
2024Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and Vectorization.
Deshun Bi, Shengguo Li, Dezun Dong, Peng Zhang, Jianbin Fang
2024Optimizing Stencil Computation on Multi-core DSPs.
Fugeng Zhu, Xinxin Qi, Peng Zhang, Jianbin Fang, Tao Tang, Yonggang Che, Kainan Yu, Jing Xie, Chun Huang, Jie Ren
2024Optimizing a Super-Fast Eigensolver for Hierarchically Semiseparable Matrices.
Abhishek V. N. Taraka Josyula, Pritesh Verma, Amar Gaonkar, Amlan Barua, Nikhil Hegde
2024PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU.
Piyush Sao, Andrey Prokopenko, Damien Lebrun-Grandié
2024PASCI : A Scalable Framework for Heterogeneous Parallel Calculation of Dynamical Electron Correlation.
Runfeng Jin, Wenhao Liang, Haoyuan Zhang, Yinxuan Song, Zhen Luo, Haibo Ma, Yingjin Ma, Zhong Jin
2024PREACT: Predictive Resource Allocation for Bursty Workloads in a Co-located Data Center.
Dingyu Yang, Ziyang Xiao, Dongxiang Zhang, Shuhao Zhang, Jian Cao, Gang Chen
2024PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis.
Siyu Wu, Hailong Yang, Xin You, Ruihao Gong, Yi Liu, Zhongzhi Luan, Depei Qian
2024Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systems.
Wojciech Kwedlo
2024Parallel Optimization for Accelerating the Generation of Correctly Rounded Elementary Functions.
Xianglin Wang, Xin Yi, Hengbiao Yu, Chun Huang, Lin Peng
2024Parallel Task Scheduling in Autonomous Robotic Systems: An Event-Driven Multimodal Prediction Approach.
Wen Gao, Zhiwen Yu, Hui Xiong, Bin Guo, Liang Wang, Yuan Yao
2024Parallelization of the Banded Needleman & Wunsch Algorithm on UPMEM PiM Architecture for Long DNA Sequence Alignment.
Meven Mognol, Dominique Lavenier, Julien Legriel
2024PheCon: Fine-Grained VM Consolidation with Nimble Resource Defragmentation in Public Cloud Platforms.
Jiazhen Zhu, Wenda Tang, Xianglong Meng, Nan Gong, Tianxiang Ai, Guanghui Li, Bin Yu, Xin Yang
2024Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-tuning.
Bei Ouyang, Shengyuan Ye, Liekang Zeng, Tianyi Qian, Jingyi Li, Xu Chen
2024Proceedings of the 53rd International Conference on Parallel Processing, ICPP 2024, Gotland, Sweden, August 12-15, 2024
2024RIA: Return on Investment Auto-scaler for Serverless Edge Functions.
Huadong Li, Hui Liu, Aoqi Chen, Xirui Ma, Qiaoqiao Liu, Junzhao Du
2024RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access Applications.
Simon Schwitanski, Yussur Mustafa Oraji, Cornelius Pätzold, Joachim Jenke, Felix Tomski, Matthias S. Müller
2024ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNNs.
Mohammad Sabri Abrebekoh, Marc Riera Villanueva, Antonio González
2024Rethinking Low-Carbon Edge Computing System Design with Renewable Energy Sharing.
Hanlong Liao, Guoming Tang, Deke Guo, Yi Wang, Ruide Cao
2024Rethinking Personalized Federated Learning from Knowledge Perspective.
Dezhong Yao, Ziquan Zhu, Tongtong Liu, Zhiqiang Xu, Hai Jin
2024Revisiting Learned Index with Byte-addressable Persistent Storage.
Rui Zhang, Yukai Huang, Sicheng Liang, Shangyi Sun, Shaonan Ma, Chengying Huan, Lulu Chen, Zhihui Lu, Yang Xu, Ming Yan, Jie Wu
2024RoDMap: A Reserve-on-Demand Mapper for Spatially-Configured Coarse-Grained Reconfigurable Arrays.
Kyle Zhao Bin Chen, Tarek S. Abdelrahman, Reza Azimi, Tomasz S. Czajkowski, Maziar Goudarzi
2024SIndex: An SSD-based Large-scale Indexing with Deterministic Latency for Cloud Block Storage.
Shucheng Wang, Kaiye Zhou, Zhandong Guo, Qiang Cao, Jun Xu, Jie Yao
2024SPHINX: Search Space-Pruning Heterogeneous Task Scheduling for Deep Neural Networks.
Bowen Yuchi, Heng Shi, Guoqing Bao
2024SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core Processors.
Chuhe Hong, Qinglin Wang, Runzhang Mao, Yuechao Liang, Rui Xia, Jie Liu
2024Scheduling Machine Learning Compressible Inference Tasks with Limited Energy Budget.
Tiago Da Silva Barros, Davide Ferré, Frédéric Giroire, Ramon Aparicio-Pardo, Stephane Perennes
2024Scratchpad Memory Management for Deep Learning Accelerators.
Stavroula Zouzoula, Mohammad Ali Maleki, Muhammad Waqar Azhar, Pedro Trancoso
2024Selective Memory Compression for GPU Memory Oversubscription Management.
Abdun Nihaal, Madhu Mutyam
2024Significantly Improving Fixed-Ratio Compression Framework for Resource-limited Applications.
Tri Nguyen, Md Hasanur Rahman, Sheng Di, Michela Becchi
2024Sparse Gradient Communication with AlltoAll for Accelerating Distributed Deep Learning.
Jing Peng, Zihan Li, Shaohuai Shi, Bo Li
2024Sparsity-Aware Communication for Distributed Graph Neural Network Training.
Ujjaini Mukhopadhyay, Alok Tripathy, Oguz Selvitopi, Katherine A. Yelick, Aydin Buluç
2024SpeedCore: Space-efficient and Dependency-aware GPU Parallel Framework for Core Decomposition.
Chen Zhao, Ting Yu, Zhigao Zheng, Yuanyuan Zhu, Song Jin, Bo Du, Dacheng Tao
2024SuperCSR: A Space-Time-Efficient CSR Representation for Large-scale Graph Applications on Supercomputers.
Xinbiao Gan, Tiejun Li, Qiang Zhang, Bo Yang, Xinhai Chen, Jie Liu
2024SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales.
Jiajian Zhang, Fangyu Wu, Hai Jiang, Guangliang Cheng, Genlang Chen, Qiufeng Wang
2024TESLA: Thermally Safe, Load-Aware, and Energy-Efficient Cooling Control System for Data Centers.
Hanfei Geng, Yi Sun, Yuanzhe Li, Jichao Leng, Xiangyu Zhu, Xianyuan Zhan, Yuanchun Li, Feng Zhao, Yunxin Liu
2024TeMCO: Tensor Memory Compiler Optimization across Tensor Decompositions in Deep Learning Inference.
Seungbin Song, Ju Min Lee, Haeeun Jeong, Hyunho Kwon, Shinnung Jeong, Jaeho Lee, Hanjun Kim
2024Thawbringer: An Orchestrator to Mitigate Cascading Cold Starts of Serverless Function Chains.
Huadong Li, Hui Liu, Aoqi Chen, Xirui Ma, Junzhao Du
2024The Blind and the Elephant: A Preference-aware Edge Video Analytics Scheduler for Maximizing System Benefit.
Liang Zhang, Hongzi Zhu, Yunzhe Li, Jiangang Shen, Minyi Guo
2024The Case for Co-Designing Model Architectures with Hardware.
Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda
2024Viper: A High-Performance I/O Framework for Transparently Updating, Storing, and Transferring Deep Neural Network Models.
Jie Ye, Jaime Cernuda, Neeraj Rajesh, Keith Bateman, Orcun Yildiz, Tom Peterka, Arnur Nigmetov, Dmitriy Morozov, Xian-He Sun, Anthony Kougkas, Bogdan Nicolae
2024VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing.
Jaebeom Jeon, Minseong Gil, Junsu Kim, Jaeyong Park, Gunjae Koo, Myung Kuk Yoon, Yunho Oh
2024Yggdrasil: Reducing Network I/O Tax with (CXL-Based) Distributed Shared Memory.
Wenda Tang, Ying Han, Tianxiang Ai, Guanghui Li, Bin Yu, Xin Yang
2024zQoS: Unleashing full performance capabilities of NVMe SSDs while enforcing SLOs in distributed storage systems.
Liuying Ma, Zhenqing Liu, Jin Xiong, Yue Wu, Renhai Chen, Xi Peng, Ying Zhang, Gong Zhang, Dejun Jiang