| 2025 | 3D Scene Graph Generation with Cross-Modal Alignment and Adversarial Learning. Yujun Hu, Xiaoyu Zhou, Changbo Wang, Weiliang Meng, Gaoqi He |
| 2025 | A Coarse-to-Fine Matching Method for Reference-based Image Deraining. Fei Yuan, Xin Wen, Ang Zhao, Wenbo Ning, Chenchen Zhang, Rui Cao |
| 2025 | A Cooperative Safety-Enhanced Control Framework for Driving Assistance in the Internet of Vehicles. Chen Huang, Yan Zhang, Chao Yang, Zhifei Li, Kui Xiao, Miao Zhang, Wenxin Huang, Cheng Zeng, Hao Chen, Jianhua Song, Shihui Wang, Xian Zhong, Haobo Ma |
| 2025 | A Dual Coupled Feature Pyramid for Traditional Paintings Inpainting with Multi-level Semantic Filtering. Biao Yang, Zihan Chen, Yi Zhang |
| 2025 | A Frequency-Based Approach for Federated Domain Generalization in Heterogeneous Medical Imaging. Donghao Wang, Yingchun Cui, Zhengda Wu, Heran Xi, Jinghua Zhu |
| 2025 | A Generic Framework for Evaluating Gaze Representations for Gaze Estimation. Xinyu Lin, Buyu Liu, Suguo Zhu, Jun Bao |
| 2025 | A Multi-Stream Visual-Spectral-Spatial Adaptive Hyperspectral Object Tracking. Pengfei Wei, Qiao Liu, Zhenyu He, Di Yuan |
| 2025 | A Prior Representation-Guided Method for Low-Resolution Human Pose Estimation. Mengting Jiang, Xiaoqi An, Yang Gao, Yalong Xu, Di Wang, Lin Zhao |
| 2025 | A RAG Approach for Multi-Modal Open-ended Lifelog Question-Answering. Quang-Linh Tran, Ngo Ngoc Diep Pham, Quoc Trung Truong, Minh Hung Nguyen, Hong Cat Le, Dang Khoi Vu, Van Minh Thien Nguyen, Van Kinh Nguyen, Luu Phuong Ngoc Lam Nguyen, Tan Le, Minh Phuc Dang, Binh T. Nguyen, Gareth J. F. Jones, Cathal Gurrin |
| 2025 | A Transformer-Based Multimodal Framework for Hidden Emotion Recognition through Micro-Expression and EEG Fusion. Chuang Ma, Shaokai Zhao, Yu Pei, Liang Xie, Erwei Yin |
| 2025 | A Video Frame Interpolation Framework Based on Channel and Token Mixing. Jianchao Wang, Yongqiang Gao |
| 2025 | AGGA-MVFLN: Multivariate Time Series Forecasting via Adaptive Generalized Graph Accompanied with Multi-View Learning in Frequency Domain. Jierui Lei, Fangzheng Chen, Haina Tang |
| 2025 | AI Got Your Tongue? Analysing the Sounds of Audio Deepfake Generation Methods. Karla Schäfer |
| 2025 | ALVG: Training High-Quality Multi-modal Fusion Modules for Visual Grounding with Attention Loss. Sicheng Yang, Rongwei Yu |
| 2025 | Adaptive Agent Semantic Aggregation Network for Multimodal Sentiment Analysis. Yue Su, Xuying Zhao |
| 2025 | Adaptive Asymmetric Online Hashing for Cross-Modal Retrieval. Yuhao Liu, Yanbo Zhang, Hao Fu, Guanghua Gu |
| 2025 | Adaptive Hypergraph-Based 3D Multi-Person Pose Estimation Method for Intangible Cultural Heritage Dance Videos. Xingquan Cai, Xiaoyu Wang, Kaijie Qu, Mengrui Dai, Ying Li |
| 2025 | Adaptive Social Bot Detection through Bridging the Feature Bias Between Source and Target Users. Hao Sun, Huailiang Peng, Yanan Cao, Qiong Dai, Xu Bai |
| 2025 | Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion. Huiyan Qi, Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Ee-Peng Lim |
| 2025 | Adversarial Masked Graph Autoencoders for Improved Graph Representation Learning. Yulan Hu, Zhirui Yang, Sheng Ouyang, Yong Liu |
| 2025 | AgentStory: A Multi-Agent System for Story Visualization with Multi-Subject Consistent Text-to-Image Generation. Tianchen Zhou, Zhongjie Duan, Cen Chen, Wenmeng Zhou, Yanhao Wang, Yaliang Li |
| 2025 | Aligning Large Multimodal Model with Sequential Recommendation via Content-Behavior Guidance. Zihao Wu, Xin Wang, Heng Chang, Hong Chen, Lifeng Sun, Wenwu Zhu |
| 2025 | Alternating Guided Training for Robust Adversarial Defense. Xinlei Liu, Chunlai Ma, Bo Chen, Tao Hu, Hailong Ma, Peng Yi, Yiming Jiang, Yuxiang Hu |
| 2025 | An Explainable Machine Learning Approach for Cognitive Load Detection in Virtual Reality Using Eye Tracking Data. Hong Gao, Yapeng Gao, Enkelejda Kasneci |
| 2025 | AnchorTalk: High-Fidelity Upper-Body Talking Human Generation From Speech. Yali Cai, Peng Qiao, Dongsheng Li |
| 2025 | ArtNVG: Content-Style Separated Artistic Neighboring-View Gaussian Stylization. Zixiao Gu, Zhenye Zhang, Mengtian Li, Zhongxia Ji, Ruhua Chen, Zuo Hu, Guangnan Ye |
| 2025 | Assisted Refinement Network Based on Channel Information Interaction for Camouflaged Object Detection. Kuan Wang, Xiuhong Li, Yulong Bai, Songlin Li, Mengge Lu, Zhenhong Jia |
| 2025 | Attentive Multi-Kernel Feature Aggregation Network for Cross-View Geo-Localization. Shuheng Huang, Deyong Wu, Jinliang Lin, Lei Peng, Zhiming Luo |
| 2025 | Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning. Yifan Xie, Fei Ma, Yi Bin, Ying He, Fei Yu |
| 2025 | Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos. Riku Takahashi, Ryugo Morita, Jinjia Zhou |
| 2025 | BMUNet: When Pixel-Wise Precision Meets Global Context Dependency. Sizhe Yang, Yutao Qin, Wei Ren |
| 2025 | BRepFormer: Transformer-Based B-rep Geometric Feature Recognition. Yongkang Dai, Xiaoshui Huang, Yunpeng Bai, Hao Guo, Hongping Gan, Ling Yang, Yilei Shi |
| 2025 | BlockIQA: Local Sensitivity-Enhanced Blind Image Quality Assessment through Deep Block Analysis. Yuqi Pang, Yican Liu, Zhiqi Lin, Delu Zeng |
| 2025 | Bottom-Up and Top-Down Thoughts for Visual Intention Grounding. Kangcheng Liu, Junbin Xiao, Rui Zhang, Hanqi Lv, Zidong Du |
| 2025 | Bridging the Gap Between Semantic and User Preference Spaces for Multi-modal Music Representation Learning. Xiaofeng Pan, Jing Chen, Haitong Zhang, Menglin Xing, Jiayi Wei, Xuefeng Mu, Zhongqian Xie |
| 2025 | CABRec: A Category-Aware Bundle Recommendation Model. Mengmeng Li, Jinlong Tian, Hongmei Li, Qiyuan Zhang, Xianglong Li, Xinhai Xu |
| 2025 | CCDCNet: Cross-Modal Change Detection CNN for Flood Mapping. Yacheng Li, Juan Luo, Kexuan Feng, Shuyang Teng, Ying Qiao |
| 2025 | CEFSW'25: The 2nd Collaboration and Evolution of Foundation and Specialized Models Workshop. Shengyu Zhang, Fan Yao, Yujie Lu, Chaoyue Niu, Hongxia Yang, Fan Wu, Fei Wu |
| 2025 | CFSynthesis: Controllable and Free-view 3D Human Video Synthesis. Liyuan Cui, Xiaogang Xu, Wenqi Dong, Zesong Yang, Hujun Bao, Zhaopeng Cui |
| 2025 | CMAD-UNet: UNet-Driven RGB-D Salient Object Detection with Cross-Modal Consistency and Aggregative Decoding. Qi Xu, Zhaozhao Su, Zhaoru Guo, Yongming Li, Liejun Wang, Panpan Zheng |
| 2025 | CMS-YOLO for Small-Scale UAV Detection. Jun Yan Zhu, Bo Wen Yang, Yizhe Luo, Haoran Zhang, Shuo Feng, Zhao Jin, Yucheng Shi, Ming Liang Xu |
| 2025 | COMAE: COMprehensive Attribute Exploration for Zero-shot Hashing. Yuqi Li, Qingqing Long, Yihang Zhou, Ran Zhang, Zhiyuan Ning, Zhihong Zhu, Yuanchun Zhou, Xuezhi Wang, Meng Xiao |
| 2025 | ClearView: A Quality-aware Cross-modal Alignment Framework for CT Report Generation. Qingyong Su, Chong Feng, Bo Wang, Ge Shi, Yan Zhuang |
| 2025 | ClothHMR: 3D Mesh Recovery of Humans in Diverse Clothing from Single Image. Yunqi Gao, Leyuan Liu, Yuhan Li, Changxin Gao, Yuanyuan Liu, Jingying Chen |
| 2025 | CoATA: Effective Co-Augmentation of Topology and Attribute for Graph Neural Networks. Tao Liu, Longlong Lin, Yunfeng Yu, Xi Ou, Youan Zhang, Zhiqiu Ye, Tao Jia |
| 2025 | Collaborative Cross-Complementary Unfolding Network for Pan-sharpening Remote Sensing Image. Honghui Xu, Yan Li, Yutao Jia, Chuangjie Fang, Wanjun Chen, Jianwei Zheng |
| 2025 | Composed Query-Based Event Retrieval in Video Corpus with Multimodal Episodic Perceptron. Fan Ni, Xun Jiang, Hao Yang, Chong Peng, Peng Yan, Zheng Wang, Fumin Shen, Xing Xu |
| 2025 | Con2Diff: Controllable Condition Diffusion Model for Unsupervised Anomaly Detection. Zhipeng Wang, Yonghong Song |
| 2025 | Concept Drift Guided LayerNorm Tuning for Efficient Multimodal Metaphor Identification. Wenhao Qian, Zhenzhen Hu, Zijie Song, Jia Li |
| 2025 | Consistent Human Animation with Pseudo Multi-View Anchoring and Cross-Granularity Integration. Jintai Wang, Yinglin Zheng, Pengfei Liu, Qifeng Dai, Ming Zeng |
| 2025 | Contextual Reasoning for Robust Composed Image Retrieval with Vision-Language Models. Peng Gao, Yujian Lee, Xubo Liu, Hui Zhang, Zailong Chen, Yiyang Hu, Guquan Jing, Yunting Lai |
| 2025 | Contrastive Single-Stream Spatio-Temporal Joint Modeling for Few-Shot Action Recognition. Xingyang Xu, Jixiang Du, Jing Wang, Hongbo Zhang, Qing Lei, Lijing Ye, Jiayu Xiong |
| 2025 | Core Inter-Category Contrastive Learning for Enhancing Robustness of Caries Classification. Peiliang Zhang, Yaru Chen, Yunjiong Liu, Chao Che, Yongjun Zhu |
| 2025 | CrackMamba with Normalized Soft-Frangi-Filter Enhancement towards Accurate Crack Segmentation. Wanqiang Cai, Xudong Wang, Yifan Xue, Yingyao Ma, Jiasong Wu, Zongyuan Ge, Bin Wang |
| 2025 | CrossHand: Multimodal 3D Hand Reconstruction via Vision and Wearable Sensor Data Fusion. Lin Song, Guanya Zhou, Changyunkun Xiao, Qiyu Jiang, Daquan Yang, He Yu |
| 2025 | DARTer: Dynamic Adaptive Representation Tracker for Nighttime UAV Tracking. Xuzhao Li, Xuchen Li, Shiyu Hu |
| 2025 | DASPL: Enhancing Few-Shot Learning with Dual Adapters and a Single-Step Pseudo-Label Cycle. Yanbo Zhang, Yuhao Liu, Zhaoyang Liu, Huiying Li, Ruilin Chai, Guanghua Gu |
| 2025 | DC-SNet: Efficient Spatio-Temporal Prediction Network Based On Dual-Domain Collaborative Spatiotemporal Network. Tiantian Liu, Kai Li, Ming Ma |
| 2025 | DFFNet: A Super-Resolution Algorithm based on Dynamic Feature Fusion Network. Jiaqi Yang, Jin Yang, Huiying Jia, Wenguang Zheng |
| 2025 | DGFNet: End-to-End Audio-Visual Source Separation Based on Dynamic Gating Fusion. Yinfeng Yu, Shiyu Sun |
| 2025 | DML-FitAR: A Deep Metric Learning Approach for IMU-Based Fitness Activity Recognition. Timin Li, Dongmei Li, Yuepeng Chen, Zhuangzhuang Li, Ye Ma, Dongwei Liu, Xuefeng Feng, Ji Wu, Chenyi Guo |
| 2025 | DMR-XNet: Dynamic Multi-Relation Cross-Fusion Network for Aspect-Based Multimodal Sentiment Analysis. Fengling Zhou, Zhixin Li |
| 2025 | DNVC-FC: A Low-Latency Distributed Neural Video Codec for Resource-Constrained Multimedia Applications. Yiming Ding, Jianguo Wei |
| 2025 | DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation. Yinfeng Yu, Dongsheng Yang |
| 2025 | DRoLaS: Diffusion-Based Coarse-to-Fine Conditional Synthesis of Hierarchical Road Layouts. Shenao Dong, Weitao Li, Bo Li, Long Li, Junao Shen, Tian Feng |
| 2025 | DSSM-KG: Dual-Stream State-Space Modeling with Adaptive Knowledge Injection for Video Captioning. Haoying Sun, Shuyi Li, Zeyu Xi, Bowen Zhang, Lifang Wu |
| 2025 | DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning. Jialang Lu, Huayu Zhao, Huiyu Zhai, Xingxing Yang, Shini Han |
| 2025 | Demonstration Meets Typed Events: Type Specific Video Semantic Role Labeling via Multimodal Prompting and Retrieval. Hanxiao Wei, Bin Wu, Chunjia Wang, Guangyao Su, Tao Zhou |
| 2025 | Diffusion Alignment for Cross Domain Recommendation. Fengxin Li, Hongyan Liu, Jun He |
| 2025 | Diffusion-Based Adversarial Generation with SAM-Guided Spatial Semantics for Text-to-Image Models. Zhanghao Qin |
| 2025 | Direction-aware Attention and Semantic Guidance Network for Salient Object Detection in Optical Remote Sensing Images. Yifei Teng, Zhaoru Guo, Yaqian Wang, Liejun Wang, Panpan Zheng |
| 2025 | Divide and Conquer: Static-Dynamic Collaboration for Few-Shot Class-Incremental Learning. Kexin Bao, Daichi Zhang, Yong Li, Dan Zeng, Shiming Ge |
| 2025 | DomainDiff: Unified Two-Stage Optimization for Text-Video Retrieval. Chenxu Wang, Dong Zhou, Jianghao Lin, Yongmei Zhou, Aimin Yang |
| 2025 | Dual-Branch Sentiment Enhancement Modeling For Joint Multimodal Aspect-Based Sentiment Analysis. Xiangbo Ji, Haoyu Shi, Wei Wu, Na Li, Jinyang Wang |
| 2025 | Dynamic Motion Modeling for Enhanced Visual-Inertial Odometry. Xinchen Ye, Haobo Wang, Rui Xu, Haojie Li |
| 2025 | EPNet: Efficient Part Segmentation for Dense Point Clouds. Cheng Wang, Wulong Hu, Minqian Wang, Zhenbo Cheng, Yuanming Zhang, Fei Gao |
| 2025 | Edge-Aware Network with Confidence Feature Fusion for Infrared Small Target Detection. Boyuan Li, Zitong Ren, Xiuhong Li, Kurban Ubul |
| 2025 | Efficient Camouflaged Object Detection Network Based on Channel Reconstruction and Hybrid Attention. Kuan Wang, Xiuhong Li, Songlin Li, Yulong Bai, Boyuan Li, Mengge Lu, Zhenhong Jia |
| 2025 | Efficient Monocular Depth Estimation Via Single-Step Latent Diffusion Models. Zhiyong Huo, Zhendong Wang |
| 2025 | Efficient Prompt-based Multimodal Interaction for Audio-Visual Event Localization. Longzhuo Huang, Liang Li, Xueyang Fu, Zhengjun Zha |
| 2025 | EilMoB: Emotion-aware Incongruity Learning and Modality Bridging Network for Multi-modal Sarcasm Detection. Haochen Zhao, Yongxiu Xu, Xinkui Lin, Jiarui Lu, Hongbo Xu, Yubin Wang |
| 2025 | EmoHuman: Fine-Grained Emotion-Controlled Talking Head Generation via Audio-Text Multimodal Detangling. Qifeng Dai, Huidong Feng, Wendi Cui, Xinqi Cai, Yinglin Zheng, Ming Zeng |
| 2025 | Enhanced Multi-View Clustering with Multiple Linear Graph Filtering. Henghui Jiang, Liang Du |
| 2025 | Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank Adaptation. Yuheng Ji, Yue Liu, Zhicheng Zhang, Zhao Zhang, Yuting Zhao, Xiaoshuai Hao, Gang Zhou, Xingwei Zhang, Xiaolong Zheng |
| 2025 | Enhancing Adversarial Transferability via Self-Ensemble Feature Alignment. Zhiming Zhao, Jiahao Chen, Qingming Li, Chunyi Zhou, Shouling Ji |
| 2025 | Enhancing OOD Detection Using Latent Diffusion. Heng Gao, Jun Li |
| 2025 | Enhancing Self-Supervised Fine-Grained Video Object Tracking with Dynamic Memory Prediction. Zihan Zhou, Changrui Dai, Aibo Song, Xiaolin Fang |
| 2025 | Ensemble CLIPs: Effective Zero-shot Classification with Hundreds of Multi-modal CLIPs. Bowen Han, Shizhuo Deng, Zehua Gan, Da Teng, Dongyue Chen, Tong Jia |
| 2025 | Evaluate the Generative Capability of Diffusion Models from a Discriminative Perspective. Yixuan Jiang, Hsiao-Dong Chiang, Yiqing Shen |
| 2025 | Event-Driven Hybrid and Cross-Stage Guide for Video Corpus Moment Retrieval. Zheng Wang, Kun Huang, Zengrong Lin, Cong Bai |
| 2025 | Exploiting Event Temporal Dynamics and Sparsity Characteristics for RGB-Event Fusion Semantic Segmentation. Yitong Zhang, Yingmei Wei, Yanming Guo, Jiangming Chen, Yi Zhong |
| 2025 | Exploiting Multimodal Prompt Learning and Distillation for RGB-T Tracking. Qingkuo Hu, Yichen Li, Wenbin Yu |
| 2025 | Exploring Objectness Information via Progressively Decoupled Adaptation for Cross-Domain Detection. Yiming Ge, Hui Liu, Ertong Shang, Junzhao Du, Jie Zhao, Zhaocheng Niu |
| 2025 | FC-MonoDETR: A Monocular 3D Object Detection Network Based on Foreground Constraint. Daifeng Xiao, Dongbo Yu, Yunbiao Wang, Jun Xiao, Ying Wang, Lupeng Liu |
| 2025 | FLAIN: Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons. Binbin Ding, Penghui Yang, Sheng-Jun Huang |
| 2025 | FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis. Ziqi Ni, Ao Fu, Yi Zhou |
| 2025 | Face Anti-spoofing based on Contour-constrained Anomaly Detection. Jiahui Wang, Yuan Liu, Chunlei Peng, Yu Zheng |
| 2025 | Face-DM: An Efficient Framework for Makeup Transfer and Face Swapping. Depei Liu, Ruochong Xiong, Rongxing Wang, Junfei Liu |
| 2025 | FedRE: Robust and Effective Federated Learning with Privacy Preference. Tianzhe Xiao, Yichen Li, Yu Zhou, Yining Qi, Yi Liu, Wei Wang, Haozhao Wang, Yi Wang, Ruixuan Li |
| 2025 | Few-Shot Adaptive Diffusion with Semantic Injection and Parameter Smoothing. Yunjie Cai, Ting Xiao, Yanbing Zhang, Zhe Wang |
| 2025 | Few-Shot Generalized Category Discovery With Retrieval-Guided Decision Boundary Enhancement. Yunhan Ren, Feng Luo, Siyu Huang |
| 2025 | Few-Shot Learning with Class-Number Non-Aligned Training and Cross-Scale Feature Differential Network for Hyperspectral Image Classification. Pan He, Bodong Li, Han Xiang, Bowen Xu, Chunhong Cao |
| 2025 | FewMEA: Few-shot Model Extraction Attack against Sequential Recommenders. Fu Liu, Hui Zhang, Yuqin Lan, Min Li |
| 2025 | Fine-grained Block Pruning with Tiny Sets for Vision Transformers. Yilin Wang, Qiang Dong, Dongyang Zhang, Xin Hu, Tao He, Aiguo Chen |
| 2025 | Floorplan-Diffusion: Automatic Floor Plan Generation via Pre-trained Large Latent Diffusion Model. Minyang Xu, Yunzhong Lou, Xiang Gao, Xiangdong Zhou |
| 2025 | FreqINR: Frequency Consistency for Implicit Neural Representation with Adaptive DCT Frequency Loss. Meiyi Wei, Liu Xie, Ying Sun, Gang Chen |
| 2025 | Frequency-Semantic-enhanced Channel Attention Network for Human Parsing. Yitao Yan, Faliang Huang, Demin Wu, Lin Luo |
| 2025 | From Skeleton to Flesh: Aggregated Relational Transformer Towards Controllable Video Captioning with Two-Step Decoding. Qianwen Cao, Heyan Huang, Boran Wang |
| 2025 | GRE-SLAM: 6-DoF Pure Event-Based SLAM with Semi-Dense Depth Recovery Assisted Bundle Adjustment. Yang Chen, Lin Zhang |
| 2025 | GarmentGS: Point-Cloud Guided Gaussian Splatting for High-Fidelity Non-Watertight 3D Garment Reconstruction. Zhihao Tang, Shenghao Yang, Hongtao Zhang, Mingbo Zhao |
| 2025 | Generative Emotion Cause Explanation in Multimodal Conversations. Lin Wang, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang, Zhitao Zhang |
| 2025 | Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining. Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Zhiqiang Wang |
| 2025 | Graph Alignment Using Seed-Oriented Subgraph Matching. Wei Tang, Xinglin Lv, Yuang Li, Min Zhang, Hao Yang |
| 2025 | GraphDC: Detecting and Confusing in Node Injection Attack. Jialong Wang, Shilong Zhang, Zhiguo Gong |
| 2025 | GroupAC: Inter-Group Context Modeling for Point Cloud Attribute Compression with RAHT. Guangjie Zhang, Chunyang Fu, Qiang Xu, Shan Liu, Ge Li |
| 2025 | Guided Infrared Image Super-Resolution via Cross-modal Progressive Guidance. Xinchen Ye, Siqi Wang, Rui Xu, Haojie Li |
| 2025 | HGAtt-ARN: A Novel Adversarial Reconstruction Network Based on Higher-order Gate Attention for Incomplete Multimodal Sentiment Analysis. Qingpeng Wen, Pengfei Wei, Fan Li, Qintai Hu, Bi Zeng, Guang Feng |
| 2025 | HM3: Hierarchical Modeling of Multimedia Metaverses on 10000 Thematic Museums via Theme-aware Contrastive Loss Function. Gianluca Macrì, Lorenzo Bazzana, Alex Falcon, Giuseppe Serra |
| 2025 | HM3D: A Lightweight Hierarchical Mamba Model for Efficient 3D Point Cloud Analysis. Tianyi Chen, Xian-Feng Han |
| 2025 | HOOI Detection: Cascade-Clue Integrated Modeling over Multiple Temporal Segments. Mingxuan Zhang, Qi He, Zhaoquan Yuan, Tingquan He, Rong Li |
| 2025 | Heterogeneous Graph Embedding for Multimodal Multi-Label Emotion Recognition. Disen Hu, Xun Jiang, Zhe Sun, Fumin Shen, Xing Xu |
| 2025 | Heterogeneous Model Knowledge Distillation via Dual Alignment for Semantic Segmentation. Mingzhu Xu, Jing Wang, Mingcai Wang, Yiping Li, Yupeng Hu, Xuemeng Song, Weili Guan |
| 2025 | Hierarchical Matrix-Contrastive Bilateral Fusion for Multimodal Sentiment Analysis. Chaoxing Tang, Anyang Tong, Fei Wang, Zhangling Duan |
| 2025 | Hierarchical Neural Architecture Search for Fast and Accurate Depth Completion. Xiaogang Jia, Songlei Jian, Yusong Tan, Yonggang Che, Wei Chen, Zhengfa Liang, Yulin He |
| 2025 | HyHE: Enhancing Image-Text Retrieval through Hyperbolic Hierarchical Embeddings. Aohui Miao, Wei Wei |
| 2025 | ICDAR 25: Intelligent Cross-Data Analysis and Retrieval. Takahiro Komamizu, Marc A. Kastner, Minh-Son Dao, Michael Alexander Riegler, Duc-Tien Dang-Nguyen, Son N. Tran |
| 2025 | Identity-domain Removal for Robust EEG-based Emotion Recognition. Wenchang Deng, Shenghua Zhong, Rongrong Lu, Yi Wang |
| 2025 | Image Description and Aspect-Aware Denoising for Aspect-Based Multimodal Sentiment Analysis. Jiachang Sun, Xiuhong Li |
| 2025 | Incremental Information-Aware: Mine Abundant and Accurate Information for Video Captioning. Ningkai Zhong, Bin Fang, Mengdi Li, Langping Wang |
| 2025 | Intent-Augmented Multimodal Graph Embedding for Multimedia Recommendation. Ruoxi Li, Meng Jian, Lifang Wu, Xinying Wu |
| 2025 | Inter - Diffusion Generation Model of Speakers and Listeners for Effective Communication. Jinhe Huang, Yongkang Cheng, Minghang Yu, Gaoge Han, Jinwei Li, Jing Zhang, Shilei Wang, Xingjian Gu |
| 2025 | Introduction to the 8th Annual Lifelog Search Challenge, LSC'25. Cathal Gurrin, Liting Zhou, Graham Healy, Allie Tran, Luca Rossetto, Werner Bailer, Duc-Tien Dang-Nguyen, Steve Hodges, Björn Þór Jónsson, Minh-Triet Tran, Klaus Schöffmann |
| 2025 | Inverse Farthest Point Sampling (IFPS): A Universal and Hierarchical Shell Representation for Discrete Data. Nayu Ding, Yujie Lu, Yao Huang, Long Wan, Yan Zhao, Zhijun Fang, Shen Cai, Lin Gao |
| 2025 | Joint Adversarial Purification: Mitigating the Threat of Multimodal Adversarial Examples. Qin Li, Youze Wang, Wenbo Hu, Richang Hong |
| 2025 | KEGNN: Knowledge-Enhanced Graph Neural Networks for User Engagement Prediction. Ching-Hao Fan, Hao Zhou, Yao Sun, Geovanny Palomino Roldan, Olga Kokshagina, Marc Santolini, Lijing Wang |
| 2025 | Knowledge Discovery in Fuzzy Linguistic Triadic Context: Mining Data Hidden under Conditions. Yujie Cao, Hongping Liu, Benshuai Wang, Li Zou |
| 2025 | LDNet: Dynamic Feature Extraction and Attention Fusion for Building Change Detection. Xue Li, Dong Li, Xueying Feng |
| 2025 | LLAUS: A High-Quality Instruction-Tuned Large Vision Language Assistant for UltraSound. Junhao Guo, XueFeng Shan, Guoming Wang, Dong Chen, Rongxing Lu, Siliang Tang |
| 2025 | Label Ranker: Self-aware Preference for Classification Label Position in Visual Masked Self-supervised Pre-trained Model. Peihao Xiang, Ou Bai |
| 2025 | Latent Sensor Fusion: Multimedia Learning of Physiological Signals for Resource-Constrained Devices. Abdullah Ahmed, Jeremy Gummeson |
| 2025 | Learning 3D Volume Cloud from Single Image. Yuhang Cheng, Yu Zhang, Xiaogang Wang |
| 2025 | Learning to Predict Advertisement Expansion Moments in Short-Form Video Platforms. Wenxuan Hou, Kaibing Yang, Di Hu |
| 2025 | Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training. Zijian Zhao |
| 2025 | Lifelong Visible-Infrared Person Re-Identification with Prompt Pool and Instance-level Prompt Generator. Zhenxi Luo, Guoqiang Xiao, Michael S. Lew, Song Wu |
| 2025 | Local and Global Aware Document Image Enhancement with Residual Denoising Diffusion Model. Hongrui Tie, Heng Li, Xiangping Wu, Qingcai Chen |
| 2025 | Low-Rank Adaptation for Parameter-Efficient Fine-Tuning in Composed Image Retrieval. Jiaxin Luo, Mingbo Zhao, Hongtao Zhang |
| 2025 | MAD'25: 4th ACM International Workshop on Multimedia AI against Disinformation. Dan-Cristian Stanciu, Bogdan Ionescu, Symeon Papadopoulos, Giorgos Kordopatis-Zilos, Adrian Popescu, Roberto Caldelli, Milica Gerhardt, Vera Schmitt |
| 2025 | MAD-paint: Mask-Aware Diffusion Sampling for Image Inpainting. Shipeng Jiang, Jingwei Qu, Bingyao Huang |
| 2025 | MAGIC: Noise Mitigation and Knowledge Alignment for Knowledge Graph-Based Multi-modal Recommendation. Shijie Zhu, Yan Zhang, Li Zhang, Xi Chen, Lei Zhao |
| 2025 | MDN: Modality Decomposition Network for Multimodal Recommendation. Zhuoyang Liu, Weihai Lu |
| 2025 | MFLCP: Personalized Multimodal Federated Learning via Collaborative Prompting with Missing Modalities. Wenli Li, MeiYu Liang, Ruoyu Fan, Yuxuan Li |
| 2025 | MFSVFND: Multimodal Fusion Network for Detecting Fake News on Short Video Platforms. Liyuan Zhang, Yang Yajing, Yan Yang, Yong Liu, Zhongyan Gui, Ruofan Li, Hao Fei |
| 2025 | MGSGM: Multi-Granularity Selective Graph Mamba for Image-Text Retrieval. Yongle Huang, Yongfeng Bu, Keyu Guo, Zedong Liu, Xiangyu Song, ShiJie Sun |
| 2025 | MIE-GAT: Multi-perspective Information Enhancement for Slice-based Image Retrieval in Multi-modal Medical Diagnosis. Yuan Li, Yinjian Zhao, Minghao Wang, Xia Feng, Qian Peng, Airu Yin, Hua Ji |
| 2025 | MIMCL: Multilayer Interaction Module with Contrastive Learning for Speech Emotion Recognition. Feng Li, Rongsheng Liu, Bing Wang |
| 2025 | MMCNav: MLLM-empowered Multi-agent Collaboration for Outdoor Visual Language Navigation. Ziheng Zhang, Minghao Chen, Suguo Zhu, Tingting Han, Zhou Yu |
| 2025 | MR4SseC: A Multimodal Representation Learning Framework for Space Science Experiment of China's Space Station. Yunfei Liu, Anqi Liu, Yanan Liu, Yunziwei Deng, Yizhao Wang, Shengyang Li |
| 2025 | MSSA-Net: A Multi-Scale Structure-Aware Network for Edge Detection in Point Clouds. Yunzhou Xia, Weiqi Yan, Yu Zang, Weiquan Liu, Cheng Wang |
| 2025 | MTDIR: A Malicious Traffic Detection Model Based on the Image Retrieval Perspective. Ziang Li, Haonan He, Zhou Zhou, Chengxiang Si |
| 2025 | MaGo-I2P: Image-to-Point Cloud Registration with Mamba and Geometry Recovery. Yunda Sun, Lin Zhang |
| 2025 | MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval. Chao He, Hongxi Wei |
| 2025 | Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval. Li-Cheng Shen, Jih-Kang Hsieh, Wei-Hua Li, Chu-Song Chen |
| 2025 | MedQuery: A Graph-Driven Medical Literature-Enhanced Query Answering System. Chenhan Fu, Yu Xia, Guoming Wang, Rongxing Lu, Siliang Tang |
| 2025 | MedVSA: Medical Visual Spoken-Question Answering. Lei Liu, Xiangdong Su, Guanglai Gao |
| 2025 | MeloDance: Dance Generation Guided by Music Structure and Emotion. Yixuan Li, Qiang Jin, Huaping Liu, Jinhai Chen, Xiangyu Zhao, Peng Li |
| 2025 | Metal Surface Defect Detection based on Variable Mask Ratio Multi-scale Reconstruction. Yu Zheng, Meng Du, Lin Zhao, Chunlei Peng |
| 2025 | MirrorDiff: Learning Mirror Diffusion for Image Captioning via Regeneration. Junbo Wang, Liangyu Fu, Yining Zhu, Qiangguo Jin, Hongsong Wang, Yuke Li, Xuecheng Wu, Kun Hu |
| 2025 | Mitigating Expression Class Bias with Class-Incremental Learning in Facial Expression Recognition. Yiqin Luo, Yinghui Li, Tianlong Gu, Liang Chang |
| 2025 | MixSENet: A Lightweight Model for Speech Enhancement with Multi-Scale Features and Contextual Modeling. Chuike Kong, Guangcun Wei, Shuo Li, Penghao Ma, Changhao Li |
| 2025 | Mixture of Experts for Node Classification. Yu Shi, Yiqi Wang, Weixuan Liang, Jiaxin Zhang, Pan Dong, Aiping Li |
| 2025 | MoAFCL: Feature-Aware Mixture-of-Adapter for Federated Continual Learning. Dian Zhang, Bingyan Liu |
| 2025 | MoRLACS: A Monocular RGBD-based Locomotion Approach for CAVE Systems. Haopeng Lu, Ruiqi Li, Qian Yin, Li Song, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao |
| 2025 | MuSeLLM: SDF Generation and Understanding via Multi-Scale Tokenization with Position-Aware Guidance. Tianwei Ding, Lanshan He, Weijian Ma, Xiangdong Zhou |
| 2025 | Multi-modal Similarity Guided Adaptive Fusion Network for Short Video Fake News Detection. Jing Shen, Yanjia Wang, Shengze Wang, Yuping Zhang, Haibo Liu |
| 2025 | Multi-scale Feature Field with Anti-brightness-sensitivity Postprocessing for Few-shot Neural Panoptic Segmentation. Bin Dou, Yongjia Ma, Tianyu Zhang, Zejian Yuan |
| 2025 | Multilayer Graph Clustering with Lightweight Contrastive Learning. Wentao Li, Xingwang Zhao, Zhiqiang Wang |
| 2025 | Multimodal Contrastive Learning for Music with Incomplete Modalities. Koto Nakata, Koji Eguchi |
| 2025 | Multimodal and Multilingual Fact-Checked Article Retrieval. Stefanos-Iordanis Papadopoulos, Ivana Benová, Sebastian Kula, Michal Gregor, George Karantaidis, Tomas Javurek, Marián Simko, Symeon Papadopoulos |
| 2025 | Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection. Zihan Xiong, Xiaohua Wu, Lei Chen, Fangqi Lou |
| 2025 | Multiscale Feature Enhancement and Adaptive Receptive Field for Tiny Object Detection in Remote Sensing Images. Yunpeng Zeng, An Luo, Kefan Zhan, Jiaxin Li, Yuan Zhang, Kai Hu |
| 2025 | OT-Talk: Animating 3D Talking Head with Optimal Transportation. Xinmu Wang, Xiang Gao, Xiyun Song, Heather Yu, Zongfang Lin, Liang Peng, Xianfeng Gu |
| 2025 | OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering. Jingrui Ye, Zhongkai Zhang, Qingmin Liao |
| 2025 | Octree-STCM: Octree-Based Spatio-Temporal Context Model for Lossless Geometry Compression of Dynamic Point Cloud. Zhecheng Wang, Shuai Wan, Jianqiang Huang |
| 2025 | On the Adversarial Robustness of Visual-Language Chat Models. Tianrui Qin, Xuan Wang, Juanjuan Zhao, Kejiang Ye, Cheng-Zhong Xu, Xitong Gao |
| 2025 | Open-World 3D Scene Understanding with Cross-Modal Dual Consistency Learning. Xian-Feng Han, Chuyu Wang, Yuhang Wang, Mingjie Wang |
| 2025 | OpenSGen: Fine-Grained Relation-Aware Prompt for Open-Vocabulary Scene Graph Generation. Zihan Kong, Haiwei Zhang |
| 2025 | Optimal Transport-Driven Federated Out-of-Distribution Detection in Heterogeneous Data. Yuan He, Yingchun Cui, Zhengda Wu, Heran Xi, Jinghua Zhu |
| 2025 | Optimization of CLIP Models for Domain-Specific Video Search. Kazuya Ueki, Haruki Sato, Yuma Suzuki, Takayuki Hori, Hiroki Takushima, Takumi Takada, Hayato Tanoue, Aiswariya Manoj Kumar, Hiroki Nishihara, Yuki Shibata |
| 2025 | Out-of-Distribution Detection for Open-Set Semi-Supervised Medical Image Classification. Jiawei Zhang, Yingchun Cui, Zhengda Wu, Heran Xi, Jinghua Zhu |
| 2025 | PAP-SAM: Global-Local Prior Adaptive Perception SAM for Co-Salient Object Detection. Jizhe Yu, Xiya Bu, Yu Liu, Kaiping Xu |
| 2025 | PIG: Physically-based Multi-Material Interaction with 3D Gaussians. Zeyu Xiao, Zhenyi Wu, Mingyang Sun, Qipeng Yan, Yufan Guo, Zhuoer Liang, Lihua Zhang |
| 2025 | PRNet: Parallel Refinement Network with Selective Feature Enhancement for Infrared Small Target Detection. Boyuan Li, Xiuhong Li, Kurban Ubul |
| 2025 | PTSR: A Unified Patch Tokenization, Selection and Representation Framework for Efficient Micro-expression Recognition. Liangyu Fu, Junbo Wang, Qiangguo Jin, Yining Zhu, Hongsong Wang, Yuke Li, Xuecheng Wu, Kun Hu |
| 2025 | Proceedings of the 2025 International Conference on Multimedia Retrieval, ICMR 2025, Chicago, IL, USA, 30 June 2025 - 3 July 2025 Zhongfei (Mark) Zhang, Elisa Ricci, Yan Yan, Liqiang Nie, Vincent Oria, Lamberto Ballan |
| 2025 | Q-Chain: A Causal-Aware Framework for Structural and Educational Question Generation. Junqi Xu, Lvcheng Wang, Zeyd Boukhers, Bipin Indurkhya, Cong Yang |
| 2025 | QUEST: QUasi-clique Enhanced Structure-aware Transformation for Low-overlap Point Cloud Registration. Yance Fang, Hualong Cao, Yongcai Wang, Haoyu Liu, Deying Li |
| 2025 | RATE: Robust Adversarial Training and Temperature-scaled Ensemble Framework for Trustworthy Misinformation Detection. Rui Sun, Wenbo Hu, Qiang Liu, Richang Hong |
| 2025 | ROAD-6: A Diverse Dataset for Unexpected Hazard Recognition in Autonomous Vehicles. Shehzad Ali, Md Tanvir Islam, Minh-Son Dao, Ik Hyun Lee, Shuai Liu, Khan Muhammad |
| 2025 | RPUDet: Learning Relational Prior and Uncertainty for Robust Aerial Object Detection. Kun Qian, Wei Liu, Minshi Chen, Xiao Wang, Xin Yuan |
| 2025 | RagMe: Retrieval Augmented Video Generation for Enhanced Motion Realism. Elia Peruzzo, Dejia Xu, Xingqian Xu, Humphrey Shi, Nicu Sebe |
| 2025 | Real-Time Dynamic Light Pixels Video Frame Interpolation with Zero-Overhead Masks. Zhaohong Xiang, Yigui Luo, Hejing Cai, Yuqi Kuang, Yonghong Guo, Minchi Luo, Yanfang Wang |
| 2025 | Reproducibility Companion Paper: AdOCTeRA - Adaptive Optimization Constraints for Improved text-guided Retrieval of Apartments. Ali Abdari, Alex Falcon, Giuseppe Serra, Qiushi Huang |
| 2025 | Reproducibility Companion Paper: Learning Differentiable Particle Filter on the Fly. Jiaxi Li, Xilu Wang, Yunfan Hu |
| 2025 | Reproducibility Companion Paper: u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model. Jinjin Xu, Xilu Wang, Liwu Xu, Yuzhe Yang, Xiang Li, Fanyi Wang, Yanchun Xie, Yi-Jie Huang, Yaqian Li, Yunfan Hu |
| 2025 | Resolution-Aware Criss-Cross Attention Detector for Small Object Detection in Aerial Images. Heyu Sun, Taoying Liu, Xingzhou Zhang, Qiang Guo |
| 2025 | RetrievFace: Retrieval-Enhanced Diffusion for Controllable Text-Guided Face Editing. Lulu Tian, Hongxun Yao |
| 2025 | Robust Relevance Feedback for Interactive Known-Item Video Search. Zhixin Ma, Chong-Wah Ngo |
| 2025 | RobustPT: Dynamic Disentanglement Prompt Tuning in Vision-Language Models with Missing Modalities. Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang |
| 2025 | SAP-DIFF: Semantic Adversarial Patch Generation for Black-Box Face Recognition Models via Diffusion Models. Mingsi Wang, Shuaiyin Yao, Chang Yue, Lijie Zhang, Guozhu Meng |
| 2025 | SCN-Pillar: Construct a Pillar-based Fully Sparse Lightweight 3D Detector via Sparse ConvNeXt. Xusheng Li, Chengliang Wang, Tian Jiang, Yonggang Luo, Bo Zheng |
| 2025 | SCNet: Spatio-temporal Feature Aggregation and Cross-modal Interactive Encoding Network for DAVIS Object Detection. Yunhua Chen, Jinyu Zhong, Pinghua Chen, Wei Wu, Jinsheng Xiao |
| 2025 | SEPA: An Semantic Projection Alignment Framework for Multimodal Named Entity Recognition. Guohui Ding, Yushuo Kong, Xinlei Li |
| 2025 | SFi-Former: Sparse Flow Induced Attention for Graph Transformer. Zhonghao Li, Ji Shi, Xinming Zhang, Miao Zhang, Bo Li |
| 2025 | SSCD: Self-Supervised Coherence Discrimination Representation Learning for Scene Text Recognition. Zhi-Yuan Xue, Li-Jun Zhao, Jia-Ying Zhang, Xin Luo, Xin-Shun Xu |
| 2025 | SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations. Shuting Zhao, Linxin Bai, Liangjing Shao, Ye Zhang, Xinrong Chen |
| 2025 | SSTAP: Generating Sample-Specific Transferable Adversarial Patch in Multimodal Contrastive Learning. Changchun Yin, Liming Fang |
| 2025 | STGFuse: Semantic Text-Guided Medical Image Fusion with Interactive Degradation Handling. Aimei Dong, Zhen Chen, Long Wang, Yongxing Cai |
| 2025 | STrack: Confidence-Level-Based Separate Tracking for Robust Multi-Object Tracking. Zheng Liang, Shuo Yang |
| 2025 | Scene-guided Attention Network for Spatial Understanding in 3D Scenes. Yunqi Jiang, Jianwei Zhang, Chaoyang Lin, Yi Yu, Zhenguo Yang |
| 2025 | Self-supervised Bidirectional Synchronization Estimation for Multimodal Deepfake Detection with Short-term Dependency. Man Xiao, Jianbin Ye, Bo Liu, Zijian Gao, Kele Xu, Xiaodong Wang |
| 2025 | Separable and Flexible Classification on Pseudo-Features for Few-Shot Class-Incremental Learning. Yuancheng Yang, Luyang Jin, Shuai Zhang, Chao Tong |
| 2025 | Single-Source Dual-Stream Representation Learning for DNA Sequence Classification. Jiarui Zhou, Zongmeng Zhang, Min Wang, Wengang Zhou, Houqiang Li |
| 2025 | Spatially-Aware Entity Relation Exploration for Remote Sensing Image-Text Retrieval. Jianan Shui, Shuaipeng Ding, Mingyuan Ge, Mingyong Li |
| 2025 | SpectraSpan: Zero Fine-Tuning Long Video Generation Framework and Its Frequency Domain Optimization. Wentao Zhang, Fen Wang, Zheng Cao |
| 2025 | Step-wise Soft Alignment Enhanced Procedural Text Generation from Long Instructional Videos. Zhihao Wang, Lin Li, Xian Zhong, Xiaohui Tao, Jianquan Liu |
| 2025 | TF-IECN: Tuning-free Image Efficient Customization via Refined Collaborative Denoising Strategies. Wei Xia, Jun Qin, Zheng Ye, Jing Liu, Zhou Liu |
| 2025 | TF-MERC: Integrating Time-Frequency Information for Multimodal Emotion Recognition in Conversation. Jiawei Cheng, Xiaofei Zhu, Zhou Yang |
| 2025 | TLENet: Two-stage Low-light Enhancement Network Based on Illuminance Adaptation. Haixin Jia, Yu Zhang, Guoying Zhang, Xing Yang, Han Wang, Hengchen Xu |
| 2025 | Taming Vision-Language Models for Federated Foundation Models on Heterogeneous Medical Imaging Modalities. Lulu Feng, Shengchao Chen |
| 2025 | TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment. Zhichuan Wang, Yang Zhou, Jinhai Xiang, Yulong Wang, Xinwei He |
| 2025 | TeMTG: Text-Enhanced Multi-Hop Temporal Graph Modeling for Audio-Visual Video Parsing. Yaru Chen, Peiliang Zhang, Fei Li, Faegheh Sardari, Ruohao Guo, Zhenbo Li, Wenwu Wang |
| 2025 | TexDreamer: Text-driven Photorealistic and Robust Texture Synthesis via Multi-View Diffusion. Zhenqiang Li, Jie Li, Yangjie Cao |
| 2025 | Text-Guided Attribute Enhancement Framework for Composed Image Retrieval. Zhi Ma, Yizi Huang, Di Wang, Bo Wan, Lin Zhao, Quan Wang |
| 2025 | Text-Guided Realistic Single Image Relighting with Wavelet Mamba Diffusion Network. Yunting Lai, Hui Zhang, Xin Zhang, Yiyang Hu, Peng Gao, Guquan Jing |
| 2025 | The Multimedia Recommendation System Based on Multimodal Fine-Grained Classification Mining. Yifan Huo, Zheng Fan, Ming Liu, Junhong Zheng, Lili He |
| 2025 | The Power of "Why?" in Decision Making in Complex, Dynamic Systems. K. Selçuk Candan |
| 2025 | Three and a Half Generations of Video Generation Models. Sergey Tulyakov |
| 2025 | TourMLLM: A Retrieval-Augmented Multimodal Large Language Model for Multitask Learning in the Tourism Domain. Hiromasa Yamanishi, Ling Xiao, Toshihiko Yamasaki |
| 2025 | Towards Comprehensive Legal Document Analysis: A Multi-Round RAG Approach. Wutong Zhang, Hefeng Zhou, Qiang Zhou, Yunshen Li, Yuxin Liu, Jiong Lou, Chentao Wu, Jie Li |
| 2025 | Towards Effective and Consistent Information Extraction for Social Recommendation: A Minimum and Sufficiency Perspective. Wenze Ma, Yuexian Wang, Chenyu Sun, Yanmin Zhu, Zhaobo Wang, Xuhao Zhao, Jiadi Yu, Feilong Tang |
| 2025 | Towards Emotion Analysis in Short-form Videos: A Large-Scale Dataset and Baseline. Xuecheng Wu, Heli Sun, Junxiao Xue, Jiayu Nie, Xiangyan Kong, Ruofan Zhai, Danlei Huang, Liang He |
| 2025 | Towards Interpretable User Intent Analysis with Deficient Evidence Fusion for Pseudo-Modalities. Chaochen Wu, Guan Luo, Meiyun Zuo |
| 2025 | Towards Robust Polyp Segmentation: Multi-Focus Attention Network with Fine-grained Polyp Cues. Nan Mu, Xianchao Zhang, Yazhou Feng, Xiaoning Li, Jingfeng Jiang, Lei Liu |
| 2025 | Troublemaker Learning for Low-Light Image Enhancement. Yinghao Song, Hao Ma, Bo Yang, Yanchun Liang, Hongwei Ge, Heow Pueh Lee, Chunguo Wu |
| 2025 | Tutorial Proposal: Hallucinations in Large Language Models and Large Vision-Language Models. Liqiang Jing, Yue Zhang, Xinya Du |
| 2025 | Two Heads are Better than One: A Network Attack Detection Model Based on Multimodal and Multimedia Retrieval. Ziang Li, Zhou Zhou, Chengxiang Si |
| 2025 | UMLLA-AD: Mamba-Driven Adaptive Feature Selection for Industrial Anomaly Detection. Tingting Fang, Junjie Wang, Ming Ye, Yuefei Huang |
| 2025 | Unified Multi-modal Salient Object Detection via Frequency Prompt and Adapter Tuning. Chaojun Cen, Fei Li, Zhenbo Li |
| 2025 | Unveiling Bias and Safety Issues in Generative Models. Nicu Sebe |
| 2025 | VLMs bridging-enhanced Scene Semantic Reasoning Framework for Image-Text Matching. Yihua Gao, Junyu Chen, Mingyong Li |
| 2025 | ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing. Yisu Wang, Yixiang Zhu, Xinjiao Li, Yulong Zhang, Ruilong Wu, Dirk Kutscher |
| 2025 | ViT-Enhanced Prompts: Integrating Pre-Trained Knowledge for Robust Continuous Learning. Xiaoyu Du, Guoqiang Xiao, Michael S. Lew, Song Wu |
| 2025 | Video Frame Enhancement based Text Semantic Fusion for Cross-modal Text-video Retrieval. Kang Du, Huaxiang Zhang, Li Liu, Dongmei Liu, Hao Du |
| 2025 | Visible-Infrared Person Re-Identification via Mutual Reinforcement of Prompts and Image Encoders. Hongde Zhang, Bingpeng Ma |
| 2025 | Visual Content Generation in the Era of Large Foundation Models. Leigang Qu, Fei Shen, Zhenglin Zhou, Jiayi Lyu, Wenjie Wang, Lu Jiang |
| 2025 | Visual Grounding with Feature Enhancement and Language-Aware Attribute Guidance. Xiya Bu, Jizhe Yu, Yu Liu, Kaiping Xu |
| 2025 | Vividportraits: Face Parsing Guided Portrait Animation. Xuze Tian, Jinshan Zhang, Tao Jiang, Boxi Wu, Meng Xi, Zejian Li, Jianwei Yin |
| 2025 | WE-LSTM: Multi-Wavelet Enhanced Seasonal-Trend Denoising for Long-Term Time Series Forecasting. Zhiwei Zhang, Jiwei Qin, Dezhi Sun, Xuefeng Feng, Huiguo Zhang |