| 2025 | 'Hi AirStar, Guide Me to the Badminton Court.'. Ziqin Wang, Jinyu Chen, Xiangyi Zheng, Qinan Liao, Linjiang Huang, Si Liu |
| 2025 | 'What Can I Cook?' LetMeCook: An LLM-Based Interactive System for Personalized Recipe Generation. Shiqin Liu, Minjun Zhao, Jiajun Bu |
| 2025 | (DFF '25) 1st Deepfake Forensics Workshop: Detection, Attribution, Recognition, and Adversarial Challenges in the Era of AI-Generated Media. Sebastiano Battiato, Mirko Casu, Francesco Guarnera, Luca Guarnera, Giovanni Puglisi, Orazio Pontorno, Claudio Vittorio Ragaglia, Zahid Akhtar |
| 2025 | (RichMediaGAI'25) 3rd International Workshop on Rich Media with Generative AI. Wei Jiang, Zhenghao Chen, Dong Xu |
| 2025 | 3D Gaussian Splatting Data Compression with Mixture of Priors. Lei Liu, Zhenghao Chen, Dong Xu |
| 2025 | 3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians. Zeming Wei, Junyi Lin, Yang Liu, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin |
| 2025 | 3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting. Yuke Xing, Jiarui Wang, Peizhi Niu, Wenjie Huang, Guangtao Zhai, Yiling Xu |
| 2025 | 3DGabSplat: 3D Gabor Splatting for Frequency-adaptive Radiance Field Rendering. Junyu Zhou, Yuyang Huang, Wenrui Dai, Junni Zou, Ziyang Zheng, Nuowen Kan, Chenglin Li, Hongkai Xiong |
| 2025 | 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models. Min Wei, Chaohui Yu, Jingkai Zhou, Fan Wang |
| 2025 | 8th ACM International Workshop on Multimedia Content Analysis in Sports (ACM MMSports'25). Rainer Lienhart, Thomas B. Moeslund, Hideo Saito |
| 2025 | A Jinfan Liu, Zhangli Hu, Hanqi Chen, Ye Chen, Bingbing Ni, Shuicheng Yan |
| 2025 | A Comprehensive Benchmark for Electrocardiogram Time-Series. Zhijiang Tang, Jiaxin Qi, Yuhua Zheng, Jianqiang Huang |
| 2025 | A Comprehensive Model for Visual Fatigue Assessment in 3D Light Field Displays Based on Eye Movement Data Analysis. Yu Chen, Binbin Yan, Shuo Chen, Xinzhu Sang |
| 2025 | A Data-driven Approach to the Longitudinal Study of Canine Vocal Pattern Development. Hridayesh Lekhak, Tuan M. Dang, Theron S. Wang, Kenny Q. Zhu |
| 2025 | A Dataset and Metric for Textual Video Content Description. Stefan J. Arzberger, Paul Raith, Werner Bailer, Marion Jaks |
| 2025 | A Dual-Branch 3D Spatial-Aware Latent Diffusion for Realistic Depth Image Synthesis. Shuang Hao, Pengfei Ren, Lei Zhang, Haifeng Sun, Pan Ting, Menghao Zhang, Cong Liu, Qi Qi, Jianxin Liao, Jingyu Wang |
| 2025 | A Filtering Framework for Semi-online Referring Video Object Segmentation. Xiao Hu, Heiko Neumann, Jochen Lang |
| 2025 | A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task. Mashiro Toyooka, Kiyoharu Aizawa, Yoko Yamakata |
| 2025 | A Language-Assisted Semantic-Aware Disentangled Method for Link Prediction on Heterogeneous Graphs. Rongqiang Fang, Yongqi Sun, Jidong Yuan, Hongbo Cao, Jinkun Dong |
| 2025 | A Large-Scale Dataset for Short-Video Topic Peak Prediction and a Large Heterogeneous Graph Model. Shangheng Chen, Shengsheng Qian, Quan Fang, Jun Hu, Changsheng Xu |
| 2025 | A Large-scale Universal Evaluation Benchmark For Face Forgery Detection. Hengrui Lou, Zunlei Feng, Jinsong Geng, Erteng Liu, Lechao Cheng, Jie Lei, Jie Song, Mingli Song, Yijun Bei |
| 2025 | A Matter of Time: Revealing the Structure of Time in Vision-Language Models. Nidham Tekaya, Manuela Waldner, Matthias Zeppelzauer |
| 2025 | A Motion is Worth a Hybrid Sentence: Taming Language Model for Unified Motion Generation by Fine-grained Planning. Ronghui Li, Lingxiao Han, Shi Shu, Yueyao Liu, Yukang Lin, Yue Ma, Jie Guo, Ziwei Liu, Xiu Li |
| 2025 | A Multi-illumination Dataset and an Illumination Domain Adaptation Network for Finger Vein Identification. Huabin Wang, Yingfan Cheng, Wu Zheng, Jiayuan Cheng, Xin Li, Min Li, Fei Liu |
| 2025 | A Multimodal Deviation Perceiving Framework for Weakly-Supervised Temporal Forgery Localization. Wenbo Xu, Junyan Wu, Wei Lu, Xiangyang Luo, Qian Wang |
| 2025 | A Multimodal Evaluation Framework for Spatial Audio Playback Systems: From Localization to Listener Preference. Changhao Pan, Wenxiang Guo, Yu Zhang, Zhiyuan Zhu, Zhetao Chen, Han Wang, Zhou Zhao |
| 2025 | A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding. Zhenyang Liu, Sixiao Zheng, Siyu Chen, Cairong Zhao, Longfei Liang, Xiangyang Xue, Yanwei Fu |
| 2025 | A New Dataset and Benchmark for Grounding Multimodal Misinformation. Bingjian Yang, Danni Xu, Kaipeng Niu, Wenxuan Liu, Zheng Wang, Mohan Kankanhalli |
| 2025 | A Novel Perspective on Low-Light Image Enhancement: Leveraging Artifact Regularization and Walsh-Hadamard Transform. Weilin Wu, Shifan Yang, Qizhao Lin, Xinghong Chen, Kunping Yang, Jing Wang, Guannan Chen |
| 2025 | A Satellite-Ground Synergistic Large Vision-Language Model System for Earth Observation. Yuxin Zhang, Jiahao Yang, Zhe Chen, Wenjun Zhu, Jin Zhao, Yue Gao |
| 2025 | A Spatial Relationship Aware Dataset for Robotics. Peng Wang, Minh Huy Pham, Zhihao Guo, Wei Zhou |
| 2025 | A Streamlined System for Multimodal Industrial Anomaly Detection via 2D and 3D Feature Fusion. Wenbing Zhu, Mingmin Chi, Bo Peng |
| 2025 | A Theoretical Proof of Dynamic Multimodal Fusion Exacerbates Modality Greedy. Xiaorui Ding, Huan Ma, Changqing Zhang |
| 2025 | A Two-Stage Full Fine-Tuning and LLM Post-processing Framework for MCABSA. Deyuan Chen, Xiaocui Yang, Shi Feng, Zihan Cheng, Daling Wang, Yifei Zhang |
| 2025 | A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement. Gaozheng Pei, Ke Ma, Dongpeng Zhang, Chengzhi Sun, Qianqian Xu, Qingming Huang |
| 2025 | AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse. Zichao Yu, Zhen Zou, Guojiang Shao, Chenwei Zhang, Shengze Xu, Jie Huang, Feng Zhao, Xiaodong Cun, Wenyi Zhang |
| 2025 | ACE: Concept Editing in Diffusion Models without Performance Degradation. Ruipeng Wang, Junfeng Fang, Jiaqi Li, Hao Chen, Jie Shi, Kun Wang, Xiang Wang |
| 2025 | ACM Multimedia 2025 Grand Challenge report for Image-to-Video Generation Model Acceleration. Jie Yang, Shien Song, Jin Chen, Haoyuan Xie, Han Qi, Yifei Xue, Yizhen Lao |
| 2025 | ACM Multimedia Grand Challenge on ENT Endoscopy Analysis. Trong-Thuan Nguyen, Viet-Tham Huynh, Thao Thi Phuong Dao, Mai-Khiem Tran, Ha Nguyen Thi, Tien To Vu Thuy, Uyen Hanh Tran, Tam V. Nguyen, Minh-Triet Tran, Thanh Dinh Le |
| 2025 | ACMamba: Fast Unsupervised Anomaly Detection via An Asymmetrical Consensus State Space Model. Guanchun Wang, Xiangrong Zhang, Yifei Zhang, Zelin Peng, Tianyang Zhang, Xu Tang, Licheng Jiao |
| 2025 | ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems. Chenxi Wang, Jizhan Fang, Xiang Chen, Bozhong Tian, Ziwen Xu, Huajun Chen, Ningyu Zhang |
| 2025 | AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences. Jieyu Li, Xin Zhang, Joey Tianyi Zhou |
| 2025 | AEMVC: Mitigate Imbalanced Embedding Space in Multi-view Clustering. Pengyuan Li, Man Liu, Dongxia Chang, Yiming Wang, Zisen Kong, Yao Zhao |
| 2025 | AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation. Qingqing Fang, Wenxi Lv, Qinliang Su |
| 2025 | AFFIR: Dual-Modal Attention Feature Fusion for Scene Text Image Retargeting. Gang Pan, Liming Pan, Hongze Mi, Rongyu Xiong, Jiahao Wang, Di Sun |
| 2025 | AI-Mediated Human Interaction. Shalini De Mello |
| 2025 | AI-based Multimedia Data Compression: Perception Utility Optimization and Standardization. Wei Gao, Ge Li |
| 2025 | AICL: Action In-Context Learning for Text-to-Video Generation. Jianzhi Liu, Junchen Zhu, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Jingkuan Song |
| 2025 | AIGC-Enhanced UAV-Based 3D Mapping and Trajectory Planning for Rapid Disaster Response. Xiaohang Zhang, Hui Gao, Bo Zhang, Xiao Chen, Kun Niu, Tan Yang, Wufan Wang, Wendong Wang |
| 2025 | AIQAM'25: The 2nd ACM Workshop on AI-powered Question Answering Systems for Multimedia. Tai Tan Mai, Allie Tran, Quang-Linh Tran, An Nguyen, Hoang Nguyen, Tho Quan, Duc-Tien Dang-Nguyen, Cathal Gurrin |
| 2025 | ALDEN: Dual-Level Disentanglement with Meta-learning for Generalizable Audio Deepfake Detection. Yuxiong Xu, Bin Li, Weixiang Li, Sara Mandelli, Viola Negroni, Sheng Li |
| 2025 | ALLM4ADD: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection. Hao Gu, Jiangyan Yi, Chenglong Wang, Jianhua Tao, Zheng Lian, Jiayi He, Yong Ren, Yujie Chen, Zhengqi Wen |
| 2025 | ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model. Wenshuo Chen, Kuimou Yu, Haozhe Jia, Kaishen Yuan, Zexu Huang, Bowen Tian, Songning Lai, Hongru Xiao, Erhang Zhang, Lei Wang, Yutao Yue |
| 2025 | APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech. Zhicheng Lian, Lizhi Wang, Hua Huang |
| 2025 | APP3DV'25: ACM Multimedia - International Workshop on Application-driven Point Cloud Processing and 3D Vision. Wei Gao, Sam Kwong, Zhu Li, Shan Liu, Ge Li |
| 2025 | ASTER: Adaptive Dynamic Layer-Skipping for Efficient Transformer Inference via Markov Decision Process. Fangxin Liu, Junjie Wang, Ning Yang, Zongwu Wang, Junping Zhao, Li Jiang, Haibing Guan |
| 2025 | AStF: Motion Style Tranfer via Adaptive Statistics Fusor. Hanmo Chen, Chenghao Xu, Jiexi Yan, Cheng Deng |
| 2025 | AU-IQA: A Benchmark Dataset for Perceptual Quality Assessment of AI-Enhanced User-Generated Content. Shushi Wang, Chunyi Li, Zicheng Zhang, Han Zhou, Wei Dong, Jun Chen, Guangtao Zhai, Xiaohong Liu |
| 2025 | AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations. Zhixi Cai, Kartik Kuckreja, Shreya Ghosh, Akanksha Chuchra, Muhammad Haris Khan, Usman Tariq, Tom Gedeon, Abhinav Dhall |
| 2025 | AV-DiT: Taming Image Diffusion Transformers for Efficient Joint Audio and Video Generation. Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian |
| 2025 | AV-RISE: Hierarchical Cross-Modal Denoising for Learning Robust Audio-Visual Speech Representation. Zhishuo Zhao, Yi Lin, Dongyue Guo, Junyu Fan |
| 2025 | Abstractive Visual Understanding of Multi-modal Structured Knowledge: A New Perspective for MLLM Evaluation. Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Min Zhang, Wen Zhang, Huajun Chen |
| 2025 | Accelerating Diffusion Models via Parallel Denoising. Yanming Chen, Zixin Ma, Chuanguang Yang, Zhulin An, Yiwen Zhang |
| 2025 | Accelerating Diffusion Transformer via Error-Optimized Cache. Junxiang Qiu, Shuo Wang, Jinda Lu, Lin Liu, Houcheng Jiang, Xingyu Zhu, Yanbin Hao |
| 2025 | Accelerating Long Video Understanding via Compressed Scene Graph-Enabled Chain-of-Thought. Tao Ling, Siping Shi, Dan Wang |
| 2025 | Action Unit Enhance Dynamic Facial Expression Recognition. Feng Liu, Lingna Gu, Chen Shi, Xiaolan Fu |
| 2025 | Activation Shape Matters: OOD Detection with Norm-Entropy Fusion. Jiawei Gu, Ziyue Qiao, Zechao Li |
| 2025 | Activation and Weight Distribution Balancing for Optimal Post-Training Quantization in Learned Image Compression. Jie Yu, Songping Mai, Peng Zhang, Yucheng Jiang, Jian Cheng |
| 2025 | AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings. Haoxuan Li, Wei Song, Aofan Liu, Peiwu Qin |
| 2025 | Adaptive Graph Attention-Guided Parallel Sampling and Embedded Selection for Multi-Model Fitting. Wenyu Yin, Shuyuan Lin, David Suter, Hanzi Wang |
| 2025 | Adaptive Neighbors and Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation with Noisy Labels. Yanting Pei, Fan Yang |
| 2025 | Adaptive Prompt Learning for Blind Image Quality Assessment with Multi-modal Mixed-datasets Training. Yan Zhong, Xinping Zhao, Li Zhang, Xinyuan Song, Tingting Jiang |
| 2025 | Adaspeaker: Learning Discriminative Speaker Representations with Gradient-Aware Adaptive Scaling. Jinghan Liu, Xingmei Wang, Jiaxiang Meng |
| 2025 | Addressing Granularity-induced Semantic Drift in OvOD via Graph-guided semantically consistent representation. Hongyan Xu, Zhongze Wu, Ang He, Xi Lin, Yi Chen, Xiu Su |
| 2025 | AdvPainting: Clean-text Jailbreaking Against Inpainting Models. Bingqian Zhou, Zhihao Wu, Yushi Cheng, Wenyuan Xu |
| 2025 | Advanced SpikingYOLOX: Extending Spiking Neural Network on Object Detection with Spike-based Partial Self-Attention and 2D-Spiking Transformer. Wei Miao, Jiangrong Shen, Hongming Xu, Tommi Kärkkäinen, Qi Xu, Yi Xu, Fengyu Cong |
| 2025 | Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset. Wentao Mo, Qingchao Chen, Yuxin Peng, Siyuan Huang, Yang Liu |
| 2025 | Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection. Jingyao Wang, Yiming Chen, Lingyu Si, Changwen Zheng |
| 2025 | Advancing Fashion Design Through Intelligent Sketchpad Studio. Nhu-Binh Nguyen Truc, Nhu-Vinh Hoang, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le |
| 2025 | Advancing Lung Cancer Diagnosis with eyonis® LCS. Benoit Huet |
| 2025 | Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations. Yiwen Liang, Hui Chen, Yizhe Xiong, Zihan Zhou, Mengyao Lyu, Zijia Lin, Shuaicheng Niu, Sicheng Zhao, Jungong Han, Guiguang Ding |
| 2025 | Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media (AADD-2025). Sebastiano Battiato, Mirko Casu, Francesco Guarnera, Luca Guarnera, Giovanni Puglisi, Orazio Pontorno, Claudio Vittorio Ragaglia, Zahid Akhtar |
| 2025 | AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation. Ruipu Wu, Yige Zhang, Jinyu Chen, Linjiang Huang, Shifeng Zhang, Xu Zhou, Liang Wang, Si Liu |
| 2025 | Affective-CoT: Decomposing Multimodal Emotion Reasoning through a Hierarchical Cognitive Workflow. Yuesheng Huang, Jinming Liu, Jiajia Chen, Yihang Lin, Yanmei Chen, Jianwei Dong |
| 2025 | Agent-MER: A Cognitive Agent with Hierarchical Deliberation for Open-Vocabulary Multimodal Emotion Recognition. Zhengqin Lai, Zhilin Zhu, Xiaopeng Hong, Yaowei Wang |
| 2025 | Agent-to-Agent (A2A) Protocol Integrated Digital Twin System with AgentIQ for Multimodal AI Fitness Coaching and Personalized Well-Being. Kamran Gholizadeh HamlAbadi, Monica (Monireh) Vahdati, Fedwa Laamarti, Abdulmotaleb El Saddik |
| 2025 | AirScape: An Aerial Generative World Model with Motion Controllability. Baining Zhao, Rongze Tang, Mingyuan Jia, Ziyou Wang, Fanhang Man, Xin Zhang, Yu Shang, Weichen Zhang, Wei Wu, Chen Gao, Xinlei Chen, Yong Li |
| 2025 | Ali-UI: Enhancing Complex Vision-Language Navigation with Alignment of Unified Map and Instruction Parsing. Shanshan Li, Jiawei Hou, Da Huang, Yanwei Fu, Xiangyang Xue |
| 2025 | Align 3D Representation and Text Embedding for 3D Content Personalization. Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan |
| 2025 | AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding. Yidan Wang, Chenyi Zhuang, Wutao Liu, Pan Gao, Nicu Sebe |
| 2025 | AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation. Jeongsoo Choi, Ji-Hoon Kim, Sung-Bin Kim, Tae-Hyun Oh, Joon Son Chung |
| 2025 | Aligned or Apart? Multi-Agent Insights into Consumer and Brand Messaging Discrepancies. Haotian Gan, Yudong Li, Wanyue Li, Weidong Tang |
| 2025 | Amplitude-aware Domain Style Replay for Lifelong Person Re-identification. Long Chen, De Cheng, Shizhou Zhang, Yinghui Xing, Di Xu, Yanning Zhang |
| 2025 | An Xide Xu, Sandesh Kamath, Muhammad Atif Butt, Bogdan Raducanu |
| 2025 | An Aesthetic Cultural Relic Poster Generation Framework Based on Multi-target Learning and Multimodal Large Language Model. Mohan Zhang, Qianqian Hu, Chuhan Li, Yanxiu Dan, Shenglan Cui, Fang Liu |
| 2025 | An Event-tailored State-Space Based Model for Pedestrian Detection. Liuyi Li, Feng Shi, Jian Wang, Jinjing Zhu, Wenze Shao |
| 2025 | An Innovative Industry Program on Multimedia in A New AI Era. Jianquan Liu, Balu Adsumilli, Yukiko Yanagawa, Haiwei Dong |
| 2025 | AnaFig: A Human-Aligned Dataset for Scientific Figure Analysis. Tan Yue, Xuzhao Shi, Rui Mao, Zilong Song, Zonghai Hu, Dongyan Zhao |
| 2025 | Analytic Continual Test-Time Adaptation for Multi-Modality Corruption. Yufei Zhang, Yicheng Xu, Hongxin Wei, Zhiping Lin, Xiaofeng Zou, Cen Chen, Huiping Zhuang |
| 2025 | Analytic Synaptic Dynamic Scaling Balancer for Multimodal Deepfake Continual Detection. Man Xiao, Jianbin Ye, Bo Liu, Zijian Gao, Kele Xu, Xiaodong Wang |
| 2025 | Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning. Angelos Vlachos, Giorgos Filandrianos, Maria Lymperaiou, Nikolaos Spanos, Ilias Mitsouras, Vasileios Karampinis, Athanasios Voulodimos |
| 2025 | Anatomical Region-Guided 3D PET/MR Tumor Segmentation via Medical Record. Tianming Xu, Tiantian Guo, Youdan Feng, Zihan Chen, Qiaoyi Xue, Lingzhi Hu, Yuhang Shi |
| 2025 | AnchorSync: Global Consistency Optimization for Long Video Editing. Zichi Liu, Yinggui Wang, Tao Wei, Chao Ma |
| 2025 | Anchoring Trends: Mitigating Social Media Popularity Prediction Drift via Feature Clustering and Expansion. Chia-Ming Lee, Bo-Cheng Qiu, Cheng-Jun Kang, Yi-Hsuan Wu, Jun-Lin Chen, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Chung Hsu |
| 2025 | Anchors Bring Stability and Efficiency: Fast Tensorial Multi-view Clustering on Shuffled Datasets. Jintian Ji, Songhe Feng |
| 2025 | AnimeColor: Reference-based Animation Colorization with Diffusion Transformers. Yuhong Zhang, Liyao Wang, Han Wang, Danni Wu, Zuzeng Lin, Feng Wang, Li Song |
| 2025 | AnomalyControl: Highly-Aligned Anomalous Image Generation with Controlled Diffusion Model. Yuanyi Duan, Wei Xu, Qinlong Wu, Guo-Sen Xie, Fang Zhao, Caifeng Shan |
| 2025 | AnyStyleDiffusion: Flexible Style Transfer with Consistent Content Adaptation Across Diffusion Models. Zhenyu Xu, Junjie Wu, Zhiyan Piao, Xiaoqi Sheng, Yu Xiao, Xinyu Zhang |
| 2025 | Anywhere Avatar: 3D Telepresence with Just a Phone and a Laptop. Ruifan Ji, Mingyuan Wu, Bo Chen, Michael Zink, Ramesh K. Sitaraman, Jacob Chakareski, Klara Nahrstedt |
| 2025 | Appearance Contrasts for Unconstrained Age Estimation. Jilong Wei, Yangyang Hu, Xiangjuan Wu, Yiqiang Wu, Hao Liu |
| 2025 | Arbitrary-scale Fusion Neural Operator. Junwei Zhu, Wei Li, Honghui Xu, Jiawei Jiang, Zhi Liu, Jianwei Zheng |
| 2025 | Are Synthetic Videos Useful? A Benchmark for Retrieval-Centric Evaluation of Synthetic Videos. Zecheng Zhao, Selena Song, Tong Chen, Zhi Chen, Shazia Sadiq, Yadan Luo |
| 2025 | Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes? Yang Yao, Lingyu Li, Jiaxin Song, Chiyu Chen, Zhenqi He, Yixu Wang, Xin Wang, Tianle Gu, Jie Li, Yan Teng, Yingchun Wang |
| 2025 | Art4Math: Handwritten Mathematical Expression Recognition via Multimodal Sketch Grounding. Yang Zhou, Jin Wang, Yuxiao Zhang, Kaixiang Huang, Guodong Lu, Jingru Yang, Shengfeng He |
| 2025 | ArtFRD: A Fisher-Rao Mixture Metric for Generative Model Aesthetic Evaluation. Chuanwei Huang, Zexi Jia, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Jinchao Zhang, Jie Zhou |
| 2025 | ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding. Shuai Wang, Ivona Najdenkoska, Hongyi Zhu, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring |
| 2025 | Assessing Personality Traits and Interview Performance from Asynchronous Video Interviews. Tianyi Zhang, Tianhua Qi, Antonis Koutsoumpis, Yuan Zong, Wenming Zheng, Janneke K. Oostrom, Djurre Holtrop, Zhaojie Luo, Reinout E. de Vries |
| 2025 | Asymmetric Pre-aligned Anchor Contrastive Enhanced Diffusion Hashing Model for Incomplete Multimodal Retrieval. Yang Yu, MeiYu Liang, Wei Huang, Juncheng Zheng, Kangkang Lu, Yawen Li, Junping Du, Zhe Xue, Wu Liu |
| 2025 | AtlantisGS: Underwater Sparse-View Scene Reconstruction via Gaussian Splatting. Jingjun Yi, Qi Bi, Hao Zheng, Huimin Huang, Haolan Zhan, Yixian Shen, Wei Ji, Yawen Huang, Yuexiang Li, Xian Wu, Yefeng Zheng |
| 2025 | AttriPrompt: Dynamic Prompt Composition Learning for CLIP. Qiqi Zhan, Shiwei Li, Qingjie Liu, Yunhong Wang |
| 2025 | Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment Retrieval. Junan Lin, Daizong Liu, Xianke Chen, Xiaoye Qu, Xun Yang, Jixiang Zhu, Sanyuan Zhang, Jianfeng Dong |
| 2025 | Audio-Visual Asynchrony Mitigation: Cross-Modal Alignment and Feature Reconstruction for Deepfake Detection. Yan Wang, Qindong Sun, Dongzhu Rong |
| 2025 | AudioAtlas: A Comprehensive and Balanced Benchmark Towards Movie-Oriented Text-to-Audio Generation. Chenxi Wang, Yusheng Dai, Lei Sun, Jun Du, Jianqing Gao |
| 2025 | AudioFab: Building A General and Intelligent Audio Factory through Tool Learning. Cheng Zhu, Jing Han, Qianshuai Xue, Kehan Wang, Huan Zhao, Zixing Zhang |
| 2025 | AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation. Yan Rong, Jinting Wang, Guangzhi Lei, Shan Yang, Li Liu |
| 2025 | AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation. Yulin Sun, Qisheng Xu, Yi Su, Qian Zhu, Yong Dou, Xinwang Liu, Kele Xu |
| 2025 | AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior. Guoqiang Liang, Qingnan Fan, Bingtao Fu, Jinwei Chen, Hong Gu, Lin Wang |
| 2025 | Automatic Accessible Multimodal Translation of Graphics Using A Refreshable Pin Array. Seung-gyeom Kim, Areum Kim, Eunchae Kim, Minho Chung, Yongjae Yoo |
| 2025 | B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding. Changho Choi, Youngwoo Shin, Gyojin Han, Dong-Jae Lee, Junmo Kim |
| 2025 | BAC-GCN: Background-Aware CLIP-GCN Framework for Unsupervised Multi-Label Classification. Yonghyeon Jo, Janghyun Kim, Jinsun Park |
| 2025 | BAPEN: Towards Versatile Audio Phase Retrieval. Lingling Dai, Andong Li, Zhe Han, Chengshi Zheng, Xiaodong Li |
| 2025 | BEAM: Bridging Physically-based Rendering and Gaussian Modeling for Relightable Volumetric Video. Yu Hong, Yize Wu, Zhehao Shen, Chengcheng Guo, Yuheng Jiang, Yingliang Zhang, Qiang Hu, Jingyi Yu, Lan Xu |
| 2025 | BIMCompNet: Multimodal Dataset for Geometric Deep Learning in Building Information Model. Mingsong Yang, Xinhong Hei, Kehai Chen, Haining Meng, Haoyang Dong, Qin Zhao |
| 2025 | BOLT: Fewer Tokens but More Performance Retention for Efficient Vision-Language Models Inference. Jiahua Bao, Siyao Cheng, Jiaxing Du, Changjiang He, Zeming Lang, Hao Zhang, Jie Liu |
| 2025 | BS3: Bézier Slicing Middleware for 3D Mesh LOD Optimization. Lehao Lin, Baohua Fang, Ziheng Sun, Ke Wang, Hong Kang, Wei Cai |
| 2025 | BSGS: Bi-Stage 3D Gaussian Splatting for Camera Motion Deblurring. An Zhao, Piaopiao Yu, Zhe Zhu, Mingqiang Wei |
| 2025 | BTUAP: Boosting the Transferability of Universal Adversarial Perturbations in the Black-box Setting under various data dependencies. Jie Wan, Jianhao Fu, Ziqi Yang, Kui Ren |
| 2025 | BadMDA: Towards Backdoor Injection during Domain Adaptation to Collapse Multi-Agent Perception. Tong Chen, Bowen Du, Jiejie Zhao, Hanyang Xia, Haiquan Wang, Jiakai Wang |
| 2025 | Balanced Multiple Kernel Clustering with Discrete Partition Entropy Auto Regularization. Yan Chen, Bingbing Jiang, Peng Zhou, Lei Duan, Yuhua Qian, Liang Du |
| 2025 | Balancing Cross-Modal Attention for Generalized Zero-Shot Learning. Zhijie Rao, Jingcai Guo |
| 2025 | Behave Your Motion: Habit-preserved Cross-category Animal Motion Transfer. Zhimin Zhang, Bi'an Du, Caoyuan Ma, Zheng Wang, Wei Hu |
| 2025 | Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts. Zhenghao Liu, Xingsheng Zhu, Tianshuo Zhou, Xinyi Zhang, Xiaoyuan Yi, Yukun Yan, Ge Yu, Maosong Sun |
| 2025 | Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning. Zhiyuan Han, Beier Zhu, Yanlong Xu, Peipei Song, Xun Yang |
| 2025 | Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations. Jinjie Shen, Yaxiong Wang, Lechao Cheng, Nan Pu, Zhun Zhong |
| 2025 | Beyond Emotion Recognition: A Multi-Turn Multimodal Emotion Understanding and Reasoning Benchmark. Jinpeng Hu, Hongchang Shi, Chongyuan Dai, Zhuo Li, Peipei Song, Meng Wang |
| 2025 | Beyond Equal Views: Strength-Adaptive Evidential Multi-View Learning. Cai Xu, Ziqi Wen, Jie Zhao, Wanqing Zhao, Jinlong Yu, Haishun Chen, Ziyu Guan, Wei Zhao |
| 2025 | Beyond Interpretability: Exploring the Comprehensibility of Adaptive Video Streaming through Large Language Models. Lianchen Jia, Chaoyang Li, Ziqi Yuan, Jiahui Chen, Tianchi Huang, Jiangchuan Liu, Lifeng Sun |
| 2025 | Beyond Snapshots: A Multimodal User-Level Dataset for Depression Detection in Dynamic Social Media Streams. Bichen Wang, Yixin Sun, Yanyan Zhao, Bing Qin |
| 2025 | Beyond Sparse Keypoints: Dense Pose Modeling for Robust Gait Recognition. Wenpeng Lang, Saihui Hou, Yongzhen Huang |
| 2025 | Beyond Technical Failures: Multimodal Time-Series Modelling for Detecting Social Breakdowns and User Repair Attempts in Human-Robot Interaction. Rutherford Agbeshi Patamia, Ha Pham Thien Dinh, Ming Liu, Akansel Cosgun |
| 2025 | Beyond Visual Quality: Fidelity-Oriented Diffusion Model for Real-world Image Super-Resolution. Zhenxuan Fang, Shuaibo Wang, Weisheng Dong, Junwei Xu, Fangfang Wu, Xin Li, Guangming Shi |
| 2025 | Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset. Ruixu Zhang, Yuran Wang, Xinyi Hu, Chaoyu Mai, Wenxuan Liu, Danni Xu, Xian Zhong, Zheng Wang |
| 2025 | Bi-Orthogonal Non-negative Tensor tri-Factorization for Tensorized Label Learning. Rui Wang, Yuxuan Liu, Guangyu Yang, Quanxue Gao, Cheng Deng |
| 2025 | BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression. Wei Jiang, Junru Li, Kai Zhang, Li Zhang |
| 2025 | BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance. Huy Le, Nhat Chung, Tung Kieu, Anh Nguyen, Ngan Le |
| 2025 | BiOMamba: Mamba-based Forward-Then-Backward Temporal Modeling for Online Action Detection and Anticipation. Sensen Wang, Yuehu Liu, Chi Zhang |
| 2025 | Bimodal Debiasing for Text-to-Image Diffusion: Adaptive Guidance in Textual and Visual Spaces. Liu Yu, Jiajun Sun, Ping Kuang, Rui Zhou, Fan Zhou, Zhikun Feng |
| 2025 | Boosting Chart-to-Code Generation in MLLM via Dual Preference-Guided Refinement. Zhihan Zhang, Yixin Cao, Lizi Liao |
| 2025 | Boosting Guided Diffusion with Large Language Models for Multimodal Sequential Recommendation. Te Song, Lianyong Qi, Weiming Liu, Fan Wang, Xiaolong Xu, Hongsheng Hu, Yang Cao, Xuyun Zhang, Amin Beheshti |
| 2025 | Boosting Micro-Expression Analysis via Prior-Guided Video-Level Regression. Zizheng Guo, Bochao Zou, Yinuo Jia, Xiangyu Li, Huimin Ma |
| 2025 | Boosting Multi-Modal Alignment: Geometric Feature Separation for Class Incremental Learning. Guoqiang Liang, Chuan Qin, De Cheng, Shizhou Zhang, Yanning Zhang |
| 2025 | Boosting Single-Domain Generalized Object Detection via Vision-Language Knowledge Interaction. Xiaoran Xu, Jiangang Yang, Wenyue Chong, Wenhui Shi, Shichu Sun, Jing Xing, Jian Liu |
| 2025 | Boosting Temporal Sentence Grounding via Causal Inference. Kefan Tang, Lihuo He, Jisheng Dang, Xinbo Gao |
| 2025 | BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance Segmentation. Jinxiang Lai, Wenlong Wu, Jiawei Zhan, Jian Li, Bin-Bin Gao, Jun Liu, Jie Zhang, Song Guo |
| 2025 | BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings. Dongyang Li, Haoyang Qin, Mingyang Wu, Chen Wei, Quanying Liu |
| 2025 | BrainSegDMIF: A Dynamic Fusion-enhanced SAM for Brain Lesion Segmentation. Hongming Wang, Yifeng Wu, Huimin Huang, Hongtao Wu, Jiaxuan Jiang, Xiaodong Zhang, Hao Zheng, Yawen Huang, Xian Wu, Yefeng Zheng, Jinping Xu, Jing Cheng |
| 2025 | Breaking Semantic Barriers: A Zero-Shot Generalized Framework for Graph Anomaly Detection. Xiangping Zheng, Xuan Feng, Bo Wu, Bin Ren, Wei Li, Xiuxin Hao, Xun Liang, Bin Tang, Zhiwen Yu |
| 2025 | Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs. Tiancheng Gu, Kaicheng Yang, Ziyong Feng, Xingjun Wang, Yanzhao Zhang, Dingkun Long, Yingda Chen, Weidong Cai, Jiankang Deng |
| 2025 | Breaking the Spatial-Temporal Consistency Constraint: Towards Reference-Based Hyperspectral Image Super-Resolution. Xuyao Liu, Jiahui Qu, Wenqian Dong |
| 2025 | Breaking the Synthetic Barrier: Towards Stable and Generalizable Real-World Image Dehazing. Zhuo Su, Jufeng Li, Yan Zhang, Xin Li, Fuwei Zhang, Yuxin Feng, Fan Zhou |
| 2025 | BridgeGLM: Bridging Graph and Language Spaces for Domain Generalization. Jiaxing Qi, Yifan Xu, Zhifei Yang, Ruifei Ma, Chao Zhang, Kuifei Yu |
| 2025 | BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection. An Xiang, Zixuan Huang, Xitong Gao, Kejiang Ye, Cheng-Zhong Xu |
| 2025 | Bridging Domains in Mental Stress Assessment via Retrieval-Augmented Reasoning. Yi Dai, Yang Ding, Kaisheng Zeng |
| 2025 | Bridging Inter-Class Ambiguity and Spatial Variability in Flexible Object Recognition via Graph Distillation. Lin Zuo, Kunshan Yang, Mengmeng Jing, Xiangxu Zhao, Jiaqiao Chen |
| 2025 | Bridging the Gap: Consistent Image Outpainting via Training-Free Noise Optimization. Na Li, Zihao Li, Zuoli Tang, Yuqing Yu, Lixin Zou, Chenliang Li |
| 2025 | Bridging the Lab and the Wild: Behavioral Experiments as a Pathway to QoE Research Closer to Realistic Environment. Dominika Wanat, Dawid Juszka, Mikolaj Leszczuk, Lucjan Janowski |
| 2025 | Bridging the Unseen Gap: Label-Enhanced Information Bottleneck Distillation for Multimodal Named Entity Recognition. Bo Xu, Jie Wei, Hongya Wang, Ming Du, Hui Song, Yanghua Xiao |
| 2025 | Bright to Dark: Stage-wise Bilevel Knowledge Transfer for Seeing Text in the Dark. Chengpei Xu, Wenhao Zhou, Long Ma, Weimin Wang, Feng Xia, Binghao Li, Wenjie Zhang |
| 2025 | Bring the VibeOn: Designing a Multimodal Interface for Shared Emotional Experiences in Live-streamed Concerts. Gyeongjin Kim, Sebin Lee, Daye Kim, Jungjin Lee, Minju Kim |
| 2025 | BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos. Jiahao Lin, Weixuan Peng, Bojia Zi, Yifeng Gao, Xianbiao Qi, Xingjun Ma, Yu-Gang Jiang |
| 2025 | Building Embodied EvoAgent: A Brain-inspired Paradigm for Bridging Multimodal Large Models and World Models. Junyu Gao, Xuan Yao, Yong Rui, Changsheng Xu |
| 2025 | Butter: Frequency Consistency and Hierarchical Fusion for Autonomous Driving Object Detection. Xiaojian Lin, Wenxin Zhang, Yuchu Jiang, Wangyu Wu, Yiran Guo, Kangxu Wang, Zongzheng Zhang, Guijin Wang, Lei Jin, Hao Zhao |
| 2025 | CADQ: Attribute-Consistent Face Cartoonization with Cross-modal Aligned and Deformable Quantization. Yongjie Hu, Yifan Jiang, Ziyun Li, Fei Gao, Henrik Boström, Nannan Wang |
| 2025 | CCDb+: Enhanced Annotations and Multi-Modal Benchmark for Natural Dyadic Conversations. Yang Deng, Yu-Kun Lai, Paul L. Rosin |
| 2025 | CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation. Yuanhong Chen, Kazuki Shimada, Christian Simon, Yukara Ikemiya, Takashi Shibuya, Yuki Mitsufuji |
| 2025 | CDIB: Consistency Discovery-guided Information Bottleneck for Multi-modal Knowledge Graph Reasoning. Haichuan Fang, Haoran Zhang, Yulin Du, Qiang Guo, Zhen Tian, Youwei Wang, Yangdong Ye |
| 2025 | CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors. Jiahuan Long, Wen Yao, Tingsong Jiang, Jiacheng Hou, Shuai Jia, Junqi Wu, Xiaoya Zhang, Xiaohu Zheng, Chao Ma |
| 2025 | CEARI: Co-Evolutionary Agents for Reassembling and Inpainting Puzzles with Gaps and Missing Pieces. Xingke Song, Jianxu Shangguan, Yiran Li, Jialu Zhang, Jianfeng Ren, Ruibin Bai, Xin Chen, Xudong Jiang |
| 2025 | CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point Clouds. Jiaxu Li, Rui Li, Jianyu Qi, Songning Lai, Linpu Lv, Kejia Fan, Jianheng Tang, Yutao Yue, Dongzhan Zhou, Yunhuai Liu, Huiping Zhuang |
| 2025 | CGCOD: Class-Guided Camouflaged Object Detection. Chenxi Zhang, Qing Zhang, Jiayun Wu, Youwei Pang |
| 2025 | CH-SV: A Benchmark for Multi-Type Chinese Harmful Short Video Detection. Linlin Zong, Shilin Sui, Wenjun Liang, Wanyu Song, Linlin Tian, Xinyue Liu, Xianchao Zhang, Bo Xu |
| 2025 | CHORD: Customizing Hybrid-precision On-device Model for Sequential Recommendation with Device-cloud Collaboration. Tianqi Liu, Kairui Fu, Shengyu Zhang, Wenyan Fan, Zhaocheng Du, Jieming Zhu, Fan Wu, Fei Wu |
| 2025 | CIA: Class- and Instance-aware Adaptation for Vision-Language Models. Lin Peng, Cong Wan, Shaokun Wang, Xiang Song, Yuhang He, Yihong Gong |
| 2025 | CITR: Efficient Long Video Understanding Needs Causal Importance. Ziqi Yuan, Jun Li, Yanghao Li, Yuxiang Huang, Chi Chen, Shuo Wang, Zhinan Gou |
| 2025 | CLIP-6D: Empowering CLIP as a Zero-Shot 6D Pose Estimator Through Generalizable Object-Specific Representations. Hua Wang, Hong Liu, JiaLe Ren, Mingxin Tan, Zhongzien Jiang |
| 2025 | CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation. Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang |
| 2025 | CLIP-HNet: Hybrid Network with Cross-Modal Guidance for Self-Supervised Remote Sensing Dehazing. Shan Wang, Weisi Lin, Yun Liu, Libao Zhang |
| 2025 | CLIP-MT: Multi-Modal Knowledge-Driven Adaptive Scale Feature Allocation for Multi-Task Dense Prediction. Shalayiding Sirejiding, Yue Ding, Yuxiang Lu, Xinyi Hou, Shaokai Wu, Qichen He, Chunlin Wang, Wenqiang Guo, Hongtao Lu |
| 2025 | CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework. Wentao Wu, Xiao Wang, Chenglong Li, Bo Jiang, Jin Tang, Bin Luo, Qi Liu |
| 2025 | CMA-VC: Large Vision-Language Model for Cross-Modal Alignment in Intention-Oriented Video Captioning. Jun Yu, Xilong Lu, Yunxiang Zhang, Qiang Ling |
| 2025 | CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models. Wentao Liu, Qianjun Pan, Yi Zhang, Zhuo Liu, Ji Wu, Jie Zhou, Aimin Zhou, Qin Chen, Bo Jiang, Liang He |
| 2025 | CODE: Towards Partial Label Graph Learning via Coupled Dual Separation. Yiyang Gu, Taian Guo, Hang Zhou, Zihao Chen, Zhiping Xiao, Yifang Qin, Xiao Luo, Wei Ju, Yifan Wang, Ming Zhang |
| 2025 | CP3: Customizable 3D Pop-Out Effect Creation for Immersive Content Using Multimodal Models. Zezhou Chen, Ping Chen, Huan Hu, Xiang Liu, Zipeng Wang, Zhaoxiang Liu, Kai Wang, Shiguo Lian |
| 2025 | CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation. Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge |
| 2025 | CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs. Jianting Tang, Yubo Wang, Haoyu Cao, Linli Xu |
| 2025 | CReLeRI: Explainable, Concept-centric, Representation, Learning, Reasoning, and Interaction Video Analysis System. Michael Francis Perez, Yichi Yang, Yuheng Zha, Enze Ma, Danish Nisar Ahmed Tamboli, Haodi Ma, Reza Shahriari, Vyom Pathak, Dzmitry Kasinets, Rohith Venkatakrishnan, Daisy Zhe Wang, Jaime Ruiz, Eric D. Ragan, Zhiting Hu, Eric P. Xing, Jun-Yan Zhu |
| 2025 | CSDN: CLIP-Driven Similarity-Aligned Distillation Network for Weakly-Supervised Object Localization. Sifan Zuo, Youfa Liu, Bo Du |
| 2025 | CWCP: Generalizing Virtual Reality to Real World with Contextual-Weather Correlation Pairing for Deraining and Desnowing. Yuwu Lu, Chunzhi Liu, Yihan Yang |
| 2025 | CaDGS: Modeling Inter-Gaussian Mutual Information for Dynamic Novel View Synthesis. Yunlong Zhao, Xiaoheng Deng, Zhuohua Qiu, Feng Yang, Chang Xu, Xiangjian He, Shan You, Xiu Su |
| 2025 | CalibCLIP: Contextual Calibration of Dominant Semantics for Text-Driven Image Retrieval. Bin Kang, Bin Chen, Junjie Wang, Yulin Li, Junzhi Zhao, Junle Wang, Zhuotao Tian |
| 2025 | CalibWorkflow: A General MLLM-Guided Workflow for Centimeter-Level Cross-Sensor Calibration. Xingchen Li, Wuyang Zhang, Guoliang You, Xiaomeng Chu, Wenhao Yu, Yifan Duan, Yuxuan Xiao, Yanyong Zhang |
| 2025 | Cam-Bench: A Benchmark for Image-based Camera Parameter Estimation. Quanhong Peng, Dan Zhang, Dong Zhao, Jianpeng Zhang, Meihua Song, Chenlei Lv |
| 2025 | Camera-Specific Imaging Simulation for Raw Domain Image Super Resolution. Xiaobo Liu, Henglu Wei, Chuxi Yang, Wei Yu, Xudong Zhao, Xiangyang Ji |
| 2025 | Camouflaged Object Tracking: A Benchmark. Xiaoyu Guo, Pengzhi Zhong, Hao Zhang, Defeng Huang, Huikai Shao, Qijun Zhao, Shuiwang Li |
| 2025 | Can Audio Language Models Listen Between the Lines? A Study on Metaphorical Reasoning via Unspoken. Hongru Xiao, Xiang Li, Duyi Pan, Longfei Zhang, ZhixueSong ZhixueSong, Jiale Han, Songning Lai, Wenshuo Chen, Jing Tang, Benyou Wang |
| 2025 | Can I Trust You? Advancing GUI Task Automation with Action Trust Score. Haiyang Mei, Difei Gao, Xiaopeng Wei, Xin Yang, Mike Zheng Shou |
| 2025 | Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection. Tairan Huang, Yili Wang, Qiutong Li, Changlong He, Jianliang Gao |
| 2025 | Can Person-Level Attributes Improve Group Re-Identification? Kamakshya Prasad Nayak, Kamalakar Vijay Thakare, Ashesh Xalxo, Lalit Lohani, Debi Prosad Dogra |
| 2025 | CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models. Kedong Xiu, Sai Qian Zhang |
| 2025 | Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting Verification. Peirong Zhang, Kai Ding, Lianwen Jin |
| 2025 | Casual3DHDR: High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos. Shucheng Gong, Lingzhe Zhao, Wenpu Li, Hong Xie, Yin Zhang, Shiyu Zhao, Peidong Liu |
| 2025 | CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation. Hyunwoo Oh, SeungJu Cha, Kwanyoung Lee, Si-Woo Kim, Dong-Jin Kim |
| 2025 | Category-Aware 3D Object Composition with Disentangled Texture and Shape Multi-view Diffusion. Zeren Xiong, Zikun Chen, Zedong Zhang, Xiang Li, Ying Tai, Jian Yang, Jun Li |
| 2025 | CauRDG: Enhancing Domain Generalization with Causal-Driven Semantic Consistency Reasoning. Zongxin Liu, Yishu Liu, Guangming Lu, Xiaoling Luo, Bingzhi Chen |
| 2025 | CausalCtrl: Causality-Aware Control Framework for Text-Guided Visual Editing. Haoxiang Cao, Chaoqun Wang, Yongwen Lai, Shaobo Min, Xuejin Chen |
| 2025 | CausalMVC: Causal Content-Style Representation Learning for Deep Multi-View Clustering. Shifeng Bao, Zhe Xue, Qi Chen, Shilong Ou, Amin Beheshti, Quan Z. Sheng, Anton van den Hengel, Yuankai Qi |
| 2025 | Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation. Xinshu Li, Ruoyu Wang, Erdun Gao, Mingming Gong, Lina Yao |
| 2025 | Cause and Effect: Video Social Relationship Recognition from Causal Perspective. Yuxuan Zhang, Bo Wang, Yu Du, Yangfu Zhu, Haorui Wang, Guangyao Su, Tao Zhou, Bin Wu |
| 2025 | Chain-of-Cooking: Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance. Mengling Xu, Ming Tao, Bing-Kun Bao |
| 2025 | Chain-of-Thought Guided Semantic Debiasing for Low-Shot Vision-Language Tasks. Biao Chen, Kunbin He, Zhikun Zheng, Mengmeng Jing, Lin Zuo |
| 2025 | Challenging Cases of Neural Image Compression: A Dataset of Visually Compelling Yet Semantically Incorrect Reconstructions. Nora Hofer, Rainer Böhme |
| 2025 | Change-UP: Advancing Visualization and Inference Capability for Multi-level Remote Sensing Change Interpretation. Mo Yang, Luo Chen, Jiali Zhou |
| 2025 | Character-Centric Understanding of Animated Movies. Zhongrui Gui, Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman |
| 2025 | Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts. Xiangnan Chen, Yuancheng Fang, Juncheng Li, Qian Xiao, Jun Lin, Siliang Tang, Yueting Zhuang |
| 2025 | ChartM Donglu Yang, Liang Zhang, Zihao Yue, Liangyu Chen, Yichen Xu, Wenxuan Wang, Qin Jin |
| 2025 | CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale. Xiao Liang, Jiawei Hu, Di Wang, Zhi Ma, Lin Zhao, Ronghan Li, Bo Wan, Quan Wang |
| 2025 | Choose Your Expert: Uncertainty-Guided Expert Selection for Continual Deepfake Detection. Xueyi Zhang, Peiyin Zhu, Jinping Sui, Xiaoda Yang, Jiahe Tian, Mingrui Lao, Siqi Cai, Yanming Guo, Jun Tang |
| 2025 | ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion. Xuanchen Wang, Heng Wang, Weidong Cai |
| 2025 | City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning. Penglei Sun, Yaoxian Song, Xiangru Zhu, Xiang Liu, Qiang Wang, Yue Liu, Changqun Xia, Tiefeng Li, Yang Yang, Xiaowen Chu |
| 2025 | CitySculpt: 3D City Generation from Satellite Imagery with UV Diffusion. Xingbo Yao, Xuanmin Wang, Hui Xiong |
| 2025 | Client-Server Co-design with Multi-modal Codebooks Makes Better and Faster Federate Knowledge Sharing. Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Lei Liang, Wen Zhang, Huajun Chen |
| 2025 | Closing the Feedback Loop in Text2Vis: Refining Visualization with Vision-Language Models. Shengze Shi, Tao Ren, Guoliang Zhu, Guan Dong Feng, Jun Hu |
| 2025 | Cluster-Aware Contrastive Multi-View Clustering Based on Masked Views. Penglei Wang, Ziming Quan, Danyang Wu, Jin Xu |
| 2025 | Clustering-Based Tail-class Mitigation for New-class Discovery. Zelei Wu, Xulun Ye, Jieyu Zhao |
| 2025 | Clustering-Oriented Generative Attribute Graph Imputation. Mulin Chen, Bocheng Wang, Jiaxin Zhong, Zongcheng Miao, Xuelong Li |
| 2025 | CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis. Aravindan Kamatchi Sundaram, Ujjayan Pal, Abhimanyu Chauhan, Aishwarya Agarwal, Srikrishna Karanam |
| 2025 | CoFi-Dec: Hallucination-Resistant Decoding via Coarse-to-Fine Generative Feedback in Large Vision-Language Models. Zongsheng Cao, Yangfan He, Anran Liu, Jun Xie, Zhepeng Wang, Feng Chen |
| 2025 | CoFiVLA: Synergistic Coarse-Fine Vision-Language Alignment for Image Aesthetic Assessment. Yuzhen Niu, Siling Chen, Yuzhong Chen, Fusheng Li, Rui Xu, Hui Da |
| 2025 | CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection. Guankun Wang, Han Xiao, Renrui Zhang, Huxin Gao, Long Bai, Xiaoxiao Yang, Zhen Li, Hongsheng Li, Hongliang Ren |
| 2025 | CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model. Ruohao Zhan, Yijin Li, Yisheng He, Shuo Chen, Yichen Shen, Xinyu Chen, Zilong Dong, Zhaoyang Huang, Guofeng Zhang |
| 2025 | Coding-Prior Guided Diffusion Network for Video Deblurring. Yike Liu, Jianhui Zhang, Haipeng Li, Shuaicheng Liu, Bing Zeng |
| 2025 | CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking. Yuehao Huang, Liang Liu, Shuangming Lei, Yukai Ma, Hao Su, Jianbiao Mei, Pengxiang Zhao, Yaqing Gu, Yong Liu, Jiajun Lv |
| 2025 | CogMAEC'25: The 1st Workshop on Cognition-oriented Multimodal Affective and Empathetic Computing. Hao Fei, Bobo Li, Meng Luo, Qian Liu, Lizi Liao, Fei Li, Min Zhang, Björn W. Schuller, Mong-Li Lee, Erik Cambria |
| 2025 | Cognitive Predictive Coding Network: Rethinking the Generalization in Raven's Progressive Matrices. Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Jun Liu |
| 2025 | CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition. Kaixing Yang, Xulong Tang, Haoyu Wu, Biao Qin, Hongyan Liu, Jun He, Zhaoxin Fan |
| 2025 | Collaboration Wins More: Dual-Modal Collaborative Attention Reinforcement for Mitigating Large Vision Language Models Hallucination. Jiye Xie, Yifei Gao, Liangliang You, Xiang Xu, Haoran Xu, Zhiqiang Kou, Kexue Fu, Youyang Qu, Wenjie Yang, Jianwei Guo, Weiliang Meng, Longxiang Gao, Haoran Yang, Changwei Wang, Yu Zhang |
| 2025 | Collaborative Cloud-edge Generalized Category Discovery. Yingbing Liu, Fei Ma, Yanan Wu, Xinxin Zuo, Fan Zhang, Yang Wang |
| 2025 | Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation. Sung-Lin Tsai, Bo-Lun Huang, Yu-Ting Shen, Cheng-Yu Yeo, Chiang Tseng, Bo-Kai Ruan, Wen-Sheng Lien, Hong-Han Shuai |
| 2025 | ColorDiffuser: Video Colorization with Pretrained Text-to-Image Diffusion Models. Hanyuan Liu, Minshan Xie, Jinbo Xing, Chengze li, Chi-Sing Leung, Tien-Tsin Wong |
| 2025 | Combating Online Misinformation Videos: Characterization, Detection, and Prevention. Qiang Sheng, Peng Qi, Tianyun Yang, Yuyan Bu, Wynne Hsu, Mong-Li Lee, Juan Cao |
| 2025 | Combatting Data Imbalance and Noise in Micro-Action Recognition. Chuang Wang, Weidong Chen, Xu Cui, Yiming Zhao, Zhaobo Qi, Pengqi Huang, Xinyan Liu, Weigang Zhang |
| 2025 | ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies. Chenglin Wang, Yucheng Zhou, Qianning Wang, Zhe Wang, Kai Zhang |
| 2025 | Compliance Rating Scheme: A Data Provenance Framework for Generative AI Datasets. Matyas Bohacek, Ignacio Vilanova Echavarri |
| 2025 | Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training. Yun Li, Lina Yao, Zhe Liu |
| 2025 | Compositional Zero-shot Learning via Progressive Language-based Observations. Lin Li, Guikun Chen, Zhen Wang, Jun Xiao, Long Chen |
| 2025 | Compressed Feature Quality Assessment: Dataset and Baselines. Changsheng Gao, Wei Zhou, Guosheng Lin, Weisi Lin |
| 2025 | Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching. Zhixin Zheng, Xinyu Wang, Chang Zou, Shaobo Wang, Linfeng Zhang |
| 2025 | Conducting Conditional Diffusion by Estimating the Mean Vector of von Mises-Fisher Distribution. Longquan Dai, He Wang, Xiaolu Wei, Shaomeng Wang, Jinhui Tang |
| 2025 | Configuring Dynamic Multi-Stage Serverless Pipelines for Video Processing with Minimal Profiling Overhead. Jiaye Zhang, Hongyi Wang, Peiru Yang, Zili Meng, Mingwei Xu |
| 2025 | Conflict-Buffering Optimization by Symmetry Teleportation for Deep Long-Tailed Recognition. Mianzimei Yang, Zhipeng Zhou, Jin Zhang, Yuanhao Pu, Hong Xie, Defu Lian |
| 2025 | Congestion Control for VR Cloud Gaming: Integration and Comparison in Real VR Gaming Environment. Ahmad Alhilal, Ze Wu, Teemu Kämäräinen, Tristan Braud, Matti Siekkinen |
| 2025 | Consistency of Local and Global Flatness for Federated Learning. Junkang Liu, Fanhua Shang, Yuxuan Tian, Hongying Liu, Yuanyuan Liu |
| 2025 | Consistent and Invariant Generalization Learning for Short-video Misinformation Detection. Hanghui Guo, Weijie Shi, Mengze Li, Juncheng Li, Hao Chen, Yue Cui, Jiajie Xu, Jia Zhu, Jiawei Shen, Zhangze Chen, Sirui Han |
| 2025 | Context-aware Image-to-Music Generation via Bridging Modalities through Musical Captions. Shilin Liu, Kyohei Kamikawa, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama |
| 2025 | Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation. Pinxin Liu, Pengfei Zhang, Hyeongwoo Kim, Pablo Garrido, Ari Shapiro, Kyle Olszewski |
| 2025 | Contextually-Guided State Space Fusion for Misaligned Multi-Spectral Object Detection. Guyue Jin, Tianming Zhao, Jiacan Yan, Tian Tian |
| 2025 | Contrastive Lie Algebra Learning for Ultra-Fine-Grained Visual Categorization. Xiaohan Yu, Zicheng Pan, Yang Zhao, Qin Zhang, Yongsheng Gao |
| 2025 | Contrastive Prototype Framework for Calibrating Video Recommendation. Fan Li, Jiazhen Huang, Shisong Tang, Bing Han, Huafeng Cao, Haochen Sui, Ting Xu, Xiaoyu Kang |
| 2025 | Contrastive Regularization over LoRA for Multimodal Biomedical Image Incremental Learning. Haojie Zhang, Yixiong Liang, Hulin Kuang, Lihui Cen, Zhe Qu, Yigang Cen, Min Zeng, Shichao Kan |
| 2025 | Controllable Video-to-Music Generation with Multiple Time-Varying Conditions. Junxian Wu, Weitao You, Heda Zuo, Dengming Zhang, Pei Chen, Lingyun Sun |
| 2025 | CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation. Ruoxuan Zhang, Bin Wen, Hongxia Xie, Yi Yao, Songhan Zuo, Jian-Yu Jiang-Lin, Hong-Han Shuai, Wen-Huang Cheng |
| 2025 | CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models. Shunchang Liu, Zhuan Shi, Lingjuan Lyu, Yaochu Jin, Boi Faltings |
| 2025 | CorrNeXt: Making the ConvNet-Style Correspondence Pruner Stronger for Two-View Geometry. Zizhuo Li, Chunbao Su, Fan Fan, Jun Huang, Jiayi Ma |
| 2025 | CounterHelp: Promoting Online Civil Courage Among Young People Through AI-Generated Counterspeech. Andreas Babic, Xihui Chen, Djordje Slijepcevic, Adrian Jaques Böck, Matthias Zeppelzauer |
| 2025 | Counting by Points: Density-Guided Weakly-Supervised Nuclei Segmentation in Histopathological Images. Lingbo Zhang, Bingqian Sun, Linghan Cai, Yifeng Wang, Ye Zhang, Songhan Jiang, Kai Zhang, Yongbing Zhang |
| 2025 | Court of LLMs: Evidence-Augmented Generation via Multi-LLM Collaboration for Text-Attributed Graph Anomaly Detection. Yiming Xu, Jiarun Chen, Zhen Peng, Zihan Chen, Qika Lin, Lan Ma, Bin Shi, Bo Dong |
| 2025 | CrosST: Cross Swin 4D Transformer for Multi-Modal Alzheimer's Detection. Hao Wang, Hanxiao Li, Li Xu |
| 2025 | Cross Paradigm Representation and Alignment Transformer for Image Deraining. Shun Zou, Yi Zou, Juncheng Li, Guangwei Gao, Guo-Jun Qi |
| 2025 | Cross Time Domain Intention Interaction for Conditional Trajectory Prediction. Yuxiang Zhao, Wei Huang, Haipeng Zeng, Huan Zhao, Yujie Song |
| 2025 | Cross-Counter-Repeat Attention for Enhanced Understanding of Visual Semantics in Radiology Report Generation. Xiaolei Bo, Feiyang Yang, Feilong Xu, Xiaoli Zhang |
| 2025 | Cross-Domain Attribute Alignment with CLIP: A Rehearsal-Free Approach for Class-Incremental Unsupervised Domain Adaptation. Kerun Mi, Guoliang Kang, Guangyu Li, Lin Zhao, Tao Zhou, Chen Gong |
| 2025 | Cross-Modal Dual-Causal Learning for Long-Term Action Recognition. Shaowu Xu, Xibin Jia, Junyu Gao, Qianmei Sun, Jing Chang, Chao Fan |
| 2025 | Cross-Modal Metrics for Capturing Correspondences Between Music Audio and Stage Lighting Signals. Michael Kohl, Tobias Wursthorn, Christof Weiß |
| 2025 | Cross-Modal Prototype Augmentation and Dual-Grained Prompt Learning for Social Media Popularity Prediction. Ao Zhou, Mingsheng Tu, Luping Wang, Tenghao Sun, Zifeng Cheng, Yafeng Yin, Zhiwei Jiang, Qing Gu |
| 2025 | Cross-Modal Retrieval with Cauchy-Schwarz Divergence. Jiahao Zhang, Wenzhe Yin, Shujian Yu |
| 2025 | Cross-Model Watermarking via Discriminative Samples for Secure Authentication. Juan Zhao, Yudao Sun, Zhihai Yang, Cai Xu, Hongji Chen, Fan Zhang, Jianxin Li |
| 2025 | Cross-View Geometric Collaboration for Generalizable Sparse View Neural Surface Reconstruction. Hang Yang, Le Hui, Jianjun Qian, Jian Yang, Yigong Zhang, Jin Xie |
| 2025 | CrossMind-VL: Multi-Subject Mind-to-Video Decoding with Multimodal LLM Semantic Grounding. Xuanliu Zhu, Yiqiao Chai, Runnan Li, Mingying Lan, Li Gao |
| 2025 | Crowd Dynamics Demand Adaptivity: Self-Adaptive Physics-Informed Neural Network for Crowd Simulation. Ziying Tan, Linbo Luo, Haiyan Yin, Yew-Soon Ong, Wentong Cai |
| 2025 | Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition. Guanjie Huang, Danny H. K. Tsang, Shan Yang, Guangzhi Lei, Li Liu |
| 2025 | CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model. Wei Zhang, Wong Kam-Kwai, Biying Xu, Yiwen Ren, Yuhuai Li, Yingchaojie Feng, Minfeng Zhu, Wei Chen |
| 2025 | Cycle-Consistent Mamba-Based Registration-Fusion Joint Network for Unregistered Hyperspectral Image Super-Resolution. Quangui He, Jiahui Qu, Wenqian Dong, Song Xiao, Qinghao Gao |
| 2025 | D Wenxiang Liu, Yongkang Liu, Weiliang Meng, Gaoqi He, Jianhua Li |
| 2025 | D Yefei Sheng, Jie Wang, Ming Tao, Bing-Kun Bao |
| 2025 | DA-Font: Few-Shot Font Generation via Dual-Attention Hybrid Integration. Weiran Chen, Guiqian Zhu, Ying Li, Yi Ji, Chunping Liu |
| 2025 | DA3D: Domain-Aware Dynamic Adaptation for All-Weather Multimodal 3D Detection. Haochen Yang, Lei Li, Jiacheng Guo, Baolu Li, Minghai Qin, Hongkai Yu, Tianyun Zhang |
| 2025 | DACA-Net: A Degradation-Aware Conditional Diffusion Network for Underwater Image Enhancement. Chang Huang, Jiahang Cao, Jun Ma, Kieren Yu, Cong Li, Huayong Yang, Kaishun Wu |
| 2025 | DAFU-CAD: Depth-assisted Feature Unraveling for Sketch-based Robust CAD Modeling. Yue Sun, Xinqi Liu, Zhiliang He, Jialu Zhang, Chenming Wu, Guodong Lu, Jituo Li |
| 2025 | DAPT: Domain-Aware Prompt-Tuning for Multimodal Fake News Detection. Yu Tong, Weihai Lu, Xiaoxi Cui, Yifan Mao, Zhejun Zhao |
| 2025 | DARL: Mitigating Gradient Conflicts in Long-Tailed Out-of-Distribution Learning. Xuan Zhang, Sin Chee Chin, Jing-Hao Xue, Xiaochen Yang, Wenming Yang |
| 2025 | DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition. Haijing Liu, Tao Pu, Hefeng Wu, Keze Wang, Liang Lin |
| 2025 | DATE: Dual Prompt Learning with Information Bottleneck for Graph Out-of-Distribution Generalization. Jiayi Zeng, Tao Ren, Changhu Wang, Yifan Wang, Wei Ju, Zhipeng Sun, Xiao Luo |
| 2025 | DC Chuan Zeng, Zhao Zhang, Wei Huang, Lei Zhang, Le Yi, Kefu Zhao |
| 2025 | DCNOT: Diffusion-Cascaded Neural Optimal Transport for Scalable Multi-Domain Image-to-Image Translation. Yingzhen Zhang, Jimin Dai, Qianliang Wu, Jian Yang, Lei Luo |
| 2025 | DCount: Decoupled Spatial Perception and Attribute Discrimination for Referring Expression Counting. Ming Li, Yupeng Hu, Yinwei Wei, Hao Liu, Haocong Wang, Weili Guan |
| 2025 | DDFD: Diffusion-Based Denoising Fusion for Object Detection in Infrared-Visible Images. Min Dang, Gang Liu, Jingqi Zhao, Adams Wai-Kin Kong, Nan Luo, Di Wang |
| 2025 | DDSE: A Decoupled Dual-Stream Enhanced Framework for Multimodal Sentiment Analysis with Text-Centric SSM. Shenjie Jiang, Zhuoyu Wang, Xuecheng Wu, Hongru Ji, Mingxin Li, Xianghua Li, Chao Gao |
| 2025 | DEEMO: De-identity Multimodal Emotion Recognition and Reasoning. Deng Li, Bohao Xing, Xin Liu, Baiqiang Xia, Bihan Wen, Heikki Kälviäinen |
| 2025 | DEPO: Enhancing E-commerce Image Background Generation with Short Trajectory Direct Expected Preference Optimization. Shikun Sun, Chengrui Wang, Min Zhou, Zixuan Wang, Xiaoyu Qin, Tiezheng Ge, Bo Zheng, Jia Jia |
| 2025 | DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models. Jiarui Wang, Huiyu Duan, Juntong Wang, Ziheng Jia, Woo Yi Yang, Xiaorong Zhu, Yu Zhao, Jiaying Qian, Yuke Xing, Guangtao Zhai, Xiongkuo Min |
| 2025 | DFCNet: Dual-Factor Compensatory Clustering Network for Modality-Imbalanced Generalized Zero-Shot Learning. Xiangyu Shan, Heng Song, Junwu Zhu |
| 2025 | DFGAP: Towards Depth-Free Cross-Category GAParts Perception via Uncertainty-Quantified Modeling. Xueyu Yuan, Jiarui Zhang, Jiangqi Song, Liu Liu, Li Zhang, Dan Guo, Richang Hong, Meng Wang |
| 2025 | DFPD: Dual-Forgery Proactive Defense against Both Deepfakes and Traditional Image Manipulations. Beijing Chen, Yuting Hong, Ziqiang Li, Zhangjie Fu |
| 2025 | DGFSD: Bridging the Gap between Dense and Sparse for Fully Sparse 3D Object Detection. Guoxin Zhang, Zhonghong Ou, Kaiwen Xue, Jiangfeng Sun, Yifan Zhu, Siyuan Yao, Yiran Shen, Meina Song |
| 2025 | DGNS: Deformable Gaussian Splatting and Dynamic Neural Surface for Monocular Dynamic 3D Reconstruction. Xuesong Li, Jinguang Tong, Jie Hong, Vivien Rolland, Lars Petersson |
| 2025 | DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models. Yudong Zhang, Ruobing Xie, Xingwu Sun, Yiqing Huang, Jiansheng Chen, Zhanhui Kang, Di Wang, Yu Wang |
| 2025 | DHGCN: Dual HyperGraph Convolutional Network for EEG-Based Auditory Attention Detection. Jian Zhou, Yingjie Xie, Cunhang Fan, Huabin Wang, Zhao Lv, Liang Tao |
| 2025 | DHOW '25: 2nd International Workshop on Diffusion of Harmful Content on Online Web. Amit Kumar Jaiswal, Thomas Mandl, Gautam Kishore Shahi, Durgesh Nandini, Haiming Liu |
| 2025 | DIME-Net: A Dual-Illumination Adaptive Enhancement Network Based on Retinex and Mixture-of-Experts. Ziang Wang, Xiaoqin Wang, Dingyi Wang, Qiang Li, Shushan Qiao |
| 2025 | DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification. Yujie Yang, Shuang Li, Jun Ye, Neng Dong, Fan Li, Huafeng Li |
| 2025 | DITL Songze Li, Yunfei Guo, Shen Chen, Bin Li, Kaiqing Lin, Changsheng Chen, Haodong Li, Taiping Yao, Shouhong Ding |
| 2025 | DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation. Zhihang Yuan, Siyuan Wang, Yuzhang Shang, Hanling Zhang, Tongcheng Fang, Rui Xie, Shengen Yan, Guohao Dai, Yu Wang |
| 2025 | DMC Jiayi Zou, Chaofan Chen, Bing-Kun Bao, Changsheng Xu |
| 2025 | DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction. Cunhang Fan, Sheng Zhang, Jingjing Zhang, Enrui Liu, Xinhui Li, Gangming Zhao, Zhao Lv |
| 2025 | DMMD4SR: Diffusion Model-based Multi-level Multimodal Denoising for Sequential Recommendation. Weihai Lu, Li Yin |
| 2025 | DOMR: Establishing Cross-View Segmentation via Dense Object Matching. Jitong Liao, Yulu Gao, Shaofei Huang, Jialin Gao, Jie Lei, Ronghua Liang, Si Liu |
| 2025 | DPCSet: A Large-scale Dynamic Point Cloud Dataset for Compression and Perception. Wenxu Gao, Liang Xie, Kangli Wang, Jingxuan Su, Changhao Peng, Wei Gao |
| 2025 | DPFMVC: Dynamic Progressive Fusion for Multi-view Clustering. Taichun Zhou, Zhibin Dong, Siwei Wang, Ke Liang, Miaomiao Li, Xinwang Liu, En Zhu, Xiangjun Dong |
| 2025 | DR-VQA: Decompose-then-Reconstruct for Visual Question Answering in BLV Assistance. Bocheng Pan, Hailong Shi, Xingyu Gao |
| 2025 | DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition. Yiyan Xu, Wuqiang Zheng, Wenjie Wang, Fengbin Zhu, Xinting Hu, Yang Zhang, Fuli Feng, Tat-Seng Chua |
| 2025 | DREAM: Document Reconstruction via End-to-end Autoregressive Model. Xin Li, Mingming Gong, Yunfei Wu, Jianxin Dai, Antai Guo, Xinghua Jiang, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun |
| 2025 | DREAM: Integrating Hierarchical Multimodal Retrieval with Multi-page Multimodal Language Model for Documents VQA. Jinxu Zhang, Qiyuan Fan, Yongqi Yu, Yu Zhang |
| 2025 | DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition. Peiyuan Jiang, Yao Liu, Qiao Liu, Zongshun Zhang, Jiaye Yang, Lu Liu, Daibing Yao |
| 2025 | DRMix: Decomposition-Recomposition Data Augmentation with Diffusion Model. Shuo Wang, Zhichuan Wang, Yanmin Chen, Mengyao Zhou, Jun Luo |
| 2025 | DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection. Guiping Cao, Xiangyuan Lan, Wenjian Huang, Jianguo Zhang, Dongmei Jiang, Yaowei Wang |
| 2025 | DSACap: Enhancing Visual-Semantic Alignment with Diffusion-based Framework for Image Captioning. Liangyu Fu, Junbo Wang, Yuke Li, Qiangguo Jin, Hongsong Wang, Jing Ya, Linjiang Huang, Liang Yao, Jiangbin Zheng, Xuecheng Wu, Zhiyong Wang |
| 2025 | DSDGF-Nutri: A Decoupled Self-Distillation Network with Gating Fusion For Food Nutritional Assessment. Sujuan Hou, Zhihui Feng, Hao Xiong, Weiqing Min, Peng Li, Shuqiang Jiang |
| 2025 | DSDNet: Raw Domain Demoiréing via Dual Color-Space Synergy. Qirui Yang, Fangpu Zhang, Yeying Jin, Qihua Cheng, Peng-Tao Jiang, Huanjing Yue, Jingyu Yang |
| 2025 | DSF-Net: Dynamic Sparse Fusion of Event-RGB via Spike-Triggered Attention for High-Speed Detection. Dongyang Ma, Zhengyu Ma, Wei Zhang, Yonghong Tian |
| 2025 | DSP: Dense-Sparse Parallel Networks for Self-supervised 3D Multi-person Pose Estimation from Multiple Views. Yang Liu, Zhiyong Zhang |
| 2025 | DSPF: Dual-Stage Preservation and Fusion for Source-Free Domain Adaptive Point Cloud Completion. Zhiqian Xia, Haifeng Xia, Shichao Jin, Wei Wang, Zhengming Ding, Xiaochun Cao |
| 2025 | DSS-Prompt: Dynamic-Static Synergistic Prompting for Few-Shot Class-Incremental Learning. Linpu He, Yanan Li, Bingze Li, Elvis Han Cui, Donghui Wang |
| 2025 | DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation. Changsheng Gao, Zijie Liu, Li Li, Dong Liu, Xiaoyan Sun, Weisi Lin |
| 2025 | DUDA: A Two-stage Decoupling Unsupervised Domain Adaptation Framework for Semi-supervised Singing Melody Extraction from Polyphonic Music. Shuai Yu, Xiaoliang He, Kangjie Dong, Yi Yu |
| 2025 | DUIMC: Deep Unbalanced Incomplete Multi-View Clustering via Graph Constrained Imputation and Contrastive Learning. Wenhui Wu, Guanqi Wen, Le Ou-Yang, Ran Wang, Sam Kwong |
| 2025 | DVW: Diffusion Visible Watermark. Jiawei Zhang, Xiaoli Jiang, Hao Wang, Lin Yuan, Xiangyang Luo, Bin Ma, Jinwei Wang |
| 2025 | Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning. Yu Zhao, Ying Zhang, Xuhui Sui, Baohang Zhou, Haoze Zhu, Jeff Z. Pan, Xiaojie Yuan |
| 2025 | DeCoRec: Decoupled Collaborative Refinement for Multi-Modal Sequential Recommendations. Zhaoqi Chen, Wanni Xu, Yunfeng Zhang, Yawei Hou, Zhenyu Wen, Cong Wang |
| 2025 | DeHate: A Holistic Hateful Video Dataset for Explicit and Implicit Hate Detection. Yuchen Zhang, Tailin Chen, Jiangbei Yue, Yueming Sun, Rahul Singh, Jianbo Jiao, Zeyu Fu |
| 2025 | DeLoad: Demand-Driven Short-Video Preloading with Scalable Watch-Time Estimation. Tong Liu, Zhiwei Fan, Guanyan Peng, Haodan Zhang, Yucheng Zhang, Zhen Wang, Pengjin Xie, Liang Liu |
| 2025 | Debiasing Multimodal Large Language Models via Penalization of Language Priors. Yifan Zhang, Yang Shi, Weichen Yu, Qingsong Wen, Xue Wang, Wenjing Yang, Zhang Zhang, Liang Wang, Rong Jin |
| 2025 | Deciphering Functions of Neurons in Vision-Language Models. Jiaqi Xu, Cuiling Lan, Yan Lu |
| 2025 | Decode-What-Matters: Frame-Level Parallel Generative Decoding to Accelerate Large-Scale Video Analytics. Xiaokun Wang, Yuting Yan, Sheng Zhang, Andong Zhu, Ning Chen, Yu Chen, Zhuzhong Qian, Sanglu Lu, Yu Liang |
| 2025 | Decoupled Global-Local Alignment for Improving Compositional Understanding. Xiaoxing Hu, Kaicheng Yang, Jun Wang, Haoran Xu, Ziyong Feng, Yupei Wang |
| 2025 | Decoupled Identity and Attribute Tokenization for Person Re-Identification. Rui Shang, Min Liu, Xueping Wang, Yuan Bian, Yaonan Wang |
| 2025 | Decoupled Motion Prediction for Real-time G-buffer Free Frame Extrapolation. Jiawei Zhang, Haonan Zhang, Weitao Zhang, Liang Pu, Zesen Feng, Jie Guo |
| 2025 | Decoupling Dense Video Captioning via Task-specific Prompts. Wei Chen, Jianwei Niu, Xuefeng Liu, Xinghao Wu |
| 2025 | Deep Graph Clustering with Disentangled Representation Learning. Yifan Wang, Yuntai Ding, Yiyang Gu, Ziyue Qiao, Chong Chen, Xian-Sheng Hua, Ming Zhang, Wei Ju |
| 2025 | Deep Multi-Level Contrastive Clustering for Multi-Modal Remote Sensing Images. Weiqi Liu, Yongshan Zhang, Xinxin Wang, Lefei Zhang |
| 2025 | Deep Probabilistic Binary Embedding via Learning Reliable Uncertainty for Cross-Modal Retrieval. Kun Cheng, Qibing Qin, Wenfeng Zhang, Lei Huang, Jie Nie |
| 2025 | Deep Variational Incomplete Multi-View Clustering with Information-Theoretic Guidance. Wenlan Chen, Lu Gao, Cheng Liang, Fei Guo |
| 2025 | Deep-Plant-Disease Dataset Is All You Need for Plant Disease Identification. Abel Yu Hao Chai, Kelly Li Zhen Jee, Sue Han Lee, Fei Siang Tay, Jules Vandeputte, Hervé Goëau, Pierre Bonnet, Alexis Joly |
| 2025 | DeepMolTex: Deep Alignment of Molecular Graphs with Large Language Models via Mixture of Modality Experts. Mingliang Yan, Yanhua Yu, Ruochi Zhang, Zhiyuan Liu, Ruicheng Zhang, Yimeng Ren, Kangkang Lu, Zhiyong Huang, Feng Luo, Zhen Cai |
| 2025 | DeepSIX at ACM MM 2025 Grand Challenge: Enhancing Context Text Processing for Multimodal Hallucination Detection and Fact Verification. Hoang Chu, Huy Chu, Tan-Minh Nguyen, Son T. Luu, Cuong Hoang, Hiep Nguyen, Vu Tran, Le Minh Nguyen |
| 2025 | DeflareMamba: Hierarchical Vision Mamba for Contextually Consistent Lens Flare Removal. Yihang Huang, Yuanfei Huang, Junhui Lin, Hua Huang |
| 2025 | Degradation-Aware One-Step Diffusion Model for Content-Sensitive Super-Resolution in the Dark. Tengyu Ma, Jiafa Ruan, Yuetong Wang, Guangchao Han, Zhu Liu, Long Ma, Risheng Liu |
| 2025 | Degradation-Consistent Learning via Bidirectional Diffusion for Low-Light Image Enhancement. Jinhong He, Minglong Xue, Zhipu Liu, Mingliang Zhou, Aoxiang Ning, Palaiahnakote Shivakumara |
| 2025 | DenseSR: Image Shadow Removal as Dense Prediction. Yu-Fan Lin, Chia-Ming Lee, Chih-Chung Hsu |
| 2025 | DepFormer: A Unified Framework with Bimodal Collaborative Transformer for Depression Detection. Fangyuan Liu, Sirui Zhao, Kang Yin, Tong Xu, Enhong Chen |
| 2025 | Depth-Enabled Inspection of Medical Videos. Hadi Amirpour, Doris Putzgruber-Adamitsch, Klaus Schoeffmann |
| 2025 | DepthDark: Robust Monocular Depth Estimation for Low-Light Environments. Longjian Zeng, Zunjie Zhu, Rongfeng Lu, Ming Lu, Bolun Zheng, Chenggang Yan, Anke Xue |
| 2025 | DepthGait: Multi-Scale Cross-Level Feature Fusion of RGB-Derived Depth and Silhouette Sequences for Robust Gait Recognition. Xinzhu Li, Juepeng Zheng, Yikun Chen, Xudong Mao, Guanghui Yue, Wei Zhou, Chenlei Lv, Ruomei Wang, Fan Zhou, Baoquan Zhao |
| 2025 | Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries. Pengfei Cai, Yan Song, Qing Gu, Nan Jiang, Haoyu Song, Ian McLoughlin |
| 2025 | DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models. Jiachen Fu, Chun-Le Guo, Chongyi Li |
| 2025 | Detecting Forged HEVC Videos via Anomalous Bitrate-Compressed Traces: A Frame-Level Bitrate Analysis Framework. Lizhi Xiong, Linsen Ding, Ziqiang Li |
| 2025 | Detecting Synthetic Image by Cross-Modal Commonality Interaction. Kai Li, Wenqi Ren, Wei Wang, Linchao Zhang, Xiaochun Cao |
| 2025 | Detecting Violations of Physical Common Sense in Images: A Challenge Dataset and Effective Model. Weibin Wu, Zitong Wang, Zhengjie Luo, Wenqing Chen, Zibin Zheng |
| 2025 | Device-Cloud Collaborative Learning Framework for Efficient Unknown Object Detection. Kewei Zhao, Xiaowei Hu, Qinya Li |
| 2025 | DiSCo: Disentangled Attribute Manipulation Retrieval via Semantic Reconstruction and Consistency Regularization. Min Tan, Guanhao Liu, Huijing Zhan, Yuyu Yin, Zhou Yu, Jiajun Ding, Yinfu Feng |
| 2025 | Dialogue-Driven Interactive Dynamic Learning for Text-to-Image Person Retrieval. Hongyu Liu, Hongwei Ge, Yuxuan Liu, Yaqing Hou |
| 2025 | DichotomyIR: Universal Image Reconstruction via Dichotomy Classification and Uncertainty Elimination. Yan Zhang, Shiwen He, Lin Yuan, Jiaxu Leng, Xinbo Gao |
| 2025 | DiffArtist: Towards Structure and Appearance Controllable Image Stylization. Ruixiang Jiang, Chang Wen Chen |
| 2025 | DiffTMR: Diffusion-based Hierarchical Alignment for Text-Molecule Retrieval. Chenxu Wang, Dong Zhou, Ting Liu, Jianghao Lin, Yongmei Zhou, Aimin Yang |
| 2025 | Differential Contrastive Training for Gaze Estimation. Lin Zhang, Yi Tian, Xiyun Wang, Wanru Xu, Yi Jin, Yaping Huang |
| 2025 | Differentially Private Visual Learning with Public Subspace Augmented by Synthetic Data. Haichao Sha, Yuncheng Wu, Ruixuan Liu, Yang Cao, Hong Chen |
| 2025 | DiffuFuse: Diffusion-Driven Dual-Stream Fusion Framework for Multimodal Sentiment Analysis. Xiongjian Lv, Yimin Wen, Hang Yu |
| 2025 | DiffuQKT: A Diffusion-Based Approach for Improved Question Representation in Knowledge Tracing. Fenghua Yu, Jianwen Sun, Qian Wan, Meicheng Chen, Xiaoxuan Shen, Qing Li |
| 2025 | DiffuSeg: Diffusion-Enhanced Cross-Modal Semantic Segmentation for RGB-D. Jun Yang, Maoyu Mao |
| 2025 | Diffusion-Guided Knowledge Distillation for Weakly-Supervised Low-Light Semantic Segmentation. Chunyan Wang, Dong Zhang, Jinhui Tang |
| 2025 | Diffusion-based Adversarial Identity Manipulation for Facial Privacy Protection. Liqin Wang, Qianyue Hu, Wei Lu, Xiangyang Luo |
| 2025 | DiffusionMat: Alpha Matting as Deterministic Sequential Refinement Learning. Yangyang Xu, Shengfeng He, Wenqi Shao, Yong Du, Kwan-Yee K. Wong, Yu Qiao, Jun Yu, Ping Luo |
| 2025 | DilateQuant: Accurate and Efficient Quantization-Aware Training for Diffusion Models via Weight Dilation. Xuewen Liu, Zhikai Li, Minghao Jiang, Mengjuan Chen, Jianquan Li, Qingyi Gu |
| 2025 | Direction-Aware Room Impulse Response Estimation for Immersive Audio Rendering in Real Environments. Giovanni Zanin, Ritujoy Biswas, Pietro Morerio, Sylvio Barbon Junior, Alberto Carini, Alessio Del Bue, Vittorio Murino |
| 2025 | DisFaceRep: Representation Disentanglement for Co-occurring Facial Components in Weakly Supervised Face Parsing. Xiaoqin Wang, Xianxu Hou, Meidan Ding, Junliang Chen, Kaijun Deng, Jinheng Xie, Linlin Shen |
| 2025 | DisMS-TS: Eliminating Redundant Multi-scale Features for Time Series Classification. Zhipeng Liu, Peibo Duan, Binwu Wang, Xuan Tang, Qi Chu, Changsheng Zhang, Yongsheng Huang, Bin Zhang |
| 2025 | Discovering Maximum Frequency Consensus: Lightweight Federated Learning for Medical Image Segmentation. Lingren Wang, Wenxuan Tu, Jieren Cheng, Jianan Wang, Xiangyan Tang, Chenchen Wang |
| 2025 | Discrepancy-Aware Attention Network for Enhanced Audio-Visual Generalized Zero-Shot Learning. Runlin Yu, Yipu Gong, Wenrui Li, Aiwen Sun, Mengren Zheng |
| 2025 | Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation. Weipeng Tan, Chuming Lin, Chengming Xu, FeiFan Xu, Xiaobin Hu, Xiaozhong Ji, Junwei Zhu, Chengjie Wang, Yanwei Fu |
| 2025 | Disentangling Homophily and Heterophily in Multimodal Graph Clustering. Zhaochen Guo, Zhixiang Shen, Xuanting Xie, Liangjian Wen, Zhao Kang |
| 2025 | Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis. Tianqi Li, Ruobing Zheng, Minghui Yang, Jingdong Chen, Ming Yang |
| 2025 | Diverse and Public Features Cooperation via Gradient Rectification for Federated Prompt Learning. Qi Li, Yucan Zhou, Jiang Zhou, XingYou Yang, Xiaoyan Gu |
| 2025 | Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs. Yaowen Hu, Wenxuan Tu, Yue Liu, Miaomiao Li, Wenpeng Lu, Zhigang Luo, Xinwang Liu, Ping Chen |
| 2025 | Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models? Yunbo Lyu, Zhou Yang, Yuqing Niu, Jing Jiang, David Lo |
| 2025 | DogSpeak: A Canine Vocalization Classification Dataset. Hridayesh Lekhak, Theron S. Wang, Tuan M. Dang, Kenny Q. Zhu |
| 2025 | Domain Crossover Non-Rigid Registration for 3D Human Meshes. Kyungjune Lee, Seongjean Kim, Hoseok Tong, Hyucksang Lee, Seongmin Lee, Weisi Lin, Ping An, Sanghoon Lee |
| 2025 | Domain-Agnostic Neural Oil Painting via Normalization Affine Test-Time Adaptation. Qichao Dong, Lingyu Liu, Yaxiong Wang, Jason J. R. Liu, Zhedong Zheng |
| 2025 | Domain-Specific Interactive Prompting for Generalized Nuclei Classification. Binbin Zheng, Aiqiu Wu, Kai Fan, Ao Li, Minghui Wang |
| 2025 | Domain-aware Visual Context Prompt for Multi-Source Domain Adaptation. Yuwu Lu, Haoyu Huang, Xue Hu |
| 2025 | Dome-DETR: DETR with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection. Zhangchi Hu, Peixi Wu, Jie Chen, Huyue Zhu, Yijun Wang, Yansong Peng, Hebei Li, Xiaoyan Sun |
| 2025 | Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation. Zhiqing Cui, Jiahao Yuan, Hanqing Wang, Yanshu Li, Chenxu Du, Zhenglong Ding |
| 2025 | Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings. Feiwei Qin, Shichao Lu, Junhao Hou, Changmiao Wang, Meie Fang, Ligang Liu |
| 2025 | DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes. Zhende Song, Chenchen Wang, Jiamu Sheng, Chi Zhang, Shengji Tang, Jiayuan Fan, Tao Chen |
| 2025 | DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment. Xiaofan Li, Chenming Wu, Zhao Yang, Zhihao Xu, Yumeng Zhang, Dingkang Liang, Ji Wan, Jun Wang |
| 2025 | Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual Grounding. Yuzhen Li, Min Liu, Yuan Bian, Xueping Wang, Zhaoyang Li, Gen Li, Yaonan Wang |
| 2025 | Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval. Yifan Wang, Tao Wang, Chenwei Tang, Caiyang Yu, Zhengqing Zang, Mengmi Zhang, Shudong Huang, Jiancheng Lv |
| 2025 | Dual Teacher with Dempster-Shafer Guidance for Decision Making in Semi-Supervised Small Object Detection. Nan Gao, Junchao Zhu, Yilong Zhang, Ronghua Liang, Guodao Sun, Peng Chen |
| 2025 | Dual Uncertainty-Guided Feature Alignment Learning for Text-Based Person Retrieval. Yufei Zheng, Jiawei Liu, Bingyu Hu, Zikun Wei, Yong Wu, Zheng-Jun Zha |
| 2025 | Dual-Constraint Multi-view Fuzzy Clustering with Scalable Anchor Graph Learning. Luyan Cui, Huibing Wang, Yawei Chen, Mingze Yao, Xianping Fu, Jiqing Zhang |
| 2025 | Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching. Yafei Zhang, Yongle Shang, Huafeng Li |
| 2025 | Dual-Learning based Penalized Multi-Align Clustering for Multi-View Incomplete and Disorderly Data. Liang Zhao, Shubin Ma, Bo Xu, Qingchen Zhang |
| 2025 | Dual-Level Distribution Alignment for Deep Incomplete Multi-View Clustering. Fujian Ren, Wenlan Chen, Lu Gao, Fei Guo, Cheng Liang |
| 2025 | Dual-Phase Playtime-guided Recommendation: Interest Intensity Exploration and Multimodal Random Walks. Jingmao Zhang, Zhiting Zhao, Yunqi Lin, Jianghong Ma, Tianjun Wei, Haijun Zhang, Xiaofeng Zhang |
| 2025 | Dual-Prototype Learning in Multiple Instance Learning for Histopathology Image Classification. Ting Xiao, Minqian Sun, Yiqing Xia, Zhe Wang |
| 2025 | DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis. Wenjie Tian, Xinfa Zhu, Haohe Liu, Zhixian Zhao, Zihao Chen, Chaofan Ding, Xinhan Di, Junjie Zheng, Lei Xie |
| 2025 | DualEnhance: External Multimodal Foundation Models Guidance and Internal Fast-Slow Teacher Regulation. Qi He, Xiao Wu, Jun-Yan He, Wei Li, Zhaoquan Yuan |
| 2025 | DualFPT: Handling Data Heterogeneity in Federated Prompt Tuning from both Generalized and Personalized Perspective. Yuliang Chen, Xi Lin, Chao Sang, Xiu Su |
| 2025 | DualMat: PBR Material Estimation via Coherent Dual-Path Diffusion. Yifeng Huang, Zhang Chen, Yi Xu, Minh Hoai, Zhong Li |
| 2025 | DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework. Kuiye Ding, Fanda Fan, Yao Wang, Ruijie jian, Xiaorui Wang, Luqi Gong, Yishan Jiang, Chunjie Luo, Jianfeng Zhan |
| 2025 | DyNAS-DDI: Dynamic Pairwise Architecture Search for Generalizable Drug-Drug Interaction LLM. Linxin Xiao, Xin Wang, Zeyang Zhang, Yang Yao, Wenwu Zhu |
| 2025 | DynFed: Adaptive Federated Learning via Quantization-Aware Knowledge Distillation. Nan He, Yiming Chen, Zheng Jiang, Song Yang, Lifeng Sun |
| 2025 | DynMark: A Robust Watermarking Solution for Dynamic Screen Content with Small-size Screenshot Support. Changyu Rao, Gaozhi Liu, Sheng Li, Xinpeng Zhang, Zhenxing Qian |
| 2025 | Dynamic 2D Gaussians: Geometrically Accurate Radiance Fields for Dynamic Objects. Shuai Zhang, Guanjun Wu, Zhoufeng Xie, Xinggang Wang, Bin Feng, Wenyu Liu |
| 2025 | Dynamic Analysis and Adaptive Discriminator for Fake News Detection. Xinqi Su, Zitong Yu, Yawen Cui, Ajian Liu, Xun Lin, Yuhao Wang, Haochen Liang, Wenhui Li, Li Shen, Xiaochun Cao |
| 2025 | Dynamic Beauty is Easy to Find: A Large-Scale Composition-Aware Dataset and an End-to-End Framework for Video Reframing. Sitian Gu, Zhiyu Pan, Chaoyi Hong, Chengxin Liu, Zhiguo Cao |
| 2025 | Dynamic Optimization Noisy Cross-Modal Hashing. Zebing Yao, Hao Fu, Yuanhang Yang, Guanghua Gu |
| 2025 | Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation. Jing Jin, Xu Liu, Te Gao, Zhihong Shi, Yixiong Liang, Ruiqing Zheng, Hulin Kuang, Min Zeng, Shichao Kan |
| 2025 | Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection. Francesco Tonini, Lorenzo Vaquero, Alessandro Conti, Cigdem Beyan, Elisa Ricci |
| 2025 | Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Retrieval. Zhengyang Liang, MeiYu Liang, Wei Huang, Yawen Li, Wu Liu, Yingxia Shao, Kangkang Lu |
| 2025 | E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras. Chaoran Feng, Zhenyu Tang, Wangbo Yu, Yatian Pang, Yian Zhao, Jianbin Zhao, Li Yuan, Yonghong Tian |
| 2025 | E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model. Ronghao Lin, Shuai Shen, Weipeng Hu, Qiaolin He, Aolin Xiong, Li Huang, Haifeng Hu, Yap-Peng Tan |
| 2025 | EBaR: Efficient Buffer and Resetting for Single-Sample Continual Test-Time Adaptation. Tianyi Ma, Maoying Qiao |
| 2025 | EDMG: Towards Efficient Long Dance Motion Generation with Fundamental Movements from Dance Genres. Jinming Zhang, Yunlian Sun, Hongwen Zhang, Jinhui Tang |
| 2025 | EDPC: Accelerating Lossless Compression via Lightweight Probability Models and Decoupled Parallel Dataflow. Zeyi Lu, Xiaoxiao Ma, Yujun Huang, Minxiao Chen, Bin Chen, Baoyi An, Shu-Tao Xia |
| 2025 | EDeF-Net: Spatio-temporal Association Network for Flicker Removal in Event Streams. Jin Han, Yixin Yang, Zhan Zhan, Boxin Shi, Imari Sato |
| 2025 | EEG-Driven Image Reconstruction with Saliency-Guided Diffusion Models. Igor Abramov |
| 2025 | EEG-Face: A Facial-Image Stimulated EEG Data-Set for Analysis of Brain Perceived Multimedia. Wuxia Zhang, Yang Xin, Shibo Lv, Xin Zhang, Xiang Zhong, Jianmin Jiang |
| 2025 | EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion Recognition. Qile Liu, Weishan Ye, Lingli Zhang, Zhen Liang |
| 2025 | EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment. Lancheng Gao, Ziheng Jia, Yunhao Zeng, Wei Sun, Yiming Zhang, Wei Zhou, Guangtao Zhai, Xiongkuo Min |
| 2025 | EHPE: A Segmented Architecture for Enhanced Hand Pose Estimation. Bolun Zheng, Xinjie Liu, Qianyu Zhang, Canjin Wang, Fangni Chen, Mingen Xu |
| 2025 | EHVC: Efficient Hierarchical Reference and Quality Structure for Neural Video Coding. Junqi Liao, Yaojun Wu, Chaoyi Lin, Zhipin Deng, Li Li, Dong Liu, Xiaoyan Sun |
| 2025 | EIR-SDG: Explore Invariant Representation for Single-source Domain Generalization in Medical Image Segmentation. Ziwei Niu, Shiao Xie, Ziyue Wang, Yen-Wei Chen, Yueming Jin, Lanfen Lin |
| 2025 | ELFATT: Efficient Linear Fast Attention for Vision Transformers. Chong Wu, Maolin Che, Renjie Xu, Zhuoheng Ran, Hong Yan |
| 2025 | EMIFS: Efficient Multi-scale Information Fusion Self-supervision for Medical Image Segmentation. Luyao Ren, Wenxin Yu, Zhiqiang Zhang, Chang Liu |
| 2025 | EMO-Avatar: An LLM-Agent-Orchestrated Framework for Multimodal Emotional Support in Human Animation. Keqi Chen, Wenxin Fu, Qihang Lu, Zekai Sun, Yizhong Geng, Yi Liu, Puyuan Guo, Yingming Gao, Ya Li |
| 2025 | ENRIC: EveNt-AwaRe Captioning with Image Retrieval via UnCertainty-Guided Re-ranking and Semantic Ensemble Reasoning. Nam-Quan Nguyen, Minh-Hoang Le, Vinh-Toan Vong, Minh-Triet Tran |
| 2025 | ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations. Shiye Cao, Maia Stiber, Amama Mahmood, Maria Teresa Parreira, Wendy Ju, Micol Spitale, Hatice Gunes, Chien-Ming Huang |
| 2025 | ESOD: Event-Based Small Object Detection. Quanmin Liang, Jinyi Lu, Qiang Li, Shuai Liu, Zhihao Zhao, Yinzheng Zhao, Wei Zhang, Kai Huang, Yonghong Tian |
| 2025 | ESTJ: Enhancing Structured Tendency Judgment in Hybrid-Modal Table Understanding. Shu-Xun Yang, Xian-Ling Mao, Heyan Huang |
| 2025 | EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic Captions. Dinh-Khoi Vo, Van-Loc Nguyen, Minh-Triet Tran, Trung-Nghia Le |
| 2025 | Ear with Eye: Lightweight Multimodal Audio-Visual Network Inspired by Bionic Structures. Xuanming Jiang, Baoyi An, Zhengwei Zou, Dingyu Nie, Jialie Shen, Xueming Qian, Guoshuai Zhao |
| 2025 | EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation. Jiaqi Xu, Kunzhe Huang, Xinyi Zou, Yunkuo Chen, Bo Liu, Mengli Cheng, Jun Huang, Xing Shi |
| 2025 | EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation. Xiangyue Zhang, Jianfang Li, Jiaxu Zhang, Jianqiang Ren, Liefeng Bo, Zhigang Tu |
| 2025 | EchoVim: Making Vision Mamba Docile for Echocardiography Video Segmentation via Dynamic Interaction and Semantic Token-attentive Refinement. Jingxing Guo, Guilian Chen, Yimu Sun, Huisi Wu, Jing Qin |
| 2025 | Echoes of the Creator: An Immersive VR System for Spatial Storytelling and Empathy Towards Co-Creation. Yifan Chen |
| 2025 | Edge-aware Affinity Enhancement for Image Manipulation Localization. Tianyi Zhang, Qinglong Lin, Yang Hu, Pengming Feng, Rubo Zhang |
| 2025 | Edit-by-Example: Adaptive Exemplar-Based Image Editing. Yaojie Li, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Wu Liu, Ting Yao, Tao Mei |
| 2025 | EditEval: Towards Comprehensive and Automatic Evaluation for Text-guided Video Editing. Bingshuai Liu, Ante Wang, Zijun Min, Chenyang Lyu, Longyue Wang, Zhihao Wang, Xu Han, Peng Li, Jinsong Su |
| 2025 | EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation. Deqiang Yin, Junyi Guo, Huanda Lu, Fangyu Wu, Dongming Lu |
| 2025 | EditMaster: Bridging Text instruction and Visual Example for Multimodal guided Image Editing. Jiahui Zhang, Mengtian Li, Jiewei Tang, Junyu Deng, Siyu Tian, Xiang Liu, Meng Zhang, Guangnan Ye, Yu-Gang Jiang |
| 2025 | EditWorld: Simulating World Dynamics for Instruction-Following Image Editing. Bohan Zeng, Ling Yang, Jiaming Liu, Minghao Xu, Yuanxing Zhang, Pengfei Wan, Wentao Zhang, Shuicheng Yan |
| 2025 | Efficient Multi-Slide Visual-Language Feature Fusion for Placental Disease Classification. Hang Guo, Qing Zhang, Zixuan Gao, Siyuan Yang, Shulin Peng, Xiang Tao, Ting Yu, Yan Wang, Qingli Li |
| 2025 | Efficient Semantic Codec for Real-time Vibrotactile Transmission. Runjie Wang, Kemi Chen, Shuijie Li, Mingkai Chen, Tiesong Zhao |
| 2025 | Efficient Trajectory Space-Time Super-Resolution for Fast Live-cell Imaging. Ruian He, Zixian Zhang, Ri Cheng, Weimin Tan, Bo Yan |
| 2025 | Efficient Video Anomaly Detection via Scene-Dependent Memory Assisted Inter-Frame RGB Difference Reconstruction. Han Hu, Wenli Du, Bing Wang |
| 2025 | EgoHierMask: Hierarchical Semantic-Prior Guided Masked Autoencoder for Egocentric Action Recognition. Jiang Shao, Xinbo Zhao, Xiaochun Zou, Xiaolin Ye |
| 2025 | EgoMusic: An Egocentric Augmented Reality Glasses Dataset for Music. Alessandro Ragano, Carl Timothy Tolentino, Kata Szita, Dan Barry, Davoud Shariat Panah, Niall Murray, Andrew Hines |
| 2025 | EgoPrompt: Prompt Learning for Egocentric Action Recognition. Huaihai Lyu, Chaofan Chen, Yuheng Ji, Changsheng Xu |
| 2025 | ElaSleepNet: Exploring an Elastic Multimodal Neural Network for Sleep Staging via Temporal and Contextual Consistency Learning. Qi Shen, Junchang Xin, Bing Tian Dai, Shudi Zhang, Xinyao Liu, Zhiqiong Wang |
| 2025 | EmIT: Emotional Interaction control in Text-to-image diffusion models. Haofan Zhang, Shangfei Wang |
| 2025 | Embodied Ink: A Multisensory Reinterpretation of Chinese Calligraphy Through Digital Twins and Immersive Realities. Anna Borou Yu, Jiajian Min |
| 2025 | Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning. Baining Zhao, Ziyou Wang, Jianjie Fang, Chen Gao, Fanhang Man, Jinqiang Cui, Xin Wang, Xinlei Chen, Yong Li, Wenwu Zhu |
| 2025 | EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler. Hao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, Shanghang Zhang |
| 2025 | EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation. Cheng Zhang, Hongxia Xie, Bin Wen, Songhan Zuo, Ruoxuan Zhang, Wen-Huang Cheng |
| 2025 | EmoDETective: Detecting, Exploring, and Thinking Emotional Cause in Videos. Xuandong Huang, Yuzhe Zhou, Jiashu Li, Shiqian Lu, Shangfei Wang |
| 2025 | EmoSym: A Symbiotic Framework for Unified Emotional Understanding and Generation via Latent Reasoning. Yijie Zhu, Yibo Lyu, Zitong Yu, Rui Shao, Kaiyang Zhou, Liqiang Nie |
| 2025 | EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting. Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen |
| 2025 | Emotion across Modalities and Cultures: Multilingual Multimodal Emotion-Cause Analysis with Memory-inspired Framework. Dan Wu, Xincheng Ju, Dong Zhang, Shoushan Li, Erik Cambria, Guodong Zhou |
| 2025 | Emotion in a Bottle: Information Bottleneck Guided Disentanglement for Emotion Domain Adaptation. Jiankun Zhu, Sicheng Zhao, Lulu Tian, Jing Jiang, Xi Chen, Hongxun Yao |
| 2025 | Emotion-Qwen-VL: A Fully Fine-Tuned Multimodal Large Language Model for Micro-Expression Visual Question Answering. Yujing Wang, Ruotong Fang, Xing Huang, Zhiyuan Han, Xiaoqing Lin, Yuhao Shan, Tong Chen |
| 2025 | EmotionalCanines: A Dataset for Analysis of Arousal and Valence in Dog Vocalization. Tuan M. Dang, Theron S. Wang, Hridayesh Lekhak, Kenny Q. Zhu |
| 2025 | End-to-End Multiple Object Tracking with Dynamic Scene Perception. Ruonan Wei, Yuntao Wang, Siyan Fang, Yuehuan Wang |
| 2025 | Energy-based Deep Incomplete Multi-View Clustering. Ziyu Wang, Yiming Du, Rui Ning, Lusi Li |
| 2025 | Enhanced Dual-Pixel Image Reflection Removal via Gaussian Splatting. Kailong Yu, Liyuan Pan, Liu Liu, Wei Liang |
| 2025 | Enhanced Motion-aware Latent Diffusion Models for Video Frame Interpolation. Zhilin Huang, Chujun Qin, Yifei Xing, Wenming Yang |
| 2025 | Enhancing Democratic Mediation through Norm-Awareness in Generative Agent Societies. Tianjiao Xu, Hao Fu, Suiyang Zhang, Jianhua Yin, Tian Gan, Liqiang Nie |
| 2025 | Enhancing Diffusion Model Stability for Image Restoration via Gradient Management. Hongjie Wu, Mingqin Zhang, Linchao He, Ji-Zhe Zhou, Jiancheng Lv |
| 2025 | Enhancing Endoscopic Image Retrieval via Self-Supervised Learning and Large VLM-Based Re-ranking. Khoa Tran, Linh Ly, Duy Khanh Ho, Ngoc Hoang Luong |
| 2025 | Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models. Yu-Wei Zhan, Fan Liu, Xin Luo, Xin-Shun Xu, Liqiang Nie, Mohan Kankanhalli |
| 2025 | Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data. Xun Zhu, Fanbin Mo, Zheng Zhang, Jiaxi Wang, Yiming Shi, Ming Wu, Chuang Zhang, Miao Li, Ji Wu |
| 2025 | Enhancing Multi-view Open-set Learning via Ambiguity Uncertainty Calibration and View-wise Debiasing. Zihan Fang, Zhiyong Xu, Lan Du, Shide Du, Zhiling Cai, Shiping Wang |
| 2025 | Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization. Huiyi Chen, Jiawei Peng, Kaihua Tang, Xin Geng, Xu Yang |
| 2025 | Enhancing Multimodal Personality Assessment with LLM-Augmented Hierarchical Fusion. Longjiang Yang, Cong Yu, Chenxi Huang, Fengyu Zhang, Ran Liu, Zhuofan Wen, Shun Chen, Hailiang Yao, Bin Liu, Zheng Lian, Jianhua Tao |
| 2025 | Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning. Hongfei Xue, Yufeng Tang, Hexin Liu, Jun Zhang, Xuelong Geng, Lei Xie |
| 2025 | Enhancing Pseudo-Boxes via Data-Level LiDAR-Camera Fusion for Unsupervised 3D Object Detection. Mingqian Ji, Jian Yang, Shanshan Zhang |
| 2025 | Enhancing Small-Scale Dataset Expansion with Triplet-Connection-based Sample Re-Weighting. Ting Xiang, Changjian Chen, Zhuo Tang, Qifeng Zhang, Fei Lyu, Li Yang, Jiapeng Zhang, Kenli Li |
| 2025 | Enhancing Sports Experiences Through Video-Based Interactions. João Diogo |
| 2025 | Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models. Nanxing Hu, Xiaoyue Duan, Jinchao Zhang, Guoliang Kang |
| 2025 | Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration. Yicheng Pan, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Quan Liu, Jianqing Gao, Feng Ma |
| 2025 | Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes. Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Qingming Li, Tianyu Du, Shouling Ji |
| 2025 | Ensuring Responses Contain Appropriate Images: Timing Judgment for Multimodal Responses. Hao Yang, Tian Zheng, Yanyan Zhao, Bing Qin |
| 2025 | Entity Graph Alignment and Visual Reasoning for Multimodal Fake News Detection. Guoyi Li, Die Hu, Xiaomeng Fu, Qirui Tang, Yulei Wu, Xiaodan Zhang, Honglei Lyu |
| 2025 | Entity-Level Alignment with Prompt-Guided Adapter for Remote Sensing Image-Text Retrieval. Shuoshuo Li, Shuli Cheng, Liejun Wang |
| 2025 | Epipolar Consistency-based Network for Structure-Aware LF Semantic Segmentation. Chen Gao, Youfang Lin, Wenbin Wang, Shuo Zhang |
| 2025 | EvRAW: Event-guided Structural and Color Modeling for RAW-to-sRGB Image Reconstruction. Wenli Zheng, Huiyuan Fu, Xicong Wang, Hao Kang, Chuanming Wang, Jin Liu, Zekai Xu, Heng Zhang, Huadong Ma |
| 2025 | Evaluating Perceptual Color Preferences in Smartphone Photography: Dataset and Challenges. Zhihua Wang, Weixia Zhang, Wei Zhou, Xiaohong Liu, Guangtao Zhai, Patrick Le Callet |
| 2025 | Evaluating Visual Quality of Autostereoscopic 3D Displays via a Multimodal Parameter Perception Network. Liqian Zhang, Feng Yuan, Haoran Xie, Fu Lee Wang, Zhaoqing Pan |
| 2025 | Evaluating and Mitigating Sycophancy in Large Vision-Language Models. Jiayi Gao, Huaiwen Zhang |
| 2025 | Evaluating the Evaluators: Towards Human-aligned Metrics for Missing Markers Reconstruction. Taras Kucherenko, Derek Peristy, Judith Bütepage |
| 2025 | Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks. Yurun Chen, Xueyu Hu, Keting Yin, Juncheng Li, Shengyu Zhang |
| 2025 | Evaluation of Egyptian Hieroglyph Classification Across Diverse Writing Styles. Maksim Golyadkin, Valeria Rubanova, Aleksandr Utkov, Dmitry Nikolotov, Ilya Makarov |
| 2025 | Event Chain-Driven Communication Strategy Generation for News Videos. Qinglan Wei, Ruiqi Xue, Mingyue Liao, Long Ye |
| 2025 | Event Consistency-aware Robust Fake News Detection. Liyuan Cao, Zihang Guo, Huaiwen Zhang |
| 2025 | Event-Enriched Image Analysis Grand Challenge At ACM Multimedia 2025. Thien-Phuc Tran, Minh-Quang Nguyen, Minh-Triet Tran, Tam V. Nguyen, Trong-Le Do, Duy-Nam Ly, Viet-Tham Huynh, Khanh-Duy Le, Mai-Khiem Tran, Trung-Nghia Le |
| 2025 | EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction. Qile Su, Shoutai Zhu, Shuai Zhang, Baoyu Liang, Chao Tong |
| 2025 | EventLip: Enhancing Event-Based Lip Reading via Frequency-Aware Spatiotemporal Hypergraph Modeling. Xueyi Zhang, Jialu Sun, Chengwei Zhang, Xianghu Yue, Tianfang Xiao, Siqi Cai, Mingrui Lao, Haizhou Li |
| 2025 | EventVAD: Training-Free Event-Aware Video Anomaly Detection. Yihua Shao, Haojin He, Sijie Li, Siyu Chen, Xinwei Long, Fanhu Zeng, Yuxuan Fan, Muyang Zhang, Ziyang Yan, Ao Ma, Xiaochen Wang, Hao Tang, Yan Wang, Shuyan Li |
| 2025 | Evidential Remote Physiological Measurement via Uncertainty-aware Fusion of Video and RF. Jieyi Ge, Zhaodong Sun, Wei Peng, Chenhang Ying, Yuwei Chen, Kui Ren, Xiaobai Li |
| 2025 | EvoVLMA: Evolutionary Vision-Language Model Adaptation. Kun Ding, Ying Wang, Shiming Xiang |
| 2025 | Ex Pede Herculem, Predicting Global Actionness Curve from Local Clips. Xu Chen, Yang Li, Yahong Han, Jialie Shen |
| 2025 | ExDA: Towards Universal Detection and Plug-and-Play Attribution of AI-Generated Ex-Regulatory Images. Wenpeng Mu, Zheng Li, Qiang Xu, Xinghao Jiang, Tanfeng Sun |
| 2025 | Excavating the Most Critical Gaussians: Sparse Selection and Structural Optimization for Efficient 3DGS Compression. Yang Hu, Jingui Ma, Yucheng Yang, Jie Liang, Jinbo Yan, Jiahao Wu, Jiayu Yang, Yang Deng, Ronggang Wang |
| 2025 | ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments. Jiali Chen, Yujie Jia, Zihan Wu, Jinyu Yang, Jianpeng Chen, Xusen Hei, Jiayuan Xie, Yi Cai, Qing Li |
| 2025 | Explaining Listener Reactions: Personality-Guided Facial Response Generation with Cross-Modal Attention. Peng Wang, Pujun Xue, Xiaofeng Liu, Tongjuan Ji |
| 2025 | Explicit Context Reasoning with Supervision for Visual Tracking. Fansheng Zeng, Bineng Zhong, Haiying Xia, Yufei Tan, Xiantao Hu, Liangtao Shi, Shuxiang Song |
| 2025 | ExplorAR: Assisting Older Adults to Learn Smartphone Apps through AR-powered Trial-and-Error with Interactive Guidance. Jiawei Li, Linjie Qiu, Zhiqing Wu, Qiongyan Chen, Ziyan Wang, Mingming Fan |
| 2025 | Exploring Adapter Design Tradeoffs for Low Resource Music Generation. Atharva Mehta, Shivam Chauhan, Monojit Choudhury |
| 2025 | Exploring Fourier Prior and Event Collaboration for Low-Light Image Enhancement. Chunyan She, Fujun Han, Chengyu Fang, Shukai Duan, Lidan Wang |
| 2025 | Exploring Global Correlations via Polarity Memory for Multispectral Demosaicing. Mengzu Liu, Junwei Xu, Tao Huang, Fangfang Wu, Le Dong, Xin Li, Weisheng Dong |
| 2025 | Exploring Multimodal Prompts For Unsupervised Continuous Anomaly Detection. Mingle Zhou, Jiahui Liu, Jin Wan, Gang Li, Min Li |
| 2025 | Exploring Palette based Color Guidance in Diffusion Models. Qianru Qiu, Jiafeng Mao, Xueting Wang |
| 2025 | Extending Lifelog Retrieval to Multi-stream Video Retrieval at the CASTLE Challenge 2025. Quang-Linh Tran, Hoang-Bao Le, Thang-Long Nguyen-Ho, Graham Healy, Liting Zhou, Allie Tran |
| 2025 | Eye-based Emotion Recognition via Event-Driven Sparse Transformers. Zixuan Wan, Jiqing Zhang, Yushan Wang, Hu Lin, Yafei Wang, Zetian Mi, Xin Yang, Xianping Fu, Huibing Wang |
| 2025 | EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR. Zihao Ding, Cheng-Tse Lee, Mufeng Zhu, Tao Guan, Yuan-Chun Sun, Cheng-Hsin Hsu, Yao Liu |
| 2025 | EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model. Sijing Li, Tianwei Lin, Lingshuai Lin, Wenqiao Zhang, Jiang Liu, Xiaoda Yang, Juncheng Li, Yucheng He, Xiaohui Song, Jun Xiao, Yueting Zhuang, Beng Chin Ooi |
| 2025 | Eyes on the Road, Mind Beyond Vision: Context-Aware Multi-modal Enhanced Risk Anticipation. Jiaxun Zhang, Haicheng Liao, Yumu Xie, Chengyue Wang, Yanchen Guan, Bin Rao, Zhenning Li |
| 2025 | F-DDIM: A Featurized Denoising Diffusion Implicit Model for Facial Image Steganography. Liqi Yan, Xuebin Li, Jianhui Zhang, Fangli Guan, Kanglei Peng, Pan Li |
| 2025 | FA Jiahao Wang, Fang Liu, Licheng Jiao, Hao Wang, Shuo Li, Lingling Li, Puhua Chen, Xu Liu, Xinyi Wang |
| 2025 | FAB-Attack: Fabric-Aware Adversarial Attacks on Person Detectors under Motion Blur. Jiaqi Hou, Kewei Zhang, Tianyu Yang, Chengyu Jia, Qiqi Lin, Hui Wei, Zheng Wang |
| 2025 | FACE: A Dual-Template and Adaptive Curriculum Framework for Unsupervised Text-Based Person Search. Xiaoxuan Mu, Haoyu Tang, Han Jiang, Tianyuan Liang, Qinghai Zheng, Jihua Zhu |
| 2025 | FAME: Fusion-Aware Multi-modal Ensemble for Social Media Popularity Prediction. Yan Zhuang, Wei Bai, Yanru Zhang, Minhao Liu, Jiawen Deng, Fuji Ren |
| 2025 | FAMRD: Frequency-Aware Multimodal Reverse Distillation for Industrial Anomaly Detection. Qiyin Zhong, Xianglin Qiu, Xiaolei Wang, Zhen Zhang, Gang Liu, Jimin Xiao |
| 2025 | FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data. Hezhao Liu, Yang Lu, Mengke Li, Yiqun Zhang, Shreyank N. Gowda, Chen Gong, Hanzi Wang |
| 2025 | FCG: High-Throughput JPEG Heterogeneous Inference with Hybrid Parallel Pipeline on Mobile Devices. Youbo Mao, Ziyang Kang, Pengfei Li, Jiyao Chen, Zenglin Yang, Zhijun Li |
| 2025 | FCM-RT: Real-Time Feature Coding for Machines. Ashan Perera, Md Eimran Hossain Eimon, Juan Merlos, Velibor Adzic, Hari Kalva, Borko Furht |
| 2025 | FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning. Zhuozhao Hu, Kaishen Yuan, Xin Liu, Zitong Yu, Yuan Zong, Jingang Shi, Huanjing Yue, Jingyu Yang |
| 2025 | FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching. Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang, Shiwan Zhao, Haiyang Sun, Yanqing Liu, Haoqin Sun, Jiaming Zhou, Yan Lu, Yong Qin |
| 2025 | FFCBA: Feature-based Full-target Clean-label Backdoor Attacks. Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Liantao Wu, Zhe Li, Weifeng Liu |
| 2025 | FG-Midiformer: A Symbolic Music Understanding Model towards Fine-Grained Learning of Multi-Attributes. Haonan Cheng, Junwei Zhang, Hengyan Huang, Long Ye |
| 2025 | FGRFlow: Learning Fine-Grained Rigidity Scene Flow from 4D Radar Point Cloud. Mingliang Zhai, Yiheng Wang, Haidong Hu, Chi-Man Pun, Hao Gao |
| 2025 | FITMM: Adaptive Frequency-Aware Multimodal Recommendation via Information-Theoretic Representation Learning. Wei Yang, Rui Zhong, Yiqun Chen, Shixuan Li, Heng Ping, Chi Lu, Peng Jiang |
| 2025 | FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion model. Lingzhou Mu, Baiji Liu, Ruonan Zhang, Guiming Mo, Jiawei Jin, Kai Zhang, Haozhi Huang |
| 2025 | FORGET ME: Federated Unlearning for Face Generation Models. Fan Qi, Ao liu, Zixin Zhang, Changsheng Xu |
| 2025 | FRED: The Florence RGB-Event Drone Dataset. Gabriele Magrini, Niccolò Marini, Federico Becattini, Lorenzo Berlincioni, Niccolò Biondi, Pietro Pala, Alberto Del Bimbo |
| 2025 | FSCDiff: Frequency-Spatial Entangled Conditional Diffusion model for Underwater Salient Object Detection. Hua Li, Gaowei Lin, Zhiyuan Li, Sam Kwong, Runmin Cong |
| 2025 | FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment. Sijing Wu, Yunhao Li, Ziwen Xu, Yixuan Gao, Huiyu Duan, Wei Sun, Guangtao Zhai |
| 2025 | FaceCluster: Interactive Photo Organization with Enhanced Face Recognition. Alexander Filonenko, Ilya Makarov, Andrey V. Savchenko |
| 2025 | FaceInsight: A Multimodal Large Language Model for Face Perception. Jingzhi Li, Changjiang Luo, Ruoyu Chen, Hua Zhang, Wenqi Ren, Jianhou Gan, Xiaochun Cao |
| 2025 | Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media. Van-Hoang Phan, Tung-Duong Le-Duc, Long-Khanh Pham, Anh-Thu Le, Quynh-Huong Dinh-Nguyen, Dang-Quan Vo, Hoang-Quoc Nguyen-Son, Anh-Duy Tran, Dang Vu, Minh-Son Dao |
| 2025 | Factorized Transformer Hashing with Adaptive Routing for Large-scale Image Retrieval. Yadong Huo, Qibing Qin, Wenfeng Zhang, Lei Huang, Jie Nie |
| 2025 | FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis. Mengchao Wang, Qiang Wang, Fan Jiang, Yaqi Fan, Yunpeng Zhang, Yonggang Qi, Kun Zhao, Mu Xu |
| 2025 | Farther Than Mirror: Explore Pattern-Compensated Depth of Mirror with Temporal Changes for Video Mirror Detection. Zhaohu Xing, Lihao Liu, Tian Ye, Sixiang Chen, Yijun Yang, Guang Liu, Lei Zhu |
| 2025 | Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene Understanding. Wencan Huang, Daizong Liu, Wei Hu |
| 2025 | FastRSR: Efficient and Accurate Road Surface Reconstruction in Bird's Eye View. Yuting Zhao, Yuheng Ji, Xiaoshuai Hao, Shuxiao Li |
| 2025 | FeatShield: Isolating Malicious Feature Extractors for Backdoor-Robust Federated Learning. Zhou Tan, De Li, Yirui Huang, Jia-Li Yin, Ximeng Liu |
| 2025 | FedAPT: Federated Adversarial Prompt Tuning for Vision-Language Models. Kun Zhai, Siheng Chen, Xingjun Ma, Yu-Gang Jiang |
| 2025 | FedBAP: Backdoor Defense via Benign Adversarial Perturbation in Federated Learning. Xinhai Yan, Libing Wu, Zhuangzhuang Zhang, Bingyi Liu, Lijuan Huo, Jing Wang |
| 2025 | FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning. Yubin Zheng, Pak-Hei Yeung, Jing Xia, Tianjie Ju, Peng Tang, Weidong Qiu, Jagath C. Rajapakse |
| 2025 | FedRog: Robust Federated Graph Classification for Strong Heterogeneity and High-Noise Scenarios. De Li, Zhou Tan, Qiyu Li, Zeming Gan, Tiange Xia, Jinyan Wang, Xianxian Li |
| 2025 | Federated Incomplete Multi-view Clustering with Individual Structure Preservation and Central Representation Tensorization. Yan Li, Xingchen Hu, Jiyuan Liu, Zhong Liu |
| 2025 | Fighting Fire with Fire (F3): A Training-free and Efficient Visual Adversarial Example Purification Method in LVLMs. Yudong Zhang, Ruobing Xie, Yiqing Huang, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, Di Wang, Yu Wang |
| 2025 | Financial Models meets Generative Art: Black-Scholes-Inspired Concept Blending in Text-to-Image Diffusion. Divya Kothandaraman, Ming Lin, Dinesh Manocha |
| 2025 | Find True Collaborators: Banzhaf Index-based Cross View Alignment for Partially View-aligned Clustering. Shanghui Deng, Xiao Zheng, Chang Tang, Kun Sun, Yuanyuan Liu, Xinwang Liu |
| 2025 | Fine-grained Zero-Shot Object Detection. Hongxu Ma, Chenbo Zhang, Lu Zhang, Jiaogen Zhou, Jihong Guan, Shuigeng Zhou |
| 2025 | Fine-tuning Bias Neurons for Fair Text-to-Image Generation. Fan Qi, Zhan Wang, Changsheng Xu, Huaiwen Zhang |
| 2025 | FineBadminton: A Multi-Level Dataset for Fine-Grained Badminton Video Understanding. Xusheng He, Wei Liu, Shanshan Ma, Qian Liu, Chenghao Ma, Jianlong Wu |
| 2025 | FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning. Haodong Chen, Haojian Huang, Xinxiang Yin, Dian Shao |
| 2025 | FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos. Rui Chen, Lei Sun, Jing Tang, Geng Li, Xiangxiang Chu |
| 2025 | FingerVeinSyn-5M: A Million-Scale Dataset and Benchmark for Finger Vein Recognition. Yifan Wang, Jie Gui, Baosheng Yu, Qi Li, Zhenan Sun, Juho Kannala, Guoying Zhao |
| 2025 | FlexGaussian: Flexible and Cost-Effective Training-Free Compression for 3D Gaussian Splatting. Boyuan Tian, Qizhe Gao, Siran Xianyu, Xiaotong Cui, Minjia Zhang |
| 2025 | Flexible Multi-view Clustering with Dynamic Views Generation. Yalan Qin, Nan Pu, Hanzhou Wu, Zhaoxin Fan |
| 2025 | Flip is Better than Noise: Unbiased Interest Generation for Multimedia Recommendation. Yue He, Jingxi Xie, Fengling Li, Lei Zhu, Jingjing Li |
| 2025 | FloorplanSBS: Synthesizing Vector Floorplans by Patch-Based Floorplan Segmentation. Wenming Wu, Tianlei Sheng, Gaofeng Zhang, Liping Zheng |
| 2025 | FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing. Gaoxiang Cong, Liang Li, Jiadong Pan, Zhedong Zhang, Amin Beheshti, Anton van den Hengel, Yuankai Qi, Qingming Huang |
| 2025 | FlowTrack: Integrating Adjacent-Frame Motion Tracking and Adaptive Prediction for Robust Semi-Supervised VOS. Duolin Wang, Guanyu Xing, Yanli Liu |
| 2025 | Flowing Crowd to Count Flows: A Self-Supervised Framework for Video Individual Counting. Feng-Kai Huang, Bo-Lun Huang, Li-Wu Tsao, Jhih-Ciang Wu, Hong-Han Shuai, Wen-Huang Cheng |
| 2025 | FluidGS: Physics Informed Gaussian Splatting for Dynamic Fluid Reconstruction from Sparse Views. Youchen Xie, Chen Li, Sheng Qiu, Zhi-Jun Wang, Chenhui Li, Yibo Zhao, Zan Gao, Changbo Wang |
| 2025 | Focus Where It Matters: LLM-Guided Regional Identification for Instruction-based Image Editing. Minho Park, Youngjoo Jo, Jae-Hyeok Lee, JiYong Lee, Dong-oh Kang, Yong Man Ro |
| 2025 | Focus on Generalization: Improving Adversarial Transferability via Bi-Level Bias Mitigation. Yiqiang Guo, Lei Zhong, Bin Chen, Jia-Li Yin, Xiaolei Liu, Shouling Ji |
| 2025 | Focus on the Object: Gradient-based Feature Modulation for Camouflaged Object Segmentation. Naisong Luo, Yuan Wang, Yuwen Pan, Rui Sun |
| 2025 | FocusTrack: One-Stage Focus-and-Suppress Framework for 3D Point Cloud Object Tracking. Sifan Zhou, Jiahao Nie, Ziyu Zhao, Yichao Cao, Xiaobo Lu |
| 2025 | FoodLogAthl-218: Constructing a Real-World Food Image Dataset Using Dietary Management Applications. Mitsuki Watanabe, Sosuke Amano, Kiyoharu Aizawa, Yoko Yamakata |
| 2025 | Foresail: LLM Sensor Knowledge Empowered Status-guided Network for Multivariate Time-series Classification. Yuhan Jing, Bo He, Haifeng Sun, Qi Qi, Zirui Zhuang, Lei Zhang, Jianxin Liao, Jingyu Wang |
| 2025 | FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents. Bobo Li, Yuheng Wang, Hao Fei, Juncheng Li, Wei Ji, Mong-Li Lee, Wynne Hsu |
| 2025 | Formula Spotting Based on Synergy Perception and Representation Mining. Gang Pan, Hongen Liu, Di Sun |
| 2025 | Forward-Only Continual Learning. Jiao Chen, Jiayi He, Fangfang Chen, Zuohong Lv, Jianhua Tang |
| 2025 | Fourier Self-Adaptation for Transferring General Pretrained Models to Specific Domains. Lei Liu, Xiangdong Su, Guanglai Gao |
| 2025 | FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks. Tianyi Wang, Harry Cheng, Ming-Hui Liu, Mohan Kankanhalli |
| 2025 | Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing. Bo Gao, Jianhui Wang, Xinyuan Song, Yangfan He, Fangxu Xing, Tianyu Shi |
| 2025 | FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation. Yuxuan Jiang, Zehua Chen, Zeqian Ju, Chang Li, Weibei Dou, Jun Zhu |
| 2025 | FreeCAD: A Multimodal Framework for 3D CAD Model Generation from Free-Form Prompts. Dawei Lin, Meng Yuan, Ziming Wang, Tieru Wu, Yuanning Liu |
| 2025 | FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors. Chenxi Li, Weijie Wang, Qiang Li, Nicu Sebe, Bruno Lepri, Weizhi Nie |
| 2025 | Freq-RWKV: Granularity-Aware Spatial-Frequency Synergy via Dual-Domain Recurrent Scanning for Pan-sharpening. Xueheng Li, Xuanhua He, Tao Hu, Jie Zhang, Man Zhou, Chengjun Xie, Yingying Wang, Bo Huang |
| 2025 | Frequency Domain Distributed Perturbations: Towards Query-Efficient Black-Box Adversarial Video Attack. Teng Jin, Ziwen He, Zhangjie Fu, Songping Wang, Yueming Lyu, Yufei Shi |
| 2025 | Frequency Meets Semantics: Text-Visual Fusion with Directional Spectral Enhancement for Salient Object Detection in Optical Remote Sensing Images. Lamei Di, Bin Zhang, Yiming Wang, Wenxia Zhang |
| 2025 | Frequency Regulation for Exposure Bias Mitigation in Diffusion Models. Meng Yu, Kun Zhan |
| 2025 | Frequency-aware Correlation Discovering and Spatial Forgery Clue Distilling for Synthetic Image Detection. Jiehua Zhang, Liang Li, Chenggang Yan, Wei Ke, Yihong Gong |
| 2025 | Frequency-refined Graph Convolution Network with Cross-modal Wavelet Denoising for Recommendation. Feiyu Peng, Chaobo He, Junwei Cheng, Huijuan Hu, Wenkai Zhang, Youda Mo |
| 2025 | From Captions to Rewards (CaReVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models. Muzhi Dai, Jiashuo Sun, Zhiyuan Zhao, Shixuan Liu, Rui Li, Junyu Gao, Xuelong Li |
| 2025 | From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models. Zhaoxi Mu, Rilin Chen, Andong Li, Meng Yu, Xinyu Yang, Dong Yu |
| 2025 | From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving. Xinyu Xia, Xingjun Ma, Yunfeng Hu, Ting Qu, Hong Chen, Xun Gong |
| 2025 | From Guesswork to Guarantee: Towards Faithful Multimedia Web Forecasting with TimeSieve. Songning Lai, Ninghui Feng, Jiechao Gao, Hao Wang, Haochen Sui, Xin Zou, Jiayu Yang, Wenshuo Chen, Lijie Hu, Hang Zhao, Xuming Hu, Yutao Yue |
| 2025 | From Hemoglobin to MOS: Towards Neuro-Based QoE Assessment Using fNIRS. Natalia Jakubiec, Lucjan Janowski |
| 2025 | From Individuals to Crowds: Dual-Level Public Response Prediction in Social Media. Jinghui Zhang, Kaiyang Wan, Longwei Xu, Ao Li, Zongfang Liu, Xiuying Chen |
| 2025 | From Language to Instance: Generative Visual Prompting for Zero-shot Camouflaged Object Detection. Zihou Zhang, Hao Li, Zhengwei Yang, Zechao Hu, Liang Li, Zheng Wang |
| 2025 | From Model Diagram to Code: A Benchmark Dataset and Multi-Agent Framework. Mengzhen Wang, Xunbin Huang, Jiayuan Xie, Shukai Ma, Jiale Men, Dayong Liang, Yi Cai |
| 2025 | From Outline to Detail: An Hierarchical End-to-end Framework for Coherent and Consistent Visual Novel Generation and Assembly. Yilin Zhang, Yanyan Wei, Zhao Zhang, Jicong Fan, Haijun Zhang, Shuicheng Yan |
| 2025 | From Pixels to Semantics: A Novel MLLM-Driven Approach for Explainable Tampered Text Detection. Guitao Xu, Ziqi Yi, Peirong Zhang, Jiahuan Cao, Shihang Wu, Lianwen Jin |
| 2025 | From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training. Jinwen Wang, Youfang Lin, Xiaobo Hu, Siyu Yang, Sheng Han, Shuo Wang, Kai Lv |
| 2025 | From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models. Yuying Shang, Xinyi Zeng, Yutao Zhu, Xiao Yang, Zhengwei Fang, Jingyuan Zhang, Jiawei Chen, Zinan Liu, Yu Tian |
| 2025 | From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users. Shahroz Tariq, Simon S. Woo, Priyanka Singh, Irena Irmalasari, Saakshi Gupta, Dev Gupta |
| 2025 | From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Open-vocabulary Grounded Situation Recognition. Chen Cai, Tianyi Liu, Jianjun Gao, Wenyang Liu, Kejun Wu, Ruoyu Wang, Yi Wang, Soo Chin Liew |
| 2025 | From Subtle Hints to Grand Expressions - Mastering Fine-grained Emotions with Dynamic Multimodal Analysis. Qinfu Xu, Liyuan Pan, Shaozu Yuan, Yiwei Wei, Chunlei Wu |
| 2025 | FutureGS: Structured Gaussian Fields for Future-Aware Dynamic Scene Modeling. Mingyang Ding, Zhan Wang, Jiachen Wang, Tingting Han, Xinyuan Hu, Jiajun Ding, Min Tan, Zhenzhong Kuang |
| 2025 | G2LFormer: Global-to-Local Query Enhancement for Robust Table Structure Recognition. Haosheng Cai, Yang Xue |
| 2025 | GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning. Bo Liu, Xiangyu Zhao, Along He, Yidi Chen, Huazhu Fu, Xiao-Ming Wu |
| 2025 | GENEA Workshop 2025: The 6th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents. Taras Kucherenko, Alice Delbosc, Rajmund Nagy, Laura B. Hensel, Youngwoo Yoon, Oya Çeliktutan, Gustav Eje Henter |
| 2025 | GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts. Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Chenyang Li, Hanyuan Chen, Jin-Peng Lan, Jun-Yan He, Bin Luo, Yifeng Geng |
| 2025 | GM-DF: Generalized Multi-Scenario Deepfake Detection. Yingxin Lai, Hongyang Wang, Jing Yang, Xiangui Kang, Bin Li, Linlin Shen, Zitong Yu |
| 2025 | GMML: Gradient-Modulated Robustness for Imbalance-Aware Multimodal Learning. Zikai Zhang, Xu Zhang, Ziyi Li, Yidong Li, Yuanzhouhan Cao |
| 2025 | GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs. Xiaorong Zhu, Ziheng Jia, Jiarui Wang, Xiangyu Zhao, Haodong Duan, Xiongkuo Min, Jia Wang, Zicheng Zhang, Guangtao Zhai |
| 2025 | GOES: 3D Gaussian-based One-shot Head Animation with Any Emotion and Any Style. Chuhang Ma, Shuai Tan, Junjie Wei, Ye Pan |
| 2025 | GPT-ReID: Learning Fine-grained Representation with GPT for Text-based Person Retrieval. Xudong Wang, Lei Tan, Pingyang Dai, Liujuan Cao, Rongrong Ji |
| 2025 | GTHNA: Local-global Graph Transformer with Memory Reconstruction for Holistic Node Anomaly Evaluation. Mingkang Li, Xuexiong Luo, Yue Zhang, Yaoyang Li, Fu Lin |
| 2025 | GUI-Narrator: Detecting and Captioning Computer GUI Actions. Qinchen Wu, Difei Gao, Qinghong Lin, Zhuoyu Wu, Mike Zheng Shou |
| 2025 | GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology Analysis. Ruoqi Wang, Haitao Wang, Qiong Luo |
| 2025 | Gamma: Toward Generic Image Assessment with Mixture of Assessment Experts. Hantao Zhou, Rui Yang, Longxiang Tang, Guanyi Qin, Runze Hu, Xiu Li |
| 2025 | Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective. Yan Zhang, Gangyan Zeng, Daiqing Wu, Huawen Shen, Binbin Li, Yu Zhou, Can Ma, Xiaojun Bi |
| 2025 | GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting. Lei Yao, Yi Wang, Yi Zhang, Moyun Liu, Lap-Pui Chau |
| 2025 | Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation. Konstantin Egorov, Stepan Botman, Pavel Blinov, Galina Zubkova, Anton Ivaschenko, Alexander Kolsanov, Andrey V. Savchenko |
| 2025 | Gaze-Adaptive Foveation for Remote Rendered VR. Adhi Widagdo, Teemu Kämäräinen, Ahmad Alhilal, Matti Siekkinen, Cheng-Hsin Hsu |
| 2025 | Gen4Track: A Tuning-free Data Augmentation Framework via Self-correcting Diffusion Model for Vision-Language Tracking. Jiawei Ge, Xinyu Zhang, Jiuxin Cao, Xuelin Zhu, Weijia Liu, Qingqing Gao, Biwei Cao, Kun Wang, Chang Liu, Bo Liu, Chen Feng, Ioannis Patras |
| 2025 | GenStream: Semantic Streaming Framework for Generative Reconstruction of Human-centric Media. Emanuele Artioli, Daniele Lorenzi, Shivi Vats, Farzad Tashtarian, Christian Timmerer |
| 2025 | GenWardrobe: A Fully Generative System for Travel Fashion Wardrobe Construction. Peng Jin, Yilin Wen, Mingzhe Yu, Yunshan Ma, Rong Zheng, Jintu Fan, Chong Wah Ngo |
| 2025 | Generalizable Audio Deepfake Detection via Risk-Aware Style Alignment and Structural Empirical Risk Minimization. Mingru Yang, Yanmei Gu, Qianhua He, Peirong Zhang, Haolin He, Zhiming Wang, Huijia Zhu, Jian Liu, Weiqiang Wang |
| 2025 | Generalizable Engagement Estimation in Conversation via Domain Prompting and Parallel Attention. Yangchen Yu, Yin Chen, Jia Li, Peng Jia, Yu Zhang, Li Dai, Zhenzhen Hu, Meng Wang, Richang Hong |
| 2025 | Generalizing to New Area: Self-Distillation Curriculum Learning for Fine-Grained Cross View Localization. Fenghao Tian, Mingtao Feng, Jianqiao Luo, Zijie Wu, Longlong Mei, Lijie Yang, Weisheng Dong, Yaonan Wang |
| 2025 | Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial Inspection. Yilin Lu, Jianghang Lin, Linhuang Xie, Kai Zhao, Yansong Qu, Shengchuan Zhang, Liujuan Cao, Rongrong Ji |
| 2025 | Generating 3D Hair Strands from Images with Diverse Styles and Viewpoints. Pengyu Long, Zijun Zhao, Min Ouyang, Qingcheng Zhao, Wei Yang, Lan Xu, Jingyi Yu |
| 2025 | Generating Negative Samples for Multi-Modal Recommendation. Yanbiao Ji, Dan Luo, Chang Liu, Shaokai Wu, Jing Tong, Qichen He, Deyi Ji, Hongtao Lu, Yue Ding |
| 2025 | Generative AI for Multimedia Communication: Recent Advances, An Information-Theoretic Framework, and Future Opportunities. Yili Jin, Xue Liu, Jiangchuan Liu |
| 2025 | Generative Flow Networks for Personalized Multimedia Systems: A Case Study on Short Video Feeds. Yili Jin, Ling Pan, Rui-Xiao Zhang, Jiangchuan Liu, Xue Liu |
| 2025 | Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos. Haowen Gao, Liang Pang, Shicheng Xu, Leigang Qu, Tat-Seng Chua, Huawei Shen, Xueqi Cheng |
| 2025 | Generative Multi-Sensory Meditation: Exploring Immersive Depth and Activation in Virtual Reality. Yuyang Jiang, Binzhu Xie, Lina Xu, Xiaokang Lei, Shi Qiu, Luwen Yu, Pan Hui |
| 2025 | Generative Semantic Probing for Vision-Language Models via Hierarchical Feature Optimization. He Wang, Longquan Dai, Shihao Pu, Shaomeng Wang, Jinhui Tang |
| 2025 | Genesis: A Large-Scale Benchmark for Multimodal Large Language Model in Emotional Causality Analysis. Yulong Li, Yuxuan Zhang, Rui Chen, Feilong Tang, Zhixiang Lu, Ming Hu, Jianghao Wu, Haochen Xue, Mian Zhou, Chong Li, Jionglong Su, Imran Razzak |
| 2025 | Geo-CF2Net: Geometry-Prior Cross-Frequency Interactive Fusion Network for 3D Human Action Recognition. Zhaoyu Chen, Qian Huang, Xing Li, Yunfei Zhang, Shihao Han, Ge Gao, Yirui Wu, Xin Li, Ziyang Yin |
| 2025 | GeoMag: A Vision-Language Model for Pixel-level Fine-Grained Remote Sensing Image Parsing. Xianzhi Ma, Jianhui Li, Changhua Pei, Hao Liu |
| 2025 | GeoQE: Enhancing Quality of Experience in Point Cloud Streaming. Junzhe Zhang, Chengfeng Han, Dandan Ding, Zhan Ma |
| 2025 | GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions. Jo-Ku Cheng, Zeren Zhang, Ran Chen, Jingyang Deng, Ziran Qin, Jinwen Ma |
| 2025 | Geometric Gradient Divergence Modulation for Imbalanced Multimodal Learning. Disen Hu, Xun Jiang, Zhe Sun, Hao Yang, Chong Peng, Peng Yan, Heng Tao Shen, Xing Xu |
| 2025 | Gloss Matters: Unlocking the Potential of Non-Autoregressive Sign Language Translation. Zhihao Wang, Shiyu Liu, Zhiwei He, Kangjie Zheng, Liangying Shao, Junfeng Yao, Jinsong Su |
| 2025 | Google Industry Seminar: Video Processing in the New Age of AI. Balu Adsumilli, Jianle Chen, In Suk Chong, Yilin Wang |
| 2025 | Gradient-Aware Revitalization of Non-Effective Samples in Medical Image Segmentation. Shiying Lin, Rong Hu, Zuoyong Li, Qinghua Lin, Jiawei Wu, Changqing Zhang |
| 2025 | Granular Music Attribute Transformation with Proximal Policy Optimization Adapters for Diffusion Model. Kunsheng Ma, Fan Qi, Changsheng Xu |
| 2025 | Graph Canvas for Controllable 3D Scene Generation. Libin Liu, Shen Chen, Sen Jia, Jingzhe Shi, Can Jin, Zongkai Wu, Jenq-Neng Hwang, Lei Li |
| 2025 | Graph Unlearning Meets Influence-aware Negative Preference Optimization. Qiang Chen, Zhongze Wu, Ang He, Xi Lin, Shuo Jiang, Shan You, Chang Xu, Yi Chen, Xiu Su |
| 2025 | Graph-Guided Dual-Level Augmentation for 3D Scene Segmentation. Hongbin Lin, Yifan Jiang, Juangui Xu, Jesse Jiaxi Xu, Yi Lu, Zhengyu Hu, Ying-Cong Chen, Hao Wang |
| 2025 | Graph-Perceptron with Semantic Fidelity for No-Reference Super-Resolution Image Quality Assessment. Lei Chen |
| 2025 | Graph-based Approximate Nearest Neighbor Search by Deep Reinforcement Routing. Mingjie Li, Junhao Lin, Dian Ouyang, Ying Zhang, Wei Wang |
| 2025 | GraphSplat: Sparse-View Generalizable 3D Gaussian Splatting is Worth Graph of Nodes. Zeyang Bai, Yunbiao Wang, Dongbo Yu, Jun Xiao, Lupeng Liu |
| 2025 | GraphVideoAgent: Enhancing Long-form Video Understanding with Entity Relation Graphs. Meng Chu, Yicong Li, Tat-Seng Chua |
| 2025 | GraphWorld: Ultra-fast Graph Engine for World-Wide Web Searching. Xinbiao Gan, Qiang Zhang, Tiejun Li, Chunye Gong, Kai Lu |
| 2025 | GroMo25: ACM Multimedia 2025 Grand Challenge for Plant Growth Modeling with Multiview Images. Shreya Bansal, Ruchi Bhatt, Amanpreet Chander, Rupinder Kaur, Malya Singh, Mohan Kankanhalli, Abdulmotaleb El Saddik, Mukesh Saini |
| 2025 | Ground and Reconstruct: Entity-Region Bidirectional Alignment Pre-Training for Low-Resource GMNER. Runwei Situ, Yi Cai, Yong Xu, Jiexin Wang |
| 2025 | Grounding Emotion Recognition with Visual Prototypes: VEGA - Revisiting CLIP in MERC. Guanyu Hu, Dimitrios Kollias, Xinyu Yang |
| 2025 | GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset. Sahar Nasirihaghighi, Negin Ghamsarian, Leonie Peschek, Matteo Munari, Heinrich Husslein, Raphael Sznitman, Klaus Schoeffmann |
| 2025 | HAFUNet: A Hierarchical Attention Fusion Network for Monocular Depth Estimation Integrating Event and Frame Data. Siyuan Zhang, Xiaoping Wang, Jiang Li, Weibin Feng, Xin Zhan, Hongzhi Huang |
| 2025 | HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection. Jialei Cui, Jianwei Du, Yanzhe Li, Lei Gao, Hui Jiang, Chenfu Bao |
| 2025 | HAN: Korean Heritage Augmented Narrative Visual-Language Description Dataset. Sunghyun Moon, Aidyn Zhakatayev, Seungjae Lee |
| 2025 | HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided Drones. Hao Ruan, Jinliang Lin, Yingxin Lai, Zhiming Luo, Shaozi Li |
| 2025 | HDCFN: Haze Distribution-aware Cross-modal Fusion Network for Infrared-guided Dense Haze Removal in UAVs. Junwei Zhao, Qianchun Luo, Shiliang Zhang, Shen Gao, Jie Wu |
| 2025 | HEALTH+: Empowering Individuals via Unifying Health Data. Sujaya Maiyya, Shantanu Sharma, Avinash Kumar |
| 2025 | HEAR: A Holistic Extraction and Agentic Reasoning Framework for Document Understanding. Longfeng Chen, Zheng Xiao, Juyuan Wang, Zeyu Huang, Yawen Zeng, Jin Xu |
| 2025 | HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction. Jie Qin, Wei Yang, Yan Su, Yiran Zhu, Weizhen Li, Yunyue Pan, Chengchang Pan, Honggang Qi |
| 2025 | HGAC Zongxing Zhao, Shenzhi Yang, Xingkai Yao, Yuying Wang, Zhongqiu Chen, Xiaofang Zhang |
| 2025 | HGC-Avatar: Hierarchical Gaussian Compression for Streamable Dynamic 3D Avatars. Haocheng Tang, Ruoke Yan, Xinhui Yin, Qi Zhang, Xinfeng Zhang, Siwei Ma, Wen Gao, Chuanmin Jia |
| 2025 | HGCF: Hierarchical Geometry-Color Fusion for Multimodal Industrial Anomaly Detection. Min Li, Jinghui He, Jiachen Li, Delong Han, Jin Wan, Gang Li |
| 2025 | HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs. Zijian Zhang, Xuecheng Wu, Danlei Huang, Siyu Yan, Chong Peng, Xuezhi Cao |
| 2025 | HL-EAI: A Multimodal Framework Enabling Emotional Reciprocity in Human-AI Strategic Decision-Making. Mikhail Mozikov, Daniil Orekhov, Ivan Nasonov, Konstantin Baltsat, Vladislav Pedashenko, Dmitrii Abramov, Nikita Severin, Yury Maximov, Andrey V. Savchenko, Ilya Makarov |
| 2025 | HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation. Pei Liu, Xin Liu, Ruoyu Yao, Junming Liu, Siyuan Meng, Ding Wang, Jun Ma |
| 2025 | HOLA: Enhancing Audio-visual Deepfake Detection via Hierarchical Contextual Aggregations and Efficient Pre-training. Xuecheng Wu, Heli Sun, Danlei Huang, Xinyi Yin, Yifan Wang, Hao Wang, Jia Zhang, Fei Wang, Peihao Guo, Suyu Xing, Junxiao Xue, Liang He |
| 2025 | HOPE: Hierarchical Fusion for Optimized and Personality-Aware Estimation of Depression. Hanlei Shi, Yu Liu, Haoxun Li, Yuxuan Ding, Jiaxi Hu, Leyuan Qu, Taihao Li |
| 2025 | HOPNet: Learning Hand-Object-Person Interaction Network for Hand Contact State Detection. Wei Li, Yizhao Wan, Xiao Wu, Jianshuai Wang, Penglin Dai, Zhaoquan Yuan |
| 2025 | HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation. Weihuang Lin, Yiwei Ma, Xiaoshuai Sun, Shuting He, Jiayi Ji, Liujuan Cao, Rongrong Ji |
| 2025 | HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval. Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, Weili Guan |
| 2025 | HVEval: Towards Unified Evaluation of Human-Centric Video Generation and Understanding. Sijing Wu, Yunhao Li, Huiyu Duan, Yanwei Jiang, Yucheng Zhu, Guangtao Zhai |
| 2025 | HairShifter: Consistent and High-Fidelity Video Hair Transfer via Anchor-Guided Animation. Wangzheng Shi, Yinglin Zheng, Yuxin Lin, Jianmin Bao, Ming Zeng, Dong Chen |
| 2025 | HandCraft: Tactile-Informed Hand-Object Dynamics Capture and Realistic Rendering. Hongyang Lin, Kuixiang Shao, Peijun Xu, Zhuoyang Bu, Yuyang Jiao, Ziyuan Tang, Chenxi Xiao, Jingyi Yu |
| 2025 | HandSolo: A Mid-Air Hand Pose Interaction Method Based on Disentangled Degrees-of-Hand-Freedom. Songpei Xu, Xuri Ge, Chaitanya Kaul, Roderick Murray-Smith |
| 2025 | Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities. Rui Liu, Haolin Zuo, Zheng Lian, Hongyu Yuan, Qi Fan |
| 2025 | HarmoniVox: Painting Voices to Match the Avatar's Soul. Songtao Zhou, Xiaoyu Qin, Yixuan Zhou, Qixin Wang, Zeyu Jin, Zixuan Wang, Zhiyong Wu, Jia Jia |
| 2025 | Harnessing Multimodal Large Language Models for Personalized Product Search with Query-aware Refinement. Beibei Zhang, Yanan Lu, Ruobing Xie, Zongyi Li, Siyuan Xing, Tongwei Ren, Fen Lin |
| 2025 | HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection. Han Wang, Zhuoran Wang, Roy Ka-Wei Lee |
| 2025 | HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning. Chuhang Zheng, Chunwei Tian, Jie Wen, Daoqiang Zhang, Qi Zhu |
| 2025 | Heterogeneous Encoder Fusion with KAN Decoder for Group Engagement Modeling via 8× Sliding Pipelines. Yuefeng Zou, Hui Zhang, Jun Yu, Keda Lu, Lingsi Zhu, Fengzhao Sun, Bo Wang, Kun Yao, Jianqing Sun, Jiaen Liang |
| 2025 | Hi-Motion: Hierarchical Intention Guided Conditional Motion Synthesis. Le Han, Kaixuan Chen, Minchen Ye, Nenggan Zheng |
| 2025 | HiDream-I1: An Open-Source High-Efficient Image Generative Foundation Model. Qi Cai, Yehao Li, Yingwei Pan, Ting Yao, Tao Mei |
| 2025 | HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs. Zhaolin Cai, Fan Li, Ziwei Zheng, Yanjun Qin |
| 2025 | HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation. Wenqi Dong, Bangbang Yang, Zesong Yang, Yuan Li, Tao Hu, Hujun Bao, Yuewen Ma, Zhaopeng Cui |
| 2025 | HierMEQA: A Relationship-Aware Hierarchical Framework for Consistent Micro-Expression Visual Question Answering. Lingsi Zhu, Yanjun Chi, Jun Yu, Gongpeng Zhao, Yuefeng Zou, Fengzhao Sun, Xilong Lu |
| 2025 | Hierarchical Disentanglement of Cognitive States for Enhanced Cognitive Diagnosis. Hengnian Gu, Zhifu Chen, Jin Peng Zhou, Dongdai Zhou |
| 2025 | Hierarchical Meta-prototypes Network for Few-shot Action Recognition. Xiaoyu Chen, Yigang Cen, Wanru Xu, Yue Zhang, Yi Jin, Yidong Li, Linna Zhang |
| 2025 | Hierarchical Multi-Feature Extraction and Aggregation for Micro-Action Recognition. Zhichao Xia, Yichi Zhang, Yanjun Chi, Lingsi Zhu, Mohan Jing, Jun Yu |
| 2025 | Hierarchical Spatiotemporal Context Aggregation and Speckle-aware Deformable Convolution for Echocardiography Video Segmentation. Jingxing Guo, Guilian Chen, Yimu Sun, Huisi Wu, Jing Qin |
| 2025 | Hierarchical Vision-Language Reasoning for Multimodal Multiple-Choice Question Answering. Ao Zhou, Zebo Gu, Tenghao Sun, Jiawen Chen, Mingsheng Tu, Zifeng Cheng, Yafeng Yin, Zhiwei Jiang, Qing Gu |
| 2025 | High-Performance Discriminative Tracking with Spatio-Temporal Template Fusion. Xuedong He, Huiying Xu, Xinzhong Zhu, Hongbo Li |
| 2025 | Higher-Order Vision-Language Fusion for Video Popularity Prediction. Kele Xu, Qisheng Xu, Binli Luo, Han Zhou, Zengming Lin, Hui Geng, Xianhan Tan |
| 2025 | HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation. Haiyang Zhou, Wangbo Yu, Jiawen Guan, Xinhua Cheng, Yonghong Tian, Li Yuan |
| 2025 | HoloTrace: LLM-based Bidirectional Causal Knowledge Graph for Edge-Cloud Video Anomaly Detection. Hanling Wang, Qing Li, Li Chen, Haidong Kang, Fei Ma, Yong Jiang |
| 2025 | Hot-Swap MarkBoard: An Efficient Black-box Watermarking Approach for Large-scale Model Distribution. Zhicheng Zhang, Peizhuo Lv, Mengke Wan, Jiang Fang, Diandian Guo, Yezeng Chen, Yinlong Liu, Wei Ma, Jiyan Sun, Liru Geng |
| 2025 | How Generative AI Understands the Balance of Energy, Efficiency, and Human Experience. Tomoya Sawada |
| 2025 | How2Compress: Scalable and Efficient Edge Video Analytics via Adaptive Granular Video Compression. Yuheng Wu, Thanh-Tung Nguyen, Lucas Liebe, Quang Tau, Pablo Espinosa Campos, Jinghan Cheng, Dongman Lee |
| 2025 | Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval. Zhengxin Pan, Haishuai Wang, Fangyu Wu, Peng Zhang, Jiajun Bu |
| 2025 | Human Motion Generation in 3D Scenes from Open-Ended Textual Instructions with MLLM Planning. Siyi Qian, Jian Fang, Yuzhou Mao, Yayun Zou, Wentao Zhang, Haiwei Xue |
| 2025 | Human vs AI: How Digital Human News Anchors Affect Our Cognitive Processes? Yan-Kai Liu, Shunyang Yao, Tao Xi, Bao-Liang Lu, Wei-Long Zheng |
| 2025 | Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric. Zhichao Zhang, Wei Sun, Xinyue Li, Yunhao Li, Qihang Ge, Jun Jia, Zicheng Zhang, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, Guangtao Zhai |
| 2025 | HumanPrinter: Reconstructing 3D Human from a Single Image Like a 3D Printer. Leyuan Liu, Shen Chen, Jingying Chen |
| 2025 | HyMoENet: Mixture-of-Experts Enhanced CNN-Transformer Hybrid Framework for Classifying Anatomical Sites in Endoscopic ENT Images. Trong-Nhan Nguyen, Luan L. M. Nguyen, Phat-Dat To, Tran-Quoc Duy Nguyen, Anh-Huy Nguyen, Tuan Pham-Dang, Chu Lam Nguyen, Duy V. M. Nguyen |
| 2025 | HybridPlane: A General 4D Representation for Dynamic Scene Reconstruction. Ru Jia, Xiaoqian Liang, Xubin Duan, Jianji Wang, Nanning Zheng |
| 2025 | HydraMamba: Multi-Head State Space Model for Global Point Cloud Learning. Kanglin Qu, Pan Gao, Qun Dai, Yuanhao Sun |
| 2025 | I Huilin Chen, Miaomiao Cai, Fan Liu, Zhiyong Cheng, Richang Hong, Meng Wang |
| 2025 | I-C Attack: In-place and Cross-pixel Augmentations for Highly Transferable Transformation-based Attacks. Jiaming Liang, Chi-Man Pun |
| 2025 | I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking. Ziyan Liu, Junwen Li, Kaiwen Li, Tong Ruan, Chao Wang, Xinyan He, Zongyu Wang, Xuezhi Cao, Jingping Liu |
| 2025 | ICAS: Detecting Training Data from Autoregressive Image Generative Models. Hongyao Yu, Yixiang Qiu, Yiheng Yang, Hao Fang, Tianqu Zhuang, Jiaxin Hong, Bin Chen, Hao Wu, Shu-Tao Xia |
| 2025 | ICE: Intercede Concept Erasure in Text-to-Image Diffusion Models. Yizhou Lin, Nisha Huang, Kaer Huang, Henglin Liu, Yiqiang Yan, Jie Guo, Tong-Yee Lee, Xiu Li |
| 2025 | ICS-MR: Interactive Conversation Scenarios for Assessment of Mixed Reality Communication. Felix Immohr, Gareth Rendle, Annika Neidhardt, Anton Benjamin Lammert, Bernd Froehlich, Alexander Raake |
| 2025 | IDPFlow: A No-Code Agentic Framework for Multimodal Intelligent Document Processing. Goutham Vignesh, Harikrishnan P. M., Siddartha Reddy, Saisubramaniam Gopalakrishnan, Vishal Vaddina |
| 2025 | IFS-Light: An Interactive Framework for Single-view Face Relighting with both Facial and Lighting Consistency. Shuyang Wang, Chunxiao Li, Anlong Ming |
| 2025 | IM-POI: Bridging ID and Multi-modal Gaps in Next POI Recommendation. Siyuan Huang, Jiahui Jin, Xin Lin, Xigang Sun, Yukun Ban |
| 2025 | INDS: Incremental Named Data Streaming for Real-Time Point Cloud Video. Ruonan Chai, Yixiang Zhu, Xinjiao Li, Jiawei Li, Zili Meng, Dirk Kutscher |
| 2025 | IPCMoE: Integrating Perceptual Cues with Mixture-of-Experts for Joint Low-Light Image Enhancement and Deblurring. Yuezhou Li, Yuzhen Niu, Huangbiao Xu, Hui Da, Rui Xu, Wenxi Liu |
| 2025 | ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting. Yu Zhang, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Tao Jin, Zhou Zhao |
| 2025 | IXR '25: 3rd International Workshop on Interactive eXtended Reality. Irene Viola, Silvia Rossi, Marta Orduna, Maria Torres Vega |
| 2025 | Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation. Wenhao Li, Xiu Su, Jingyi Wu, Feng Yang, Yang Liu, Yi Chen, Shan You, Chang Xu |
| 2025 | Identity-Preserving Facial Aesthetic Enhancement via Hierarchical Prompt Learning and Pivotal Tuning. Fangli Ying, Zhihong Zhang, Liting Zhou, Cathal Gurrin, Jinhai Wang |
| 2025 | Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations. Yuji Wang, Moran Li, Xiaobin Hu, Ran Yi, Jiangning Zhang, Han Feng, Weijian Cao, Yabiao Wang, Chengjie Wang, Lizhuang Ma |
| 2025 | Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement. Jiayi Gao, Changcheng Hua, Qingchao Chen, Yuxin Peng, Yang Liu |
| 2025 | Identity-Preserving Video Generation Challenge. Yiheng Zhang, Zhaofan Qiu, Qi Cai, Yehao Li, Fuchen Long, Yingwei Pan, Ting Yao, Tao Mei |
| 2025 | Illustration Layout Generation for Slide Enhancement with Pixel-based Diffusion Model. Zhaoyun Jiang, Jiaqi Guo, Shakie Liu, Chao Han, Ting Liu, Jian-Guang Lou, Dongmei Zhang |
| 2025 | Image Captioning with Multimodal Guidance and Search Space Optimization. Yimou Guo, Yaochen Li, Jingze Liu, Jiahui Feng, Haoyi Lou, Zhimin Chen, Yuan Gao, Yuanqi Su |
| 2025 | Image Retargeting based on Text Region Awareness. Gang Pan, Meihua Liu, Lei Zhou, Jiahao Wang, Di Sun |
| 2025 | Imagining Vision From Language for Few-Shot Class-Incremental Learning. Shuo Li, Xingchen Liu, Fang Liu, Licheng Jiao, Jiahao Wang, Xinyan Huang, Yanbiao Ma, Puhua Chen, Lingling Li, Xu Liu, Xuejian Gou |
| 2025 | Immunizing Images from Text to Image Editing via Adversarial Cross-Attention. Matteo Trippodo, Federico Becattini, Lorenzo Seidenari |
| 2025 | Impact of Stickers on Multimodal Sentiment and Intent in Social Media: A New Task, Dataset and Baseline. Yuanchen Shi, Fang Kong, Longyin Zhang |
| 2025 | Implementation of Visualizer for Beats and Scratches. Masatoshi Hamanaka |
| 2025 | Implicit Retinex Decomposition with Chromaticity Disentanglement for Low-Light Image Enhancement. Mufan Liu, Wu Ran, Zhiquan He, Zuojie Xie, Hong Lu, Peirong Ma |
| 2025 | Improving Compositional Generalization in Cross-Embodiment Learning via Mixture of Disentangled Prototypes. Ren Wang, Xin Wang, Tongtong Feng, Xinyue Gong, Guangyao Li, Yu-Wei Zhan, Qing Li, Wenwu Zhu |
| 2025 | Improving Identity Preservation in Video Generation with Multi-Branch Models. Jiahao Xu, Jianjie Luo, Zhenguo Yang |
| 2025 | Incorporating the Refractory Period into Spiking Neural Networks through Spike-Triggered Threshold Dynamics. Yang Li, Xinyi Zeng, Zhe Xue, Pinxian Zeng, Zikai Zhang, Yan Wang |
| 2025 | Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models. Yiming Wu, Zhenghao Chen, Huan Wang, Dong Xu |
| 2025 | Infrared and Visible Image Fusion with Language-Driven Loss in CLIP Embedding Space. Yuhao Wang, Lingjuan Miao, Zhiqiang Zhou, Lei Zhang, Yajun Qiao |
| 2025 | Infusing AI Art with Cultural Authenticity Through the Culture-Specific LoRA. Zuona Chen, James She |
| 2025 | Ingredients-Guided and Nutrients-Prompted Network for Food Nutrition Estimation. Donglin Zhang, Boyuan Ma, Xiaojun Wu, Josef Kittler |
| 2025 | Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts. YongXiang Hua, Haoyu Cao, Zhou Tao, Bocheng Li, Zihao Wu, Chaohu Liu, Linli Xu |
| 2025 | InstructCrop: Teaching Multimodal Large Language Models to Crop Aesthetic Images. Xiangfei Sheng, Pangu Xie, Weidong Zou, Pengfei Chen, Tong Zhu, Leida Li |
| 2025 | InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing. Kun-Hsiang Lin, Yu-Wen Tseng, Kang-Yang Huang, Jhih-Ciang Wu, Wen-Huang Cheng |
| 2025 | InstructStep: Fine-Grained Localization of Step Content and Relation in Instructional Video. Wangsheng He, Wanru Xu, Ping Guo, Zhenjiang Miao, Yi Tian |
| 2025 | Intelligent Immersification in the Metaverse: AI-Driven Immersive Multimedia. Aik Beng Ng, Yethoven Tukimin, Jeannie S. Lee, Megani Rajendran, Chek Tien Tan, Indriyati Atmosukarto |
| 2025 | IntentVC 2025: The ACM Multimedia Grand Challenge on Intention-Oriented Controllable Video Captioning. Takahiro Komamizu, Marc A. Kastner, Yasutomo Kawanishi, Trung Thanh Nguyen, Junan Chen |
| 2025 | IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video Captioning. Tianheng Qiu, Jingchun Gao, Jingyu Li, Huiyi Leong, Xuan Huang, Xi Wang, Xiaocheng Zhang, Kele Xu, Lan Zhang |
| 2025 | Inter-Task Weaving in Image Enhancement: From a New Unified Architecture to a Better Meta-Representation Learning. Nan An, Siqi Xu, Long Ma, Zhu Liu, Guangchao Han, Tengyu Ma, Risheng Liu |
| 2025 | InterAnimate: Taming Region-Aware Diffusion Model for Realistic Human Interaction Animation. Yukang Lin, Yan Hong, Zunnan Xu, Xindi Li, Chao Xu, Chuanbiao Song, Ronghui Li, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang, Xiu Li |
| 2025 | InterMind: Doctor-Patient-Family Interactive Depression Assessment Empowered by Large Language Models. Zhiyuan Zhou, Jilong Liu, Sanwang Wang, Shijie Hao, Yanrong Guo, Richang Hong |
| 2025 | Interact-Custom: Customized Human Object Interaction Image Generation. Zhu Xu, Zhaowen Wang, Yuxin Peng, Yang Liu |
| 2025 | InteractGuide: LLM-Enhanced Multimodal Reasoning for User-Centric Interaction Recommendations in AR-HRI Authoring. Yunqiang Pei, Hongrong Yang, Kaiyue Zhang, Guoqing Wang, Peng Wang, Chaoning Zhang, Yang Yang, Heng Tao Shen |
| 2025 | InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects. Xinhao Cai, Minghang Zheng, Xin Jin, Yang Liu |
| 2025 | Interactive Retrieval System for Multi-Stream Collections: multiXview at CASTLE 2025 Interactive Grand Challenge. Omar Shahbaz Khan, Ujjwal Sharma, Gonçalo Marcelino, Aaron Duane, Stevan Rudinac, Marcel Worring, Björn Þór Jónsson |
| 2025 | Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis. Trong-Thang Pham, Anh Nguyen, Zhigang Deng, Carol C. Wu, Hien Nguyen, Ngan Le |
| 2025 | Inverse-Tone-Mapped HDR Video Quality Assessment for Broadcast Television: A Comprehensive Dataset and SDR-Referenced Method. Leidong Fan, Qian Zhang, Qing Li |
| 2025 | Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models. Zejian Li, Yize Li, Chenye Meng, Zhongni Liu, Ling Yang, Shengyuan Zhang, Guang Yang, Changyuan Yang, Zhiyuan Yang, Lingyun Sun |
| 2025 | Investigating Domain Gaps for Indoor 3D Object Detection. Zijing Zhao, Zhu Xu, Qingchao Chen, Yuxin Peng, Yang Liu |
| 2025 | JPEG-RAE: Reversible Adversarial Example for Privacy and Copyright Protection of JPEG Images. Dahao Fu, Jiangqun Ni, Jian Zhang |
| 2025 | JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering. Renmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, Qinglin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, Minlie Huang |
| 2025 | Joint Holistic and Lesion Controllable Mammogram Synthesis via Gated Conditional Diffusion Model. Xin Li, Kaixiang Yang, Qiang Li, Zhiwei Wang |
| 2025 | Joint Test-time Adaptation with Refined Pseudo-labels and Latent Score Matching. Yijie Yang, Lianyong Qi, Weiming Liu, Fan Wang, Jing Du, Yuwen Liu, Xiaolong Xu, Qiang Ni, Wanchun Dou, Xiaokang Zhou |
| 2025 | K-Space Bispectrum Steganography for Robust Unlearnable Data. Jiahao Li, Yiqiang Chen, Yunbing Xing, Yang Gu, Xiangyuan Lan |
| 2025 | KAID: Knowledge-Aware Interactive Distillation for Vision-Language Models. Da Zhang, Feiyu Wang, Bingyu Li, Zhiyuan Zhao, Junyu Gao, Xuelong Li |
| 2025 | KDTalker++: Controllable Talking Portrait Generation with Audio, Text, and Expression Editing. Chaolong Yang, Yinuo Guo, Kai Yao, Yuyao Yan, Jie Sun, Kaizhu Huang |
| 2025 | KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection. Peican Zhu, Yubo Jing, Le Cheng, Keke Tang, Yangming Guo |
| 2025 | KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features. Ivan Kukanov, Jun Wah Ng |
| 2025 | Kinematic Enhanced Hypergraph Convolutional Network for Skeleton-based Human Action Recognition with LLM Training Guides. Nan Ma, Beining Sun, Yiheng Han, Genbao Xu |
| 2025 | Knowledge Negative Distillation: Circumventing Overfitting to Unlock More Generalizable Deepfake Detection. Jipeng Liu, Haichao Shi, Yaru Zhang, Xiao-Yu Zhang |
| 2025 | Knowledge Regularized Negative Feature Tuning of Vision-Language Models for Out-of-Distribution Detection. Wenjie Zhu, Yabin Zhang, Xin Jin, Wenjun Zeng, Lei Zhang |
| 2025 | LAVA Grand Challenge 2025: Benchmarking Japanese-English Document Understanding with Large Vision-Language Models. Daichi Sato, Duc Minh Vo, Khan Md. Anwarus Salam, Hidenori Shoji, Yuma Matsuoka, Takara Taniguchi, Kaito Baba, Hideki Nakayama |
| 2025 | LDW: Label Divergence Weighting for Multimodal Sentiment Analysis. Quanqi Du, Loic De Langhe, Els Lefever, Véronique Hoste |
| 2025 | LEAF-Mamba: Local Emphatic and Adaptive Fusion State Space Model for RGB-D Salient Object Detection. Lanhu Wu, Zilin Gao, Hao Fei, Mong-Li Lee, Wynne Hsu |
| 2025 | LEGO: A Lightweight and Efficient Multiple-Attribute Unlearning Framework for Recommender Systems. Fengyuan Yu, Yuyuan Li, Xiaohua Feng, Junjie Fang, Tao Wang, Chaochao Chen |
| 2025 | LEHA-CVQAD: Dataset To Enable Generalized Video Quality Assessment of Compression Artifacts. Aleksandr Gushchin, Maksim Smirnov, Dmitriy S. Vatolin, Anastasia Antsiferova |
| 2025 | LES-CLIP: A Lightweight Emotion-Sensitive Adaptation of CLIP for Precise Similar Emotion Discrimination. Xiao Fu, Pengyu Wang, Wei Xi, Kun Zhao, Jiadong Feng, Jizhong Zhao |
| 2025 | LFMamba: Focal Stack-aware State Space Modeling for Light Field Salient Object Detection. Xinbo Geng, Fan Shi, Xu Cheng, Chen Jia, Meng Zhao, Shengyong Chen |
| 2025 | LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks. Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Mengfei Shi, Xia Xie, Shengyong Chen |
| 2025 | LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis. Hao Sun, Fenggen Yu, Huiyao Xu, Tao Zhang, Changqing Zou |
| 2025 | LLM-Grounded Diffusion for Cross-Domain Recommendation. Kuan Liu, Ke Wang, Ji Zhang, Gang Zhou |
| 2025 | LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning. Shibo Sun, Xue Li, Donglin Di, Mingjie Wei, Lanshun Nie, Weinan Zhang, Dechen Zhan, Yang Song, Lei Fan |
| 2025 | LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs. Zitong Xu, Huiyu Duan, Bingnan Liu, Guangji Ma, Jiarui Wang, Liu Yang, Shiqi Gao, Xiaoyu Wang, Jia Wang, Xiongkuo Min, Guangtao Zhai, Weisi Lin |
| 2025 | LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs. Woo Yi Yang, Jiarui Wang, Sijing Wu, Huiyu Duan, Yuxin Zhu, Liu Yang, Kang Fu, Guangtao Zhai, Xiongkuo Min |
| 2025 | LSC-ADL: An Activity of Daily Living (ADL)-Annotated Lifelog Dataset Generated via Semi-Automatic Clustering. Duy-Khang Ho, Minh-Quan Ho-Le, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran |
| 2025 | LSFDNet: A Single-Stage Fusion and Detection Network for Ships Using SWIR and LWIR. Yanyin Guo, Runxuan An, Junwei Li, Zhiyuan Zhang |
| 2025 | LUMOS: A Lumbar Multimodal Osteoporosis Screening Dataset with X-ray and CT images. Keyue Shi, Qianqian Shen, Zhaoming Ye, Liangjun Jiang, Jiajun Bu, Haishuai Wang |
| 2025 | LVLM-HBA: Large Vision-Language Model with Cross-Modal Alignment for Human Behavior Analysis. Jun Yu, Xilong Lu, Lingsi Zhu, Qiang Ling |
| 2025 | LVLM-MIR: Large Vision-Language Model with Parameter-Efficient Fine-Tuning for Multimodal Interleaved Reasoning. Jun Yu, Xilong Lu, Cong Wang, Qiang Ling |
| 2025 | LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation. Hanning Chen, Yang Ni, Wenjun Huang, Hyunwoo Oh, Yezi Liu, Tamoghno Das, Mohsen Imani |
| 2025 | LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation. Wenhui Song, Hanhui Li, Jiehui Huang, Panwen Hu, Yuhao Cheng, Long Chen, Yiqiang Yan, Xiaodan Liang |
| 2025 | Label Prediction Inherited Hashing for Cross-Modal Retrieval: Applying Supervised Hashing to Unsupervised Tasks. Kaihang Jiang, Wai Keung Wong, Jianyang Qin, Xiaozhao Fang, Jie Wen, Bingzhi Chen, Hongbo Gao |
| 2025 | Label-Semantics-Guided Multi-View Multi-Label Learning via High-Order Semantic Fusion. Kaixiang Wang, Xiaojian Ding, Wanqi Yang, Ming Yang |
| 2025 | Language-Driven 3D Human Pose Estimation in Multi-Person Scenarios: A New Dataset and Approach. Tingrui Shen, Bangzhen Liu, Zhirun Fan, Shiting Zhang, Weifeng Pan, Fan Sun, Dan Cao, Shengfeng He |
| 2025 | Large Generative Models Meet Multimodal Applications (LGM Zheng Wang, Qianqian Chen, Yiyang Luo, Zhiqiu Ye, Shi Wei, Hanwang Zhang, Tat-Seng Chua |
| 2025 | Large-Small Model Synergy with Multimodal Fine-Grained Heuristics for Knowledge-Based Visual Question Answering. Zhongfan Sun, Kan Guo, Yongli Hu, Daxin Tian, Qingqing Gao, Jiapu Wang, Junbin Gao, Yanfeng Sun, Baocai Yin |
| 2025 | LargeMvC-Net: Anchor-based Deep Unfolding Network for Large-scale Multi-view Clustering. Shide Du, Chunming Wu, Zihan Fang, Wendi Zhao, Yilin Wu, Changwei Wang, Shiping Wang |
| 2025 | Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations. Naresh Kumar Devulapally, Shruti Agarwal, Tejas Gokhale, Vishnu Suresh Lokhande |
| 2025 | Latent Interactiveness Field for Non-Contact Human Object Interaction Detection. Xiang Huang, Ao Luo, Xiao Wu, Zhaoquan Yuan |
| 2025 | Latent Space Consistency for Sparse-View CT Reconstruction. Duoyou Chen, Yunqing Chen, Can Zhang, Zhou Wang, Cheng Chen, Ruoxiu Xiao |
| 2025 | Lava: Language Driven Scalable and Versatile Traffic Video Analytics. Yanrui Yu, Tianfei Zhou, Jiaxin Sun, Lianpeng Qiao, Lizhong Ding, Ye Yuan, Guoren Wang |
| 2025 | Layer Separation: Towards Adjustable Joint Space Width Images Synthesis. Haolin Wang, Yafei Ou, Prasoon Ambalathankandy, Gen Ota, Pengyu Dai, Masayuki Ikebe, Kenji Suzuki, Tamotsu Kamishima |
| 2025 | Leader is Guided: Interactive Motion Generation via Lead-Follow Paradigm and Trajectory Guidance. Runqi Wang, Caoyuan Ma, Jian Zhao, Hanrui Xu, Dongfang Sun, Haoyang Chen, Lin Xiong, Zheng Wang, Xuelong Li |
| 2025 | Learn 3D VQA Better with Active Selection and Reannotation. Shengli Zhou, Yang Liu, Feng Zheng |
| 2025 | Learned Single-Pass Multitasking Perceptual Graphics for Immersive Displays. Doga Yilmaz, He Wang, Towaki Takikawa, Duygu Ceylan, Kaan Aksit |
| 2025 | Learning Adaptive Node Selection with External Attention for Human Interaction Recognition. Chen Pang, Xuequan Lu, Qianyu Zhou, Lei Lyu |
| 2025 | Learning Arbitrary-Scale RAW Image Downscaling with Wavelet-based Recurrent Reconstruction. Yang Ren, Hai Jiang, Wei Li, Menglong Yang, Heng Zhang, Zehua Sheng, Qingsheng Ye, Shuaicheng Liu |
| 2025 | Learning Discrepant Transformations for Face Privacy Protection. Chenda Wei, Haoyue Wang, Zhenxing Qian, Sheng Li, Xinpeng Zhang, Jian Liu |
| 2025 | Learning Evidential Delta Denoising Scores for Video Editing. Yufan Hu, Kunlin Yang, Junyu Gao, Bin Fan, Hongmin Liu |
| 2025 | Learning Hierarchical Cross-modal Association with Intra-modal Context for Text-Image Person Retrieval. Yifei Deng, Chenglong Li, Futian Wang, Jin Tang |
| 2025 | Learning Invariant Discriminative Patterns for Unified Anomaly Detection. Chengcheng Xing, Yanyu Xu, Yonghui Xu, Lizhen Cui |
| 2025 | Learning Long-Range Action Representation by Two-Stream Mamba Pyramid Network for Figure Skating Assessment. Fengshun Wang, Qiurui Wang, Peilin Zhao |
| 2025 | Learning New Concepts, Remembering the Old: Continual Learning for Multimodal Concept Bottleneck Models. Songning Lai, Mingqian Liao, Zhangyi Hu, Jiayu Yang, Wenshuo Chen, Hongru Xiao, Jianheng Tang, Haicheng Liao, Yutao Yue |
| 2025 | Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search. Fan Hu, Zijie Xin, Xirong Li |
| 2025 | Learning Structural Priors via Laplacian RWKV Diffusion with Light-Effect Dataset for Nighttime Visibility Enhancement. Dirui Xie, Xiaofang Hu, Zihan Wei, Zhengqiqi Yang, Yanlian Jiang, Yue Zhou |
| 2025 | Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters. Nian Liu, Zilong Zhang, Zi Wang, Tengyu Liu, Hongzhao Xie, Xinyi Tong, Libin Liu, Yaodong Yang, Zhaofeng He |
| 2025 | Learning from Heterogeneity: Generalizing Dynamic Facial Expression Recognition via Distributionally Robust Optimization. Feng-Qi Cui, Anyang Tong, Jinyang Huang, Jie Zhang, Dan Guo, Zhi Liu, Meng Wang |
| 2025 | Learning the Anchors with Similar Distributions to Original Data for Multi-view Clustering. Junpu Zhang, Shengju Yu, Suyuan Liu, Siwei Wang, Miaomiao Li, Xinwang Liu, En Zhu, Kunlun He |
| 2025 | Learning to Be a Doctor: Searching for Effective Medical Agent Architectures. Yangyang Zhuang, Wenjia Jiang, Jiayu Zhang, Ze Yang, Joey Tianyi Zhou, Chi Zhang |
| 2025 | Less is More: High-value Data Selection for Visual Instruction Tuning. Zikang Liu, Kun Zhou, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wen |
| 2025 | Let Your Video Listen to Your Music! - Beat-Aligned, Content-Preserving Video Editing with Arbitrary Music. Xinyu Zhang, Dong Gong, Zicheng Duan, Anton van den Hengel, Lingqiao Liu |
| 2025 | Leveraging Multimodal Data and Side Users for Diffusion Cross-Domain Recommendation. Fan Zhang, Jinpeng Chen, Huan Li, Senzhang Wang, Yuan Cao, Kaimin Wei, Jianxiang He, Feifei Kou, Jinqing Wang |
| 2025 | Lightweight Medical Image Restoration via Integrating Reliable Lesion-Semantic Driven Prior. Pengcheng Zheng, Kecheng Chen, Jiaxin Huang, Bohao Chen, Ju Liu, Yazhou Ren, Xiaorong Pu |
| 2025 | Lightweight Relational Proposal Network with Dual-Branch Distillation for Video Moment Retrieval. Yujia Zhu, Hao Yang, Yibo Zhao, Chunjie Ma, Weili Guan, Zan Gao |
| 2025 | Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit. Yang Zhao, Shusheng Li, Xueshang Feng |
| 2025 | Like or Not to Like: An Usecase of Vietnamese Street Food Videos on YouTube. Duy X. Nguyen, Hoang V. Hoan, Ninh A. Vu, Loc T. Nguyen, Trung T. Phan |
| 2025 | Listening to the Unspoken: Exploring '365' Aspects of Multimodal Interview Performance Assessment. Jia Li, Yang Wang, Wenhao Qian, Jialong Hu, Zhenzhen Hu, Richang Hong, Meng Wang |
| 2025 | LoCo: Training-Free Layout-to-Image Synthesis with Localized Constraints. Peiang Zhao, Han Li, Ruiyang Jin, S. Kevin Zhou |
| 2025 | LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models. Shangqing Tu, Yucheng Wang, Daniel Zhang-Li, Yushi Bai, Jifan Yu, Yuhao Wu, Lei Hou, Huiqin Liu, Zhiyuan Liu, Bin Xu, Juanzi Li |
| 2025 | LooBox: Loose-box-supervised 3D Tumor Segmentation with Self-correcting Bidirectional Learning. Tianzhong Lan, Zhang Yi, Xiuyuan Xu, Min Zhu |
| 2025 | Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning. Yian Li, Wentao Tian, Yang Jiao, Tianwen Qian, Na Zhao, Bin Zhu, Jingjing Chen, Yu-Gang Jiang |
| 2025 | Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion. Xueyang Kang, Zhengkang Xiang, Zezheng Zhang, Kourosh Khoshelham |
| 2025 | Low-light Image Enhancement Quality Assessment: A Real-World Dataset and An Objective Method. Chunyi Li, Bo Hu, Taiyang Chen, Leida Li, Lihuo He, Xinbo Gao |
| 2025 | Low-light Invariant Representation Learning for Visible-Infrared Person Re-identification. Dengwen Wang, Guanyu Xing, Yanli Liu |
| 2025 | M2PE-Diff: Music-to-Pose Encoder for Dance Video Generation Leveraging Latent Diffusion Framework. Nokap Tony Park |
| 2025 | MAC 2025: The 2nd Micro-Action Analysis Grand Challenge. Kun Li, Dan Guo, Xiaobai Li, Haoyu Chen, Pengyu Liu, Fei Wang, Jingjing Hu, Guoying Zhao, Meng Wang |
| 2025 | MADPHash: Manipulation-Aware Deep Perceptual Hashing using Feature Consistency. Lizhi Xiong, Peipeng Yu, Yue Wu |
| 2025 | MAGNeT: Multimodal Adaptive Gaussian Networks for Intent Inference in Moving Target Selection across Complex Scenarios. Xiangxian Li, Yawen Zheng, Baiqiao Zhang, Yijia Ma, Xianhui Cao, Juan Liu, Yulong Bian, Jin Huang, Chenglei Yang |
| 2025 | MAP: Parameter-Efficient Tuning for Referring Expression Comprehension via Multi-Modal Adaptive Positional Encoding. Ruilin Yao, Yi Rong, Tianyu Zou, Bo Zhang, Jian Li, Shengwu Xiong, Shili Xiong |
| 2025 | MARA: A Multimodal Adaptive Retrieval-Augmented Framework for Document Question Answering. Hui Wu, Haoquan Zhai, Yuchen Li, Hengyi Cai, Peirong Zhang, Yidan Zhang, Lei Wang, Chunle Wang, Yingyan Hou, Shuaiqiang Wang, Dawei Yin |
| 2025 | MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation. Ruicheng Zhang, Yu Sun, Zeyu Zhang, Jinai Li, Xiaofan Liu, Hoi Fan Au, Haowei Guo, Puxin Yan |
| 2025 | MAXplain: A Multi-Agent System for Interactive Multimodal Hate Speech Detection. Nils Riekers, Marten Risius, Tong Chen |
| 2025 | MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models. Qiyan Zhao, Xiaofeng Zhang, Yiheng Li, Yun Xing, Xiaosong Yuan, Feilong Tang, Sinan Fan, Xuhang Chen, Da-Han Wang, Xu-Yao Zhang |
| 2025 | MCHM25: Multimedia Computing for Health and Medicine. Wei Zhou, Hadi Amirpour, Li Yu, Jungong Han, Richang Hong, Paul L. Rosin |
| 2025 | MCOD: The First Challenging Benchmark for Multispectral Camouflaged Object Detection. Yang Li, Tingfa Xu, Shuyan Bai, Peifu Liu, Jianan Li |
| 2025 | MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics. Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li |
| 2025 | MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding. Chang Liu, Ye Pan, Chenyang Ding, Susanto Rahardja, Xiaokang Yang |
| 2025 | MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering. Xinqi Fan, Jingting Li, John See, Moi Hoon Yap, Wen-Huang Cheng, Xiaobai Li, Xiaopeng Hong, Su-Jing Wang, Adrian K. Davison |
| 2025 | MER 2025: When Affective Computing Meets Large Language Models. Zheng Lian, Rui Liu, Kele Xu, Bin Liu, Xuefei Liu, Yazhou Zhang, Xin Liu, Yong Li, Zebang Cheng, Haolin Zuo, Ziyang Ma, Xiaojiang Peng, Xie Chen, Ya Li, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao |
| 2025 | MERIA: Empathetic Response Generation via Parallel Disentanglement and Uncertainty-Gated Fusion. Chenhao Dang, Zeyuan Zhu |
| 2025 | MESH - Understanding Videos Like Human: Measuring Hallucinations in Large Video Models. Garry Yang, Zizhe Chen, Man Hon Wong, Haoyu Lei, Yongqiang Chen, Zhenguo Li, Kaiwen Zhou, James Cheng |
| 2025 | MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios. Changtao Miao, Yi Zhang, Man Luo, Weiwei Feng, Kaiyuan Zheng, Qi Chu, Tao Gong, Jianshu Li, Yunfeng Diao, Wei Zhou, Joey Tianyi Zhou, Xiaoshuai Hao |
| 2025 | MGHFT: Multi-Granularity Hierarchical Fusion Transformer for Cross-Modal Sticker Emotion Recognition. Jian Chen, Yuxuan Hu, Haifeng Lu, Wei Wang, Min Yang, Chengming Li, Xiping Hu |
| 2025 | MGVC: MLLM-Guided Video Captioning for the IntentVC Challenge. Zhipeng Yu, Qianqian Xu, Yangbangyan Jiang, Pinci Yang, Qingming Huang |
| 2025 | MIG-COW: Transferable Adversarial Attacks on Deepfake Detectors via Gradient Decomposition. Wonjune Seo, Joonhyuk Baek, Yeseong Jung, Saerom Park |
| 2025 | MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and Editing. Xueyun Tian, Wei Li, Bingbing Xu, Yige Yuan, Yuanzhuo Wang, Huawei Shen |
| 2025 | MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models. Jiale Li, Mingrui Wu, Zixiang Jin, Hao Chen, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Rongrong Ji |
| 2025 | MINDEV: Multi-modal Integrated Diffusion Framework for Video Reconstruction from EEG Signals. Shuai Huang, Yongxiong Wang, Huan Luo, Haodong Jing, Chendong Qin, Jingqun Tang |
| 2025 | MIPS: A Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction. Jiaxi Wang, Yaosen Min, Xun Zhu, Miao Li, Ji Wu |
| 2025 | MIRA: A Novel Framework for Fusing Modalities in Medical RAG. Jinhong Wang, Tajamul Ashraf, Zongyan Han, Jorma Laaksonen, Rao Muhammad Anwer |
| 2025 | MIRAGE25: ACM MM25 Multimodal Interleaved Reasoning and Generation Challenge. Dong Chen, Fei Gao, Zhengqing Hu, Xiaojun Chang |
| 2025 | MISP-QEKS: A Large-Scale Dataset with Multimodal Cues for Query-by-Example Keyword Spotting. Shifu Xiong, Hang Chen, Shi Cheng, Kai Shen, Hengshun Zhou, Genshun Wan, Chenyue Zhang, Kewei Li, Jun Du, Lirong Dai |
| 2025 | MLLMs Meet Person Re-identification. Mengying Duan, He Li, Mang Ye |
| 2025 | MM-HSD: Multi-Modal Hate Speech Detection in Videos. Berta Céspedes-Sarrias, Carlos Collado-Capell, Pablo Rodenas-Ruiz, Olena Hrynenko, Andrea Cavallaro |
| 2025 | MM-Prompt: Multi-modality and Multi-granularity Prompts for Few-Shot Segmentation. Hang Xiong, Runmin Cong, Jinpeng Chen, Chen Zhang, Feng Li, Huihui Bai, Sam Kwong |
| 2025 | MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks. Wenqi Zeng, Yuqi Sun, Chenxi Ma, Weimin Tan, Bo Yan |
| 2025 | MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning. Ziliang Gan, Dong Zhang, Haohan Li, Yang Wu, Xueyuan Lin, Ji Liu, Haipang Wu, Chaoyou Fu, Zenglin Xu, Rongjunchen Zhang, Yong Dai |
| 2025 | MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks. Lei Zhang, Xin Zhou, Chaoyue He, Di Wang, Yi Wu, Hong Xu, Wei Liu, Chunyan Miao |
| 2025 | MMF-SV: A Multi-Modal Feature Fusion-Based Structural Variant Caller. Zeyu Xia, Canqun Yang, Haoang Chi, Tao Tang, Weiming Xiang, Yingbo Cui |
| 2025 | MMFood'25: 1st International Workshop on Multi-modal Food Computing. Lipika Dey, Marianna Obrist, Stavroula G. Mougiakakou |
| 2025 | MMPro: A Decoupled Perception-Thinking-Execution Framework for Secure GUI Agent. Benlong Wu, Yuang Qi, Xiuwei Shang, Weiming Zhang, Nenghai Yu, Kejiang Chen |
| 2025 | MORE: Multi-Organ Medical Image REconstruction Dataset. Shaokai Wu, Yapan Guo, Yanbiao Ji, Jing Tong, Yuxiang Lu, Mei Li, Suizhi Huang, Yue Ding, Hongtao Lu |
| 2025 | MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models. Yiyan Ji, Haoran Chen, Qiguang Chen, Chengyue Wu, Libo Qin, Wanxiang Che |
| 2025 | MPI-CD: Multi-Path Information Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models. Jiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Wenzhen Yuan, Ting Liu, Yuzhuo Fu |
| 2025 | MPPR: Memory-Prior-based Prompt Refinement in Continuous Space for Advanced Text-to-Image Generation. Zhibing Zhang, Jiantao Lin, Cangqi Zhou, Rui Xia |
| 2025 | MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Static Quantization. Jiangyong Yu, Sifan Zhou, Dawei Yang, Shuoyu Li, Shuo Wang, Xing Hu, Chen Xu, Zukang Xu, Changyong Shu, Zhihang Yuan |
| 2025 | MRAC 2025: 3rd International Workshop on Multimodal, Generative and Responsible Affective Computing. Zheng Lian, Shreya Ghosh, Erik Cambria, Zhixi Cai, Guoying Zhao, Abhinav Dhall, Björn W. Schuller, Roland Goecke, Jianhua Tao, Tom Gedeon |
| 2025 | MRBench: A Multi-Image Reasoning Benchmark with Adaptive Knowledge Retrieval. Wenxi Huang, Xiaojun Chen, Qin Zhang, Ting Wan, Ziqi Liu, Liangjie Zhang |
| 2025 | MRED-14: A Benchmark for Low-Energy Residential Floor Plan Generation with 14 Flexible Inputs. Pengyu Zeng, Jun Yin, Haoyuan Sun, Yuqin Dai, Maowei Jiang, Miao Zhang, Shuai Lu |
| 2025 | MS-DETR: Towards Effective Video Moment Retrieval and Highlight Detection by Joint Motion-Semantic Learning. Hongxu Ma, Guanshuo Wang, Fufu Yu, Qiong Jia, Shouhong Ding |
| 2025 | MS-Road: Towards Spatiotemporal-Consistent Large-Scale Road Reconstruction. Ze Huang, Zhongyang Xiao, Mingliang Song, Yu Fang, Hongyuan Yuan, Kevin Li Sun, Li Zhang |
| 2025 | MSC: A Marine Wildlife Dataset for Video Understanding with Grounded Segmentation and Clip-Level Captions. Quang-Trung Truong, Yuk-Kwan Wong, Vo Hoang Kim Tuyen Dang, Rinaldi Gotama, Duc Thanh Nguyen, Sai-Kit Yeung |
| 2025 | MSITrack: A Challenging Benchmark for Multispectral Single Object Tracking. Tao Feng, Tingfa Xu, Haolin Qin, Tianhao Li, Shuaihao Han, Xuyang Zou, Zhan Lv, Jianan Li |
| 2025 | MSMA'2025: The 1st International Workshop on Multi-Sensorial Media and Applications. Tiesong Zhao, Qian Liu, Zhisheng Yan |
| 2025 | MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation. Hui Li, Pengfei Yang, Juanyang Chen, Le Dong, Yanxin Chen, Quan Wang |
| 2025 | MT-DPCQA: A Multimodal Time-aware Learning Approach for No-Reference Dynamic Point Cloud Quality Assessment. Swarna Chakraborty, Mylène C. Q. Farias |
| 2025 | MUDI: A Multimodal Biomedical Dataset for Understanding Pharmacodynamic Drug-Drug Interactions. Tung-Lam Ngo, Ba-Hoang Tran, Duy-Cat Can, Trung-Hieu Do, Oliver Y. Chén, Hoang-Quynh Le |
| 2025 | MUWS 2025: The 4th International Workshop on Multimodal Human Understanding for the Web and Social Media. Sherzod Hakimov, David Semedo, Eric Müller-Budack, Marc A. Kastner, Takahiro Komamizu |
| 2025 | MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions. Zeyu Huang, Juyuan Wang, Longfeng Chen, Boyi Xiao, Leng Cai, Yawen Zeng, Jin Xu |
| 2025 | MVP: Winning Solution to SMP Challenge 2025 Video Track. Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, Zikai Song |
| 2025 | MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment. Yanyun Pu, Kehan Li, Zeyi Huang, Zhijie Zhong, Kaixiang Yang |
| 2025 | MaXsive: High-Capacity and Robust Training-Free Generative Image Watermarking in Diffusion Models. Poyuan Mao, Cheng-Chang Tsai, Chun-Shien Lu |
| 2025 | Manipulating Multimodal Agents via Cross-Modal Prompt Injection. Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, Xianglong Liu |
| 2025 | MarkSplatter: Generalizable Watermarking for 3D Gaussian Splatting Model via Splatter Image Structure. Xiufeng Huang, Ziyuan Luo, Qi Song, Ruofei Wang, Renjie Wan |
| 2025 | MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts. Hao Liang, Linzhuang Sun, zhouminxuan zhouminxuan, Zirong Chen, Meiyi Qiang, Mingan Lin, Tianpeng Li, Fan Yang, Zenan Zhou, Wentao Zhang |
| 2025 | Mavors: Multi-granularity Video Representation for Multimodal Large Language Model. Yang Shi, Jiaheng Liu, Yushuo Guan, Zhenhua Wu, Yuanxing Zhang, Zihao Wang, Weihong Lin, Jingyun Hua, Zekun Wang, Xinlong Chen, Bohan Zeng, Wentao Zhang, Fuzheng Zhang, Wenjing Yang, Di Zhang |
| 2025 | Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs. Chang Gao, Kang Zhao, Runqi Wang, Jianfei Chen, Liping Jing |
| 2025 | McGE '25: The 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice. Cheng Jin, Mingli Song, Rui Wang, Xingjiao Wu |
| 2025 | MeDKCoOp: Dual Knowledge-guided Graph Prompt Learning for Biomedical Vision-Language Models. Yijun Wang, Siying Wu, Lubin Gan, Zheyu Zhang, Jing Zhang, Zhangchi Hu, Huyue Zhu, Peixi Wu, Xiaoyan Sun |
| 2025 | MeGraS: An Open-Source Store for Multimodal Knowledge Graphs. Luca Rossetto, Florian Ruosch |
| 2025 | MedAI Hub: A Multimodal Medical Data Platform with Evolutionary Image Enhancement and Graph-Driven Literature Retrieval. Guoming Wang |
| 2025 | MedAlign: Enhancing Combinatorial Medication Recommendation with Multi-modality Alignment. Hang Lv, Zixuan Guo, Zijie Wu, Yanchao Tan, Guofang Ma, Zhigang Lin, Xiping Chen, Hong Cheng, Carl Yang |
| 2025 | MediSee: Reasoning-Based Pixel-Level Perception in Medical Images. Qinyue Tong, Ziqian Lu, Jun Liu, Yangming Zheng, Zhe-Ming Lu |
| 2025 | Media integrity and literacy in the age of GenAI & Deepfakes. Christoph Bregler |
| 2025 | Medical Vision-Language Pre-training with Multimodal Variational Masked Autoencoder for Robust Medical VQA. Dexuan Xu, Yanyuan Chen, Yu Huang, Shihao E, Yiwei Lou, Yongzhi Cao, Hanpin Wang, Meikang Qiu |
| 2025 | MelodyEdit: Zero-shot Music Editing with Disentangled Inversion Control. Huadai Liu, Jialei Wang, Xiangtai Li, Wen Wang, Qian Chen, Rongjie Huang, Yang Liu, Jiayang Xu, Zhou Zhao, Wei Xue |
| 2025 | Merging-Resistant Watermarking for LoRA Modules. Na Zhao, Kejiang Chen, Yuang Qi, Kai Zeng, Weiming Zhang, Nenghai Yu |
| 2025 | MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving. Aishan Liu, Jiakai Wang, Tianyuan Zhang, Hainan Li, Jiangfan Liu, Siyuan Liang, Yilong Ren, Xianglong Liu, Dacheng Tao |
| 2025 | Meta-Illustrator: Transferring Illustrations from 2D Interactive Image Space to 3D Immersive Exploration Space. Richen Liu, Lingyu Sun, Xuefeng Huang, Yiran Li, Jiang Zhang, Siru Chen, Zhouhao Wu, Ayush Kumar, Chufan Lai |
| 2025 | Meta-Knowledge Path Augmentation for Multi-Hop Reasoning on Satellite Commonsense Multi-Modal Knowledge Graphs. Qian Li, Siyuan Liang, Yuzheng Zhang, Cheng Ji, Zongyu Chang, Shangguang Wang |
| 2025 | MetaWild: A Multimodal Dataset for Animal Re-Identification with Environmental Metadata. Yuzhuo Li, Di Zhao, Tingrui Qiao, Yihao Wu, Bo Pang, Yun Sing Koh |
| 2025 | Method and Applications of Solid-State Lidar Modeling for X-in-the-Loop Testing of Autonomous Vehicles. Cheng Peng, Zhen Wang |
| 2025 | MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation. Aleksandr Farseev, Marlo Ongpin, Qi Yang, Ilia Gossoudarev, Yu-Yi Chu-Farseeva, Sergey I. Nikolenko |
| 2025 | MindSpeak: A Real-Time BCI System for Silent Speech. Jinzhao Zhou, Daniel Leong, Zehong Cao, Thomas Do, Sheng-Fu Liang, Tzyy-Ping Jung, Chin-Teng Lin |
| 2025 | MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image Detection. Kuo Shi, Jie Lu, Shanshan Ye, Guangquan Zhang, Zhen Fang |
| 2025 | Mirage. Xuanyang Huang, Wei Huang |
| 2025 | Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval. Qing Wang, Chong-Wah Ngo, Yu Cao, Ee-Peng Lim |
| 2025 | Mitigating Delivery Artifacts in Real-World Video Super-Resolution. Jiaxin Peng, Siwang Zhou, Chengqing Li, Yucheng Li, Dunyun Chen |
| 2025 | Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models. Mingyu Fu, Wei Suo, Ji Ma, Lin Yuanbo Wu, Peng Wang, Yanning Zhang |
| 2025 | Mitigating Long-tail Distribution in Oracle Bone Inscriptions: Dataset, Model, and Benchmark. Jinhao Li, Zijian Chen, Runze Jiang, Tingzhu Chen, Changbo Wang, Guangtao Zhai |
| 2025 | Mitigating Query Selection Bias in Referring Video Object Segmentation. Dingwei Zhang, Dong Zhang, Jinhui Tang |
| 2025 | Mitigating Stereotypes in Text-to-Image Generation: A Novel Perspective of Selective Neural Suppression. Junlei Zhou, Jiashi Gao, Xinwei Guo, Haiyan Wu, Quanying Liu, Xiangyu Zhao, Hongxin Wei, Xin Yao, Xuetao Wei |
| 2025 | Mitigating the Evolving Semantic Entanglement in Continual Learning of Vision-Language Models. Yiliang Zhu, Dayan Wu, Qinghang Su, Zexian Yang, Zheng Lin, Weiping Wang |
| 2025 | Mixanthropy: Holographic Metamorphic Clouds. Meichun Cai, Yiou Wang |
| 2025 | MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians. Peng Chen, Xiaobao Wei, Qingpo Wuwu, Xinyi Wang, Xingyu Xiao, Ming Lu |
| 2025 | Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization. Changtao Miao, Qi Chu, Tao Gong, Zhentao Tan, Zhenchao Jin, Wanyi Zhuang, Man Luo, Honggang Hu, Nenghai Yu |
| 2025 | MoCERNet: A Modality-Complete Modeling Framework for Emotion Recognition in Physiological Signals under Imperfect Modal Matching. Tianzuo Xin, Jing Wang, Xiyuan Jin, Xiaojun Ning, Zhiyang Feng, Youfang Lin |
| 2025 | MoCount: Motion-Based Repetitive Action Counting. Ruocheng Gu, Sen Jia, Yule Ma, Jinqin Zhong, Jenq-Neng Hwang, Lei Li |
| 2025 | MoTAS: MoE-Guided Feature Selection from TTS-Augmented Speech for Enhanced Multimodal Alzheimer's Early Screening. Yongqi Shao, Bingxin Mei, Cong Tan, Hong Huo, Tao Fang |
| 2025 | Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation. Fenghe Tang, Bingkun Nian, Jianrui Ding, Wenxin Ma, Quan Quan, Chengqi Dong, Jie Yang, Wei Liu, S. Kevin Zhou |
| 2025 | Modal Symbiosis: Variational Alignment Unveils New Horizons in Multimodal Representation Learning. Zeyan Li, Cankun Guo, Yin Tang |
| 2025 | Modality-Aligned Hierarchical Attention Network for Multi-Modal Popularity Prediction on Social Media. Wenzheng Hou, Weixin Li |
| 2025 | Modeling and Identifying Distractors with Curriculum for Robust 3D Gaussian Splatting. Ruiqi Li, Yiu-ming Cheung |
| 2025 | ModuleTeam: Open-Set Multi-Conditional Image Generation with Training-Free Latent Mixture of Any Control Module. Yuwei Zhou, Xin Wang, Hong Chen, Yipeng Zhang, Zeyang Zhang, Wenwu Zhu |
| 2025 | Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction. Wenyu Li, Sidun Liu, Peng Qiao, Yong Dou |
| 2025 | MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory. Ana Carolina Condez, Diogo Tavares, João Magalhães |
| 2025 | Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition. Jihao Gu, Kun Li, Fei Wang, Yanyan Wei, Zhiliang Wu, Hehe Fan, Meng Wang |
| 2025 | Motion-Aware Adaptive Pixel Pruning for Efficient Local Motion Deblurring. Wei Shang, Dongwei Ren, Wanying Zhang, Pengfei Zhu, Qinghua Hu, Wangmeng Zuo |
| 2025 | MotionRefineNet: Fine-Grained Pose Sequence Smoothing and Refinement. Haolun Li, Weihuang Liu, Jiateng Liu, Zhenhua Tang, Chi-Man Pun, Qiguang Miao, Feng Xu, Hao Gao |
| 2025 | MuCodec: Ultra Low-Bitrate Music Codec for Music Generation. Yaoxun Xu, Hangting Chen, Jianwei Yu, Wei Tan, Shun Lei, Zhiwei Lin, Rongzhi Gu, Zhiyong Wu |
| 2025 | MuMMy: Multimodal Dataset supporting VLM-based Egyptology Research Assistant. Maksim Golyadkin, Innokentiy Humonen, Valeria Rubanova, Danil Kalin, Ianis Plevokas, Dmitry Nikolotov, Aleksandr Utkov, Nikita Sidelnikov, Petr Ivanov, Ekaterina Bureeva, Ekaterina Alexandrova, Ilya Makarov |
| 2025 | Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition. Zihao Wang, Shulei Ji, Le Ma, Yuhang Jin, Shun Lei, Jianyi Chen, Haoying Fu, Roger B. Dannenberg, Kejun Zhang |
| 2025 | Multi-Agent Amodal Completion: Direct Synthesis with Fine-Grained Semantic Guidance. Hongxing Fan, Lipeng Wang, Haohua Chen, Zehuan Huang, Jiangtao Wu, Lu Sheng |
| 2025 | Multi-Agent System for Comprehensive Soccer Understanding. Jiayuan Rao, Zifeng Li, Haoning Wu, Ya Zhang, Yanfeng Wang, Weidi Xie |
| 2025 | Multi-Dimensional Text-to-Face Image Quality Assessment Using LLM: Database and Method. Yixuan Gao, Xiongkuo Min, Jinliang Han, Yuqin Cao, Sijing Wu, Yunze Dou, Guangtao Zhai |
| 2025 | Multi-Domain Enhancement via Residual Interwoven Transfer in Cross-Domain Sequential Recommendation. Qingtian Bian, Tieying Li, Marcus Vinícius de Carvalho, Jiaxing Xu, Hui Fang, Yiping Ke |
| 2025 | Multi-Information Hierarchical Fusion Transformer with Local Alignment and Global Correlation for Micro-Expression Recognition. Jinsheng Wei, Jialiang Sun, Guanming Lu, Jingjie Yan, Dong Zhang |
| 2025 | Multi-Layer Gaussian Splatting for Single-Image Feed-Forward Spatial Scene Reconstruction. Shanding Diao, Yang Zhao, Yuan Chen, Zhao Zhang, Wei Jia, Ronggang Wang |
| 2025 | Multi-Level CLS Token Fusion for Contrastive Learning in Endoscopy Image Classification. Y. Hop Nguyen, Doan Anh Phan Huu, Trung Thai Tran, Nhat Nam Mai, Van Toi Giap, Thao Thi Phuong Dao, Trung-Nghia Le |
| 2025 | Multi-Level Segment Fusion Based on Adaptive Time-Window Selection for Multimodal Personality-Aware Elderly Depression Detection. Yuyun Liu, Kaifei Zhang, Yinghao Ma, Xiaolin Xu, Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong |
| 2025 | Multi-Modal Gradual Domain Osmosis: Stepwise Dynamic Learning with Batch Matching for Gradual Domain Adaptation. Zixi Wang, Yubo Huang, Jingzehua Xu, Jinzhu Wei, Shuai Zhang, Xin Lai |
| 2025 | Multi-Modal Retrieval Augmented Visual Understanding and Generation. Zhucun Xue |
| 2025 | Multi-Modal Semantic Parsing for the Interpretation of Tombstone Inscriptions. Xiao Zhang, Johan Bos |
| 2025 | Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors. Guotao Liang, Juncheng Hu, Ximing Xing, Jing Zhang, Qian Yu |
| 2025 | Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and Interaction. Shilei Wang, Gong Cheng, Pujian Lai, Dong Gao, Junwei Han |
| 2025 | Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts. Yangyang Xu, Xi Ye, Duo Su |
| 2025 | Multi-Task Gaze Communication Understanding. Cheng Peng, Oya Çeliktutan |
| 2025 | Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions. Jingdong Zhang, Hanrong Ye, Xin Li, Wenping Wang, Dan Xu |
| 2025 | Multi-Width Neural Network-Assisted Hierarchical Federated Learning in Heterogeneous Cloud-Edge-Device Computing. Haizhou Wang, Guobing Zou, Fei Xu, Yangguang Cui, Tongquan Wei |
| 2025 | Multi-faceted Complementary Learning for Incomplete Multi-view Multi-label Classification. Xinyu Xiao, Peixi Peng, Qiang Wang, Chao Xing, Shuhan Qi |
| 2025 | Multi-level SSL Feature Gating for Audio Deepfake Detection. Hoan My Tran, Damien Lolive, Aghilas Sini, Arnaud Delhay, Pierre-François Marteau, David Guennec |
| 2025 | Multi-modal Prototype Guided Few-shot Object Detection. Chenbo Zhang, Bing Huangfu, Hongxu Ma, Jihong Guan, Shuigeng Zhou |
| 2025 | Multi-round Mutual Emotion-Cause Pair Extraction for Emotion-Attributed Video Captioning. Cheng Ye, Weidong Chen, Peipei Song, Xinyan Liu, Lei Zhang, Zhendong Mao |
| 2025 | Multi-view Clustering Based on Probabilistic Tensor Regression. Yichen Bao, Yuxuan Liu, Yu Duan, Jing Li, Quanxue Gao |
| 2025 | Multi-view Collaborative Representation Learning from Noisy Labels for VHR Imagery Classification. Guangfei Li, Quanxue Gao, Yu Lei, Yichen Bao, Qianqian Wang |
| 2025 | Multi-view Graph Clustering with Dual Relation Optimization for Remote Sensing Data. Renxiang Guan, Junhong Li, Siwei Wang, Wenxuan Tu, Miaomiao Li, En Zhu, Xinwang Liu, Ping Chen |
| 2025 | Multi-view Graph Clustering with Dual Structure Awareness for Remote Sensing Data. Xin Peng, Bowen Liu, Renxiang Guan, Wenxuan Tu |
| 2025 | Multi-view Hashing Classification. Yuhang Lan, Shilin Xu, Chao Su, Run Ye, Dezhong Peng, Yuan Sun |
| 2025 | MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction. Bate Li, Houqiang Zhong, Zhengxue Cheng, Qiang Hu, Qiang Wang, Li Song, Wenjun Zhang |
| 2025 | MultiMediate '25: Cross-cultural Multi-domain Engagement Estimation. Daksitha Senel Withanage Don, Marius Funk, Michal Balazia, Huajian Qiu, Shogo Okada, François Brémond, Jan Alexandersson, Andreas Bulling, Elisabeth André, Philipp Müller |
| 2025 | MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind. Zheng Zhang, Nuoqian Xiao, Qi Chai, Deheng Ye, Hao Wang |
| 2025 | MultiRef: Controllable Image Generation with Multiple Visual References. Ruoxi Chen, Dongping Chen, Siyuan Wu, Sinan Wang, Shiyun Lang, Peter Sushko, Gaoyang Jiang, Yao Wan, Ranjay Krishna |
| 2025 | Multifractal Comparison of Billboard and AI-Generated Music. Kevin Kailun Zhang, Ying Sun, Hui Xiong |
| 2025 | Multimedia Verification Through Multi-Agent Deep Research Multimodal Large Language Models. Huy Hoan Le, Van Sy Thinh Nguyen, Thi Le Chi Dang, Vo Thanh Khang Nguyen, Truong Thanh Hung Nguyen, Hung Cao |
| 2025 | Multimodal Content Creation, Consumption and Distribution. Ting Yao |
| 2025 | Multimodal Decomposed Distillation with Instance Alignment and Uncertainty Compensation for Thermal Object Detection. Yanfeng Liu, Lefei Zhang |
| 2025 | Multimodal Dual Population Evolutionary Reinforcement Learning. Yao Zhang, Ping Huang, Rui Zhang |
| 2025 | Multimodal Emotion Recognition with Missing Modality via a Unified Multi-task Pre-training Framework. Ziyi Li, Wei-Long Zheng, Bao-Liang Lu |
| 2025 | Multimodal LLMs Can Reason about Aesthetics in Zero-Shot. Ruixiang Jiang, Chang Wen Chen |
| 2025 | Multimodal Learning for Spatio-Temporal Data Mining. Siru Zhong, Xixuan Hao, Hao Miao, Yan Zhao, Qingsong Wen, Roger Zimmermann, Yuxuan Liang |
| 2025 | Multimodal Markup Document Models for Graphic Design Completion. Kotaro Kikuchi, Ukyo Honda, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi |
| 2025 | Multimodal Time Series Alignment for Error Detection in Human Robot Interactions. Xun Jiang, Shuangle Li, Chong Liu, Xing Xu |
| 2025 | Multiple Appropriate Facial Reaction Generation Based on Multi-View Transformation of Speaker Video. Jiajian Huang, Zitong Yu |
| 2025 | Multiple Queries with Multiple Keys: A Precise Prompt Matching Paradigm for Prompt-based Continual Learning. Dunwei Tu, Huiyu Yi, Yuchi Wang, Baile Xu, Jian Zhao, Furao Shen |
| 2025 | Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual Manipulations. Parul Gupta, Shreya Ghosh, Tom Gedeon, Thanh-Toan Do, Abhinav Dhall |
| 2025 | MusFlow: Multimodal Music Generation via Conditional Flow Matching. Jiahao Song, Yuzhao Wang |
| 2025 | Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning. Jiayun Hu, Yueyi He, Tianyi Liang, Changbo Wang, Chenhui Li |
| 2025 | NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image Generation. Yitong Sun, Yao Huang, Ruochen Zhang, Huanran Chen, Shouwei Ruan, Ranjie Duan, Xingxing Wei |
| 2025 | NEXUS-O: An Omni-Perceptive and -Interactive Model for Language, Audio, and Vision. Che Liu, Yingji Zhang, Dong Zhang, Weijie Zhang, Chenggong Gong, Yu Lu, Shilin Zhou, Ziliang Gan, Ziao Wang, Haipang Wu, Ji Liu, André Freitas, Qifan Wang, Zenglin Xu, Rongjunchen Zhang, Yong Dai |
| 2025 | NIVM: Real-time View Morphing via Neural Implicit Function. Tung-I Chen, Dae Yeol Lee, Guan-Ming Su, Mohammad Hajiesmaili, Ramesh K. Sitaraman |
| 2025 | NaME: A Natural Micro-expression Dataset for Micro-expression Recognition in the Wild. Jiateng Liu, Hengcan Shi, Haiwen Liang, Xiaolin Xu, Yuan Zong, Yaonan Wang, Wenming Zheng |
| 2025 | Nature-1k: The Raw Beauty of Nature in 4K at 60FPS. Mohammad Ghasempour, Hadi Amirpour, Christian Timmerer |
| 2025 | NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving. Qucheng Peng, Chen Bai, Guoxiang Zhang, Bo Xu, Xiaotong Liu, Xiaoyin Zheng, Chen Chen, Cheng Lu |
| 2025 | Neighbor Contrastive Learning with Weakened Consensus Graph for Deep Multi-View Clustering. Kai Zhu, Jun Yin |
| 2025 | Neural Additive Adapters for Interpretable Nutrition Prediction. Vitalii Emelianov, Niki Martinel |
| 2025 | Neural Video Compression with In-Loop Contextual Filtering and Out-of-Loop Reconstruction Enhancement. Yaojun Wu, Chaoyi Lin, Yiming Wang, Semih Esenlik, Zhaobin Zhang, Kai Zhang, Li Zhang |
| 2025 | NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images. Yue Guo, Haoxiang Liao, Haibin Ling, Bingyao Huang |
| 2025 | Next Phase of Research on Multimodal Foundation Models: From Alignments to Content Generation and Quality Assessment. Tat-Seng Chua |
| 2025 | Noise Self-Correction via Relation Propagation for Robust Cross-Modal Retrieval. Ruoxuan Li, Xiangyu Wu, Yang Yang |
| 2025 | Noise-Aware Decoding with Salient Region Enhancing for Zero-Shot Image Captioning. Yuxin Xie, Dongyue Chen, Yue Zhu, Tong Jia, Shizhuo Deng |
| 2025 | Noise-Optimized Distribution Distillation for Dataset Condensation. Tongfei Liu, Yufan Liu, Bing Li, Weiming Hu, Yuming Li, Chenguang Ma |
| 2025 | Noise-Robust Cross-modal Learning for Reliable 2D-3D Retrieval. Ao Yang, Yanglin Feng, Yuan Sun, Dezhong Peng, Guiduo Duan, Yang Qin |
| 2025 | Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation. Jiahao Li, Yang Lu, Yachao Zhang, Fangyong Wang, Yuan Xie, Yanyun Qu |
| 2025 | OCR-Critic: Aligning Multimodal Large Language Models' Perception through Critical Feedback. Qiuna Tan, Runqi Qiao, Guanting Dong, Yifan Zhang, Minhui Wu, Jiapeng Wang, Miaoxuan Zhang, Yida Xu, Chong Sun, Chen Li, Honggang Zhang |
| 2025 | OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval. Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, Liqiang Nie |
| 2025 | OGDepth: Leveraging Object Guidance in Diffusion Models for Enhanced Monocular Depth Estimation. Wenzheng Yang, Songwei Pei, Bingfeng Liu, Qian Li, Shangguang Wang |
| 2025 | OIMGC-Net: Optimization-inspired Interpretable Multi-view Graph Clustering Network. Renjie Lin, Jiacheng Li, Shide Du, Shiping Wang, Le Zhang |
| 2025 | OMAR-RQ: Open Music Audio Representation Model Trained with Multi-Feature Masked Token Prediction. Pablo Alonso-Jiménez, Pedro Ramoneda, Recep Oguz Araz, Andrea Poltronieri, Dmitry Bogdanov |
| 2025 | OTR: Synthesizing Overlay Text Dataset for Text Removal. Jan Zdenek, Wataru Shimoda, Kota Yamaguchi |
| 2025 | OV-DAVEL: Towards Open-Vocabulary Dense Audio-Visual Event Localization in Untrimmed Videos. Jiale Yu, Baopeng Zhang, Zhu Teng, Jianping Fan |
| 2025 | OV-VOD: Open-Vocabulary Video Object Detection. Zhihong Zheng, Yang Cao, Junlong Gao, Hanzi Wang |
| 2025 | ObjCtrl: Object-based Control Relaxation for Conditional Text-to-Image Generation. Xinlong Zhang, Zejian Li, Wei Li, Xiaoyu Zhang, Jia Wei, Chengyu Lin, Yongchuan Tang |
| 2025 | Object-Preserving Counterfactual Diffusion Augmentation for Single-Domain Generalized Object Detection. Hongda Qin, Xiao Lu, Zhiyong Wei, Ningjiang Chen |
| 2025 | Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction. Hyungjun Doh, Dong In Lee, Seunggeun Chi, Pin-Hao Huang, Kwonjoon Lee, Sangpil Kim, Karthik Ramani |
| 2025 | OinkTrack: An Ultra-Long-Term Dataset for Multi-Object Tracking and Re-Identification of Group-Housed Pigs. Feng-Kai Huang, Hong-Wei Xu, Chu-Chuan Lee, Hong-Yi Tu, Hong-Han Shuai, Wen-Huang Cheng |
| 2025 | Omni Liu Yang, Huiyu Duan, Yucheng Zhu, Xiaohong Liu, Lu Liu, Zitong Xu, Guangji Ma, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet |
| 2025 | Omni-LLaMA-AD: A Unified Model for Open-Set Visual Anomaly Detection. Rongyu Zhang, Zhanbin Hu, Jiamu Wang, Qiang Zhu |
| 2025 | OmniDoctor: Towards LLM-centric Lifelong Learning for New Emerging Medical VQA Tasks. Na Jiang, Wenhui Zheng, Xuqian Gu, Jingjing Wang |
| 2025 | OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving. Tao Tang, Enhui Ma, Xia Zhou, Letian Wang, Tianyi Yan, Xueyang Zhang, Kun Zhan, Peng Jia, Xianpeng Lang, Jia-Wang Bian, Kaicheng Yu, Xiaodan Liang |
| 2025 | One Size Fits All? A Modular Adaptive Sanitization Kit (MASK) for Customizable Privacy-Preserving Phone Scam Detection. Kangzhong Wang, Zitong Shen, Youqian Zhang, MK Michael Cheung, Xiapu Luo, Grace Ngai, Eugene Yujun Fu |
| 2025 | Online Continual Learning via Dynamic Expandable Recursive Model. Fei Ye, Adrian G. Bors |
| 2025 | Online Cross-Modal Hashing with Multi-Level Memory. Wentao Fan, Chao Zhang, Chunlin Chen, Huaxiong Li |
| 2025 | OnlineHOI: Towards Online Human-Object Interaction Generation and Perception. Yihong Ji, Yunze Liu, Yiyao Zhuo, Weijiang Yu, Fei Ma, Joshua Zhexue Huang, Fei Yu |
| 2025 | OoDDINO: A Multi-level Framework for Anomaly Segmentation on Complex Road Scenes. Yuxing Liu, Ji Zhang, Xuchuan Zhou, Jingzhong Xiao, Huimin Yang, Jiaxin Zhong |
| 2025 | Open-CD: A Comprehensive Toolbox for Change Detection. Kaiyu Li, Jiawei Jiang, Chengxi Han, Yupeng Deng, Keyan Chen, Zhuo Zheng, Hao Chen, Ziyuan Liu, Yuantao Gu, Zhengxia Zou, Zhenwei Shi, Sheng Fang, Deyu Meng, Zhi Wang, Xiangyong Cao |
| 2025 | Open-Set Image Tagging with Multi-Grained Text Supervision. Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang |
| 2025 | Open-Source Multimedia Retrieval with vitrivr-engine. Ralph Gasser, Rahel Arnold, Laura Rettig, Heiko Schuldt, Raphael Waltenspül, Luca Rossetto |
| 2025 | Open-Vocabulary 3D Affordance Understanding via Functional Text Enhancement and Multilevel Representation Alignment. Lin Wu, Wei Wei, Peizhuo Yu, Jianglin Lan |
| 2025 | Open3D-VQA: A Benchmark for Embodied Spatial Concept Reasoning with Multimodal Large Language Model in Open Space. Weichen Zhang, Zile Zhou, Xin Zeng, Xuchen Liu, Jianjie Fang, Chen Gao, Jinqiang Cui, Yong Li, Xinlei Chen, Xiao-Ping Zhang |
| 2025 | Open3DSearch: Zero-Shot Precise Retrieval of 3D Shapes Using Text Descriptions. Xiong Li, Yikang Yan, Zhenyu Wen, Qin Yuan, Fangda Guo, Zhen Hong, Ye Yuan |
| 2025 | OpenAPV: Open Collaborative Innovation in Professional Video Ecosystem. Minsoo Park, Youngkwon Lim, Yangwoo Kim, Sam Richards, Min Woo Park, Kwang Pyo Choi |
| 2025 | OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event Grounding. Hieu Nguyen, Phuc-Tan Nguyen, Thien-Phuc Tran, Minh-Quang Nguyen, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le |
| 2025 | OpenMVC: An Open-Source Library for Learning-based Multi-view Compression. Huiming Zheng, Wei Gao |
| 2025 | OpenMap: Instruction Grounding via Open-Vocabulary Visual-Language Mapping. Danyang Li, Zenghui Yang, Guangpeng Qi, Songtao Pang, Guangyong Shang, Qiang Ma, Zheng Yang |
| 2025 | OpenMoCap: Rethinking Optical Motion Capture under Real-world Occlusion. Chen Qian, Danyang Li, Xinran Yu, Zheng Yang, Qiang Ma |
| 2025 | Optimal Feature Embedding for Document Large Visual Language Model. Fan Yang, Ling Deng, Zhiyong Gan, Qisheng He, Yuanbo Fang, Xiangmin Xu, Shuangping Huang, Tianshui Chen |
| 2025 | Outlier-Aware Model Merging for Efficient Multitask Inference. Qiyuan Zhu, Lujun Li, Dezhi Li, Jiacheng Liu, Pengyu Cheng, Yucheng Xu, Sirui Han, Yike Guo |
| 2025 | Overfitted Point Cloud Attribute Codec Using Sparse Hierarchical Implicit Neural Representations. Zhe Sun, Qiang Xu, Qi Zhang, Shan Liu, Ge Li |
| 2025 | Overview of the First CASTLE Grand Challenge at ACM Multimedia 2025. Luca Rossetto, Werner Bailer, Cathal Gurrin, Duc-Tien Dang-Nguyen, Klaus Schoeffmann, Allie Tran |
| 2025 | P Jingrou Wu, Haoxian Liu, Jin Zhang, Dan Wang, Jing Jiang |
| 2025 | PA-HOI: A Physics-Aware Human and Object Interaction Dataset. Ruiyan Wang, Lin Zuo, Zonghao Lin, Qiang Wang, Zhengxue Cheng, Rong Xie, Jun Ling, Li Song |
| 2025 | PAF: Prototype Adaptive Fusion for Test-Time Adaptation of Vision-Language Models. Si Chen, Yujia Chen, Xiaotian Yin, Xin Liu, Huakai Lai, Tianzhu Zhang |
| 2025 | PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles. Tianshun Han, Benjia Zhou, Ajian Liu, Yanyan Liang, Du Zhang, Zhen Lei, Jun Wan |
| 2025 | PET-GPRA: Rethinking PET with Gradient-Aware Prompting and Router-Free Adapters for Few-shot Class-Incremental Learning. Yishu Liu, Zhiming Chen, Desen Wang, Xiaoling Luo, Bingzhi Chen, Guangming Lu |
| 2025 | PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion. Zhiwei Zhang, Ruikai Xu, Weijian Zhang, Zhizhong Zhang, Xin Tan, Jingyu Gong, Yuan Xie, Lizhuang Ma |
| 2025 | PG-Agent: An Agent Powered by Page Graph. Weizhi Chen, Ziwei Wang, Leyang Yang, Sheng Zhou, Xiaoxuan Tang, Jiajun Bu, Yong Li, Wei Jiang |
| 2025 | PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum. Shiqi Zhang, Sha Zhang, Jiajun Deng, Yedong Shen, Mingxiao Ma, Yanyong Zhang |
| 2025 | PIRA: Pan-CDN Intra-video Resource Adaptation for Short Video Streaming. Chunyu Qiao, Tong Liu, Yucheng Zhang, Zhiwei Fan, Pengjin Xie, Zhen Wang, Liang Liu |
| 2025 | PLATO-TTA: Prototype-Guided Pseudo-Labeling and Adaptive Tuning for Multi-Modal Test-Time Adaptation of 3D Segmentation. Jianxiang Xie, Yao Wu, Yachao Zhang, Xiaopei Zhang, Yuan Xie, Yanyun Qu |
| 2025 | PLATO: Generating Objects from Part Lists via Synthesized Layouts. Amruta Muthal, Varghese P. Kuruvilla, Ravi Kiran Sarvadevabhatla |
| 2025 | PLGeo: A Patch-level Framework to Overcome Orientation Discrepancies in Cross-view Geo-localization. Yiru Li, Yingying Zhu |
| 2025 | PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning. Yingjie Xi, Jian Jun Zhang, Xiaosong Yang |
| 2025 | PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation. Sihan Zhao, Zixuan Wang, Tianyu Luan, Jia Jia, Wentao Zhu, Jiebo Luo, Junsong Yuan, Nan Xi |
| 2025 | PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process. Shiqi Jiang, Xinpeng Li, Xi Mao, Changbo Wang, Chenhui Li |
| 2025 | PRE-MAP: Personalized Reinforced Eye-tracking Multimodal LLM for High-Resolution Multi-Attribute Point Prediction. Hanbing Wu, Ping Jiang, Anyang Su, Chenxu Zhao, Tianyu Fu, Minghui Wu, Beiping Tan, Huiying Li |
| 2025 | PREMISE: Individual Preference-aware Multi-modal Cooperation for Survival Prediction. Jiaqi Cui, Yilun Li, Xi Wu, Jiliu Zhou, Yan Wang |
| 2025 | PRIME: Prototype-Driven Class Incremental Learning for Medical Image Segmentation. Shengqian Zhu, Chengrong Yu, Wenbo Qi, Jiafei Wu, Ying Song, Guangjun Li, Zhang Yi, Xiaogang Xu, Junjie Hu |
| 2025 | PRINTER: Deformation-Aware Adversarial Learning for Virtual IHC Staining with In Situ Fidelity. Yizhe Yuan, Bingsen Xue, Bangzheng Pu, Chengxiang Wang, Cheng Jin |
| 2025 | PRISM: A Benchmark for Unveiling Cross-modal Knowledge Inconsistency in Large Vision-Language Models. Mingjie Wei, Wei-Nan Zhang, Chen Zhang, Yifeng Ding, Donglin Di, Lei Ren, Wei Chen, Ting Liu |
| 2025 | PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality Alignment. Bin Wang, Yang Xu, Huan Zhao, Hao Zhang, Zixing Zhang |
| 2025 | PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning. Yibo Lyu, Rui Shao, Gongwei Chen, Yijie Zhu, Weili Guan, Liqiang Nie |
| 2025 | Pair-wise Confidence Difference-based Pseudo-Label Selection for Universal Mismatched Steganalysis. Fan Wang, Zhangjie Fu, Xiang Zhang, Ziqiang Li, Ziwen He, Manyu Wang |
| 2025 | Parameter-Efficient Variational AutoEncoder for Multimodal Multi-Interest Recommendation. Nhu-Thuat Tran, Hady W. Lauw |
| 2025 | Pask: Providing Answer before AsKing toward Proactive AI agent. Zhifei Xie, Hu Zongzheng, Guibin Zhang, Jialin Zhang, Yue Liao, Chunyan Miao, Shuicheng Yan |
| 2025 | PatAug: Augmentation of Augmentation for Test-Time Adaptation. Xinyao Li, Dan Zhang, Zhekai Du, Lei Zhu, Zhi Chen, Jingjing Li |
| 2025 | PatchWiper: Leveraging Dynamic Patch-Wise Parameters for Real-World Visible Watermark Removal. Zihao Mo, Junye Chen, Chaowei Fang, Guanbin Li |
| 2025 | Pathology-Aware Prototype Evolution via LLM-Driven Semantic Disambiguation for Multicenter Diabetic Retinopathy Diagnosis. Chunzheng Zhu, Yangfang Lin, Jialin Shao, Jianxin Lin, Yijun Wang |
| 2025 | Pathology-Aware Reconstruction with Discriminative Knowledge Boosting Alignment for Che-Xray Vision-Language Pre-training. Lihong Qiao, Shiyi Gao, Yucheng Shu, Bin Xiao, Weisheng Li, Xinbo Gao |
| 2025 | Perceptual Visual Quality Assessment in Multimedia Communication. Wei Zhou, Hadi Amirpour |
| 2025 | PeriodVOS: Learning Periodic Patterns for Unsupervised Video Object Segmentation via Adaptive Contextual Coupling. Jiaqing Fan, Hanwen Qian, Mengjuan Jiang, Fanzhang Li |
| 2025 | Permission to Dance: An End-to-End Dance Enhancement System from Dance Capture to Analysis. Jungsu Kim, Jungwoo Huh, Yeseung Park, Seongjean Kim, Jeongwook Choi, Sanghoon Lee |
| 2025 | Personality Prediction via Multimodal Fusion with Sentiment Analysis Enhancement. Xuerui Cheng, Feng Chen, Jun Xie, Kanokphan Lertniphonphan, Yi Liu, Zhepeng Wang |
| 2025 | Perspective from a Higher Dimension: Can 3D Geometric Priors Help Visual Floorplan Localization? Bolei Chen, Jiaxu Kang, Haonan Yang, Ping Zhong, Jianxin Wang |
| 2025 | PgM: Partitioner Guided Modal Learning Framework. Guimin Hu, Yi Xin, Lijie Hu, Zhihong Zhu, Hasti Seifi |
| 2025 | Phase Distribution Matters: On the Importance of Phase Distribution Alignment (PDA) in Holographic Applications. Seungmi Choi, Taehwa Lee, Jun Yeong Cha, Suhyun Jo, Hyunmin Ban, Kwan-Jung Oh, Hyunsuk Ko, Hui Yong Kim |
| 2025 | PhonoFence: A Cross-Task Defense Framework for DeepFake via Phoneme-Level Adversarial Perturbations. Zhaolin Wei, Xiuwen Shi, Dengpan Ye, Yuhan Lin, Zhigang Wang, Jiacheng Deng, Ziyi Liu, Long Tang |
| 2025 | Phys4DGen: Physics-Compliant 4D Generation with Multi-Material Composition Perception. Jiajing Lin, Zhenzhong Wang, Dejun Xu, Shu Jiang, Yunpeng Gong, Min Jiang |
| 2025 | Phys4DRT: Physics-based 4D Generation for Real-Time Interaction with Time-Frequency Supervision. Yuntian Xiao, Shoulong Zhang, Zihang Zhang, Jiahao Cui, Yan Wang, Shuai Li |
| 2025 | PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments. Minghao Zou, Qingtian Zeng, Yongping Miao, Shangkun Liu, Zilong Wang, Hantao Liu, Wei Zhou |
| 2025 | Physics-Coupled Frequency Dynamic Adaptation Network for Domain Generalized Underwater Object Detection. Linxuan Luo, Pan Mu, Cong Bai |
| 2025 | Physics-Guided Sonar Image Fine-grained Recognition under Scarce Annotations. Chengzhou Li, Xiaokang Liu, Qi Jia, Jinyuan Liu, Zhiying Jiang, Longhan Feng, Yu Liu, Zhongxuan Luo, Xin Fan |
| 2025 | Physics-Informed Representation Alignment for Sparse Radio-Map Reconstruction. Haozhe Jia, Wenshuo Chen, Zhihui Huang, Lei Wang, Hongru Xiao, Nanqian Jia, Keming Wu, Songning Lai, Bowen Tian, Yutao Yue |
| 2025 | PiMMNet: Introducing Multi-Modal Precipitation Nowcasting via a Physics-informed Perspective. Demin Yu, Wenchuan Du, Kenghong Lin, Xutao Li, Yunming Ye, Chuyao Luo, Xunlai Chen |
| 2025 | Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine-Grained Localization. Nicholas Klein, Hemlata Tak, James Fullwood, Krishna Regmi, Leonidas Spinoulas, Ganesh Sivaraman, Tianxiang Chen, Elie Khoury |
| 2025 | Polarimetric Monocular Gaussian Splatting SLAM for Dense Surface Reconstruction. Haitao Wang, Sijia Wen, Bo Guo |
| 2025 | Position-LoRA: Enhanced Relation Customization through Structural Prior in Initial Latent Noise. Yiming Li, Peng Zhou, Xiaokang Qin, Hongwei Hu, Jun Sun, Yi Xu |
| 2025 | Positional Prompt Tuning for Efficient 3D Representation Learning. Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei |
| 2025 | Positive Style Accumulation: A Style Screening and Continuous Utilization Framework for Federated DG-ReID. Xin Xu, Chaoyue Ren, Wei Liu, Wenke Huang, Bin Yang, Zhixi Yu, Kui Jiang |
| 2025 | Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning. Rutger Hendrix, Giovanni Patanè, Leonardo G. Russo, Simone Carnemolla, Federica Proietto Salanitri, Giovanni Bellitto, Concetto Spampinato, Matteo Pennisi |
| 2025 | Pretraining Large Brain Language Model for Active BCI: Silent Speech. Jinzhao Zhou, Zehong Cao, Yiqun Duan, Connor Barkley, Daniel Leong, Xiaowei Jiang, Quoc-Toan Nguyen, Ziyi Zhao, Thomas Do, Yu-Cheng Chang, Sheng-Fu Liang, Chin-Teng Lin |
| 2025 | PriCAF: Privacy-Preserving Contribution Assessment in Federated Learning Before Model Training. Yixin Xu, Hao Wu, Jingzhou Zhu, Fengyuan Xu, Sheng Zhong |
| 2025 | Prior-Constrained Relevant Feature driven Image Fusion with Hybrid Feature via Mode Decomposition. Bingfeng Liu, Songwei Pei, Shuhuai Wang, Wenzheng Yang, Qian Li, Shangguang Wang |
| 2025 | Prior-Free Augmentation for Cloth-Changing Person Re-Identification. Jiajun Zhang, Xin Li, Si Wu, Yong Xu, Yaowei Wang |
| 2025 | Prior-oriented Anchor Learning with Coalesced Semantics for Multi-View Clustering. Jinjia Peng, Tianhang Cheng, Guangqi Jiang, Huibing Wang |
| 2025 | PrivEdit: A Zero-Shot Interactive Image Privacy Editing System. Xiao Chen, Wenrui He, Meng Wang, Zhanbin Hu, Chaoquan Shen, Qiang Zhu |
| 2025 | Proactive Deepfake Detection via Self-Verifiable Semantic Watermarking. Peiqi Jiang, Bohan Lei, Yuhao Sun, Lingyun Yu, Zhineng Chen, Hongtao Xie, Yongdong Zhang |
| 2025 | Probabilistic Mixture of Hyperbolic Mamba for Few-Shot Class-Incremental Learning. Yawen Cui, Wenbin Zou, Huiping Zhuang, Yi Wang, Lap-Pui Chau |
| 2025 | Proceedings of the 33rd ACM International Conference on Multimedia, MM 2025, Dublin, Ireland, October 27-31, 2025 Cathal Gurrin, Klaus Schoeffmann, Min Zhang, Luca Rossetto, Stevan Rudinac, Duc-Tien Dang-Nguyen, Wen-Huang Cheng, Phoebe Chen, Jenny Benois-Pineau |
| 2025 | Progressive Large-Scale Modeling via Temporal-Spatial Focus Connector for Micro-Action Recognition. Qiankun Li, Qiupu Chen, Huabao Chen, Feng He, Depeng Li, Zhigang Zeng |
| 2025 | Progressive Learning with Human Feedback for Personalized Adaptive Video Streaming. Zhaohui Jiang, Xuening Feng, Tianchi Huang, Ruixiao Zhang, Paul Weng, Yifei Zhu |
| 2025 | Progressive Representation Learning for Weakly-Supervised Camouflaged Object Detection. Shuyong Gao, Qianyu Guo, Yu'ang Feng, Chunyuan Chen, Xujun Wei, Yan Wang, Wenqiang Zhang |
| 2025 | Prompt-Softbox-Prompt: A Free-Text Embedding Control for Image Editing. Yitong Yang, Yinglin Wang, Tian Zhang, Jing Wang, Shuting He |
| 2025 | PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting. Hohyun Na, Seunghoo Hong, Simon S. Woo |
| 2025 | Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Model Watermarking. Cong Kong, Rui Xu, Jiawei Chen, Zhaoxia Yin |
| 2025 | Prototype-Guided Representation Projection for Multi-Domain Multi-Task Recommendation. Binrui Wu, Haochen Sui, Jiaye Lin, Jiechao Gao, Ting Xu, Keyan Jin, Xuesong Zhang |
| 2025 | Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis. Yifan Yang, Shujie Liu, Jinyu Li, Yuxuan Hu, Haibin Wu, Hui Wang, Jianwei Yu, Lingwei Meng, Haiyang Sun, Yanqing Liu, Yan Lu, Kai Yu, Xie Chen |
| 2025 | PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation. Zongsheng Cao, Yangfan He, Anran Liu, Jun Xie, Zhepeng Wang, Feng Chen |
| 2025 | Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection. Luosheng Xu, Dalin Zhang, Zhaohui Song |
| 2025 | Pushing the Limit of Binarized Neural Network for Image Super Resolution with Smooth Information Transmission. Weimin Cheng, Zhenyu Wang, Tao Huang, Fangfang Wu, Weisheng Dong |
| 2025 | PySimPace v2.0: An Easy-to-Use Simulation Tool with Machine Learning Pipelines for Realistic MRI Motion Artifact Generation. Snehil Kumar, Neil Vaughan, Zeyu Fu, Heather Wilson |
| 2025 | Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language Models. Futa Waseda, Saku Sugawara, Isao Echizen |
| 2025 | Quantifying Samples with Invariance for Source-Free Class Incremental Domain Adaptation. Zhiyu Ye, Guowen Li, Haoyuan Liang, Zixi Wang, Shilei Cao, Yushan Lai, Juepeng Zheng |
| 2025 | Quantifying Structural Aesthetic Features and Personality Trait Preferences in Tiancheng Liu, Jiayi Ye, Shumeng Zhang, Kang Zhang, Chen Liang |
| 2025 | Quantization Meets OOD: Generalizable Quantization-aware Training from a Flatness Perspective. Jiacheng Jiang, Yuan Meng, Chen Tang, Han Yu, Qun Li, Zhi Wang, Wenwu Zhu |
| 2025 | Quantum Interference-Inspired Who-What-Where Composite-Semantics Instance Search for Story Videos. Zijun Xu, Jiahao Guo, Chunjie Zhang, Zhongyuan Wang, Chunxia Xiao, Chao Liang |
| 2025 | Query-Based Audio-Visual Temporal Forgery Localization with Register-Enhanced Representation Learning. Xiaodong Zhu, Suting Wang, Junqi Yang, Yuhong Yang, Weiping Tu, Zhongyuan Wang |
| 2025 | Query-Focused Multimodal Summarization with Gate-Guided Mixture-of-Experts. Jiajun Han, Xuran Yang, Hui Zhang |
| 2025 | Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning. Xun Li, Rodrigo Santa Cruz, Mingze Xi, Hu Zhang, Madhawa Perera, Ziwei Wang, Ahalya Ravendran, Brandon J. Matthews, Feng Xu, Matt Adcock, Dadong Wang, Jiajun Liu |
| 2025 | Querying Autonomous Vehicle Point Clouds: Enhanced by 3D Object Counting with CounterNet. Xiaoyu Zhang, Zhifeng Bao, Hai Dong, Ziwei Wang, Jiajun Liu |
| 2025 | RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual Data. Yoorhim Cho, Hongyeob Kim, Semin Kim, Youjia Zhang, YunSeok Choi, Sungeun Hong |
| 2025 | RAIDX: A Retrieval-Augmented Generation and GRPO Reinforcement Learning Framework for Explainable Deepfake Detection. Tianxiao Li, Zhenglin Huang, Haiquan Wen, Yiwei He, Shuchang Lyu, Baoyuan Wu, Guangliang Cheng |
| 2025 | RATopo: Improving Lane Topology Reasoning via Redundancy Assignment. Han Li, Shaofei Huang, Longfei Xu, Yulu Gao, Beipeng Mu, Si Liu |
| 2025 | RCQoEA-360VR: Real-time Continuous QoE Scores for HMD-based 360° VR Dataset. Sowmya Vijayakumar, Tong Xue, Abdallah El Ali, Irene Viola, Ronan Flynn, Peter Corcoran, Pablo César, Niall Murray |
| 2025 | REA-Listener: Real-Time Listening Head Generation with Dynamic Emotion Modeling and Flexible Modality Adaptation. Sizhe Zhao, Chenyang Wang, Weiyu Zhao, Zonglin Li, Ming Li, Shengping Zhang |
| 2025 | REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge. Siyang Song, Micol Spitale, Xiangyu Kong, Hengde Zhu, Cheng Luo, Cristina Palmero, Germán Barquero, Sergio Escalera, Michel F. Valstar, Mohamed Daoudi, Tobias Baur, Fabien Ringeval, Andrew Howes, Elisabeth André, Hatice Gunes |
| 2025 | REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis. Duy-Cat Can, Quang-Huy Tang, Huong Ha, Binh T. Nguyen, Oliver Y. Chén |
| 2025 | REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts. Xinkui Lin, Yongxiu Xu, Minghao Tang, Shilong Zhang, Hongbo Xu, Hao Xu, Yubin Wang |
| 2025 | RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment. Jianing Jin, Jiangyong Ying, Huiyu Duan, Liu Yang, Sijing Wu, Yunhao Li, Yushuo Zheng, Xiongkuo Min, Guangtao Zhai |
| 2025 | RIFTCast: A Template-Free End-to-End Multi-View Live Telepresence Framework and Benchmark. Domenic Zingsheim, Markus Plack, Hannah Dröge, Janelle Pfeifer, Patrick Stotko, Matthias B. Hullin, Reinhard Klein |
| 2025 | RQ-Rec: Residual Quantized Hierarchical Preference Modeling for Cross-Domain Recommendation. Yingjun Dai, Ahmed El-Roby |
| 2025 | RSFomer: Time Series Transformer for Robust Sports Action Recognition. Yongan Guo, Zhongyan Zhou, Yuao Wang, Na Zhu, Xuyun Zhang, Hongwang Xiao, Yuan Miao, Bo Li |
| 2025 | RSVLM-QA: A Benchmark Dataset for Remote Sensing Vision Language Model-based Question Answering. Xing Zi, Jinghao Xiao, Yunxiao Shi, Xian Tao, Jun Li, Ali Braytee, Mukesh Prasad |
| 2025 | RTR-GS: 3D Gaussian Splatting for Inverse Rendering with Radiance Transfer and Reflection. Yongyang Zhou, Fanglue Zhang, Zichen Wang, Lei Zhang |
| 2025 | RUN: A Case for Cross-Layer Networked Virtual Reality. Yufeng Chen, Umakant Kulkarni, Voicu Popescu, Sonia Fahmy |
| 2025 | RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion. Wenzhe He, Xiaojun Chen, Wentang Chen, Hongyu Wang, Ying Liu, Ruihui Li |
| 2025 | RWKV3D: An RWKV-Based Model with Multiple Training Strategies for Point Cloud Analysis. Chenglong Sun, Shijie Pang, Yuzheng Wang, Lizhe Qi |
| 2025 | RadLAS: A Foundation Model for Interpretable Radiography Image Analysis with Lesion-Aware Self-Supervised Pre-training. Yihang Liu, Ying Wen, Longzhen Yang, Lianghua He, Heng Tao Shen |
| 2025 | Radar-Mamba: 4D Millimeter-Wave Point Cloud Enhancement via State Space Models. Hong Gao, Xiangkai Xu, Tianqi Zhu, Xiugang Dong, Yiming Bao, Min-Ling Zhang |
| 2025 | Re-Activating Frozen Primitives for 3D Gaussian Splatting. Yuxin Cheng, Binxiao Huang, Wenyong Zhou, Taiqiang Wu, Zhengwu Liu, Graziano Chesi, Ngai Wong |
| 2025 | Re-examining Concept-based Explainable Models for Multimodal Interpretative Tasks. Julie Tores, Elisa Ancarani, Rémy Sun, Lucile Sassatelli, Hui-Yin Wu, Frédéric Precioso |
| 2025 | ReCap: Event-Aware Image Captioning with Article Retrieval and Semantic Gaussian Normalization. Thinh-Phuc Nguyen, Thanh-Hai Nguyen, Gia-Huy Dinh, Lam-Huy Nguyen, Minh-Triet Tran, Trung-Nghia Le |
| 2025 | ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension. Yizhi Hu, Zezhao Tian, Xingqun Qi, Chen Su, Bingkun Yang, Junhui Yin, Muyi Sun, Man Zhang, Zhenan Sun |
| 2025 | ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion Model. Cheng Luo, Siyang Song, Siyuan Yan, Zhen Yu, Zongyuan Ge |
| 2025 | Reactffusion: Physical Contact-guided Diffusion Model for Reaction Generation. Zihang Zhang, Shoulong Zhang, Yan Wang, Shuai Li |
| 2025 | Reading Between the Channels: Knowledge-Augmented Medical Time Series Classification. Xiaoyan Yuan, Wei Wang, Junxin Chen, Xiping Hu |
| 2025 | Real-Time EEG Emotion Recognition from Dynamic Mixed Spatiotemporal Graph Learning. Yue Pan, Cunbo Li, Peiyang Li, Fali Li, Feng Wan, Dezhong Yao, Zehong Cao, Peng Xu |
| 2025 | Real-Time SSL Sperm Whale Click Detector: Interactive Web Demo. Anvar Iskhakov, Viktor Kovalev, Vladislav Naumov, Ilya Makarov |
| 2025 | Real-time GenAI Solutions for Video Streaming in Low-bandwidth Settings. Claudio Baecchi, Matteo Bruni, Fabio Clabot, Marco Bertini |
| 2025 | RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking. Shuo Yang, Yuqin Dai, Guoqing Wang, Xinran Zheng, Jinfeng Xu, Jinze Li, Zhenzhe Ying, Weiqiang Wang, Edith C. H. Ngai |
| 2025 | RealHD: A High-Quality Dataset for Robust Detection of State-of-the-Art AI-Generated Images. Hanzhe Yu, Yun Ye, Jintao Rong, Qi Xuan, Chen Ma |
| 2025 | RealText: Realistic Text Image Generation based on Glyph and Scene Aware Inpainting. Zihou Liu, Dongming Zhang, Jing Zhang, Jun Li, Yongdong Zhang |
| 2025 | RealVG: Unleashing MLLMs for Training-Free Spatio-Temporal Video Grounding in the Wild. Hongchen Wei, Zhenzhong Chen |
| 2025 | Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis. Xueqi Ma, Yanbei Jiang, Sarah M. Erfani, James Bailey, Weifeng Liu, Krista A. Ehinger, Jey Han Lau |
| 2025 | Reasoning and Planning for Multimodal Large Language Models: A Multilingual and Cross-Domain Exploration. Sarmistha Das, Akash Ghosh, Sriparna Saha, Koustava Goswami, K. J. Joseph |
| 2025 | RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation. Ruoxuan Zhang, Jidong Gao, Bin Wen, Hongxia Xie, Chenming Zhang, Hong-Han Shuai, Wen-Huang Cheng |
| 2025 | RecipeRAG: Advancing Recipe Generation with Reinforced Retrieval Augmented Generation. Jinghan Yang, Zhenbo Xu, Dehua Ma, Liu Liu, Fei Liu, Gong Huang, Zhaofeng He |
| 2025 | Reconstructing the Experience of Nüshu Culture: An Exploration via Multimodal Mixed Reality Systems. Zheyu Feng, Boya Liu, Zhonghe Ruan, Xinyi Zhang, Zihan Gao |
| 2025 | Referring Expression Instance Retrieval and A Strong End-to-End Baseline. Xiangzhao Hao, Kuan Zhu, Hongyu Guo, Haiyun Guo, Ning Jiang, Quan Lu, Ming Tang, Jinqiao Wang |
| 2025 | Referring Multi-Object Tracking in Satellite Videos: A New Benchmark and Baseline. Peirong Zhang, Yidan Zhang, Hanru Shi, Dianyu Wang, Xiaoxuan Liu, Lei Wang |
| 2025 | Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation. Shouxing Ma, Yawen Zeng, Shiqing Wu, Guandong Xu |
| 2025 | Regist3R: Incremental Registration with Stereo Foundation Model. Sidun Liu, Wenyu Li, Peng Qiao, Yong Dou |
| 2025 | Regularizing Subspace Redundancy of Low-Rank Adaptation. Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu, Yunzhi Zhuge, Shuai Hao, Xu Jia, Lu Zhang, Ying Zhang, Huchuan Lu |
| 2025 | Regulatory Focus Theory Induced Micro-Expression Analysis with Structured Representation Learning. Bohao Zhang, Haoxin Xu, Jingzhong Lin, Changbo Wang, Gaoqi He |
| 2025 | Reliable Cross-modal Alignment via Prototype Iterative Construction. Xiang Ma, Litian Xu, Lexin Fang, Caiming Zhang, Lizhen Cui |
| 2025 | Relightable and Dynamic Gaussian Avatar Reconstruction from Monocular Video. Seonghwa Choi, Moonkyeong Choi, Mingyu Jang, Jaekyung Kim, Jianfei Cai, Wen-Huang Cheng, Sanghoon Lee |
| 2025 | Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors. Bing Wang, Ximing Li, Mengzhe Ye, Changchun Li, Bo Fu, Jianfeng Qu, Lin Yuanbo Wu |
| 2025 | RemoteSAM: Towards Segment Anything for Earth Observation. Liang Yao, Fan Liu, Delong Chen, Chuanyi Zhang, Yijun Wang, Ziyun Chen, Wei Xu, Shimin Di, Yuhui Zheng |
| 2025 | Reproducibility Companion Paper: Enhancing Model Interpretability with Local Attribution over Global Exploration. Zhiyu Zhu, Zhibo Jin, Jiayu Zhang, Fang Chen, Jianlong Zhou, Vijay John, Florian Spiess |
| 2025 | Reproducibility Companion Paper: Maskable Retentive Network for Video Moment Retrieval. Jingjing Hu, Dan Guo, Meng Wang, Jiaxi Li, Fei Liu |
| 2025 | Reproducibility Companion Paper: NIF: A Fast Implicit Image Compression with Bottleneck Layers and Modulated Sinusoidal Activations. Lorenzo Catania, Dario Allegra, Luigi Capogrosso, Thu Nguyen |
| 2025 | Reproducibility Companion Paper: Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks. Hamed Alimohammadzadeh, Shahram Ghandeharizadeh, Federico Cunico, Joshua Springer |
| 2025 | Research and Standardization Trends in Compression and Transmission Technologies for 3D Point Cloud. Keisuke Nonaka |
| 2025 | ResearchPulse: Building Method-Experiment Chains through Multi-Document Scientific Inference. Qi Chen, Jingxuan Wei, Zhuoya Yao, Haiguang Wang, Gaowei Wu, Bihui Yu, Siyuan Li, Cheng Tan |
| 2025 | Residual Prior-driven Frequency-aware Network for Image Fusion. Zheng Guan, Xue Wang, Wenhua Qian, Peng Liu, Runzhuo Ma |
| 2025 | Retaining Temporal Semantics and Relation Topologies for Continual Weakly-Supervised Audio-Visual Video Parsing. Jie Fu, Bingkun Bao |
| 2025 | Rethinking Diffusion Bridge Model with Dual Alignments for Medical Image Synthesis. Jinbao Wei, Yuhang Chen, Zhijie Wang, Gang Yang, Shimin Tao, Jian Gao, Aiping Liu, Xun Chen |
| 2025 | Rethinking Individual Fairness in Deepfake Detection. Aryana Hou, Li Lin, Justin Li, Shu Hu |
| 2025 | Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go Beyond. Huiyu Zhai, Xingxing Yang, Yalan Ye, Chenyang Li, Bin Fan, Changze Li |
| 2025 | Rethinking the Reliability of Evidence in End-to-End Fact-Checking from the Causal Perspective. Xubo Liu, Wenya Guo, Ruxue Yan, Xumeng Liu, Ying Zhang, Ru Zhou |
| 2025 | Retrieval Augmented 3D Garment Generation from Single Image. Qixun Zeng |
| 2025 | Revealing Latent Information: A Physics-inspired Self-supervised Pre-training Framework for Noisy and Sparse Events. Lin Zhu, Ruonan Liu, Xiao Wang, Lizhi Wang, Hua Huang |
| 2025 | Reversible Privacy Preserving on Vision-Language Models via Adversarial Multimodal Key. Peng Ying, Zhongnian Li, Meng Wei, Xinzheng Xu |
| 2025 | Revisiting Data Auditing in Large Vision-Language Models. Hongyu Zhu, Sichu Liang, Wenwen Wang, Boheng Li, Tongxin Yuan, Fangqi Li, Hanyi Wang, Shi-Lin Wang, Zhuosheng Zhang |
| 2025 | Rhythm Gate: Invisible Conversations in the Elevator - Echoes of Material, Behavior, Memory and Transformation. Xia Liu, Xiao Zhang |
| 2025 | Rig-Reconstruct-Render (R Yuxuan Xiong, Ye Chen, Yue Shi, Zhangli Hu, Bingbing Ni |
| 2025 | RoboAfford: A Dataset and Benchmark for Enhancing Object and Spatial Affordance Learning in Robot Manipulation. Yingbo Tang, Lingfeng Zhang, Shuyi Zhang, Yinuo Zhao, Xiaoshuai Hao |
| 2025 | RoboSax Melody Slot Machine. Masatoshi Hamanaka, Gou Koutaki |
| 2025 | RoboSoft'25: The 1st International Workshop on Vision-Language in Soft Robot. Ziyu Wei, Luting Wang, Chen Gao, Hongliang Huang, Jiaqi Liu, Li Wen, Si Liu |
| 2025 | Robust Gaussian Surface Reconstruction with Semantic Aware Progressive Propagation. Yusen Wang, Huan Zhou, Yu Jiang, Chunxia Xiao |
| 2025 | Robust Modality-Incomplete Anomaly Detection: A Modality-Instructive Framework with Benchmark. Bingchen Miao, Wenqiao Zhang, Juncheng Li, Wangyu Wu, Siliang Tang, Zhaocheng Li, Haochen Shi, Jun Xiao, Yueting Zhuang |
| 2025 | Robust Multi-view Clustering via Pseudo Label Guided Universum Learning. Zhenxi Wang, Zongyao Yin, Yujie Hou, Xianchuan Yu |
| 2025 | Robust Photo-Realistic Hand Gesture Generation: from Single View to Multiple View. Qifan Fu, Xu Chen, Muhammad Asad, Shanxin Yuan, Changjae Oh, Gregory G. Slabaugh |
| 2025 | Robust Single Image Sand Removal by Leveraging Uncertainty-aware SAM Priors and Prompt Learning with Refined Perceptual Loss. Bingcai Wei, Hui Liu, Chuang Qian, Zijian Li, Wangyu Wu, Zijie Meng |
| 2025 | Robust Tensor Learning with Graph Diffusion for Scalable Multi-view Graph Clustering. Jiale Zou, Yan Chen, Bingbing Jiang, Peng Zhou, Liang Du, Lei Duan, Yuhua Qian |
| 2025 | Robust Understanding of Human-robot Social Interactions through Multimodal Distillation. Tongfei Bian, Mathieu Chollet, Tanaya Guha |
| 2025 | RobustVisH: Robust Visual-Haptic Cross-Modal Recognition under Transmission Interference. Rouqi Zhang, Chengdi Lu, Hancheng Lu, Yang Cao, Tiesong Zhao |
| 2025 | Robustness as Architecture: Designing IQA Models to Withstand Adversarial Perturbations. Igor N. Meleshin, Anna Chistyakova, Anastasia Antsiferova, Dmitriy S. Vatolin |
| 2025 | Rodecon-net: Medical Image Segmentation via Robust Decoupling and Contrast-enhanced Fusion. Yongquan Xue, Zhaoru Guo, Zhaozhao Su, Chong Peng, Jun Feng, Pan Zhou, Marcin Pietron, Xiyuan Wang, Liejun Wang, Panpan Zheng |
| 2025 | Roll Your Eyes: Gaze Redirection via Explicit 3D Eyeball Rotation. YoungChan Choi, Hengfei Wang, Yihua Cheng, Boeun Kim, Hyung Jin Chang, Younggeun Choi, Sang-Il Choi |
| 2025 | Rule Meets Learning: Confidence-Aware Multi-View Fusion for Self-Supervised 3D Hand Pose Estimation. Pengfei Ren, Jingyu Wang, Haifeng Sun, Qi Qi, Jing Wang, Jianxin Liao |
| 2025 | S Yuqi Chen, Xiubo Liang, Yu Zhao, Hongzhi Wang, Weidong Geng |
| 2025 | SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment. Guoxin Zang, Xue Li, Donglin Di, Lanshun Nie, Dechen Zhan, Yang Song, Lei Fan |
| 2025 | SAKR-Edit: Scene-Aware Knowledge Reasoning for Text-to-Image Editing. Jiawen Wang, Jianjun Li, Zhiyuan Ma, Ruixia Bai |
| 2025 | SALVG: Latent Variable Gene Augmented Graph Learning for Multi-View Clustering in Spatial Transcriptomics. Zeyu Zhu, Ke Liang, Lingyuan Meng, Xingchen Hu, Xinwang Liu, Wanwei Liu, Kunlun He |
| 2025 | SAM based Region-Word Clustering and Inference Score Adjusting for Open-Vocabulary Object Detection. Qiuyu Liang, Yongqiang Zhang |
| 2025 | SAM-Guided Semantic Knowledge Fusion for Visible-Infrared Object Detection. Ting Li, Songtao Li, Shuaifeng Li, Xiaolin Qin, Maoyuan Zhao, Luping Ji, Mao Ye |
| 2025 | SAM-TTT: Segment Anything Model via Reverse Parameter Configuration and Test-Time Training for Camouflaged Object Detection. Zhenni Yu, Li Zhao, Guobao Xiao, Xiaoqin Zhang |
| 2025 | SAMVSR: Leveraging Semantic Priors to Zone-Focused Mamba for Video Snow Removal. Hongtao Wu, Yifeng Wu, Jiaxuan Jiang, Chengyu Wu, Hong Wang, Yefeng Zheng |
| 2025 | SAT: Supervisor Regularization and Animation Augmentation for Two-process Monocular Texture 3D Human Reconstruction. Gangjian Zhang, Jian Shu, Nanjie Yao, Hao Wang |
| 2025 | SCID-Compress900: A Multi-Scene Dataset of 4K and 1080P Screen Content Images for Image Compression Research. Huiming Zheng, Linjie Zhou, Wei Gao |
| 2025 | SCOL: Style Code Orchestration in Latent Space for Proactive Face-Swapping Defense. Eungi Lee, Jae Hyun Yoon, Seok Bong Yoo |
| 2025 | SD-VSum: A Method and Dataset for Script-Driven Video Summarization. Manolis Mylonas, Evlampios Apostolidis, Vasileios Mezaris |
| 2025 | SDART: Spatial Dart AR Simulation with Hand-Tracked Input. Milad Ghanbari, Wei Zhou, Cosmin Stejerean, Christian Timmerer, Hadi Amirpour |
| 2025 | SDG-MLLM: Injecting Structured Dialogue Graphs into MLLM for Multimodal Conversational Aspect-Based Sentiment Analysis. Xinjing Liu, Pengyue Lin, Xinyu Tu, Wenqi Jia, Chen Jiang, Ruifan Li |
| 2025 | SDP: Spectral-Decomposed Prompting for Continual Learning. Siqi Song, Limin Yu, Jimin Xiao |
| 2025 | SDVPT: Semantic-Driven Visual Prompt Tuning for Open-world Object Counting. Yiming Zhao, Guorong Li, Laiyun Qing, Amin Beheshti, Jian Yang, Quan Z. Sheng, Yuankai Qi, Qingming Huang |
| 2025 | SE2E: Recognizing Emotion behind Societal Behavior. Wending Xiong, Ruimin Hu, Lingfei Ren, Xixi Li, Dengshi Li |
| 2025 | SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering Behaviors. Tianlong Yu, Chenghang Ye, Zheyu Yang, Ziyi Zhou, Cui Tang, Zui Tao, Jun Zhang, Kailong Wang, Liting Zhou, Yang Yang, Ting Bi |
| 2025 | SG-FSL: Cross-Domain Few-Shot Learning with Style-Decoupled Augmentation and Gradient-Conflict Adjustment. Yunyu Zou, Yishu Liu, Jun Liang, Bingzhi Chen |
| 2025 | SGM-Transformer: Rethinking Gradient Information Loss and Compensation in Spiking Neural Networks. Xiubo Liang, Hongzhi Wang, Zigen Li, Jinxing Han, Yu Zhao, Weidong Geng |
| 2025 | SHALE: A Scalable Benchmark for Fine-grained Hallucination Evaluation in LVLMs. Bei Yan, Zhiyuan Chen, Yuecong Min, Jie Zhang, Jiahao Wang, Xiaozhen Wang, Shiguang Shan |
| 2025 | SIDA: Synthetic Image Driven Zero-shot Domain Adaptation. Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim |
| 2025 | SLAM- Mingrui Li, Dong Li, Sijia Hu, Kangxu Wang, Zhenjun Zhao, Hongyu Wang |
| 2025 | SLGaussian: Fast Language Gaussian Splatting in Sparse Views. Kangjie Chen, Bingquan Dai, Minghan Qin, Dongbin Zhang, Peihao Li, Yingshuang Zou, Haoqian Wang |
| 2025 | SLIVeR: A Narrative VR Experience for Immersive Lifelog Exploration. Liang Xu, Songkai Jia, Cathal Gurrin, Allie Tran |
| 2025 | SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis. Zhuojun Wu, Dong Liu, Juan Liu, Yechen Wang, Linxi Li, Liwei Jin, Hui Bu, Pengyuan Zhang, Ming Li |
| 2025 | SMPV: Social Media Prediction for Videos. Bo Wu, Peiye Liu, Qiushi Huang, Zhaoyang Zeng, Jia Wang, Bei Liu, Jiebo Luo, Wen-Huang Cheng |
| 2025 | SOMIN: An Explainable AI and LLM Platform for Real-Time, Data-Driven Digital Marketing Strategy. Aleksandr Farseev |
| 2025 | SP-Mamba: Spatial-Perception State Space Model for Unsupervised Medical Anomaly Detection. Rui Pan, Ruiying Lu |
| 2025 | SPAN: Continuous Modeling of Suspicion Progression for Temporal Intention Localization. Xinyi Hu, Yuran Wang, Ruixu Zhang, Yue Li, Wenxuan Liu, Zheng Wang |
| 2025 | SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion. Zhiwen Yang, Yuxin Peng |
| 2025 | SSAIM: Not All Self-Attentions Contain Effective Spatial Structure in Diffusion Models for Text-to-Image Editing. Zhenbo Yu, Jimin Dai, Yingzhen Zhang, Jian Yang, Lei Luo |
| 2025 | ST-SAM: SAM-Driven Self-Training Framework for Semi-Supervised Camouflaged Object Detection. Xihang Hu, Fuming Sun, Jiazhe Liu, Feilong Xu, Xiaoli Zhang |
| 2025 | STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models. Mahiro Ukai, Shuhei Kurita, Nakamasa Inoue |
| 2025 | SUMAC '25: 7th Workshop on analySis, Understanding and proMotion of heritAge Contents: Advances in Machine Learning, Signal Processing, Multimodal Techniques and Human-machine Interaction. Valérie Gouet-Brunet, Edgar Roman-Rangel, Li Weng |
| 2025 | SUVIS: A Depth- and Motion-Encoded Stereoscopic System for Communicating Forecast Uncertainty. Le Liu, Shizhou Zhang, Di Xu |
| 2025 | SVD: Spatial Video Dataset. Mohammad Hossein Izadimehr, Milad Ghanbari, Guodong Chen, Wei Zhou, Xiaoshuai Hao, Mallesham Dasari, Christian Timmerer, Hadi Amirpour |
| 2025 | SVDGNet: Shapley Value-Based Weight Adjustment for Unsupervised Image Style Transfer. Yi Han, Yaochen Li, Peijun Chen, Wenlong Zhou, Jinhuo Yang, Jintao Chang |
| 2025 | SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation. Hanqi Chen, Zhongyin Zhao, Ye Chen, Zhujin Liang, Bingbing Ni |
| 2025 | SVGen: Interpretable Vector Graphics Generation with Large Language Models. Feiyu Wang, Zhiyuan Zhao, Yuandong Liu, Da Zhang, Junyu Gao, Hao Sun, Xuelong Li |
| 2025 | SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation. Siqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang, Guiyang Hou, Yongliang Shen, Weiming Lu, Yueting Zhuang |
| 2025 | SaP-Bot: A Multimodal Large-Language Model for End-to-End Same-Product Identification. Yixuan Zhou, Yulu Tian, Wenliang Zhong, Xingbin Yu, Heng Tao Shen, Xing Xu |
| 2025 | Safe Semantics, Unsafe Interpretations: Tackling Implicit Reasoning Safety in Large Vision-Language Models. Wei Cai, Jian Zhao, Yuchu Jiang, Tianle Zhang, Xuelong Li |
| 2025 | Safe-BVAR: Text-to-Image Generative Watermarking for Bitwise Visual AutoRegressive Model. Shengjiu Dai, Xiujian Liang, Sheng Li, Zhenxing Qian, Xinpeng Zhang |
| 2025 | SafeCFG: Controlling Harmful Features with Dynamic Safe Guidance for Safe Generation. Jiadong Pan, Liang Li, Hongcheng Gao, Zheng-Jun Zha, Qingming Huang, Jiebo Luo |
| 2025 | SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation. Hao Ye, Mengshi Qi, Zhaohong Liu, Liang Liu, Huadong Ma |
| 2025 | Saliency-Guided Adaptive Random Diffusion for Remote Sensing Images Restoration with Cloud and Haze. Wanting Zhang, Jingxuan Zhang, Libao Zhang |
| 2025 | Sample-level Adaptive Knowledge Distillation for Action Recognition. Ping Li, Chenhao Ping, Wenxiao Wang, Mingli Song |
| 2025 | Scalable Multi-view Clustering based on Tight Anchor Distribution. Yawei Chen, Huibing Wang, Mingze Yao, Jinjia Peng, Guangqi Jiang, Jiqing Zhang |
| 2025 | Scalable One-step Unaligned Multi-view Clustering via Joint High-Order Correlation Learning. Hongyu Jiang, Yuxin Huo, Sirou Sheng, Hong Tao, Chenping Hou |
| 2025 | Scalable Unpaired Multi-View Clustering via Anchor-Driven High-Throughput Encoding. Junyu Chen, Jiawei Peng, Yuan Sun, Jian Dai, Xingfeng Li, Zhenwen Ren |
| 2025 | Scaling Laws for Data-Efficient Visual Transfer Learning. Wenxuan Yang, Qingqv Wei, Chenxi Ma, Weimin Tan, Bo Yan |
| 2025 | Scattering-Conditioned Diffusion Models for Multiple Appropriate Facial Reaction Generation. Qirong Mao, Qiwei Wu, Na Liu, Yakui Ding, Lijian Gao |
| 2025 | Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE. Yiying Yang, Fukun Yin, Jiayuan Fan, Wanzhang Li, Xin Chen, Gang Yu |
| 2025 | Screen Content Video Dataset and Benchmark. Nickolay Safonov, Rakhmanov Mikhail, Dmitriy S. Vatolin |
| 2025 | ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use. Kaixin Li, Ziyang Meng, Hongzhan Lin, Ziyang Luo, Yuchen Tian, Jing Ma, Zhiyong Huang, Tat-Seng Chua |
| 2025 | SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples. Yin Wang, Zixuan Wang, Hao Lu, Zhen Qin, Hailiang Zhao, Guanjie Cheng, Xin Du, Ge Su, Li Kuang, MengChu Zhou, Shuiguang Deng |
| 2025 | Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security. Muzhi Dai, Shixuan Liu, Zhiyuan Zhao, Junyu Gao, Hao Sun, Xuelong Li |
| 2025 | See Different, Think Better: Visual Variations Mitigating Hallucinations in LVLMs. Ziyun Dai, Xiaoqiang Li, Shaohua Zhang, Yuanchen Wu, Jide Li |
| 2025 | See Through the Occlusions: Few-Shot Gaussian Splatting with Layered Amodal Supervision. Gwon-Jung Kim, Du Yeol Lee, Jae Hong Yang, Chae-Eun Rhee |
| 2025 | Seeing Through Ambiguity: Effective Video-guided Machine Translation via Chaotic Fusion and Causally Aligned Spatio-temporal Attention. Jiawei Zheng, Feiyan Liu, Xiaoli Wang |
| 2025 | Seeing from Magic Mirror: Contrastive Learning from Reconstruction for Pose-based Gait Recognition. Shibei Meng, Saihui Hou, Yang Fu, Xuecai Hu, Junzhou Huang, Yongzhen Huang |
| 2025 | Seeing the Overlooked: Bio-Visual Inspired Weak Saliency Feedback Transformer for Person Re-identification. Changshuo Wang, Shuting He, Xiang Fang, Fangzhe Nan, Prayag Tiwari |
| 2025 | Seeing the Undefined: Chain-of-Action for Generative Semantic Labels. Meng Wei, Zhongnian Li, Peng Ying, Xinzheng Xu |
| 2025 | Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image Collections. Yongtang Bao, Chengjie Tang, Yuze Wang, Haojie Li |
| 2025 | SegTraj: A Segmented-Trajectory-Aware Spatio-Temporal Graph Convolutional Network for Social Group Detection. Xiongwei Dang, Wenxuan Liu, Xian Zhong, Zheng Wang |
| 2025 | Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous Driving. Mi Zheng, Guanglei Yang, Zitong Huang, Zhenhua Guo, Kevin Han, Wangmeng Zuo |
| 2025 | Selective Shift: Towards Personalized Domain Adaptation in Multi-Agent Collaborative Perception. Hui Zhang, Yiteng Xu, Yonglin Tian, Yidong Li, Tiago H. Falk, Fei-Yue Wang |
| 2025 | Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models. Wanying Wang, Zeyu Ma, Han Zheng, Xin Tan, Mingang Chen |
| 2025 | Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation. Longzhen Yang, Zhangkai Ni, Ying Wen, Yihang Liu, Lianghua He, Heng Tao Shen |
| 2025 | Self-Supervised Human Mesh Recovery from Partial Point Cloud via a Self-Improving Loop. Chang Su, Beihong Jin, Fusang Zhang, Siheng Li, Zhi Wang |
| 2025 | Self-Supervised Vision Graph Neural Networks Based on Contrastive Learning. Yuzhen Li, Yuehui Han, Jianjun Qian, Jian Yang |
| 2025 | SemGesture: Synthesizing Semantically Enhanced and Coherent Gestures. Pengsheng Liu, Zhaojie Chu, Xiaofen Xing, Xiangmin Xu |
| 2025 | Semantic-Aware Hard Negative Mining for Medical Vision-Language Contrastive Pretraining. Yongxin Li, Ying Cheng, Yaning Pan, Wen He, Qing Wang, Rui Feng, Xiaobo Zhang |
| 2025 | SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian Garments. Ruiyan Wang, Zhengxue Cheng, Zonghao Lin, Jun Ling, Yuzhou Liu, Yanru An, Rong Xie, Li Song |
| 2025 | Semantics-Driven Contrastive Learning for Real-World Depth Super Resolution. Xinchen Ye, Aokai Zhang, Rui Xu |
| 2025 | SenseCam and Isotyping: The Challenges and Benefits of Working with New Hardware. Steve Hodges |
| 2025 | Sentence-level Segmentation for Long Sign Language Videos with Captions. Bowen Guo, Shiwei Gan, Yafeng Yin, Xiao Liu, Zhiwei Jiang, Shunmei Meng |
| 2025 | SepVAMark: Deep Separable Visual-Audio Fusion Watermarking for Source Tracing and Deepfake Detection. Chuan Zhang, Zihan Li, Zihao Xu, Xuhao Ren, Liehuang Zhu |
| 2025 | Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models. Huijie Liu, Jingyun Wang, Shuai Ma, Jie Hu, Xiaoming Wei, Guoliang Kang |
| 2025 | Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis. Zihao Liu, Mingwen Ou, Zunnan Xu, Jiaqi Huang, Haonan Han, Ronghui Li, Xiu Li |
| 2025 | SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding. Jiawen Lin, Shiran Bian, Yihang Zhu, Wenbin Tan, Yachao Zhang, Yuan Xie, Yanyun Qu |
| 2025 | Sequence-Event Semantic Consistent Learning for Text-to-Motion Retrieval. Haoyu Shi, Huaiwen Zhang |
| 2025 | Sera: Separated Coarse-to-fine Representation Alignment for Cross-subject EEG-based Emotion Recognition. Zhihao Jia, Meiyan Xu, Jingyuan Wang, Ziyu Jia, Yong Li, Xinliang Zhou, Chenyu Liu, Junfeng Yao, Yi Ding |
| 2025 | Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking. Zhangyong Tang, Tianyang Xu, Xuefeng Zhu, Chunyang Cheng, Tao Zhou, Xiaojun Wu, Josef Kittler |
| 2025 | Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts. Leyang Li, Shilin Lu, Yan Ren, Adams Wai-Kin Kong |
| 2025 | Severe Light, Textureless Sight: A Benchmark for Extreme Exposure Correction. Bo Wang, Jin Liu, Huiyuan Fu, Xin Wang, Heng Zhang, Huadong Ma |
| 2025 | Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation. Xiangyu Zheng, Songcheng He, Wanyun Li, Xiaoqiang Li, Wei Zhang |
| 2025 | ShieldIR: Privacy-Preserving Unsupervised Cross-Domain Image Retrieval via Dual Protection Transformation. Zixin Tang, Haihui Fan, Jinchao Zhang, Hui Ma, Xiaoyan Gu, Bo Li, Weiping Wang |
| 2025 | ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs: ShieldVLM. Shiyao Cui, Qinglin Zhang, Xuan Ouyang, Renmiao Chen, Zhexin Zhang, Yida Lu, Hongning Wang, Han Qiu, Minlie Huang |
| 2025 | Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers. Ji Ma, Wei Suo, Peng Wang, Yanning Zhang |
| 2025 | Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration. WenKang Han, Wang Lin, Yiyun Zhou, Qi Liu, Shulei Wang, Chang Yao, Jingyuan Chen |
| 2025 | SiFMimicEvader: Evading Fake Voice Detection with Adversarial Neural Mimicry Attacks. Xuan Hai, Xin Liu, Zihao Zhang, Ziyao Yu, Xiangzhen Kong, Song Li, Weina Niu, Rui Zhou, Qingguo Zhou |
| 2025 | Signal-SGN: A Spiking Graph Convolutional Network for Skeleton Action Recognition via Learning Temporal-Frequency Dynamics. Naichuan Zheng, Yuchen Du, Hailun Xia, Zeyu Liang |
| 2025 | SimViews: An Interactive Multi-Agent System Simulating Visitor-to-Visitor Conversational Patterns to Present Diverse Perspectives of Artifacts in Virtual Museums. Mingyang Su, Chao Liu, Jingling Zhang, Shuang Wu, Mingming Fan |
| 2025 | Simple but Effective: Sub-Volume Contrastive Learning for Class-Imbalanced Semi-Supervised 3D Medical Image Segmentation. Xianrun Xu, Baoyao Yang, Wanyun Li, Jingsong Lin, Yufei Xu |
| 2025 | Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model. Zihao Wang, Ruibin Yuan, Ziqi Geng, Hengjia Li, Xingwei Qu, Xinyi Li, Songye Chen, Haoying Fu, Roger B. Dannenberg, Kejun Zhang |
| 2025 | Single Domain Generalization for Multimodal Cross-Cancer Prognosis via Dirac Rebalancer and Distribution Entanglement. Jia-Xuan Jiang, Jiashuai Liu, Hongtao Wu, Yifeng Wu, Zhong Wang, Qi Bi, Yefeng Zheng |
| 2025 | Single Trajectory Distillation for Accelerating Image and Video Style Transfer. Sijie Xu, Runqi Wang, Wei Zhu, Dejia Song, Nemo Chen, Xu Tang, Yao Hu |
| 2025 | SizeGS: Size-aware Compression of 3D Gaussian Splatting via Mixed Integer Programming. Shuzhao Xie, Jiahang Liu, Weixiang Zhang, Shijia Ge, Sicheng Pan, Chen Tang, Yunpeng Bai, Cong Zhang, Xiaoyi Fan, Zhi Wang |
| 2025 | Skeleton Compression and Complementary Enhanced Fusion Under Branch-Stage Supervision for Human Action Recognition. Qin Li, Congcong Xiao, Limei Liu, Han Peng, Junfeng Yang |
| 2025 | Skynet-V1: Towards Early Warning of Video Abnormal Events via A Spatial-temporal Causal-enhanced MoE Framework. Junxiao Ma, Jingjing Wang, Min Zhang, Guodong Zhou |
| 2025 | Slot Attention with Re-Initialization and Self-Distillation. Rongzhen Zhao, Yi Zhao, Juho Kannala, Joni Pajarinen |
| 2025 | Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach. Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang |
| 2025 | SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding. Qianqian Sun, Jixiang Luo, Dell Zhang, Xuelong Li |
| 2025 | SmokeBench: A Real-World Dataset for Surveillance Image Desmoking in Early-Stage Fire Scenes. Wenzhuo Jin, Qianfeng Yang, Xianhao Wu, Hongming Chen, Pengpeng Li, Xiang Chen |
| 2025 | Smooth Online Multiple Appropriate Facial Reaction Generation. Weicheng Xie, Chunlin Yan, Siyang Song, Zitong Yu, Linlin Shen, Laizhong Cui |
| 2025 | So Long: Interactive Storytelling, Embodying Collective Historical Memory, and Participatory Archiving in a VR Voyage. Tianxing Zhou, Chengkai Xu, Xinyue Yao |
| 2025 | Solving Critical Real-World Business Challenges - NEC's Industrial Research Model in the AI Era. Yasunori Mochizuki |
| 2025 | SonicGauss: Position-Aware Physical Sound Synthesis for 3D Gaussian Representations. Chunshi Wang, Hongxing Li, Yawei Luo |
| 2025 | Sovereign & Shared: Frugally Scalable Multilingual-Multimodal AI for Bharat. Maneesh Kumar Singh |
| 2025 | SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation. Kien T. Pham, Yingqing He, Yazhou Xing, Qifeng Chen, Long Chen |
| 2025 | Spark LLM and the Scientific Research it empowers: Practice and Thoughts. Xin Li |
| 2025 | Sparse4DGS: Flow-Geometry Assisted 4D Gaussian Splatting for Dynamic Sparse View Synthesis. Dongdong Hu, Yang Zhou, Xiaofeng Huang, Haibing Yin, Zhu Li |
| 2025 | Spatial Imputation Drives Cross-Domain Alignment for EEG Classification. Hongjun Liu, Chao Yao, Yalan Zhang, Xiaokun Wang, Xiaojuan Ban |
| 2025 | Spatial-Aware Multi-Modal Information Fusion for Food Nutrition Estimation. Dongjian Yu, Weiqing Min, Xin Jin, Qian Jiang, Shuqiang Jiang |
| 2025 | Spatial-Frequency Mamba Collaborative Learning Network for Infrared Small Target Detection. Yongji Li, Luping Wang |
| 2025 | Spatial-Temporal Decomposition and Alignment in Controllable Video-to-Music Generation. Weitao You, Heda Zuo, Junxian Wu, Dengming Zhang, Zhibin Zhou, Lingyun Sun |
| 2025 | Spatiotemporal Degradation-Aware 3D Gaussian Splatting for Realistic Underwater Scene Reconstruction. Shaohua Liu, Ning Gao, Zuoya Gu, Hongkun Dou, Yue Deng, Hongjue Li |
| 2025 | SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching. Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu, Fei Ren, Shaobo Wang, Kaixin Li, Linfeng Zhang |
| 2025 | SpecSolver: Solving Spatial-Spectral Fusion via Semantic Transformer. Wei Li, Junwei Zhu, Honghui Xu, Jiawei Jiang, Jianwei Zheng |
| 2025 | SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake Detection. Inzamamul Alam, Md Tanvir Islam, Simon S. Woo |
| 2025 | Specify Privacy Yourself: Assessing Inference-Time Personalized Privacy Preservation Ability of Large Vision-Language Models. Xingqi Wang, Xiaoyuan Yi, Xing Xie, Jia Jia |
| 2025 | Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation. Wenrui Liu, Qian Chen, Wen Wang, Guanrou Yang, Weiqin Li, Minghui Fang, Jialong Zuo, Xiaoda Yang, Tao Jin, Jin Xu, Zemin Liu, Yafeng Chen, Jionghao Bai, Zhifang Guo |
| 2025 | Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning. Jiangrong Shen, Yulin Xie, Qi Xu, Gang Pan, Huajin Tang, Badong Chen |
| 2025 | SpineBench: Benchmarking Multimodal LLMs for Spinal Pathology Analysis. Chenghanyu Zhang, Zekun Li, Peipei Li, Xing Cui, Shuhan Xia, Weixiang Yan, Yiqiao Zhang, Qianyu Zhuang |
| 2025 | SplatPose: On-Device Outdoor AR Pose Estimation Using Gaussian Splatting. Weiwu Pang, Rajrup Ghosh, Jiawei Yang, Ziyu Wei, Branden Leong, Yue Wang, Ramesh Govindan |
| 2025 | Stable Diffusion-Based Approach for Human De-Occlusion. Seung Young Noh, Ju Yong Chang |
| 2025 | StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step Reasoning. Yuxi Bi, Yunfan Gao, Haofen Wang |
| 2025 | Stealthy-AE: Generating Stealthy Adversarial Examples through Online Social Networks. Ziming Zhao, Zhaoxuan Li, Tingting Li, Fan Zhang |
| 2025 | Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation. Chao Yin, Hao Li, Kequan Yang, Jide Li, Pinpin Zhu, Xiaoqiang Li |
| 2025 | Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction. Xiufeng Huang, Ka Chun Cheung, Runmin Cong, Simon See, Renjie Wan |
| 2025 | StereoINR: Cross-View Geometry Consistent Stereo Super Resolution with Implicit Neural Representation. Yi Liu, Xinyi Liu, Yi Wan, Panwang Xia, Qiong Wu, Yongjun Zhang |
| 2025 | StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation. Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li |
| 2025 | StoryCrafter: Instance-Aligned Multi-Character Storytelling with Diffusion Policy Learning. Ruiqi Dong, Wenjing Pang, Chenjie Pan, Hengyang Lu, Chenyou Fan |
| 2025 | StrandDesigner: Towards Practical Strand Generation with Sketch Guidance. Na Zhang, Moran Li, Chengming Xu, Han Feng, Xiaobin Hu, Jiangning Zhang, Weijian Cao, Chengjie Wang, Yanwei Fu |
| 2025 | Streaming 3DGS Virtual Worlds in 6DoF over Next-Generation Networks. Yuan-Chun Sun |
| 2025 | StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA. Yuhang Hu, Zhenyu Yang, Shihan Wang, Shengsheng Qian, Bin Wen, Fan Yang, Tingting Gao, Changsheng Xu |
| 2025 | Streamlining Virtual KOL Generation Through Modular Generative AI Architecture. Tan-Hiep To, Duy-Khang Nguyen, Minh-Triet Tran, Trung-Nghia Le |
| 2025 | Structured Prompting and LLM Ensembling for Multimodal Conversational Aspect-based Sentiment Analysis. Zhiqiang Gao, Shihao Gao, Zixing Zhang, Yihao Guo, Hongyu Chen, Jing Han |
| 2025 | SyMuPe: Affective and Controllable Symbolic Music Performance. Ilya Borovik, Dmitrii Gavrilev, Vladimir Viro |
| 2025 | Symmetrical Awareness Generation for Pelvic Image Segmentation. Yize Song, Yunqing Chen, Zhou Wang, Cheng Chen, Ruoxiu Xiao |
| 2025 | SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning. Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim |
| 2025 | SynergyAmodal: Deocclude Anything with Text Control. Xinyang Li, Chengjie Yi, Jiawei Lai, Mingbao Lin, Yansong Qu, Shengchuan Zhang, Liujuan Cao |
| 2025 | SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models. Zheng Liu, Hao Liang, Bozhou Li, Wentao Xiong, Chong Chen, Conghui He, Wentao Zhang, Bin Cui |
| 2025 | Synthesizing 3D Scenes via Diffusion Model that Incorporates Indoor Scene Characteristics. Liang Yue, Shao-Kui Zhang, Lin Yuan, Yi-Tao Chen, Zirui Zhou, Song-Hai Zhang |
| 2025 | Synthetic-to-Real Camouflaged Object Detection. Zhihao Luo, Luojun Lin, Zheng Lin |
| 2025 | T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval. Dong Li, Yichen Niu, Ying Ai, Xiang Zou, Biqing Qi, Jianxing Liu |
| 2025 | T23D-QA: An Open Dataset and Benchmark for Text-driven 3D Generation Quality Assessment. Haohui Li, Bowen Qu, Wei Gao |
| 2025 | T2UE: Generating Unlearnable Examples from Text Descriptions. Xingjun Ma, Hanxun Huang, Tianwei Song, Ye Sun, Yifeng Gao, Yu-Gang Jiang |
| 2025 | T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval. Yili Li, Gang Xiong, Gaopeng Gou, Xiangyan Qu, Jiamin Zhuang, Zhen Li, Junzheng Shi |
| 2025 | TAMER: Interest Tree Augmented Modality Graph Recommender for Multimodal Recommendation. Fanshen Meng, Zhenhua Meng, Ru Jin, Yuli Chen, Rongheng Lin, Budan Wu |
| 2025 | TAP: Parameter-efficient Task-Aware Prompting for Adverse Weather Removal. Hanting Wang, Shengpeng Ji, Shulei Wang, Hai Huang, Xiao Jin, Qifei Zhang, Tao Jin |
| 2025 | TASR: Timestep-Aware Diffusion Model for Image Super-Resolution. Qinwei Lin, Xiaopeng Sun, Yu Gao, Yujie Zhong, Zheng Zhao, Dengjie Li, Haoqian Wang |
| 2025 | TF-ATM: Training-Free Adaptive Token Merging. Xin Zhang, Weiying Xie, Yunsong Li, Xiaoyu Chen, Tianlin Hui, Jitao Ma, Leyuan Fang |
| 2025 | TFPA: Text Features Guided Dynamic Parameter Adjustment for Few Shot Action Recognition. Hanyu Guo, Suzhou Que, Junlong Gao, Hanzi Wang |
| 2025 | TNT-GS: Truncated and Tailored Gaussian Splatting. Xiaofeng Liu, Guanchen Meng, Chongyang Feng, Risheng Liu, Zhongxuan Luo, Xin Fan |
| 2025 | TPDepth: Leveraging Text Prompts with ControlNet to Boost Diffusion-based Depth Estimation. Yu Liu, Kun Sun, Chang Tang, Yuhua Qian, Xin Li |
| 2025 | TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors. Mingwei Li, Pu Pang, Hehe Fan, Hua Huang, Yi Yang |
| 2025 | TV-RAG: A Temporal-aware and Semantic Entropy-Weighted Framework for Long Video Retrieval and Understanding. Zongsheng Cao, Yangfan He, Anran Liu, Jun Xie, Feng Chen, Zhepeng Wang |
| 2025 | TabiMed: Tabularizing Medical Images for Few-Shot In-Context Diagnosis. Wanying Zhou, Yuqi Sun, Yu Ling, Zhen Xing, Chenxi Ma, Weimin Tan, Bo Yan |
| 2025 | Tackling Device Data Distribution Real-time Shift via Prototype-based Parameter Editing. Zheqi Lv, Wenqiao Zhang, Kairui Fu, Qi Tian, Shengyu Zhang, Jiajie Su, Jingyuan Chen, Kun Kuang, Fei Wu |
| 2025 | Talk, Imagine, Evolve: A Unified Multimodal Agent for Seamless Visual Generation and Editing. Zhaofan Qiu, Zijian Gong, Yingwei Pan, Ting Yao, Tao Mei |
| 2025 | Talking Head Generation via Viewpoint and Lighting Simulation Based on Global Representation. Biao Dong, Lei Zhang |
| 2025 | Taming Anomalies with Down-Up Sampling Networks: Group Center Preserving Reconstruction for 3D Anomaly Detection. Hanzhe Liang, Jie Zhang, Tao Dai, Linlin Shen, Jinbao Wang, Can Gao |
| 2025 | Target-Guided Bayesian Flow Networks for Quantitatively Constrained CAD Generation. Wenhao Zheng, Chenwei Sun, Wenbo Zhang, Jiancheng Lv, Xianggen Liu |
| 2025 | Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts. Wenju Sun, Qingyong Li, Wen Wang, Yangliao Geng, Boyang Li |
| 2025 | Teaching AI to Feel: A Collaborative, Full-Body Exploration of Emotive Communication. Esen K. Tütüncü, Lissette Lemus, Kris Pilcher, Holger Sprengel, Jordi Sabater-Mir |
| 2025 | Team RoMa @ AADD-2025: On the Generation of Transferable and Visually Imperceptible Adversarial Attacks Against Deepfake Detectors. Nicolas Göller, Lukas Graner, Raphael Antonius Frick, Niklas Bunzel |
| 2025 | TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection. Zhiming Ma, Peidong Wang, Minhua Huang, Jinpeng Wang, Kai Wu, Xiangzhao Lv, Yachun Pang, Yin Yang, Wenjie Tang, Yuchen Kang |
| 2025 | Temporal-Conditioned Symbolic Alignment for Controllable Text-to-Music Generation. Zihao Zhang, Xingjiao Wu, Junjie Xu, Tianlong Ma, Tangren Yao, Wen Wu, Liang He |
| 2025 | Temporal-coded Spiking Transformer. Qian Sun, Chengzhuo Lu, Wenyu Chen, Wenjie Wei, Jingya Wang, Jieyuan Zhang, Xiaoli Liu, Yalan Ye, Yang Yang, Malu Zhang |
| 2025 | Tensor-based Opposing yet Complementary Learning for Multi-view Multi-label Feature Selection. Pingting Hao, Huijie Zhang, Yongshan Zhang |
| 2025 | Test-Time Adaptation for Text-Based Person Search. Kai Niu, Liucun Shi, Ke Han, Qinzi Zhao, Yue Wu, Yanning Zhang |
| 2025 | Test-Time Adaptation of Medical Vision-Language Models with Mixture of Modality Experts. Hancong Wang, Yue Yu, Hairong Zheng, Tong Zhang |
| 2025 | Test-Time Model Adaptation for Quantized Neural Networks. Zeshuai Deng, Guohao Chen, Shuaicheng Niu, Hui Luo, Shuhai Zhang, Yifan Yang, Renjie Chen, Wei Luo, Mingkui Tan |
| 2025 | Test-time Graph OOD Detection via Dynamic Dictionary Expansion and OOD Score Calibration. Yue Hou, Yingke Su, Junran Wu, Ke Xu |
| 2025 | Text Prompted Spatiotemporal Sequence Prediction with Text-Vision Prompt Refiner and Masked Diffusion Transformers. Yechao Xu, Zhengxing Sun, Qian Li, Yunhan Sun |
| 2025 | Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning. Xiangyu Wu, Feng Yu, Yang Yang, Jianfeng Lu |
| 2025 | Text-Promptable Propagation for Referring Medical Image Sequence Segmentation. Runtian Yuan, Mohan Chen, Jilan Xu, Ling Zhou, Qingqiu Li, Yuejie Zhang, Rui Feng, Tao Zhang, Shang Gao |
| 2025 | Text-Visual Semantic Constrained AI-Generated Image Quality Assessment. Qiang Li, Qingsen Yan, Haojian Huang, Peng Wu, Haokui Zhang, Yanning Zhang |
| 2025 | Text-to-Image Generation with Multi-modal Knowledge Graph Construction and Retrieval. Jiawei Meng, Zhengmao Yang, Zhiqiang Liu, Shaokai Chen, Zhizhen Liu, Wen Zhang, Huajun Chen |
| 2025 | Text2Weight: Bridging Natural Language and Neural Network Weight Spaces. Bowen Tian, Wenshuo Chen, Zexi Li, Songning Lai, Jiemin Wu, Yutao Yue |
| 2025 | TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting. Zhicong Wu, Hongbin Xu, Gang Xu, Ping Nie, Zhixin Yan, Jinkai Zheng, Liangqiong Qu, Ming Li, Liqiang Nie |
| 2025 | Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot Segmentation. Jianming Liu, Wenlong Qiu, Haitao Wei |
| 2025 | The 2025 Grand Challenge on Multimedia Verification: Foundations and Overview. Duc-Tien Dang-Nguyen, Morten Dahlback Langfeldt, Henrik Brattli Vold, Silje Førsund, Minh-Son Dao, Sohail Ahmed Khan, Kha-Luan Pham, Marc Gallofré Ocaña, Minh-Triet Tran, Anh-Duy Tran |
| 2025 | The ACM Multimedia 2025 Grand Challenge of Avatar-based Multimodal Empathetic Conversation. Han Zhang, Hao Fei, Hong Han, Lizi Liao, Erik Cambria, Min Zhang |
| 2025 | The ACM Multimedia 2025 Grand Challenge of Multimodal Conversational Aspect-based Sentiment Analysis. Meng Luo, Hao Fei, Bobo Li, Shengqiong Wu, Qian Liu, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu |
| 2025 | The ACM Multimedia 2025 Grand Challenge of Truthful and Responsible Multimodal Learning. Xudong Han, Kai Liu, Yanlin Li, Hao Li, Zheng Wang |
| 2025 | The Best is Yet to Come: Graph Convolution in the Testing Phase for Multimodal Recommendation. Jinfeng Xu, Zheyu Chen, Shuo Yang, Jinze Li, Edith C. H. Ngai |
| 2025 | The Birth of Vision Language. Aminul Islam, Md. Mustakin Alam, Shaker Islam |
| 2025 | The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding. Luca Rossetto, Werner Bailer, Duc-Tien Dang-Nguyen, Graham Healy, Björn Þór Jónsson, Onanong Kongmeesub, Hoang-Bao Le, Stevan Rudinac, Klaus Schöffmann, Florian Spiess, Allie Tran, Minh-Triet Tran, Quang-Linh Tran, Cathal Gurrin |
| 2025 | The Devil in the Stego Image: Far from Being Usable in Real-World Scenarios. Huanqi Wu, Huangbiao Xu, Xiao Ke |
| 2025 | The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework. Feiran Liu, Yuzhe Zhang, Xinyi Huang, Yinan Peng, Xinfeng Li, Lixu Wang, Yutong Shen, Ranjie Duan, Simeng Qin, Xiaojun Jia, Qingsong Wen, Wei Dong |
| 2025 | The First MPDD Challenge: Multimodal Personality-aware Depression Detection. Changzeng Fu, Zelin Fu, Qi Zhang, Xinhe Kuang, Jiacheng Dong, Kaifeng Su, Yikai Su, Wenbo Shi, Junfeng Yao, Yuliang Zhao, Shiqi Zhao, Jiadong Wang, Siyang Song, Chaoran Liu, Yuichiro Yoshikawa, Björn W. Schuller, Hiroshi Ishiguro |
| 2025 | The Overlooked Matters: Revisiting Background, Prototype, and Activation in Few-Shot Medical Image Segmentation. Yucheng Shu, Yaohui Wang, Lihong Qiao, Feiyan Li, Bin Xiao, Weisheng Li, Xinbo Gao |
| 2025 | Themis: Toward Stable Near-Zero Queuing Delay in Congestion Control for Low-Latency Interactive Video Streaming. Feida Liu, Yifan Wang, Jiaqi Zheng, Boxi Liu, Guihai Chen |
| 2025 | ThermVision: Exploring FLUX for Synthesizing Hyper-Realistic Thermal Face Data and Animations via Image to Video Translation. Muhammad Ali Farooq, Waseem Shariff, Peter Corcoran |
| 2025 | Through Someone Else's Eyes: Lifelogging Meets Narrative Virtual Reality. Liang Xu, Songkai Jia, Cathal Gurrin, Monica Ward, Allie Tran |
| 2025 | Through The Mirage, Sky Meets Oculus: Rethinking Human-AI Romantic Relationships in a Posthumanist Context. Mingdong Song, Yufei Huang |
| 2025 | TiP4GEN: Text to Immersive Panorama 4D Scene Generation. Ke Xing, Hanwen Liang, Dejia Xu, YuYang Yin, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei |
| 2025 | TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos. Linli Yao, Yicheng Li, Yuancheng Wei, Lei Li, Shuhuai Ren, Yuanxin Liu, Kun Ouyang, Lean Wang, Shicheng Li, Sida Li, Lingpeng Kong, Qi Liu, Yuanxing Zhang, Xu Sun |
| 2025 | TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation. Ling You, Wenxuan Huang, Xinni Xie, Xiangyi Wei, Bangyan Li, Shaohui Lin, Yang Li, Changbo Wang |
| 2025 | TimesBERT: A BERT-Style Foundation Model for Time Series Understanding. Haoran Zhang, Yong Liu, Yunzhong Qiu, Haixuan Liu, Zhongyi Pei, Jianmin Wang, Mingsheng Long |
| 2025 | TinyServe: Query-Aware Cache Selection for Efficient LLM Serving. Dong Liu, Yanxuan Yu |
| 2025 | To Advance People's Well-Being: Human health sensing, analysis, and applications. Terumi Umematsu |
| 2025 | To Remember, To Adapt, To Preempt: A Stable Continual Test-Time Adaptation Framework for Remote Physiological Measurement in Dynamic Domain Shifts. Shuyang Chu, Jingang Shi, Xu Cheng, Haoyu Chen, Xin Liu, Jian Xu, Guoying Zhao |
| 2025 | TolerantECG: A Foundation Model for Imperfect Electrocardiogram. Huynh Dang Nguyen, Trong-Thang Pham, Ngan Le, Van Nguyen |
| 2025 | TongGu-VL: Advancing Visual-Language Understanding in Chinese Classical Studies through Parameter Sensitivity-Guided Instruction Tuning. Jiahuan Cao, Yang Liu, Peirong Zhang, Yongxin Shi, Kai Ding, Lianwen Jin |
| 2025 | Topic Guided Multi-faceted Semantic Disentanglement for CTR prediction. Fengxin Li, Zhiqian Yin, Hongyan Liu, Jingcai Guo, Jun He, Yi Li, Chao Zhou, Jun Zhang, Haijie Gu |
| 2025 | TopoImages: Incorporating Local Topology Encoding into Deep Learning Models for Medical Image Classification. Pengfei Gu, Hongxiao Wang, Yejia Zhang, Huimin Li, Chaoli Wang, Danny Chen |
| 2025 | Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation. Zhenghao Zhang, Junchao Liao, Xiangyu Meng, Long Qin, Weizhi Wang |
| 2025 | Toward Fast and Exact Machine Learning Platform for Big Data. Yasuhiro Fujiwara |
| 2025 | Toward Reliable Emotion Recognition: Alleviating Label Noise and Reducing Uncertain Prediction. Chengzhe Wang, Wenqing Ji, Chenyang Li, Tongjie Pan, Yalan Ye |
| 2025 | Toward Robust Deepfake Detection: A Proactive Method Based on Watermarking and Knowledge Distillation. Chunpeng Wang, Wenlong Ma, Li Zou, Zhiqiu Xia, Qi Li, Bin Ma, Yunan Liu |
| 2025 | Toward Robust Signed Graph Learning through Joint Input-Target Denoising. Junran Wu, Beng Chin Ooi, Ke Xu |
| 2025 | Toward a Training-Free Plug-and-Play Refinement Framework for Infrared and Visible Image Registration and Fusion. Yating Liu, Yang Zou, Xingyuan Li, Xingyue Zhu, Kaiqi Han, Zhiying Jiang, Long Ma, Jinyuan Liu |
| 2025 | Towards Blind Bitstream-corrupted Video Recovery: A Visual Foundation Model-driven Framework. Tianyi Liu, Kejun Wu, Chen Cai, Yi Wang, Kim-Hui Yap, Lap-Pui Chau |
| 2025 | Towards Consumer-Grade Cybersickness Prediction: Multi-Model Alignment for Real-Time Vision-Only Inference. Yitong Zhu, Zhuowen Liang, Yiming Wu, Tangyao Li, Yuyang Wang |
| 2025 | Towards Culturally Fair Multimodal Generation: Quantifying and Mitigating Orientalist Biases in Text-to-Visual Models. Yifan Zeng, Fangzhou Dong, Jian Zhao, Peijia Zheng, Jian Li, Huiyu Zhou |
| 2025 | Towards Effective Open-set Graph Class-incremental Learning. Jiazhen Chen, Zheng Ma, Sichao Fu, Mingbin Feng, Tony S. Wirjanto, Weihua Ou |
| 2025 | Towards Explainable Fake Image Detection with Multi-Modal Large Language Models. Yikun Ji, Yan Hong, Jiahui Zhan, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang |
| 2025 | Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis. Miaosen Luo, Yuncheng Jiang, Sijie Mai |
| 2025 | Towards Explainable Partial-AIGC Image Quality Assessment. Jiaying Qian, Ziheng Jia, Zicheng Zhang, Zeyu Zhang, Guangtao Zhai, Xiongkuo Min |
| 2025 | Towards Fine-Grained Human Motion Video Captioning. Guorui Song, Guocun Wang, Zhe Huang, Jing Lin, Xuefei Zhe, Jian Li, Haoqian Wang |
| 2025 | Towards Generalized Physical Occlusion Detection On Documents. Yiang Zhu, Haoyue Wang, Zhenxing Qian, Sheng Li, Xinpeng Zhang, Jian Liu |
| 2025 | Towards Good Generalizations for Diffusion Generated Image Detection Using Multiple Reconstruction Contrastive Learning. Wanyi Zhuang, Qi Chu, Tao Gong, Changtao Miao, Nenghai Yu |
| 2025 | Towards Harmless Multimodal Assistants with Blind Preference Optimization. Yongqi Li, Lu Yang, Jian Wang, Runyang You, Wenjie Li, Liqiang Nie |
| 2025 | Towards Hazardous Activity Recognition for A Novel Real-World Dataset. Shehzad Ali, Md Tanvir Islam, Ik Hyun Lee, Mingfu Xiong, Minh-Son Dao, Saeed Anwar, Sambit Bakshi, Khan Muhammad |
| 2025 | Towards High Robust Vision-Language Large Models: Benchmark and Method. Minyi Zhao, Yi Liu, Wensong He, Bingzhe Yu, Yuxi Mi, Shuigeng Zhou |
| 2025 | Towards Measuring and Modeling Geometric Structures in Time Series Forecasting via Image Modality. Mingyang Yu, Xiahui Guo, Peng Chen, Zhenkai Li, Yang Shu |
| 2025 | Towards Modality Generalization: A Benchmark and Prospective Analysis. Xiaohao Liu, Xiaobo Xia, Zhuo Huang, See-Kiong Ng, Tat-Seng Chua |
| 2025 | Towards Multi-Scenario Forecasting of Building Electricity Loads with Multimodal Data. Yongzheng Liu, Siru Zhong, Gefeng Luo, Weilin Ruan, Yuxuan Liang |
| 2025 | Towards Open-world Generalized Deepfake Detection: General Feature Extraction via Unsupervised Domain Adaptation. Midou Guo, Qilin Yin, Wei Lu, Xiangyang Luo |
| 2025 | Towards Perfection: Building Inter-component Mutual Correction for Retinex-based Low-light Image Enhancement. Luyang Cao, Han Xu, Jian Zhang, Lei Qi, Jiayi Ma, Yinghuan Shi, Yang Gao |
| 2025 | Towards Robust Multimodal Domain Generalization via Modality-Domain Joint Adversarial Training. Hongzhao Li, Hualei Wan, Liangzhi Zhang, Mingyuan Jiu, Shupan Li, Mingliang Xu, Muhammad Haris Khan |
| 2025 | Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion. Zongye Zhang, Bohan Kong, Qingjie Liu, Yunhong Wang |
| 2025 | Towards Space and Semantics: Object-Purified Representation Learning for Multi-Label Image Classification. Haifeng Zhao, Shuo Xu, Leilei Ma, Yufei Zhang, Lei Wang, Dengdi Sun |
| 2025 | Towards Temporal-Aware Multi-Modal Retrieval Augemented Generation in Finance. Fengbin Zhu, Junfeng Li, Liangming Pan, Wenjie Wang, Fuli Feng, Chao Wang, Huanbo Luan, Tat-Seng Chua |
| 2025 | Towards Training-Free Open-World Classification with 3D Generative Models. Xinzhe Xia, Weiguang Zhao, Yuyao Yan, Guanyu Yang, Rui Zhang, Kaizhu Huang, Xi Yang |
| 2025 | Towards Universal Perception through Language-Guided Open-World Object Detection. Zihan Wang, Yunhang Shen, Yuan Fang, Zuwei Long, Ke Li, Xing Sun, Jiao Xie, Shaohui Lin |
| 2025 | Towards a Global Spatial-Temporal Food Memory: A Vision for Privacy-Preserving Collaborative Multimedia Analysis. Zhihao Hao, Bob Zhang, Haisheng Li |
| 2025 | Towards a New Paradigm of Visual Signal Compression. Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin |
| 2025 | Towards a Universal Query Representation for Multimodal Information Retreival. Luca Rossetto, Heiko Schuldt, Ralph Gasser |
| 2025 | Toxicity Begets Toxicity: Unraveling Conversational Chains in Political Podcasts. Naquee Rizwan, Nayandeep Deb, Sarthak Roy, Vishwajeet Singh Solanki, Kiran Garimella, Animesh Mukherjee |
| 2025 | Tractography-Guided Dual-Label Collaborative Learning for Multi-Modal Cranial Nerves Parcellation. Lei Xie, Junxiong Huang, Yuanjing Feng, Qingrun Zeng |
| 2025 | Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs. Shaohui Dai, Yansong Qu, Zheyan Li, Xinyang Li, Shengchuan Zhang, Liujuan Cao |
| 2025 | Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent Behaviors. Jia Li, Yichao He, Jiacheng Xu, Tianhao Luo, Zhenzhen Hu, Richang Hong, Meng Wang |
| 2025 | Transfer Attack for Bad and Good: Explain and Boost Adversarial Transferability across Multimodal Large Language Models. Hao Cheng, Erjia Xiao, Jiayan Yang, Jinhao Duan, Yichi Wang, Jiahang Cao, Qiang Zhang, Le Yang, Kaidi Xu, Jindong Gu, Renjing Xu |
| 2025 | Transform your Smartphone in a Real-time Sonagram Player. Jean-Denis Durou, Jean Mélou, Yvain Quéau, Gilles Azzaro, Hugo Pauget Ballesteros, Gabriel Gournay, Achille Jeanvoine, Clément Lacire, Floriane Payen, Julie Remenant |
| 2025 | Tree of Prompts: Aligning Hierarchical Visual Prior for Continual Generalized Category Discovery. Yiqing Hao, Yangru Huang, Yi Jin, Tao Wang, Yidong Li, Yigang Cen |
| 2025 | Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree. Qi Peng, Jialin Cui, Jiayuan Xie, Yi Cai, Qing Li |
| 2025 | TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP. Fan Li, Zanyi Wang, Zeyi Huang, Guang Dai, Jingdong Wang, Mengmeng Wang |
| 2025 | TriGS: Tri-consistency 3D Gaussian Splatting from Sparse and Unposed Views. Chi Huang, Qi Zhang, Qian Zhang, Nan Li, Yipu Gong, Xiaowei Wang, Wei Feng |
| 2025 | TrueCount: Improving Open-World Object Counting with Visual-Language Models and Dynamic Multi-Modal Inputs. Ziqiang Shi, Rujie Liu, Jun Takahashi, Shan Jiang |
| 2025 | TrustCLIP: Learning from Noisy Labels via Semantic Label Verification and Trust-aligned Gradient Projection. Xueyi Zhang, Peiyin Zhu, Yuan Liao, Xiyu Wang, Mingrui Lao, Siqi Cai, Yanming Guo, Haizhou Li |
| 2025 | Trusted Open-World Multi-View Classification with Dynamic Opinion Aggregation. Zhicheng Dong, Xiaodong Yue, Yufei Chen, Yuxian Zhou |
| 2025 | Try Harder: Hard Sample Generation and Learning for Cloth-Changing Person Re-ID. Hankun Liu, Yujian Zhao, Guanglin Niu |
| 2025 | Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment Retrieval. Xiang Fang, Wanlong Fang, Wei Ji, Tat-Seng Chua |
| 2025 | Twin Co-Adaptive Dialogue for Progressive Image Generation. Jianhui Wang, Yangfan He, Yan Zhong, Xinyuan Song, Jiayi Su, Yuheng Feng, Ruoyu Wang, Hongyang He, Wenyu Zhu, Xinhang Yuan, Miao Zhang, Keqin Li, Jiaqi Chen, Tianyu Shi, Xueqian Wang |
| 2025 | Two-Stage Approach Using Pretrained Language Models for Question Answering on Japanese Document Images. Mizuki Yamano, Keito Fukuoka, Hisashi Miyamori |
| 2025 | Two-View Correspondence Pruning via Channel-Spatial Interaction and Bidirectional Consensus Interaction. Xiangui Huang, Taotao Lai, Yizhang Liu, Shuyuan Lin, Zuoyong Li |
| 2025 | Tyee: A Unified, Modular, and Fully-Integrated Configurable Toolkit for Intelligent Physiological Health Care. Tao Zhou, Lingyu Shu, Zixing Zhang, Jing Han |
| 2025 | U-MERE: Unconstrained Multimodal Entity and Relation Extraction with Collaborative Modeling and Order-Sensitive Optimization. Wei Jia, Li Jin, Kaiwen Wei, Yuying Shang, Nayu Liu, Zhicong Lu, Qing Liu, Linhao Zhang, Jiang Zhong, Yanfeng Hu |
| 2025 | UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents. Jianqiang Xiao, Yuexuan Sun, Yixin Shao, Boxi Gan, Rongqiang Liu, Yanjin Wu, Weili Guan, Xiang Deng |
| 2025 | UEMM-Air: Enable UAVs to Undertake More Multi-modal Tasks. Liang Yao, Fan Liu, Shengxiang Xu, Chuanyi Zhang, Shimin Di, Xing Ma, Jianyu Jiang, Zequan Wang, Jun Zhou |
| 2025 | UIS-Mamba: Exploring Mamba for Underwater Instance Segmentation via Dynamic Tree Scan and Hidden State Weaken. Runmin Cong, Zongji Yu, Hao Fang, Haoyan Sun, Sam Kwong |
| 2025 | UMSD: High Realism Motion Style Transfer via Unified Mamba-based Diffusion. Ziyun Qian, Zeyu Xiao, Xingliang Jin, Dingkang Yang, Mingcheng Li, Zhenyi Wu, Dongliang Kou, Peng Zhai, Lihua Zhang |
| 2025 | UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban Scenarios. Debora Russo, Nicola Mazzocca, Valeria Vittorini |
| 2025 | UVG-CWI-DQPC: Dual-Quality Point Cloud Dataset for Volumetric Video Applications. Guillaume Gautier, Xuemei Zhou, Thong Nguyen, Jack Jansen, Louis Fréneau, Marko Viitanen, Uyen Phan, Jani Käpylä, Irene Viola, Alexandre Mercat, Pablo César, Jarno Vanne |
| 2025 | UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space. Yong Liu, Jinshan Pan, Yinchuan Li, Qingji Dong, Chao Zhu, Yu Guo, Fei Wang |
| 2025 | Uncertainty-Guided Face Matting for Occlusion-Aware Face Transformation. Hyebin Cho, Jaehyup Lee |
| 2025 | Understand, Refine and Summarize: Multi-View Knowledge Progressive Enhancement Learning for Fake News Video Detection. Zhi Zeng, Jiaying Wu, Minnan Luo, Xiangzheng Kong, Zihan Ma, Guang Dai, Qinghua Zheng |
| 2025 | Uni-DocDiff: A Unified Document Restoration Model Based on Diffusion. Fangmin Zhao, Weichao Zeng, Zhenhang Li, Dongbao Yang, Binbin Li, Xiaojun Bi, Yu Zhou |
| 2025 | Uni-Layout: Integrating Human Feedback in Unified Layout Generation and Evaluation. Shuo Lu, Yanyin Chen, Wei Feng, Jiahao Fan, Fengheng Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Ching Law, Jian Liang |
| 2025 | Uni-Sight: An E2E Vision-Language-Action System Unifying Multi-View Alignment and Multi-Modal Fusion. Daixun Li, Sibo He, Jiayun Tian, Yusi Zhang, Weiying Xie, Mingxiang Cao, Donglai Liu, Zirui Li, Tianlin Hui, Rui Huang, Yunsong Li |
| 2025 | UniAD: Integrating Geometric and Semantic Cues for Unified Anomaly Detection. Xiaodong Wang, Hongmin Hu, Fei Yan, Junwen Lu, Zhiqiang Zeng, Weidong Hong, Zhedong Zheng |
| 2025 | UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing. Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, Jiang Bian |
| 2025 | UniEmotion: A Unified Framework for Multimodal Emotion Recognition with Iterative Consensus-based Training. Yanjie Sun, Wuyang Chen, Yong Dou |
| 2025 | UniFlowRestore: A General Video Restoration Framework via Flow Matching and Prompt Guidance. Shuning Sun, Yu Zhang, Chen Wu, Dianjie Lu, Guijuan Zhang, Yang Wen, Zhuoran Zheng |
| 2025 | UniMTR: Unified Recognition of Dual-style Traditional Mongolian Scripts via Contrastive Representation Alignment. Chenyang Zhou, Monghjaya Ha, Chao Tang, Licheng Wu |
| 2025 | UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning. Maoxun Yuan, Bo Cui, Tianyi Zhao, Jiayi Wang, Shan Fu, Xue Yang, Xingxing Wei |
| 2025 | UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models. Jinke Li, Jiarui Yu, Chenxing Wei, Hande Dong, Qiang Lin, Liangjing Yang, Zhicai Wang, Yanbin Hao |
| 2025 | UniTalker: Conversational Speech-Visual Synthesis. Yifan Hu, Rui Liu, Yi Ren, Xiang Yin, Haizhou Li |
| 2025 | Unicorn: Unified Neural Image Compression with One Number Reconstruction. Qi Zheng, Haozhi Wang, Zihao Liu, Jiaming Liu, Zhijian Hao, Bu Chen, Min Li, Rui Wan, Peiye Liu, Yanheng Lu, Dimin Niu, Jinjia Zhou, Minge Jing, Yibo Fan |
| 2025 | Unified Dual-Strategy Framework for Multi-Task Visual Question Answering. Shuoping Yang, Jun Yu |
| 2025 | Unified Medical Image Segmentation with State Space Modeling Snake. Ruicheng Zhang, Haowei Guo, Kanghui Tian, Jun Zhou, Mingliang Yan, Zeyu Zhang, Shen Zhao |
| 2025 | Universally Unfiltered and Unseen: Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards. Song Yan, Hui Wei, Jinlong Fei, Guoliang Yang, Zhengyu Zhao, Zheng Wang |
| 2025 | Unknown Pixel Mask Based Fine-tuning of 2D Inpainting Models for Unbounded 3D Scene Generation from a Single Image. Dezhi Zheng, Kaijun Deng, Xianxu Hou, Jinbao Wang, Xiaoqin Wang, Linlin Shen |
| 2025 | Unleashing the Power of Data Generation in One-Pass Outdoor LiDAR Localization. Yidong Chen, Qi Li, Yuyang Yang, Wen Li, Sheng Ao, Cheng Wang |
| 2025 | Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos. Sarmistha Das, R. E. Zera Marveen Lyngkhoi, Sriparna Saha, Alka Maurya |
| 2025 | Unlocking Joint Image Deraining and Low-Light Enhancement: Benchmark and Baseline. Liang Cheng, Hao Wang, Chenwei Wu, Haochen You, Xianhao Wu |
| 2025 | Unsupervised Cross-Modal Person Search via Progressive Diverse Text Generation. Feng Chen, Jielong He, Yang Liu, Heng Liu, Zhe Chen, Yaxiong Wang |
| 2025 | Unsupervised Cross-view Message Passing Method for Multi-view Graph Clustering. Ziming Quan, Penglei Wang, Danyang Wu, Jin Xu |
| 2025 | Unsupervised Dual-Domain Memory Model for Time Series Anomaly Detection. Mingle Zhou, Xingli Wang, Jiachen Li, Delong Han, Gang Li |
| 2025 | Unsupervised Ego- and Exo-centric Dense Procedural Activity Captioning via Gaze Consensus Adaptation. Zhaofeng Shi, Heqian Qiu, Lanxiao Wang, Qingbo Wu, Fanman Meng, Hongliang Li |
| 2025 | Unsupervised Similarity-Fusion Transformer Hashing for Multimodal Retrieval. Zhan Yang, Binghong Chen, Jiajun Tang, Yinan Li |
| 2025 | Unveiling Open-set Noise: Theoretical Insights into Label Noise. Chen Feng, Nicu Sebe, Georgios Tzimiropoulos, Miguel R. D. Rodrigues, Ioannis Patras |
| 2025 | Unveiling the Impact of Multi-modal Content in Multi-modal Recommender Systems. Guipeng Xv, Xinyu Li, Yi Liu, Chen Lin, Xiaoli Wang |
| 2025 | VAEmo: Efficient Representation Learning for Visual-Audio Emotion With Knowledge Injection. Hao Cheng, Zhiwei Zhao, Yichao He, Zhenzhen Hu, Jia Li, Meng Wang, Richang Hong |
| 2025 | VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence. Chenhui Qiang, Zhaoyang Wei, Xumeng Han, Zipeng Wang, Siyao Li, Xiangyuan Lan, Jianbin Jiao, Zhenjun Han |
| 2025 | VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control. Lifeng Lin, Rongfeng Lu, Quan Chen, Haofan Ren, Ming Lu, Yaoqi Sun, Chenggang Yan, Anke Xue |
| 2025 | VIDEA-8K-60FPS Dataset: 8K 60FPS Video Sequences for Analysis and Development. Tariq Al Shoura, Ali Mollaahmadi Dehaghi, Reza Razavi, Mohammad Moshirpour |
| 2025 | VIHand: Enhancing 3D Hand Pose Estimation with Visual-Inertial Benchmark. Xinyi Wang, Pengfei Ren, Haoyang Zhang, Xin Sheng, Da Li, Liang Xie, Yue Gao, Erwei Yin |
| 2025 | VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference. Pengfei Jiang, Hanjun Li, Linglan Zhao, Fei Chao, Ke Yan, Shouhong Ding, Rongrong Ji |
| 2025 | VL-DynaRefine: A Vision-Language Dynamic Refinement Approach for Visual Reasoning. Jing Ma, Haochen Sun, Zeyuan Zang, Fangxiang Feng, Caixia Yuan, Lei Ren, Huixing Jiang, Wei Chen, Xiaojie Wang |
| 2025 | VLHP: Learning Discriminative Vision-Language Hybrid Prototypes for Weakly Supervised Semantic Segmentation. Jingyuan Fang, Yang Ning, Xiushan Nie, Xinfeng Liu, Zhiyong Cheng |
| 2025 | VLM-based Prompts as the Optimal Assistant for Unpaired Histopathology Virtual Staining. Zizhi Chen, Xinyu Zhang, Minghao Han, Yizhou Liu, Ziyun Qian, Weifeng Zhang, Xukun Zhang, Jingwei Wei, Lihua Zhang |
| 2025 | VLMPlanner: Integrating Visual Language Models with Motion Planning. Zhipeng Tang, Sha Zhang, Jiajun Deng, Chenjie Wang, Guoliang You, Yuting Huang, Xinrui Lin, Yanyong Zhang |
| 2025 | VLN-ChEnv: Vision-language Navigation in Changeable Environments. Shubo Liu, Hongsheng Zhang, Qian Qiao, Qi Wu, Peng Wang |
| 2025 | VQA Ziheng Jia, Zicheng Zhang, Jiaying Qian, Haoning Wu, Wei Sun, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Xiongkuo Min |
| 2025 | VRMusicStage: A System for Converting Fixed-Camera Music Stage Videos into Immersive VR Content. Seungkyu Leem, Seokhyun Jeong, Yeonho Cho, Yoonjae Lee, Jungjin Lee |
| 2025 | VSumMamba: Mamba Empowered Efficient Video Summarization with Multi-Scale Spatial-Temporal Modeling. Yamiao Ding, Tianrui Liu, Zhizhou Lu, Jun-Jie Huang, Wentao Zhao, Xinwang Liu, Meng Wang |
| 2025 | VaF-LangSplat: Voxel-Aware Fusion Language Gaussian Splatting. Changzhou Li, Xinyu Yang, Weiguo Yang, Xinyi Li |
| 2025 | Valor32k-AVQA v2.0: Open-Ended Audio-Visual Question Answering Dataset and Benchmark. Ines Riahi, Abduljalil Radman, Zixin Guo, Rachid Hedjam, Jorma Laaksonen |
| 2025 | Vector-Quantized Vision Foundation Models for Object-Centric Learning. Rongzhen Zhao, Vivienne Huiling Wang, Juho Kannala, Joni Pajarinen |
| 2025 | Venus: Generating Large-scale mmWave Radar Data via Few 2D Videos for Gesture Recognition While Lying Down. Yue Ling, Dong Zhao, Kaikai Deng, Kangwen Yin, Zixiao He, Yizong Wang, Huadong Ma |
| 2025 | Versatile Multimodal Controls for Expressive Talking Human Animation. Zheng Qin, Ruobing Zheng, Yabing Wang, Tianqi Li, Zixin Zhu, Sanping Zhou, Ming Yang, Le Wang |
| 2025 | ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models. Yongheng Zhang, Xu Liu, Ruihan Tao, Qiguang Chen, Hao Fei, Wanxiang Che, Libo Qin |
| 2025 | ViTraj: Learning Dual-Side Representations for Vehicle-Infrastructure Cooperative Trajectory Prediction. Shengzhe You, Libo Weng, Fei Gao |
| 2025 | VibeSpace: Automatic Generation of Data and Vector Embeddings for Arbitrary Domains and Cross-domain Mappings using LLMs. Kipp Freud, Daniel E. Collins, Delmiro D. Sampaio Neto, Grant Stevens |
| 2025 | VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition. Zhuming Wang, Yihao Zheng, Jiarui Li, Yaofei Wu, Yan Huang, Zun Li, Lifang Wu, Liang Wang |
| 2025 | VidIQ: Inference-Aware Neural Codecs for Quality-Enhanced, Real-Time Video Analytics. Andong Zhu, Sheng Zhang, Xiaohang Shi, Hesheng Sun, Yu Liang, Zhuzhong Qian, Han Zheng, Xiaokun Wang, Ning Jiang |
| 2025 | Video Content Restoration in the Wild: Challenges and Opportunities. Guan-Ming Su |
| 2025 | Video Instance Segmentation by Weighted Structure Inference. Zheyun Qin, Deng Yu, Yang Shi, Qiangchang Wang, Zhumin Chen |
| 2025 | Video Lecture Analysis Toolkit: An Open-Source Framework for Interactive Learning. Travis Seng, Axel Carlier, Wei Tsang Ooi |
| 2025 | Video Question Answering and Beyond. Yicong Li, Junbin Xiao, Angela Yao, Tat-Seng Chua |
| 2025 | Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought. Shuyi Zhang, Xiaoshuai Hao, Yingbo Tang, Lingfeng Zhang, Pengwei Wang, Zhongyuan Wang, Hongxuan Ma, Shanghang Zhang |
| 2025 | Video-Level Multimodal Relation Extraction with Event-Entity Semantic Consistency. Zefan Zhang, Weiqi Zhang, Kailong Suo, Yanhui Li, Tian Bai |
| 2025 | Video-based Transparent Object Segmentation via Temporal Feature Aggregation. Zhen Wang, Dongyuan Li, Yaozu Wu, Peide Zhu, Shiyin Tan, Renhe Jiang |
| 2025 | Video-to-Image Affordance Grounding via Visual Conceptual Learning. Zhiyuan Fan, Keyi Liang |
| 2025 | VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering. Yiran Meng, Junhong Ye, Wei Zhou, Guanghui Yue, Xudong Mao, Ruomei Wang, Baoquan Zhao |
| 2025 | ViewGauss: A Head Movement Dataset for 6DoF Gaussian Splatting Video Viewing. Zhixia Zhao, Qiyue Li, Jie Li, Richang Hong, Zhi Liu |
| 2025 | ViewSparsifier: Killing Redundancy in Multi-View Plant Phenotyping. Robin-Nico Kampa, Fabian Deuser, Konrad Habel, Norbert Oswald |
| 2025 | VisAug: Facilitating Speech-Rich Web Video Navigation and Engagement with Auto-Generated Visual Augmentations. Baoquan Zhao, Xiaofan Ma, Qianshi Pang, Ruomei Wang, Fan Zhou, Shujin Lin |
| 2025 | Vision Transformer with Sparse Scan Prior. Yuguang Zhang, Qihang Fan, Huaibo Huang |
| 2025 | Visual Context Window Extension: A New Perspective for Long Video Understanding. Hongchen Wei, Zhenzhong Chen |
| 2025 | Visual Grounding with Attention-Driven Constraint Balancing. Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan |
| 2025 | Visual Instance-aware Prompt Tuning. Xi Xiao, Yunbei Zhang, Xingjian Li, Tianyang Wang, Xiao Wang, Yuxiang Wei, Jihun Hamm, Min Xu |
| 2025 | Visual Localization using Hybrid Feature Grid and Learned Weighted Global Point Cloud. Junyi Wang, Yue Qi |
| 2025 | Visual Perception Uncertainty Learning for Hallucination Detection in Large Vision-Language Models. Runze Zhao, Fuqing Zhu, Jizhong Han, Songlin Hu |
| 2025 | Visual-Enhanced Multimodal Framework for Flexible Job Shop Scheduling Problem. Peng Zhao, Zhiguang Cao, Di Wang, Wen Song, Wei Pang, You Zhou, Yuan Jiang |
| 2025 | Visual-informed Silent Video Identity Conversion. Yifan Liu, Yu Fang, Zhouhan Lin |
| 2025 | WFF: Wavelet-based Information Fusion for Multimodal Knowledge Graph Link Prediction. Xiaodi Xu, Lijie Li, Ye Wang, Tao Ren, Tian Qiao |
| 2025 | WMamba: Wavelet-based Mamba for Face Forgery Detection. Siran Peng, Tianshuo Zhang, Li Gao, Xiangyu Zhu, Haoyuan Zhang, Kai Pang, Zhen Lei |
| 2025 | WYSIWYG: What You See Is Where Your Gaze. Raphaëlle Lemaire, Azamat Kaibaldiyev, Eléonore Mariette, Débora Viglieri, Alexis Lechervy, Fabrice Maurel, Gaël Dias, Jérémie Pantin, Gaëtane Blaizot, Véronique Agin, Nicolas Poirel, Eric Bui, Hervé Platel, Denis Vivien, Youssef Chahir |
| 2025 | Walking-with-Portals vs. Teleport in VR: Why Walking and Portals Matter in Small Spaces. Ana Rita Rebelo, Pedro A. Ferreira, André Tomás Ribeiro, Rui Nóbrega |
| 2025 | Watch, Skip, Repeat: Hotspot-Aware Joint Optimization for Video Streaming. Daoxu Sheng, Qi Qi, Jingyu Wang, Jianxin Liao |
| 2025 | WaveCL: Wavelet Calibration Learning for Referring Video Object Segmentation. Ran Chen, Taiyi Su, Hanli Wang |
| 2025 | Wavelet-GS: 3D Gaussian Splatting with Wavelet Decomposition. Beizhen Zhao, Yifan Zhou, Sicheng Yu, Zijian Wang, Hao Wang |
| 2025 | Waymo-3DSkelMo: A Multi-Agent 3D Skeletal Motion Dataset for Pedestrian Interaction Modeling in Autonomous Driving. Guangxun Zhu, Shiyu Fan, Hang Dai, Edmond S. L. Ho |
| 2025 | Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion. Sha Zhao, Song Yi, Yangxuan Zhou, Jiadong Pan, Jiquan Wang, Jie Xia, Shijian Li, Shurong Dong, Gang Pan |
| 2025 | WeatherBench: A Real-World Benchmark Dataset for All-in-One Adverse Weather Image Restoration. Qiyuan Guan, Qianfeng Yang, Xiang Chen, Tianyu Song, Guiyue Jin, Jiyu Jin |
| 2025 | WetCat: Enabling Automated Skill Assessment in Wet-Lab Cataract Surgery Videos. Negin Ghamsarian, Raphael Sznitman, Klaus Schoeffmann, Jens Kowal |
| 2025 | What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation. Jianghang Lin, Yue Hu, Jiangtao Shen, Yunhang Shen, Liujuan Cao, Shengchuan Zhang, Rongrong Ji |
| 2025 | When Headlines Meet Minds: Empowering News Recommendations with Social Simulator. Yanwei Xie, Weizhi Nie, Lanjun Wang, Hongshuo Tian, Changtai Shi, An-An Liu |
| 2025 | Where Views Meet Curves: Virtual Anchors for Hyperbolic Multi-View Graph Diffusion. Jielong Lu, Zhihao Wu, Jiajun Yu, Qianqian Shen, Jiajun Bu, Haishuai Wang |
| 2025 | Where Watermark Meets Beauty: Expert-Guided Aesthetic Visible Watermarking for Digital Artworks. Changjuan Ran, Fang Liu, Runqi Fang, Xiangyu Meng, Shenglan Cui, Yunfan Ye |
| 2025 | WhiADD: Semantic-Acoustic Fusion for Robust Audio Deepfake Detection. Jianqiao Cui, Bingyao Yu, Qihao Wang, Fei Meng, Jiwen Lu |
| 2025 | Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic Recommendation. Yuli Liu, Wenjun Kong, Weizhi Ma, Cheng Luo |
| 2025 | Why is a Bird's Caption a Good Demonstration? Towards Effective Multimodal In-Context Learning without Dedicated Data. Junlin Fang, Wenya Wang, Lingli Zhang, Fengmao Lv |
| 2025 | Wild3A: Novel View Synthesis from Any Dynamic Images in Seconds. Mingrui Li, Shuhao Zhai, Zibing Zhao, Luyue Sun, Xinxiao Wang, Dong Li, Shuhong Liu, Hongyu Wang |
| 2025 | Will AI Make Agencies Obsolete? Rethinking the Future of Advertising. Aleksandr Farseev |
| 2025 | WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management. Bowen Yuan, Selena Song, Javier Fernandez, Yadan Luo, Mahsa Baktashmotlagh, Zijian Wang |
| 2025 | XReco Platform and RAI News Media Demonstrator. Roberto Iacoviello, Alberto Ciprian, Alberto Messina, Maurizio Montagnuolo, Davide Zappia |
| 2025 | Zero Matrix guided Adaptive Image Vaccine against Diffusion Model-based Multitask. Yujiang Li, Zhili Zhou, Ruohan Meng, Baowei Wang, Xiaojuan Wang, Cheng Qiao, Jiantao Zhou |
| 2025 | Zero in on the Target: A Composite Robust Model for Retrieving Information in Traffic Data to Discover Network Attacks. Ziang Li, Chengxiang Si, Zhenyu Cheng |
| 2025 | Zero-Shot Multimodal Fact-Checking with Conceptual Reasoning. Guoyi Li, Die Hu, Haozhe Li, Qirui Tang, Xiaomeng Fu, Yulei Wu, Xiaodan Zhang, Honglei Lyu |
| 2025 | Zero-shot Compositional Action Recognition with Neural Logic Constraints. Gefan Ye, Lin Li, Kexin Li, Jun Xiao, Long Chen |
| 2025 | diveXplore - An Open-Source Software for Modern Video Retrieval with Image/Text Embeddings. Mario Leopold, Farzad Tashtarian, Klaus Schoeffmann |
| 2025 | eSkinHealth: A Multimodal Dataset for Neglected Tropical Skin Diseases. Janet Wang, Xin Hu, Yunbei Zhang, Diabate Almamy, Vagamon Bamba, Konan Amos Sébastien Koffi, Koffi Aubin Yao, Zhengming Ding, Jihun Hamm, Rie Roselyne Yotsu |
| 2025 | Ægis: AI-Enhanced OSINT for Multimedia Verification. Minh-Anh Pham, Anh-Tai Pham-Nguyen, Anh-Duy Le, Duc-Tuan Luu, Thanh-Hai Tran, Anh-Duy Tran, Duc-Tien Dang-Nguyen |