| 2025 | A Comparative Study of Conversational and Conventional Search Methods for Image Retrieval. Anastasiia Potiagalova, Joemon J. Jose, Benjamin R. Cowan, Gareth J. F. Jones |
| 2025 | A Mixed-Methods Investigation of XR Security Warnings - Lessons Learned. Junyi Zou, Riccardo Bovo, Ali Hamza, Georgios Loukas |
| 2025 | A New Pipeline for Extracting and Clustering Sub-Images from Unannotated Complex Image Datasets. Chafic Abou Akar, Christian Beddawi, Marc Kamradt, Abdallah Makhoul |
| 2025 | A Survey of Information Disorder on Video-Sharing Platforms. Meiyu Li, Wei Ai, Naeemul Hassan |
| 2025 | Accelerating Vector Search at Scale: BAM-ANN with Batch-Aware Memory-Disk Hybrid Indexing. M. M. Mahabubur Rahman, Jelena Tesic |
| 2025 | AgriPotential: A Novel Multi-Spectral and Multi-Temporal Remote Sensing Dataset for Agricultural Potentials. Mohammad El Sakka, Caroline De Pourtales, Lotfi Chaâri, Josiane Mothe |
| 2025 | An Experimental Study on Generating Plausible Textual Explanations for Video Summarization. Thomas Eleftheriadis, Evlampios Apostolidis, Vasileios Mezaris |
| 2025 | Anonymisation of Visual Lifelogs using Diffusion Models and Large Language Models. Minh-Quang Le, Graham Healy, Liting Zhou, Cathal Gurrin |
| 2025 | BandNaviHD: Band-Member Backtrack Interface Based on Member History Information. Masatoshi Hamanaka |
| 2025 | Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes. Nirmal Elamon, Rouzbeh Davoudi |
| 2025 | Breaking the 2D Dependency: What Limits 3D-Only Open-Vocabulary Scene Understanding. Domenico D'Orsi, Fabio Carrara, Fabrizio Falchi, Nicola Tonellotto |
| 2025 | DSI-3D: Differentiable Search Index for Point Clouds Retrieval. Chahine-Nicolas Zede, Laurent Caraffa, Valérie Gouet-Brunet |
| 2025 | Dialogue-AV: A Dialogue-Attended Audiovisual Dataset. Luís Vilaça, Paula Viana, Yi Yu |
| 2025 | Does CLIP Perceive Art the Same Way We Do? Andrea Asperti, Leonardo Dessì, Maria Chiara Tonetti, Nico Wu |
| 2025 | Dual-Objective Adversarial Disentanglement for Protecting Speech Data used for Diagnosing Parkinson's Disease. Mehtab Ur Rahman, Martha A. Larson, Louis ten Bosch, Cristian Tejedor García |
| 2025 | ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval. Guanqi Zhan, Yuanpei Liu, Kai Han, Weidi Xie, Andrew Zisserman |
| 2025 | EcoStream: A Resource Utilization and Power Consumption Dataset in Multimedia Streaming for Sustainability Analysis. Tariq Al Shoura, Reza Razavi, Mohammad Moshirpour |
| 2025 | Enhancing Vision-Language Model Pre-Training with Image-Text Pair Pruning Based on Word Frequency. Mingliang Liang, Martha A. Larson |
| 2025 | Evaluating the Recognisability of AI-Generated Familiar Images in a Closed Environment with a Gamified Approach. Marc Gallofré Ocaña, Balázs Mosolygó, Bahareh Fatemi |
| 2025 | Examining Performance Disparities Between Expert and Novice Users in Interactive Video Retrieval. Omar Shahbaz Khan, Ujjwal Sharma, Stevan Rudinac, Björn Þór Jónsson |
| 2025 | Explanatory Interactive Machine Learning for Bias Mitigation in Visual Gender Classification. Nathanya Queby Satriani, Djordje Slijepcevic, Markus Schedl, Matthias Zeppelzauer |
| 2025 | Exploring the Effect of Size, Architecture and Fine-Tuning Hyperparameters on Large Visual-Language Model Adaptation for Video Memorability Prediction. David Luna-García, Iván Martín-Fernández, Sergio Esteban Romero, Manuel Gil-Martín, Fernando Fernández Martínez |
| 2025 | FPN-Based Multi-Scale Feature Fusion for Robust 3D Pedestrian Detection in Crowded Scenes. Kiyotaka Matsue, Kenta Umene, Nghia Dao, Hieu Nguyen, Manh Phan |
| 2025 | Facilitating Interactive Image Labelling Using Fine-Tuned SAM2. Hermann Fürntratt, Werner Bailer |
| 2025 | First-Person Human Sensing for Upper Limb Neuroprosthesis Control: 6D Pose Estimation of Objects to Grasp. Ander Etxezarreta, Jenny Benois-Pineau, Renaud Péteri, Lucas Bardisbanian, Aymar de Rugy |
| 2025 | Fusion of Global and Local Features with Multi-Inverted Indices for Image Retrieval. Li Weng, Xizhe Wang, Qianneng Wang, Bingya Wu |
| 2025 | GTR: General Handwritten Lines Text Recognition Dataset. Xu Ji, Haizhao Sun, Yu Ning, Ming Wu, Chuang Zhang |
| 2025 | GeMix: Conditional GAN-Based Mixup for Improved Medical Image Augmentation. Hugo Carlesso, Maria Eliza Patulea, Moncef Garouani, Radu Tudor Ionescu, Josiane Mothe |
| 2025 | GenFlow: Interactive Modular System for Image Generation. Duc-Hung Nguyen, Huu-Phuc Huynh, Minh-Triet Tran, Trung-Nghia Le |
| 2025 | Historical Postcard Stamp Content Understanding. Matthieu Pelingre, Salvatore Tabbone |
| 2025 | Hockey2D: A Keypoint-Based Framework for Ice Hockey Rink Localization and Object Mapping. Mehdi Houshmand Sarkhoosh, Cise Midoglu, Saeed Shafiee Sabet, Tomas Kupka, Pål Halvorsen |
| 2025 | International Conference on Content-Based Multimedia Indexing, CBMI 2025, Dublin, Ireland, October 22-24, 2025 |
| 2025 | Label-Efficient Skeleton-Based Recognition with Stable Graph Convnets. Hichem Sahbi |
| 2025 | Lip Reading Across Languages: A Cross-Modal Framework Leveraging Foundation Models. Ruxandra Tapu, Bogdan Mocanu |
| 2025 | MERCI: A Multimodal Dataset for Personalised and Emotionally-Aware Dialogues. Mohammed Althubyani, Zhijin Meng, Shengyuan Xie, Francisco Cruz, Imran Razzak, Mukesh Prasad, Eduardo B. Sandoval, A. Baki Kocaballi |
| 2025 | MI-Cap: A Multi-Modal Interpretable Model for Video Captioning. Antoine Hanna-Asaad, Decky Aspandi-Latif, Titus Zaharia |
| 2025 | MMMS: Multi-Modal Multi-Surface Interactive Segmentation. Robin Schön, Julian Lorenz, Katja Ludwig, Daniel Kienle, Rainer Lienhart |
| 2025 | MSS: A Multilingual Spoofed Speech Dataset with Code-Switching for Anti-Spoofing Measures. Muhammad Hamza, Hafsa Ilyas, Junaid Mir, Ali Javed, Muhammad Haroon Yousaf, Ahmed Zoha |
| 2025 | Masked Spikformer: Gaussian based and Random Spike Masking for Energy-Efficient Spiking Transformers. Oumaima Marsi, Sebastien Ambellouis, José Mennesson, Cyril Meurie, Anthony Fleury, Charles Tatkeu |
| 2025 | Media Search: A Multi-Stage Image Retrieval Framework with Enriched Image Captioning. Ayse Vildan Nurdag, Mete Mert Birdal, Yusuf Yazici, Baris Özcan, Erkut Arican |
| 2025 | Melanoma Segmentation with SAM-Like Models: Assessing the Influence and Limits of Bounding Box Input. Nicolas Martin, Philippe Mulhem, Jean-Pierre Chevallet |
| 2025 | Mitigating Shortcut Learning in Online Action Detection and Anticipation via Cross-Modal Semantic Alignment. Sensen Wang, Yuehu Liu, Chi Zhang |
| 2025 | Multi-modal Context Reranking for Lifelog Question Answering. Quang-Linh Tran, Ly-Duyen Tran, Binh T. Nguyen, Gareth J. F. Jones, Cathal Gurrin |
| 2025 | MultiHuSE: A Multimodal Dataset for Humour Styles and Emotions. Mary Ogbuka Kenneth, Foaad Khosmood, Abbas Edalat |
| 2025 | Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks. Jonas Geiger, Marta Moscati, Shah Nawaz, Markus Schedl |
| 2025 | NN Watermarking for Face Segmentation Task. Carl De Sousa Trias, Mihai Mitrea |
| 2025 | Novice-Friendly Video Retrieval in Mixed Reality with Vitrivr- VR. Florian Spiess, Heiko Schuldt |
| 2025 | Personalizing Retrieval Using Joint Embeddings; or "the Return of Fluffy". Bruno Korbar, Andrew Zisserman |
| 2025 | Predicting Moral Values in Lyrics Through Audio. Charalampos Saitis, Ben Heyderman, Vjosa Preniqi, Kyriaki Kalimeri, Johan Pauwels |
| 2025 | Probabilistic Fusion Model for Multi-Label Media Content Classification. Javier Carreno, Khuong An Nguyen, Zhiyuan Luo, Andrew Fish |
| 2025 | ReViewQwen: An Explainable Vision-Language Model for Discrepancy Detection in Multimodal E-Commerce Reviews. Sandeep Kalari, Mohan Sunkara, Dominik Soós, Vikas Ashok, Ravi Mukkamala |
| 2025 | Rethinking Wine Tasting for Chinese Consumers: A Service Design Approach Enhanced by Multimodal Personalization. Xinyang Shan, Yuanyuan Xu, Tian Xia, Yin-Shan Lin |
| 2025 | Robust Multimedia Verification of Cheapfakes and Deepfakes via External Context Leveraging. Minh-Nhat Nguyen, Trong-Nghia Tran, Minh-Triet Tran, Duc-Tien Dang-Nguyen, Trong-Le Do |
| 2025 | Semi-Supervised Approach to Detect Human Discontent from Real-Life Behaviour Data. Elena Vildjiounaite, Vesa Kyllönen, Johanna Kallio, Pauli Räsänen |
| 2025 | SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models. Zhengxu Tang, Zizheng Wang, Luning Wang, Zitao Shuai, Chenhao Zhang, Siyu Qian, Yirui Wu, Bohao Wang, Haosong Rao, Zhenyu Yang, Chenwei Wu |
| 2025 | SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding. Sushant Gautam, Cise Midoglu, Vajira Lasantha Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak Shah |
| 2025 | TREB: Temporal Refinement of Egocentric Body Pose. Bruno Henriques, Benjamin Allaert, Nicolas Sutton-Charani, Pierre R. L. Slangen, Jean-Philippe Vandeborre |
| 2025 | TSalV360: A Method and Dataset for Text-driven Saliency Detection in 360-Degrees Videos. Ioannis Kontostathis, Evlampios Apostolidis, Vasileios Mezaris |
| 2025 | Text-Oriented Image Query Representation for Zero-Shot Composed Image Retrieval. Pavan K. Rachabathuni, Andrea Ciamarra, Roberto Caldelli, Marco Bertini |
| 2025 | Toward Content-Based Indexing and Retrieval of Head and Neck CT With Abscess Segmentation. Thao Thi Phuong Dao, Tan-Cong Nguyen, Trong-Le Do, Truong Hoang Viet, Nguyen Chi Thanh, Huynh Nguyen Thuan, Do Vo Cong Nguyen, Minh-Khoi Pham, Mai-Khiem Tran, Viet-Tham Huynh, Trong-Thuan Nguyen, Trung-Nghia Le, Thanh-Nhan Vo, Tam V. Nguyen, Minh-Triet Tran, Thanh Dinh Le |
| 2025 | Toward an Energy-Efficient and Explainable Neural Network Architecture for Detection of Breast Cancer in Mammography. Alireza Siyavashi, Christian Herglotz |
| 2025 | Towards Graph-Based Federated Learning: ModelNet - A ResNet-based Model Classification Dataset. Abhisek Ray, Lukas Esterle |
| 2025 | TrueEar: A Lightweight and Accurate Fake Voice Detector for Mobile Devices. Cameron Baird, Ke Li, Dan Lin |
| 2025 | U-Cker: Initial Development of an Interactive Video Retrieval System for Novice Users. Kazuya Ueki, Ryo Muto, Takuya Wada, Ryota Akaba, Genesis Faith Fernandez |
| 2025 | Understanding Indoor Context in an Office Environment: An Empirical Study on Air Stuffiness Perception. Johanna Kallio, Jussi Liikka, Satu-Marja Mäkelä, Atte Kinnula, Elena Vildjiounaite |
| 2025 | Vision Projector: Improving Zero-Shot Composed Image Retrieval at Inference. Hoang-Bao Le, Allie Tran, Binh T. Nguyen, Liting Zhou, Cathal Gurrin |
| 2025 | VoiceVision: AI-Powered Speaker-Aware Cropping and Content Indexing for Multi-Speaker Videos. Mehdi Houshmand Sarkhoosh, Cise Midoglu, Saeed Shafiee Sabet, Tomas Kupka, Pål Halvorsen |
| 2025 | Zero-Shot Vision-Language Model for Event Detection in Smart Surveillance. Younes Kebour, Smaïl Niar, Nacim Ihaddadene, Abdelghani Bekrar, Hammouda Elbez |