Long Papers
SPAE: Spatial Preservation-based Autoencoder for ADHD functional brain networks modelling
Learning and Fusing Multi-Scale Representations for Accurate Arbitrary-Shaped Scene Text Recognition
ASCS-Reinforcement Learning: A Cascaded Framework for Accurate 3D Hand Pose Estimation
Joint Geometric-Semantic Driven Character Line Drawing Generation
Explaining Image Aesthetics Assessment: An Interactive Approach
AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision
MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style
Cross-View Sample-Enriched Graph Contrastive Learning Network for Personalized Micro-video Recommendation
Hypernymization of Named Entity-rich captions for grounding-based multi-modal pretraining.
Reference-Limited Compositional Zero-Shot Learning
Explicit Knowledge Integration for Knowledge-Aware Visual Question Answering about Named Entities
MuseHash: Supervised Bayesian Hashing for Multimodal Image Representation
Towards Shape-regularized Learning for Mitigating Texture Bias in CNNs
Less is More: Decoupled High-Semantic Encoding for Action Recognition
Intra-inter Modal Attention Blocks for RGB-D Semantic Segmentation
Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval
CMMT: Cross-Modal Meta-Transformer for Video-Text Retrieval
Modeling Functional Brain Networks with Multi-Head Attention-based Region-Enhancement for ADHD Classification
Learning From Expert: Vision-Language Knowledge Distillation for Unsupervised Cross-Modal Hashing Retrieval
EMP: Emotion-guided Multi-modal Fusion and Contrastive Learning for Personality Traits Recognition
We Are Not So Similar: Alleviating User Representation Collapse in Social Recommendation
Shot Retrieval and Assembly with Text Script for Video Montage Generation
A Recurrent Neural Network based Generative Adversarial Network for Long Multivariate Time Series Forecasting
Graph Interactive Network with Adaptive Gradient for Multi-Modal Rumor Detection
Integrative Multi-Modal Computing for Personal Health Navigation
Attention-based Video Virtual Try-On
Multi-granularity Separation Network for Text-Based Person Retrieval with Bidirectional Refinement Regularization
CurveSDF: Binary Image Vectorization Using Signed Distance Fields
Edge Enhanced Image Style Transfer via Transformers
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset
Symbol Location-Aware Network for Improving Handwritten Mathematical Expression Recognition
A Dual-branch Enhanced Multi-task Learning Network for Multimodal Sentiment Analysis
Not Only Generative Art: Stable Diffusion for Content-Style Disentanglement in Art Analysis
Towards Practical Consistent Video Depth Estimation
Exploration of Lightweight Single Image Denoising with Transformers and Truly Fair Training
Improving Image Encoders for General-Purpose Nearest Neighbor Search and Classification
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation
TDEC: Deep Embedded Image Clustering with Transformer and Distribution Information
TsP-Tran: Two-Stage Pure Transformer for Multi-Label Image Retrieval
Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion
Knowledge-Aware Causal Inference Network for Visual Dialog
Unlocking Potential of 3D-aware GAN for More Expressive Face Generation
Multi-channel Convolutional Neural Network for Precise Meme Classification
RIP-NeRF: Learning Rotation-Invariant Point-based Neural Radiance Field for Fine-grained Editing and Compositing
A Multi-Teacher Assisted Knowledge Distillation Approach for Enhanced Face Image Authentication
SIGMA-DF: Single-Side Guided Meta-Learning for Deepfake Detection
Label-wise Deep Semantic-Alignment Hashing for Cross-Modal Retrieval
Dual-Modality Co-Learning for Unveiling Deepfake in Spatio-Temporal Space
Dual-Stream Multimodal Learning for Topic-Adaptive Video Highlight Detection
Zero-shot Sketch-based Image Retrieval with Adaptive Balanced Discriminability and Generalizability
FaceLivePlus: A Unified System for Face Liveness Detection and Face Verification
Raising User Awareness about the Consequences of Online Photo Sharing
TAGM: Task-Aware Graph Model for Few-shot Node Classification
Predicting Tweet Engagement with Graph Neural Networks
A Robust Deep Learning Enhanced Monocular SLAM System for Dynamic Environments
Learning with Adaptive Knowledge for Continual Image-Text Modeling
FedPcf : An Integrated Federated Learning Framework with Multi-Level Prospective Correction Factor
Short Papers
Graph Contrastive Learning on Complementary Embedding for Recommendation
Recommendation of Mix-and-Match Clothing by Modeling Indirect Personal Compatibility
Strong-Weak Cross-View Interaction Network for Stereo Image Super-Resolution
Improving Generalization for Multimodal Fake News Detection
Text-to-Image Fashion Retrieval with Fabric Textures
More Than Simply Masking: Exploring Pre-training Strategies for Symbolic Music Understanding
Video Retrieval for Everyday Scenes With Common Objects
TNOD: Transformer Network with Object Detection for Tag Recommendation
MemeFier: Dual-stage modality fusion for image meme classification
Multi-view Contrastive Learning with Additive Margin for Adaptive Nasopharyngeal Carcinoma Radiotherapy Prediction
CNNs with Multi-Level Attention for Domain Generalization
Deep Enhanced-Similarity Attention Cross-modal Hashing Learning
Improving Query and Assessment Quality in Text-Based Interactive Video Retrieval Evaluation
Offensive Tactics Recognition in Broadcast Basketball Videos Based on 2D Camera View Player Heatmaps
Escaping local minima in deep reinforcement learning for video summarization
SOFA: Style-based One-shot 3D Facial Animation Driven by 2D landmarks
A Comparison of Video Browsing Performance between Desktop and Virtual Reality Interfaces
CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis
Multimodal Topic Segmentation of Podcast Shows with Pre-trained Neural Encoders
Tweaking EfficientDet for frugal training
Technical Demonstrations
CalorieCam360: Simultaneous Eating Action Recognition of Multiple People Using Omnidirectional Camera
VISIONE: A Large-Scale Video Retrieval System with Advanced Search Functionalities
MAAM: Media Asset Annotation and Management
navigu.net: NAvigation in Visual Image Graphs gets User-friendly
Cross-Language Music Recommendation Exploration
Brave New Ideas
Framing the News: From Human Perception to Large Language Model Inferences
Doctoral Symposium Papers
Dual-Path Semantic Construction Network for Composed Query-Based Image Retrieval
Reproducibility Papers
Reproducibility Companion Paper: MeTILDA – Platform for Melodic Transcription in Language Documentation and Application
ICDAR ’23 Papers
Towards Multimodal Spatio-Temporal Transformer-based Models for Congestion Prediction
Procedural Driving Skill Coaching from More Skilled Drivers to Safer Drivers: A Review
User-irrelevant Cross-domain Association Analysis for Cross-domain Recommendation with Transfer Learning
Leveraging Cross-Modals for Cheapfakes Detection
MTSS: Movie Trailers Surveillance System using Social Media Analytics and Public Mood
CG-GNN: A Novel Compiled Graphs-based Feature Extraction Method for Enterprise Social Networks
Mask-conditioned latent diffusion for generating gastrointestinal polyp images
A Joint Scene Text Recognition and Visual Appearance Model for Protest Issue Classification
Detecting Cheapfakes using Self-Query Adaptive-Context Learning
MAD ’23 Papers
Autoencoder-based Data Augmentation for Deepfake Detection
SpoTNet: A spoofing-aware Transformer Network for Effective Synthetic Speech Detection
Synthetic Misinformers: Generating and Combating Multimodal Misinformation
In the Spotlight: The Russian Government’s Use of Official Twitter Accounts to Influence Discussions About its War in Ukraine
Synthetic Speech Detection through Audio Folding
Examining European Press Coverage of the No-Vax Movement: A NLP framework
Improving Synthetically Generated Images Detection in Cross-Concept Settings
LSC’23 Papers
LifeLens: Transforming Lifelog Search with Innovative UX/UI Design
Voxento 4.0: A More Flexible Visualisation and Control for Lifelogs
E-LifeSeeker: An Interactive Lifelog Search Engine for LSC’23
MEMORIA: A Memory Enhancement and MOment RetrIeval Application for LSC 2023
MyEachtra: Event-based Interactive Lifelog Retrieval System for LSC’23
MemoriEase: An Interactive Lifelog Retrieval System for LSC’23
Multi-Mode Clustering for Graph-Based Lifelog Retrieval
Memento 3.0: An Enhanced Lifelog Search Engine for LSC’23
Lifelog Discovery Assistant: Suggesting Prompts and Indexing Event Sequences for FIRST at LSC 2023
lifeXplore at the Lifelog Search Challenge 2023
LifeInsight: An Interactive Lifelog Retrieval System with Comprehensive Spatial Insights and Query Assistance
The Best of Both Worlds: Lifelog Retrieval with a Desktop-Virtual Reality Hybrid System