YeoDong Youn, Probabilistic Representation Learning for Improved Cross-modal Retrieval Using Density-wise Similarity, Master's Thesis, Department of Industrial and Systems Engineering, KAIST, 2023
- File
- 윤여동_석사학위논문.pdf (4.8M) 7회 다운로드 DATE : 2024-01-27 16:01:38
YeoDong Youn, Probabilistic Representation Learning for Improved Cross-modal Retrieval Using Density-wise Similarity, Master's Thesis, Department of Industrial and Systems Engineering, KAIST, 2023
Abstract
For cross-modal retrieval tasks, building a joint representation space for data samples from different modalities has been a common practice especially from the vision and language domains. The two characteristics of image and caption pairs that make this task especially challenging are the multiplicity of matches and partiality of matching pairs. Given an image or a caption, there are multiple positive captions or images and for each positive image-caption pair, the captions convey only the key concepts at interest while ignoring other components. Previous researches, which are based on learning pointwise embeddings in a deterministic way, fail to capture this one-to-many correspondences nor correctly calibrate the semantic intersection between arbitrary image-caption pairs. This paper proposes a generalized method of learning the representations of images and captions as probabilistic distributions in the joint representation space and explicitly model cross-modal uncertainty with differential entropy. The probabilistic embeddings are parametrically learned by fusing a visual, text head module to a pretrained visual text encoder and trained in a two-staged manner. Through extensive qualitative experiments on MS-COCO and Flickr30K datasets, the paper demonstrates the benefit of using probabilistic representations by showing how cross-modal uncertainty can measure the multiplicity within each sample and how density-wise similarity preserves the partial similarity of each image-caption pair.
@masterthesis{Youn:2023,
author = {YeoDong Youn},
advisor ={Il-Chul Moon},
title = {Probabilistic Representation Learning for Improved Cross-modal Retrieval Using Density-wise Similarity},
school = {KAIST},
year = {2023}
}
- PreviousSuhyeon Jo, Hierarchical Multi-Label Classification from Partial Labels without Known Hierarchy, Master's Thesis, Department of Industrial and Systems Engineering, KAIST, 2023
- NextSung-Eun Kim, Receptive Field Depth Selection in Graph Neural Network for Node Classification Task, Master's Thesis, Department of Industrial and Systems Engineering, KAIST, 2023