Add ImageCoDe: Image Retrieval from Contextual Descriptions #13

xhluca · 2022-04-07T19:42:31Z

To be presented at ACL 2022. Link: https://arxiv.org/abs/2203.15867

Abstract:

The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utterance. In order to measure to what extent current vision-and-language models master this ability, we devise a new multimodal challenge, Image Retrieval from Contextual Descriptions (ImageCoDe). In particular, models are tasked with retrieving the correct image from a set of 10 minimally contrastive candidates based on a contextual description. As such, each description contains only the details that help distinguish between images. Because of this, descriptions tend to be complex in terms of syntax and discourse and require drawing pragmatic inferences. Images are sourced from both static pictures and video frames. We benchmark several state-of-the-art models, including both cross-encoders such as ViLBERT and bi-encoders such as CLIP, on ImageCoDe. Our results reveal that these models dramatically lag behind human performance: the best variant achieves an accuracy of 20.9 on video frames and 59.4 on static pictures, compared with 90.8 in humans. Furthermore, we experiment with new model variants that are better equipped to incorporate visual and temporal context into their representations, which achieve modest gains. Our hope is that ImageCoDE will foster progress in grounded language understanding by encouraging models to focus on fine-grained visual differences.

yuewang-cuhk · 2022-10-11T07:20:16Z

你好，你的邮件我已收到，我会尽快查看~~~

xhluca · 2022-12-06T18:45:04Z

@yuewang-cuhk Hi hope you are well. Have you had the chance to review this pull request?

Update README.md

4882fc9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ImageCoDe: Image Retrieval from Contextual Descriptions #13

Add ImageCoDe: Image Retrieval from Contextual Descriptions #13

xhluca commented Apr 7, 2022

yuewang-cuhk commented Oct 11, 2022 via email

xhluca commented Dec 6, 2022

Add ImageCoDe: Image Retrieval from Contextual Descriptions #13

Are you sure you want to change the base?

Add ImageCoDe: Image Retrieval from Contextual Descriptions #13

Conversation

xhluca commented Apr 7, 2022

yuewang-cuhk commented Oct 11, 2022 via email

xhluca commented Dec 6, 2022