SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022
-
Updated
Nov 25, 2022 - Python
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022
Baselines for the Zero-Resources Speech Challenge using VisuallyGrounded Models of Spoken Language, 2021 edition
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data. Accepted to ICASSP 2024, Self-supervision in Audio, Speech, and Beyond (SASB) workshop.
Library for training visually-grounded models of spoken language understanding.
Code for the paper "Textual supervision for visually grounded spoken language understanding".
Code used in my Master's thesis
Add a description, image, and links to the visually-grounded-speech topic page so that developers can more easily learn about it.
To associate your repository with the visually-grounded-speech topic, visit your repo's landing page and select "manage topics."