Skip to content

This repo contains source code for the MultiModal Masking (M^3) Interspeech 2021 paper.

Notifications You must be signed in to change notification settings

efthymisgeo/multimodal-masking

Repository files navigation

Description

This repo contains the source code for the INTERSPEECH 2021 paper "M3: MultiModal Masking applied to sentiment analysis".

Introduction

This paper presents M 3 , a generic light-weight layer which can be emdedded in multimodal architectrues without any modifications and without any additional learnable parameters. M 3 takes as input representations from various modalities, e.g. text, audio, visual. It then randomly either masks one of them or leaves the total representation unaffected. M 3 is applied at every time step in the multimodal sequence, acting as a form of regularization.

Prerequisites

Dependencies

Setup

  • Clone repo with CMU Multimodal SDK submodule
# git version < 2.1.2
git clone --recursive https://github.com/efthymisgeo/multimodal-masking.git

# git version > 2.1.2
git clone --recurse-submodules https://github.com/efthymisgeo/multimodal-masking.git
  • Create virtualenv and install dependencies
# Ensure your python version is >= 3.7.3

pip install poetry
poetry install
  • Download data using CMU Multimodal SDK
mkdir -p data
python cmusdk.py data/

M3 Experiments

  • Optional
poetry shell
export PYTHONPATH=$PYTHONPATH:./CMU-MultimodalSDK
  • Reproduce the result in Table 1 of the paper
python experiments/main.py --config configs/m3-rnn-hard-0.2-before.yaml --m3_sequential --m3_masking --use-mmdrop-before --gpus 1 --offline
  • Reproduce the best results, illustrated in Table 2
python experiments/main.py --config configs/m3-rnn-drop-text-0.6-hard-0.2-before.yaml --m3_sequential --m3_masking --use-mmdrop-before --gpus 1 --offline
  • For further experimentation we suggest creating custom config .yaml files under configs folder and
python experiments/main.py --config configs/<myconf.yaml> --offline --gpus 1

Reference

If you find our work useful for your research, please include the following citation

@inproceedings{georgiou21_interspeech,
  author={Efthymios Georgiou and Georgios Paraskevopoulos and Alexandros Potamianos},
  title={{M3: MultiModal Masking Applied to Sentiment Analysis}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2876--2880},
  doi={10.21437/Interspeech.2021-1739}
}

TODOs

  • Upload pickle with features

About

This repo contains source code for the MultiModal Masking (M^3) Interspeech 2021 paper.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages