This is the official code for the ACL-23 paper Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling.
🎉 Visit the project page: mmre-page
Relation extraction (RE), determining the semantic relation between a pair of subject and object entities in a given text. Multimodal relation extraction has been introduced recently, where additional visual sources are added to the textual RE as enhancement to the relation inference.
Existing research on multimodal relation extraction (MRE) faces with two co-existing challenges, internal-information over-utilization and external-information under-exploitation. To combat that, we propose a novel framework that simultaneously implements the idea of internal-information screening and external information exploiting. First, we represent the fine-grained semantic structures of the input image and text with the visual and textual scene graphs, which are further fused into a unified cross-modal graph (CMG). Based on CMG, we perform structure refinement with the guidance of the graph information bottleneck principle, actively denoising the less-informative features. Next, we perform topic modeling over the input image and text, where the latent multimodal topic features are incorporated to enrich the contexts. On the benchmark MRE dataset, our system strikingly boosts the current best model by over 7 points in F1 score. With further in-depth analyses we reveal the great potentials of our method for the task.
Here is a from-scratch script for MMRE:
conda create -n mmre python=3.8
conda activate mmre
# install pytorch
conda install pytorch cudatoolkit -c pytorch -y
# install dependency
pip install -r requirements.txt
pip install -e .
The MRE dataset that we used in our experiments comes from MEGA.
To obtain the parsed textual scene graph (TSG) and visual scene graph (VSG), we provide some tools in the directory of TSG
and VSG
. Please follow the steps provided in the README.md
.
To train the model via run.sh
:
bash run.sh
or
CUDA_VISIBLE_DEVICES=2 python -u run.py \
--pretrain_name=${PRETRAIN_NAME} \
--dataset_name=${DATASET_NAME} \
--num_epochs=30 \
--batch_size=16 \
--lr_pretrained=2e-5 \
--lr_main=2e-4 \
--warmup_ratio=0.01 \
--eval_begin_epoch=10 \
--seed=1234 \
--do_train \
--max_seq=40 \
--max_obj=40 \
--beta=0.01 \
--temperature=0.1 \
--eta1=0.8 \
--eta2=0.7 \
--neighbor_num=2 \
--topic_keywords_number=10 \
--topic_number=10 \
--save_path="ckpt"
Before testing the model, you need to modify the parameter --do_train
into --do_test
:
CUDA_VISIBLE_DEVICES=2 python -u run.py \
--pretrain_name=${PRETRAIN_NAME} \
--dataset_name=${DATASET_NAME} \
--batch_size=16 \
--do_test \
--load_path="ckpt"
If you use this work, please kindly cite:
@inproceedings{WUAcl23MMRE,
author = {Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, Tat-Seng Chua},
title = {Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling},
journal = {Proceedings of the Annual Meeting of the Association for Computational Linguistics},
year = {2023},
}
This code is referred from following projects: MKGformer; contextualized-topic-models; Transformer; CLIP,
The code is released under Apache License 2.0 for Noncommercial use only.