Introduction

Partial implementation of Hierarchical Boundary-Aware Neural Encoder for Video Captioning. The C3D part is implemented but not work well.

Difference from original paper: In the original paper, the author sampled every 5 frames, while I sampled fix number of frames using np.linspace.

Requirements

Pretrained Model

VGG16 pretrained on ImageNet [PyTorch version]: https://download.pytorch.org/models/vgg16-397923af.pth
ResNet50 pretrained on ImageNet [PyTorch version]: https://s3.amazonaws.com/pytorch/models/resnet50-19c8e357.pth
C3D pretrained on Sports1M [ported from Keras]: http://imagelab.ing.unimore.it/files/c3d_pytorch/c3d.pickle

Datasets

Packages

torch
torchvision
numpy
scikit-image
nltk
h5py
pandas
future # python2 only
tensorboard_logger # for using tensorboard to view training loss

You can use:

    (sudo) pip2 install -r requirements.txt

to install all the above packages.

Usage

Preparing Data

Firstly, we should make soft links to the dataset folder and pretrained models. For example:

    mkdir datasets
    ln -s YOUR_MSVD_DATASET_PATH datasets/MSVD
    mkdir models
    ln -s YOUR_RES50_MODEL_PATH models/

somes detail can be found in args.py.

Note: If you use the MSR-VTT dataset, there are some extra steps. The MSR-VTT dataset split the train_val and test video data into two zip files, so as the annotations. So please merge the two parts of video data (annotations) into one directory (json file), and modify the msrvtt_video_root and msrvtt_anno_json_path variables in args.py

Then we can:

Prepare video feature:
```
 python2 video.py
```
Prepare caption feature and dataset split:
```
 python2 caption.py
```

Training

Before training the model, please make sure you can use GPU to accelerate computation in PyTorch. Some parameters, such as batch size and learning rate, can be found in args.py.

Train:
```
 python2 train.py
```

Evaluating

Evaluate:
```
 python2 evaluate.py best
```
Sample some examples:
```
 python2 sample.py
```

Trained Model

On MSVD: https://mega.nz/#!5pRQEaZC!zmCkfjtmqAIEMUgoT0_PFX9Ame-oNAO5SU0brIm_lqI

On MSR-VTT: https://mega.nz/#!Q0RHXYLa!2svrqHyjXaMx59aMho4GujNCnLECHyaoWnkmjHWbwUo

If you use the trained model, please make a directory named 'results', and then put the trained models into this directory.

Results

Quantity

The following table shows the performance of this implementation (using ResNet50) on MSVD and MSR-VTT dataset.

Dataset	B1	B2	B3	B4	M	Cr
MSVD	79.8	65.5	55.1	44.8	31.6	69.4
MSVD (+C3D)	79.1	65.4	54.4	43.2	30.2	61.6
MSR-VTT	78.7	63.2	49.2	36.8	26.7	41.2

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
coco-caption @ 79426e9		coco-caption @ 79426e9
img		img
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
args.py		args.py
caption.py		caption.py
data.py		data.py
evaluate.py		evaluate.py
model.py		model.py
requirements.txt		requirements.txt
sample.py		sample.py
setup.cfg		setup.cfg
train.py		train.py
utils.py		utils.py
video.py		video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Requirements

Pretrained Model

Datasets

Packages

Usage

Preparing Data

Training

Evaluating

Trained Model

Results

Quantity

Training Logs

MSVD

MSVD with C3D

MSR-VTT

About

Releases

Packages

Languages

Tsingzao/banet

Folders and files

Latest commit

History

Repository files navigation

Introduction

Requirements

Pretrained Model

Datasets

Packages

Usage

Preparing Data

Training

Evaluating

Trained Model

Results

Quantity

Training Logs

MSVD

MSVD with C3D

MSR-VTT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages