Skip to content

shuuchen/video_autoencoder

Repository files navigation

video_autoencoder

Video lstm auto encoder built with pytorch. https://arxiv.org/pdf/1502.04681.pdf

This project auto-encodes videos by vectorizing consecutive screens of videos using an LSTM auto-encoder.

Training data

The training data is a collection of cow screen images sampled from some videos. Per image is sampled for every 50 frames and 6 consecutive images are used as a training sample. Since the video is 30 frames / second, one training sample is a summary of movements in 10 seconds. We tried to analyze the movements of cows by analyzing the output movement vectors of the model.

Preprocessing

The images are vectorized using some CNNs like Resnet before input to the LSTM auto-encoder. Here, the output vector of the last full connection layer of Resnet50 is used. So every image is transformed into a 2048 dimension vector firstly. The Resnet50 is also fine-tuned using the training data.

Learning curve

The auto-encoder is well trained according to the following learning curve.

Learned vector patterns

The movement of cows in consecutive screens is vectorized by the model. The changes of colors indicate different movements of cows.

t-SNE dimension reduction

Dimension reduction on the learned vectors according to different perplexity values.

Clustering of dimension reduction results

The labels of differnet colors are automatically generated by various clustering algorithms. Differnet labels correspond to differnet movements of cows.

Relations between generated labels and original image data

The second clustering of Ward column is used for analyzing the labels and their corresponding images. It can be seen that differnet labels correspond to different movements of cows.

The summary of movements can be listed as follows:

label movement
0 head and limbs
1 head and belly (including cow-birth)
2 head