Temporal-Relations-of-Informative-Frames-in-Action-Recognition

In this paper, to detect actions with transfer learning + RNNs we have 3 steps:

In the first step, we use a frame selection algorithm to avoid the redundancy of videos which is explained in this paper Adaptive Frame Selection In Two-Dimensional Convolutional Neural Network Action Recognition and code can be found here(Code)
In the next stage, we use this repository(Feature extraction) to extract the spatial features from each selected frame by pre-trained ResNet-50 to have one spatial feature vector for each selected frame.
In the end, we use a temporal pooling method to divide each video into 4 parts and have strong spatial-temporal feature vectors for each video; after feature extraction, the RNN models are trained to classify actions. Moreover, using LOOCV helps to have reasonable results because we evaluate and train all videos of UCF11.

Provide feedback