In recent years, neural networks have fueled dramatic advances in image captioning. Researchers are looking for more challenging applications for computer vision and Sequence to Sequence modeling systems. They seek to describe the world in human terms. I have implemented three different architectures from simple Encoder Decoders to Transformers with Multi-Head Attention.
Datasets :-