Visual Question Answering

You can find the ipython notebooks for this project here.

Table of Content

Introduction
Demo
Model Overview
Technical Aspect
Running the app locally
Project Directory Tree
Technologies Used
To Do
Contributions / Bug
License

Introduction

A simple Flask app to generate answer given an image and a natural language question about the image. The app uses a deep learning model, trained with Tensorflow, behind the scenes.

Demo

Link - https://youtu.be/pah91J4MnzI

Model Overview

Recent developments in Deep Learning has paved the way to accomplish tasks involving multimodal learning. Visual Question Answering (VQA) is one such challenge which requires high-level scene interpretation from images combined with language modelling of relevant Q&A. Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. This is a Keras implementation of one such end-to-end system to accomplish the task.

The model architecture is based on the paper Hierarchical Question-Image Co-Attention for Visual Question Answering.

Technical Aspect

The model used in the app is trained on VQA 2.0 dataset. The accuracy of the paper on this dataset is 54%. The model used in the Flask app has an accuracy of 49.20%.

Running the app locally

The Code is written in Python 3.7. If you don't have Python installed you can find it here. If you are using a lower version of Python you can upgrade using the pip package, ensuring you have the latest version of pip.

First, clone this project to your local machine:

git clone https://github.com/arya46/VQA_HieCoAtt.git

# change the working directory
cd VQA_HieCoAtt

Then install the required packages and libraries. Run the following command:

pip install -r requirements.txt

Everything is set now. Use the following command to launch the app:

python main.py

The app will run on http://localhost:8080/ in the browser.

Project Directory Tree

├── models 
│   ├── arch.py   #contains the model final model architecture
│   └── layers.py #contains the custom layers
├── pickles 
│   ├── complete_model.h5  #the trained Keras model
│   ├── labelencoder.pkl   #LabelEncoder object
│   └── text_tokenizer.pkl #Keras tokenizer object
├── templates 
│   ├── index.html
│   └── error.html 
├── utils 
│   ├── helper_functions.py
│   └── load_pickles.py
├── LICENSE
├── README.md
├── main.py
└── requirements.txt

Technologies Used

Programming Language: Python
ML Tools/Libraries: Keras, Tensorflow, Scikit Learn, Numpy, Pandas
Web Tools/Libraries: Flask, HTML

To Do

Deployement on Heroku

Contributions / Bug

If you want to contribute to this project, or want to report a bug, kindly open an issue here.

License

LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Visual Question Answering

Table of Content

Introduction

Demo

Model Overview

Technical Aspect

Running the app locally

Project Directory Tree

Technologies Used

To Do

Contributions / Bug

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Visual Question Answering

Table of Content

Introduction

Demo

Model Overview

Technical Aspect

Running the app locally

Project Directory Tree

Technologies Used

To Do

Contributions / Bug

License