Skip to content

An attempt at creating a chatbot utilising the Retriever-Generator approach for Open-Domain Question-Answering (QA).

Notifications You must be signed in to change notification settings

ThusharaN/SciBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A ChatBot for Science!

Python Flask Wikipedia API Sentence Transformers Transformers YAKE GitHub pull requests

About the project

The intuition behind this project was to build a chatbot that mimics a Retriever-Generator approach for Open-Domain Question-Answering (QA). Our QA pipeline, thus, unfolds in three key stages:

1. YAKE does Keyword Extraction 🔑

  • Leveraging YAKE, our Keyword Extractor, we identify important keywords from the questions.
  • Configurations, like the number of keywords and maximum words per keyword, shape the process.

2. Get more Context! 📖

  • Unearthing answers requires context. We adopt a two-pronged strategy.
  • We pinpoint Wikipedia articles related to selected keywords, essentially turning Wikipedia into our knowledge base i.e., our domain.
  • A straightforward similarity mechanism filters and merges relevant articles to craft a relevant context for our questions.

3. RoBERTa finds the Answer 🤖

  • Armed with the question and context, RoBERTa steps in to predict the answer.

Core Functions

qamodel.py

  • Defines and initializes the ScienceChatBot class with configuration data from a YAML file (using config.yaml).
  • Implements functions for keyword extraction, fetching Wikipedia articles, filtering and combining article content, and predicting answers based on user questions.

NOTE: Detailed documentation about the individual functions can be found within the qamodel.py file.

chatbot.py

  • Renders the HTML template for the chatbot interface.
  • Initializes ScienceChatBot and predicts answers for user questions using the predict_answer method.
  • Processes user input, gets the predicted answers, and returns the responses in JSON format.

index.html

  • HTML template for the custom SciBot UI.
  • Uses Internal styling to render the interface and AJAX server to send user input to and receive chatbot output from the backend

config.yaml

  • Specifies configuration settings for SciBot.

qatest.py

  • Contains unit test cases to test the basic functionalities of SciBot.

How to run?

Before diving into the project, let's set up the groundwork. First, activate the project's virtual environment using poetry:

poetry shell

Afterward, install the essential dependencies:

poetry install

With these steps complete, the project can be run using the Custom UI!

Our own Custom UI!

As a full-stack engineer, I couldn't help but include a basic UI built using Flask for a richer experience. Follow these steps to interact with the SciBot UI:

Set Flask to the app file and development environment:

export FLASK_APP=chatbot
export FLASK_ENV=development

Then, launch the Flask app:

flask run

The app will be up and running on the server http://127.0.0.1:5000/

A demo featuring the Custom UI: 6d57872a-19fb-4216-8e22-cf2fb46a7a3c (The gif may render slower than the actual speed!)

Evaluating on SciQ

The chatbot was tasked with answering some of the questions from the SciQ dataset. There are 2 aspects that can be evaluated from the responses:

  • Context Quality Assessment: Examining the effectiveness of the chatbot in retrieving relevant context following keyword extraction, and
  • Model Accuracy Assessment: Evaluating the precision of the model in predicting accurate answers based on the retrieved context.

NOTE: Since SciBot itself fetches the context for the question through keyword extraction, the context supplied with the SciQ dataset is NOT used.

Question Actual Answer Predicted Answer
Through which process are plants able to make their own food? photosynthesis photosynthesis
Each specific polypeptide has a unique linear sequence of which acids? amino amino acids
What is the most common type of anemia? iron-def Iron-deficiency anemia
What is the process by which the nucleus of a eukaryotic cell divides? mitosis mitosis
What mineral is used in jewelry because of its striking greenish-blue color? turquoise malachite
What are hydrocarbons most important use? fuel fuels and chemicals
When a hypothesis is repeatedly confirmed, what can it then become? theory part of a theory
The effect of acetylcholine in heart muscle is inhibitory rather than what? excitatory excitatory
What is process of producing eggs in the ovary called? oogenesis meiosis
A phase diagram plots pressure and what else? temperature temperature
Energy resources can be put into two categories — renewable or? nonrenewable non-renewable
Who proposed the theory of evolution by natural selection? darwin Charles Darwin & Alfred Russel Wallace
What is the term for the secretion of saliva? salivation spit
Caffeine and alcohol are two examples of what type of drug? psychoactive stimulant
Sometimes referred to as air, what do we call the mixture of gases that surrounds the planet? atmosphere The atmosphere of Earth
Who was the first person known to use a telescope to study the sky? galileo Galileo Galilei

Unit Tests

The unit test cases can be executed by running the following command:

python -m unittest qatest.py

The latest code has been tested against these test cases locally. Below is a screenshot showing the test results: Screenshot 2024-01-16 at 11 47 15 PM

About

An attempt at creating a chatbot utilising the Retriever-Generator approach for Open-Domain Question-Answering (QA).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published