Voicebot

This is an end-to-end voicebot that aims to answer open domain questions, and is intended to be used as a benchmarking tool

Design

Requirements and Setup

python 3.6
pytorch (1.1.0)
tensorflow (1.12)
wikipedia (1.14)
deepspeech (0.5.0)
spacy (2.1.5)
gingerit (0.8.0)
pytorch-pretrained-bert (0.6.2)
playsound (1.2.2)
sounddevice (0.3.13)
soundfile (0.10.2)
inflect (2.1)
librosa (0.7.0)
matplotlib (3.1.1)
unidecode (1.1.1)
numpy (1.17.0)

We recommend using a virtual environment to run this to prevent any conflicts with things like numpy.

You can install any of the Spacy NER models you prefer (We used 'en_core_web_md') by:

python -m spacy download en_core_web_md (Note: Run this in an elevated command prompt with Admin permissions)

You will also require the following models

An info.txt file is located in every directory where a specific model is required. Extract the contents of the models and place them in their respective folders in the project. (BERT, DeepSpeech/Models and Tacotron_TTS/tacotron-models-data folders respectively. WaveRNN should be extracted under the Vocoder_WaveRNN folder)

Open domain QA will also require an internet connection, to get information from Wikipedia.

Running the program

Run the Voicebot file to start the application. You will be prompted to select the TTS system of your choice after the other models have loaded.

The WaveRNN + Tacotron is very resource heavy and produces poor results when run on systems with 8GB of RAM. The speech produced is a lot more natural sounding but often have garbage audio produced towards the end. The standalone tacotron is much lighter, and will not have as poor results on systems with lower resources

Once the TTS has been loaded you will be prompted to select the running mode. This will let you choose between a microphone for input audio, or allow you to use a folder of audio files to test. To add your own audio to the testing set, simply place the wav file in the test-audio folder. For best results, use an American male voice, with a normal or slow speed setting from a site like this.

Running on Windows 10

Run the VoiceBot-windows.py file. Outputs can be accessed from '/Vocoder_WaveRNN/WaveRNN_outputs' OR '/Tacotron_TTS/Tacotron_outputs' subfolders

Running on Ubuntu

Run the VoiceBot-linux.py file.

Note : The playsound library and sounddevice library are not compatible with Ubuntu, so audio cannot be recorded from or played on the console. VoiceBot can work only from questions pre-recorded in 'test_audio' folder. Outputs can be accessed from '/Vocoder_WaveRNN/WaveRNN_outputs' OR '/Tacotron_TTS/Tacotron_outputs' subfolders

References

Demo video

Link to demo video here: https://drive.google.com/file/d/16pFeDjqDOCkVXW0cc09l_mkuxqgQjo8s/view?usp=drive_web

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
BERT		BERT
Data		Data
DeepSpeech/Models		DeepSpeech/Models
Tacotron_TTS		Tacotron_TTS
Vocoder_WaveRNN		Vocoder_WaveRNN
test_audio		test_audio
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
VoiceBot-linux.py		VoiceBot-linux.py
VoiceBot-windows.py		VoiceBot-windows.py
Windows_Anaconda_Environment.yml		Windows_Anaconda_Environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voicebot

Design

Requirements and Setup

Running the program

Running on Windows 10

Running on Ubuntu

References

Demo video

About

Releases

Packages

Contributors 3

Languages

License

madhavmk/QA_VoiceBot_Desktop_Application

Folders and files

Latest commit

History

Repository files navigation

Voicebot

Design

Requirements and Setup

Running the program

Running on Windows 10

Running on Ubuntu

References

Demo video

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages