LIVA is a project aimed at creating a local intelligent voice assistant that leverages the power of large language models (LLMs) to understand and respond to user queries in natural language. This project provides a framework for building a voice-controlled interface that integrates speech recognition, natural language processing, and text-to-speech synthesis.
- Speech-to-text conversion for transcribing user input
- Interaction with large language models (LLMs) for understanding user queries
- Text-to-speech synthesis for generating responses
- Customizable settings for specifying model configurations and API endpoints
-
Clone the repository to your local machine:
git clone https://github.com/LuciAkirami/liva.git
-
Install the latest torch version
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-
Install the required dependencies:
pip install -r requirements.txt
-
Navigate to the project directory:
cd liva
-
Run the
main.py
script with the desired command-line arguments:python main.py --url <Chat Endpoint URL> --model-id <LLM Model ID> --api-key <API Key> --stt-model <Speech2Text Model>
--url
: Specify the URL of the OpenAI compatible chat endpoint.--model-id
: Provide the ID of the LLM model to power the text generator.--api-key
: Provide the API Key for the relevant chat endpoint URL.--stt-model
: Specify the Speech2Text model for converting speech to text.
-
Once the script is running, speak into the microphone to interact with LIVA. The assistant will transcribe your speech, process it using the specified LLM model, and generate a response.
python main.py --url http://localhost:11434/v1 --model-id mistral:instruct --api-key ollama --stt-model openai/whisper-base.en
The above are the default arguments when you do not specify any and just run the following
python main.py
- Python 3.10.3 (Better to use conda for managing the environment)
Contributions to LIVA are welcome! If you have ideas for new features, improvements, or bug fixes, feel free to open an issue or submit a pull request.
You may encounter errors related to libcuda when running Whisper.
Ensure that libcudnn_ops_infer.so.8
is in your library path. You can export the library path using the following command:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/{user_name}/miniconda3/envs/{env_name}/lib/{python_version}/site-packages/nvidia/cudnn/lib/
You may encounter issues installing PyAudio due to missing dependencies.
Install PortAudio and then install PyAudio using pip:
sudo apt-get install portaudio19-dev
pip install PyAudio
Alternatively, you can use conda to install PyAudio, which will also install PortAudio:
conda install PyAudio
To use the Speech5 Tokenizer, ensure that the sentence-piece tokenizer is installed.
You may encounter ALSA errors even if you are not using the library.
Import sounddevice
library in your code, even if you are not using it. This can help resolve ALSA errors:
import sounddevice
Ensure that your system's audio configuration is correct and that all necessary audio devices are properly configured.
If you encounter any other issues not covered here, please refer to the documentation of the specific libraries or tools you are using. Additionally, searching online forums and communities for similar issues can often provide helpful insights and solutions. If the problem persists, feel free to open an issue on the project's GitHub repository for further assistance.
This project is licensed under the MIT License