- Bully that uses Groq to generate a narrative response to a caption and then uses Play.ht to convert the narrative response to speech and play it back. See Screen and Webcam for the images that are used to generate the captions.
- Python
- OpenCV
- LangChain
- Sentence Transformers
- Groq API
- Play.ht API
- Pydub
- Pyaudio
- dotenv
-
Clone the repository:
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up the environment variables:
- Create a
.env
file in the root directory of the project. - Add the following environment variables to the
.env
file:GROQ_API_KEY=<your-groq-api-key> PLAYHT_API_KEY=<your-playht-api-key> PLAYHT_USER_ID=<your-playht-user-id>
- Create a
-
Install the Ollama model for VLLM:
ollama pull 0ssamaak0/xtuner-llava:phi3-mini-int4
-
Get API keys:
- Groq API: (FREE)
- Play.ht API: (5 dollars)
- Clone the voice of your choosing.
-
Run the main script:
python bullyAI.py
-
Interact with the application:
- The application will capture images from your webcam periodically.
- It will generate captions for the images using the Ollama model.
- The captions will be processed by the Groq API to generate a narrative response.
- The narrative response will be converted to speech using the Play.ht API and played back.
-
Stop the application:
- Press
q
in the webcam preview window to stop the application.
- Press
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.