The Voice-GPT project is a Python-based application that serves as a virtual assistant, capable of engaging in voice-based conversations with users. It utilizes various technologies and APIs for speech recognition, text-to-speech conversion, and natural language processing to provide a seamless conversational experience.
The project consists of several components, including:
- User Interface (UI): A graphical user interface built using the wxPython library, where users can initiate and participate in conversations with the virtual assistant.
- Audio Recorder: This component records audio input from the user, which is then converted into text for further processing.
- Speech-to-Text Converter: Converts recorded audio into text using a machine learning model, enabling the virtual assistant to understand spoken language.
- Text-to-Speech Converter: Converts text responses generated by the virtual assistant into audio for playback to the user.
- Conversation Engine: Uses the OpenAI GPT-3 model to generate responses to user queries and prompts, making the virtual assistant capable of meaningful interactions.
- Audio Player: Plays audio responses generated by the Text-to-Speech Converter to provide a spoken response to the user.
Before running the Personal Assistant project, ensure that you have the following dependencies installed:
- Python 3.x
- wxPython
- pydub
- whisper
- gtts
- pyaudio
- OpenAI GPT-3 API (You will need an API key for this)
- Other dependencies mentioned in the code comments
Clone this repository to your local machine:
git clone https://github.com/kristo-godari/voice-gpt.git
Install the required Python libraries using pip:
pip install -r requirements.txt
Obtain an OpenAI GPT-3 API key by signing up for access on the OpenAI website: https://beta.openai.com/signup/ Create a configuration file named role-play-conversation.properties in the config directory and populate it with the following information:
[text]
initial-prompt = Hello, how can I assist you today?
text-to-speach-language = en
openai-api-key = YOUR_OPENAI_API_KEY
Replace YOUR_OPENAI_API_KEY with the API key you obtained from OpenAI.
Run the Personal Assistant application using the following command:
python main.py
The application will launch the graphical user interface, allowing you to interact with the virtual assistant. Follow these steps to use the application:
- Click the "Reply" button to start recording your voice.
- Speak your message or question to the virtual assistant.
- Click the "Stop recording and send reply" button to stop recording.
- The virtual assistant will process your query and provide a text and audio response.
- The conversation continues, and you can ask additional questions or provide instructions.
- Voice input: Users can speak to the virtual assistant, which converts their speech to text for processing.
- Text input: Users can also type text directly into the application to interact with the virtual assistant.
- Natural language understanding: The application utilizes the OpenAI GPT-3 model to understand and generate human-like responses.
- Text-to-speech conversion: Responses from the virtual assistant are converted to audio for a more natural conversational experience.
- Multithreading: The application uses multithreading to handle audio recording and processing simultaneously, ensuring a smooth user experience.
If you encounter any issues with the application, please check that you have installed all the required dependencies and configured the role-play-conversation.properties file correctly. Make sure your system's microphone is correctly configured and working. Ensure a stable internet connection, as the OpenAI GPT-3 model requires an internet connection to generate responses.
Implement additional conversational features and expand the capabilities of the virtual assistant. Add user authentication and personalization to tailor responses based on user preferences. Enhance error handling and provide more informative feedback to users. Improve the graphical user interface for a more user-friendly experience.
Contributions to this project are welcome! Feel free to open issues or submit pull requests to help improve the project.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.