Sr. No. | Topic | Link |
---|---|---|
0. | Introduction and "Why" of the project | Link will come here |
1. | Setup and Installation | Link will come here |
2. | Features | Another Link |
3. | Demo and Application Screenshots | Another Link |
4. | Approach and Implementation* | Another Link |
5. | Recent Updates and Future Directions | Another Link |
6. | Contributions | Another Link |
7. | Issues/Troubleshooting | Another Link |
The primary objective of this project is to showcase NLP & LLM's capability to quickly summarise long meetings and help you and your organisation automate the task of delegating Minutes of Meeting(MoM) emails. It uses a high level 2 step approach where step 1 corresponds to conversion of any audio/video file into a text conversation. Step 2 utilises text produced by step 1 and generate Minutes of meeting and detailed summary notes. These minutes of meeting will be editable piece of text. Once you finalise the MoM, you can use it further as per your requirement.
The long term objective for this repository is also to develop a real time python web-application which can attend meetings for you and also provide you MoM at the end of the meeting. Taking baby steps and trying to get to long-term by starting a short term objective.
For your Information: I am working on fine-tuning custom LLMs and development. Please be patient while the whole project is completely stable. I will add training & inference code once completed. Do โญ this repository if you need to know latest updates. ๐ Appreciate your time.
Before proceeding, ensure you have the following installed:
- Ubuntu 22.04 or latest.
- Python (v3.10 or higher)
- A virtual environment tool like
virtualenv
orvenv
.
Let's begin installation steps now.
-
Clone the GitHub Repository
Open your terminal or command prompt and navigate to the directory where you want to clone the repository. Then run:
git clone https://github.com/inboxpraveen/LLM-Minutes-of-Meeting cd LLM-Minutes-of-Meeting
-
Install Requirements
It's a good practice to create a virtual environment before installing dependencies to avoid potential conflicts with other Python projects. If you're using
virtualenv
, you can set up a new environment as follows:## Create a python virtual environment and activate it. # Install the required packages after activating: pip install -r requirements.txt ## After this, let's install Llama-Cpp-Python binding which will be used to interact with LLMs. ## Run the following line if you are using it on a CPU. pip install llama-cpp-python ## Run the following line if you are using GPU (T4, A100, A10, or H100), or any Nvidia Cuda based GPU Drivers. CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python ## If you are on Mac or any other GPU types, you can refer the following links and setup the Llama-Cpp-Python https://llama-cpp-python.readthedocs.io/en/stable/#installation-configuration https://llama-cpp-python.readthedocs.io/en/stable/install/macos/
-
Setup RabbitMQ & Celery Background Job Processing
Use the following link to setup RabbitMQ on your machine. Follow the directions till step 5 and save your
admin-username
&password
.Setup RabbitMQ on Ubuntu 22.04
Once you have successfully setup RabbitMQ, then setup redis-server and celery. Use the following command to setup and install them.
sudo apt-get update -y ## Try with apt-get. If it does not install, then run with apt. sudo apt-get install redis-server -y ## If the above does not work, try this: sudo apt install redis-server -y
-
Run Application and Parallel Run Celery Task
First, start the Flask application:
cd /path/to/project/
and then open app.py file inside you code editor and modify the following line.
Line 18: broker='amqp://<user>:<password>$@localhost:5672//'
## Update <user> with "your-admin-username".
## Update <password> with "your-admin-password"
## Eg: broker='amqp://admin:hello_world$@localhost:5672//'
### IMPORTANT NOTE: If your password contains '@' symbol, you will need to convert it because it is the default delimiter in broker settings. Example if your password has @ symbol inside it would be.
## broker='amqp://admin:hello%40world$@localhost:5672//' -- where the original password was "hello@world", we represent it as 'hello%40world'
After you have updated the file, you will run the setup.py
file to setup directories and download of models. If you want to change the configurations of which models you wish to use, you can change them appropriately based on your infrastructure size and system capacity. The following table shows which models we support currently in this project but we will be adding new LLMs support as we see them fit and open-source.
Speech Models Supported
Model Name | Model Size | Memory Required (RAM or vRAM) |
---|---|---|
distil-whisper/distil-large-v3 | 3.1 GB | 4 GB |
distil-whisper/distil-large-v2 | 3.1 GB | 4 GB |
distil-whisper/distil-medium.en | 1.6 GB | 2 GB |
distil-whisper/distil-small.en | 680 MB | 900 MB |
openai/whisper-large-v3 | 6.2 GB | 7.5 GB |
openai/whisper-large-v2 | 6.2 GB | 7.5 GB |
openai/whisper-large-v1 | 6.2 GB | 7.5 GB |
openai/whisper-medium | 3.2 GB | 4.5 GB |
openai/whisper-small (default) | 980 MB | 1.7 GB |
LLMs Supported
Model Name | Model Size | Memory Required |
---|---|---|
QuantFactory/Phi-3-mini-4k-instruct-GGUF (default) | 1 GB - 8 GB | 2 GB - 14 GB |
QuantFactory/Phi-3-mini-128k-instruct-GGUF | 1 GB - 8 GB | 2.5 GB - 16 GB |
bartowski/Phi-3-medium-128k-instruct-GGUF | 3 GB - 14 GB | 6 GB - 18 GB |
You will need to modify the global_varibables.py
file with the model name you choose and then run setup.py
file which will automatically down the models you choose.
Line 32: DEFAULT_SPEECH_MODEL = "openai/whisper-small"
...
Line 46: DEFAULT_SUMMARY_MODEL = ("QuantFactory/Phi-3-mini-4k-instruct-GGUF", "Phi-3-mini-4k-instruct.Q5_0.gguf")
### After update the above lines as per your need, run the setup.py
python setup.py
In a new terminal window (ensure your virtual environment is activated here as well), start the App and Celery worker:
python app.py # ensure your environment is activated
# and then in new terminal, run the following.
celery -A app.celery worker --loglevel=info -f celery.logs
-
Upload Recording to Form
Open your web browser and navigate to the Flask application's URL (usually
http://127.0.0.1:5000
). Use the interface to upload your meeting recording. -
Get Latest Status and Wait for It to Complete
After uploading the recording, you can check the status of the processing. This could be implemented as a status page or a progress bar in your application. Wait until the processing is complete.
-
See the Final Processed Minutes of Meeting (MoM)
Once the processing is complete, the application should display the final minutes of the meeting. You can view, edit (if the feature is available), and save the MoM for your reference.
-
Effortlessly convert audio and video files to accurate text transcripts: These can also be used to summarise, generate action items, understanding work-flows, and resource planning.
-
Keyword highlighting and topic tagging for quick reference: Extracting topics and finding relevant contents to skip through meetings and listen to only specific topics which is of your interest.
-
Export minutes in various formats, including PDF and plain text: Allows you to export meeting transcripts, summaries, topic & keywords, action items, etc into documents which can be utilized in project planning and management frameworks. Also eliminates your need to manually write and generate templates.
-
User-friendly interface for easy customization and integration: Easy to tweak which ever open-source or closed source model you want to choose.
The core functionality revolves around processing meeting recordings submitted via the home page of the web application. Once a recording is submitted, a background task is initiated using Celery, which performs two primary operations: speech-to-text conversion and generating minutes of the meeting from the converted text.
The flowchart you've shared outlines a detailed process for handling and processing media files, particularly focusing on audio and video inputs to generate transcriptions and summaries. Letโs break down each step and describe the high-level solutions involved in this workflow:
- Media Types: Supports mp3, wav, mp4 files.
- Action: Users upload their media files to the system.
- Purpose: To keep users informed about the status of their upload and processing.
- Implementation: Use an asynchronous notification API to send real-time updates to the user.
- Action: The system reads the uploaded file to determine the type and content.
- Audio:
- Convert to 16 kHz: Standardize audio sample rate for consistent processing.
- Transcribe: Convert audio speech to text.
- Video:
- Extract Audio and Frames (1 Frame/Second): Separate audio track and video frames for processing.
- Short Summary Per Frame: Generate a brief summary for each extracted frame.
- Action: Combine all short summaries into a single comprehensive transcription of the video content.
- Purpose: Handle limitations of the processing language model which might have a maximum token input limit.
- Implementation: If the transcription exceeds the token limit, split the content into manageable parts.
- Generate Video MoM (Minutes of Meeting): If the input is a video, generate a detailed summary or minutes from the transcription.
- Recursive Processing: For longer content, recursively summarize to condense the information effectively.
- Action: Produce a final summary and minutes of the meeting document based on the transcribed and processed text.
- Purpose: Combine summaries from different chunks (if split previously) into a final comprehensive document.
- Integration with Notification API: Inform the user that the processing is complete and provide access to the generated summaries or MoM documents.
- Back-end: Python, Flask
- Asynchronous Task Queue: Redis, Celery
- Speech-to-Text: Whisper, Faster-Whisper, Distil-Whisper
- LLM for Text Processing: Phi3, Gemma 2, Llama 3
- Frontend: HTML, CSS, JavaScript
- Corporate Meetings: Enhances productivity by providing quick and accurate minutes for various corporate gatherings, board meetings, and team discussions.
- Educational Institutions: Useful for lecturers and students to transcribe and summarize lectures, seminars, and group discussions.
- Legal and Medical Fields: Helps in accurately documenting legal proceedings, interviews, and patient consultations.
- Accessibility: Assists individuals with disabilities, especially those who have difficulties in note-taking, by providing an automated way to capture and summarize spoken content.
- Event Coverage: Useful for journalists and event organizers to transcribe speeches, presentations, and panel discussions, aiding in report creation and event documentation.
- Integration with video conferencing tools for direct recording capture.
- Multi-language support for speech-to-text conversion.
- Enhanced summarization features tailored to specific meeting types (e.g., technical, business strategy).
- Real-time transcription and summarization capabilities.
- User customization options for formatting the minutes.
- Home Screen.
- New Minutes of Meeting Dialogue
- Upload Video/Audio File.
- Notification Center - Started Processing
- Notification Center - In Prgress Real Time Updates
- Notification Center - Completed Processing
- Final Minutes of Meeting Page
- Notification Center - Multiple File Status
- Notification Center - Multiple File Status
- Notification Center - Multiple File Status
Before proceeding, ensure you have the following installed:
- Ubuntu 22.04 or latest.
- Python (v3.10 or higher)
- A virtual environment tool like
virtualenv
orvenv
.
- Ensure all environment variables required by the application and Celery are correctly set.
- Check for any error messages in the Flask and Celery terminal outputs.
- Make sure the versions of Python and the packages in
requirements.txt
are compatible.
In Phase 2 of our project, we plan to enable real-time meeting transcription. Join us in shaping the future of efficient and collaborative meetings!
๐ Follow me for updates on Phase 2 development and other enhancements to make your meetings even more productive.
๐ฉโ๐ป Encouraging contributions from the community to make this tool a game-changer for meetings everywhere. Contribute your ideas and expertise to help us achieve real-time transcription!