Transcribe, summarize, and create smart clips from video and audio content.
- Transcription: Transcribe audio using WhisperX
- Smart Summarization: Generate concise summaries of video content, tailored to different purposes:
- Meeting Minutes
- Podcast Summaries
- Lecture Notes
- Interview Highlights
- General Content Summaries
- Intelligent Clip Creation: Automatically create clips of key moments and topics discussed in the video.
- Multi-format Support: Process various video and audio file formats.
- Cloud Integration: Utilizes AWS S3 for efficient file handling and processing.
- Python 3.8+
- AWS CLI configured with appropriate permissions
- FFmpeg installed on your system
- Node.js and npm (for running the frontend GUI)
-
Clone the repository:
git clone https://github.com/sidedwards/ai-video-summarizer.git cd ai-video-summarizer
-
Set up the backend:
- Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
- Install the required dependencies:
pip install -r requirements.txt
- Set up your configuration:
- Copy
config/config-example.yaml
toconfig/config.yaml
- Edit
config/config.yaml
with your API keys and preferences
- Copy
- Create and activate a virtual environment:
-
Set up the frontend (optional, for GUI usage):
- Navigate to the frontend directory:
cd frontend
- Install the required dependencies:
npm install
- Navigate to the frontend directory:
- Run the CLI script:
python backend/cli.py
- Follow the prompts to select a video file and choose the type of summary you want to generate.
- The generated summary files will be saved in a directory named after the input video file.
- Start the backend server:
- Run the backend server:
python backend/server.py
- Run the backend server:
- Start the frontend development server:
- In a new terminal window, navigate to the frontend directory:
cd frontend
- Run the frontend development server:
npm run dev
- In a new terminal window, navigate to the frontend directory:
- Open your web browser and navigate to
http://localhost:5173
to access the AI Video Summarizer GUI. - Use the web interface to upload a video file, select the desired summary type, and start the processing.
- Once the processing is complete, you can download the generated summary files as a zip archive.
Edit config/config.yaml
to set:
- AWS CLI path and S3 bucket name
- Replicate API key and model version
- Anthropic API key and model choice
- Other customizable parameters
- Web-based GUI
- Basic CLI
- More LLM options
- Export options for various document formats (PDF, DOCX, etc.)
Contributions are welcome! Please feel free to submit a Pull Request.
This project uses WhisperX, an advanced version of OpenAI's Whisper model, for transcription. WhisperX offers:
- Accelerated transcription
- Advanced speaker diarization
- Improved accuracy in speaker segmentation
The WhisperX model is run via the Replicate API, based on https://github.com/sidedwards/whisperx.