This project demonstrates the feasibility of real-time, auto-generated captions on the web, supporting the Auto-Generated Captions API Proposal. Inspired by feedback and motivation from Adam Argyle and the Chrome Team, this extension showcases how live captions can be integrated into web videos dynamically.
- Real-time Captioning: Captions are generated dynamically on the fly.
- Modes: Three modes available in the vanilla version:
- Static Mode: Fixed captions displayed without scrolling.
- Scroll Mode: Captions scroll dynamically as they are generated (used in the Chrome extension).
- Append Mode: Captions accumulate below the video.
- HuggingFace and Transformers JS: Leveraging advanced machine learning models for accurate transcription.
- Chrome Extension: Focused on Scroll Mode for seamless captioning directly in the browser.
This project was developed to:
- Validate the practicality of the
autogenerate
attribute proposed for the<track>
element. - Enhance web accessibility by enabling captions as a default feature, as only 0.5% of videos have it as per Web Almanac by HTTP archive.
Important
This can help to showcase the power of Chrome's Gemini Nano API as a potential backend for AI-powered captioning. With the release of Prompt API, Writer and Rewriter API and Translate API just to name a few, this can be another powerful addition to the window.ai APIs.
The vanilla version supports:
- Static Mode
- Scroll Mode
- Append Mode
To use:
- Open the provided examples.
- Play the video.
- Observe the auto-generated captions in the mode of your choice.
The extension implements the Scroll Mode exclusively:
- Clone the repository locally.
- Visit
chrome://extensions
in your Chrome browser. - Click "Load unpacked" and select the cloned repository folder.
- Visit a website with videos that allow fetching.
- Play a video to see auto-captions appear dynamically.
Note: The extension only works on websites that permit fetching the video.
- Install and Set Up:
- For the vanilla version: Load the example files in a local environment.
- For the Chrome extension: Clone the repository and load it into Chrome as described above.
- Play Video:
- Captions start automatically when a video is played.
- Switch Modes (Vanilla Only):
- Modify the code to toggle between Static, Scroll, and Append modes.
- Gemini Nano Integration: Chrome’s Gemini Nano API is a promising tool for improving transcription efficiency and accuracy. Future iterations will incorporate Gemini Nano for enhanced performance.
- Live Caption Support: Extend functionality to support live-streamed videos, enabling real-time accessibility.
- Cross-Browser Compatibility: Collaborate with other browser vendors to standardize the feature across platforms.
- Auto-Generated Captions API Proposal
- HuggingFace and Transformers JS
- Inspiration from the Chrome Team (Thomas Steiner and Adam Argyle)
- Gemini Nano API by Chrome (for future versions)
Special thanks to:
- The Chrome Team
- HuggingFace community for their ML tools