Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more output formats to pass to whisper.cpp #116

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 25 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ Node.js bindings for OpenAI's Whisper model.
## Features

- Automatically convert the audio to WAV format with a 16000 Hz frequency to support the whisper model.
- Output transcripts to (.txt .srt .vtt)
- Output transcripts to (.txt .srt .vtt .json .wts .lrc)
- Optimized for CPU (Including Apple Silicon ARM)
- Timestamp precision to single word
- Split on word rather than on token (Optional)
- Translate from source language to english (Optional)
- Convert audio formet to wav to support whisper model
- Convert audio format to wav to support whisper model


## Installation
Expand All @@ -21,16 +21,16 @@ Node.js bindings for OpenAI's Whisper model.

```bash
sudo apt update
sudo apt install build-essential
sudo apt install build-essential
```

1. Install nodejs-whisper with npm
2. Install nodejs-whisper with npm

```bash
npm i nodejs-whisper
```

2. Download whisper model
3. Download whisper model

```bash
npx nodejs-whisper download
Expand All @@ -50,18 +50,22 @@ const filePath = path.resolve(__dirname, 'YourAudioFileName')
await nodewhisper(filePath, {
modelName: 'base.en', //Downloaded models name
autoDownloadModelName: 'base.en', // (optional) autodownload a model if model is not present
verbose?: boolean
removeWavFileAfterTranscription?: boolean
withCuda?: boolean // (optional) use cuda for faster processing
verbose: false, // (optional) output more dubugging information
removeWavFileAfterTranscription: false, // (optional) remove wav file once transcribed
withCuda: false // (optional) use cuda for faster processing
whisperOptions: {
outputInCsv: false, // get output result in csv file
outputInJson: false, // get output result in json file
outputInJsonFull: false, // get output result in json file including more information
outputInLrc: false, // get output result in lrc file
outputInSrt: true, // get output result in srt file
outputInText: false, // get output result in txt file
outputInVtt: false, // get output result in vtt file
outputInSrt: true, // get output result in srt file
outputInCsv: false, // get output result in csv file
translateToEnglish: false, //translate from source language to english
wordTimestamps: false, // Word-level timestamps
outputInWords: false, // get output result in wts file for karaoke
translateToEnglish: false, // translate from source language to english
wordTimestamps: false, // word-level timestamps
timestamps_length: 20, // amount of dialogue per timestamp pair
splitOnWord: true, //split on word rather than on token
splitOnWord: true, // split on word rather than on token
},
})

Expand Down Expand Up @@ -93,10 +97,14 @@ const MODELS_LIST = [
}

interface WhisperOptions {
outputInCsv?: boolean
outputInJson?: boolean
outputInJsonFull?: boolean
outputInLrc?: boolean
outputInSrt?: boolean
outputInText?: boolean
outputInVtt?: boolean
outputInSrt?: boolean
outputInCsv?: boolean
outputInWords?: boolean
translateToEnglish?: boolean
timestamps_length?: number
wordTimestamps?: boolean
Expand All @@ -105,7 +113,7 @@ const MODELS_LIST = [

```

## Run Locally
## Run locally

Clone the project

Expand All @@ -131,7 +139,7 @@ Start the server
npm run dev
```

Build Project
Build project

```bash
npm run build
Expand Down
8 changes: 6 additions & 2 deletions src/WhisperHelper.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,14 @@ export const constructCommand = (filePath: string, args: IOptions): string => {

const constructOptionsFlags = (args: IOptions): string => {
let flags = [
args.whisperOptions?.outputInCsv ? '-ocsv ' : '',
args.whisperOptions?.outputInJson ? '-oj ' : '',
args.whisperOptions?.outputInJsonFull ? '-ojf ' : '',
args.whisperOptions?.outputInLrc ? '-olrc ' : '',
args.whisperOptions?.outputInSrt ? '-osrt ' : '',
args.whisperOptions?.outputInText ? '-otxt ' : '',
args.whisperOptions?.outputInVtt ? '-ovtt ' : '',
args.whisperOptions?.outputInSrt ? '-osrt ' : '',
args.whisperOptions?.outputInCsv ? '-ocsv ' : '',
args.whisperOptions?.outputInWords ? '-owts ' : '',
args.whisperOptions?.translateToEnglish ? '-tr ' : '',
args.whisperOptions?.wordTimestamps ? '-ml 1 ' : '',
args.whisperOptions?.timestamps_length ? `-ml ${args.whisperOptions.timestamps_length} ` : '',
Expand Down
8 changes: 6 additions & 2 deletions src/types.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
export interface WhisperOptions {
outputInCsv?: boolean
outputInJson?: boolean
outputInJsonFull?: boolean
outputInLrc?: boolean
outputInSrt?: boolean
outputInText?: boolean
outputInVtt?: boolean
outputInSrt?: boolean
outputInCsv?: boolean
outputInWords?: boolean
translateToEnglish?: boolean
language?: string
timestamps_length?: number
Expand Down
Loading