This guide will install whisper.cpp as a simple transcriber and subtitle generator on macOS with Apple Silicon
Install XCode, then run
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
Install Conda
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
And if you don't want conda to be activated by default:
conda config --set auto_activate_base false
Then you can activate and deactivate conda:
conda activate
conda deactivate
Set Conda enviroment:
conda create -n py310-whisper python=3.10 -y
conda activate py310-whisper
Install dependencies:
pip install ane_transformers
pip install openai-whisper
pip install coremltools
Clone Whisper:
cd ~/.apps
git clone https://github.com/ggerganov/whisper.cpp.git && cd whisper.cpp
Build for Apple Silicon:
make clean
WHISPER_COREML=1 make -j
Generate and build the model you need, tiny
works well:
./models/generate-coreml-model.sh tiny.en && make tiny.en
./models/generate-coreml-model.sh tiny && make tiny
./models/generate-coreml-model.sh base.en && make base.en
./models/generate-coreml-model.sh base && make base
./models/generate-coreml-model.sh small.en && make small.en
./models/generate-coreml-model.sh small && make small
./models/generate-coreml-model.sh medium.en && make medium.en
./models/generate-coreml-model.sh medium && make medium
./models/generate-coreml-model.sh large-v1 && make large-v1
./models/generate-coreml-model.sh large-v2 && make large-v2
./models/generate-coreml-model.sh large-v3 && make large-v3
Model file names in models/
are the following:
ggml-tiny.en.bin
ggml-tiny.bin
ggml-base.en.bin
ggml-base.bin
ggml-small.en.bin
ggml-small.bin
ggml-medium.en.bin
ggml-medium.bin
ggml-large-v1.bin
ggml-large-v2.bin
ggml-large-v3.bin
Try whisper.cpp with your derired arguments:
./main \
-m models/ggml-medium.bin \
-l es \
-otxt \
-ovtt \
-osrt \
-olrc \
-owts \
-f /path/to/test.wav
To create an audio file compatible with whisper.cpp, you can use FFmpeg:
ffmpeg \
-i audio.m4a \
-ar 16000 \
-ac 1 \
-c:a pcm_s16le \
test.wav
Create the transcribe bash file:
cd ~/.bin
touch transcribe
chmod +x transcribe
nano transcribe
transcribe
bash file:
#!/usr/bin/env bash
# Modifying the internal field separator
IFS=$'\t\n'
if [ -z "$1" ]; then
echo
echo ERROR!
echo No input file specified.
echo
else
~/.apps/whisper.cpp/main \
-m ~/.apps/whisper.cpp/models/ggml-tiny.bin \
-l es \
-otxt \
-ovtt \
-osrt \
-olrc \
-owts \
--print-colors \
-f "$1"
fi
Install Vosk
conda activate
pip install vosk
Help and models
vosk-transcriber --help
vosk-transcriber --list-model
Do a transcription to SRT, using the Espanish module, with warn
log level
vosk-transcriber \
-n vosk-model-es-0.42 \
--log-level warn \
-i "audio.m4a" \
-t srt \
-o transcription.srt
Extra notes: