- Create a gcp account.
- Enable cloud speech api.
- Remember to save your API_KEY and export it to your bin path using export API_KEY=XXXXXX.
- Create a bucket in storage and upload your mono stereo .flac files.
- Enable it to be accessible through public link. (Not ideal but its ok for learning purposes.)
- If your file format is .mp4, convert it to .flac format with mono stereo.
- You can convert the non mono stereo file to mono stereo using ffmpeg. Download and install it first.
- Use the following command to convert. ffmpeg -i location_of_your_file/audio_file.flac -ac 1 mono.flac
- Open the .flac file using quicktime player and get its sample rate
- create request.json file
{ "config": { "encoding":"FLAC", "sampleRateHertz": 44100, "language_code": "en-US" }, "audio": { "uri":"gs://your_app_api/mono.flac" } }
- If the audio/video content of the file is greater than 1 min use speech:longrunningrecognize endpoint, otherwise use speech:recognize. curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json "https://speech.googleapis.com/v1/speech:longrunningrecognize?key=${API_KEY}"
- https://speech.googleapis.com/v1/operations/OPERATION_NAME?key=${API_KEY} \ | jq -r '.response.results[].alternatives[].transcript' > output.txt