Transcript a audio/video file using google cloud speech api

Create a gcp account.
Enable cloud speech api.
Remember to save your API_KEY and export it to your bin path using export API_KEY=XXXXXX.
Create a bucket in storage and upload your mono stereo .flac files.
Enable it to be accessible through public link. (Not ideal but its ok for learning purposes.)

Create Transcript for the audio/video file

If your file format is .mp4, convert it to .flac format with mono stereo.
You can convert the non mono stereo file to mono stereo using ffmpeg. Download and install it first.
Use the following command to convert. ffmpeg -i location_of_your_file/audio_file.flac -ac 1 mono.flac
Open the .flac file using quicktime player and get its sample rate
create request.json file
{ "config": { "encoding":"FLAC", "sampleRateHertz": 44100, "language_code": "en-US" }, "audio": { "uri":"gs://your_app_api/mono.flac" } }
If the audio/video content of the file is greater than 1 min use speech:longrunningrecognize endpoint, otherwise use speech:recognize. curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json "https://speech.googleapis.com/v1/speech:longrunningrecognize?key=${API_KEY}"
https://speech.googleapis.com/v1/operations/OPERATION_NAME?key=${API_KEY} \ | jq -r '.response.results[].alternatives[].transcript' > output.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
output.txt		output.txt
request.json		request.json

Provide feedback