Related blog postWATSON SPEECH TO TEXT LANGUAGE MODEL CUSTOMIZATION.
This project contains a bash script automation example for the IBM Cloud Watson Speech to Text service.
The automation contains two flows:
- Basic usage for extract the text from an audio saved in FLAC format using a base language model.
- Customization of an existing language model for a domain in this example for drums ;-)
Note: If you record your own voice for example in a M4A format here is a possibiltiy to convert M4A to FLAC format for free with Converio.
- IBM Cloud CLI installed
- A Watson Text to Speech service with an Plus plan is created.
- Install the cURL command line on the local computer
Just execute following steps to run the example.
git clone https://github.com/thomassuedbroecker/watson-stt-invocation.git
cd watson-stt-invocation
cp ./code/.env-template ./code/.env
ROOTFOLDER="YOUR_PATH"
RESOURCE_GROUP="default"
REGION="us-south"
APIKEY="YOUR_IBMCLOUD_APIKEY"
S2T_SERVICE_INSTANCE_NAME="YOUR_S2T_SERVICE_NAME"
sh code/use-speech-to-text.sh
- Example output
#*******************
# Customization flow
#*******************
#------------------
# Create and train a Custom Language Model
#------------------
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 160 100 61 100 99 170 277 --:--:-- --:--:-- --:--:-- 458
customization_id: {"customization_id": "7868e363-4afa-4d64-96fd-c506774eebca"}
{"customizations": [{
"owner": "d3443a47-877c-496d-95b9-f62bce50bb38",
"base_model_name": "en-US_BroadbandModel",
"customization_id": "7868e363-4afa-4d64-96fd-c506774eebca",
"dialect": "en-US",
"versions": ["en-US_BroadbandModel.v2020-01-16"],
"created": "2022-11-18T13:32:44.945Z",
"name": "MyDrums-1",
"description": "MyDrums-demo",
"progress": 0,
"language": "en-US",
"updated": "2022-11-18T13:32:44.945Z",
"status": "pending"
}]}
{}
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 104 100 104 0 0 319 0 --:--:-- --:--:-- --:--:-- 330
Response: {
"out_of_vocabulary_words": 1,
"total_words": 43,
"name": "drums1",
"status": "analyzed"
}
Status: %-15s ( %d )
analyzed 10
{"corpora": [{
"out_of_vocabulary_words": 1,
"total_words": 43,
"name": "drums1",
"status": "analyzed"
}]}
Train ...
{}
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 449 100 449 0 0 1537 0 --:--:-- --:--:-- --:--:-- 1586
Response: {
"owner": "d3443a47-XXX-XXXX-95b9-f62bce50bb38",
"base_model_name": "en-US_BroadbandModel",
"customization_id": "7868e363-XXX-XXXX-96fd-c506774eebca",
"dialect": "en-US",
"versions": ["en-US_BroadbandModel.v2020-01-16"],
"created": "2022-11-XXX-XXXX",
"name": "MyDrums-1",
"description": "MyDrums-demo",
"progress": 0,
"language": "en-US",
"updated": "2022-11-XXX-XXXX",
"status": "training"
}
Status (training)
Status: %-15s ( %d )
training 10
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 452 100 452 0 0 1293 0 --:--:-- --:--:-- --:--:-- 1333
Response: {
"owner": "d3443a47-XXX-XXXX-95b9-f62bce50bb38",
"base_model_name": "en-US_BroadbandModel",
"customization_id": "7868e363-XXX-XXXX-96fd-c506774eebca",
"dialect": "en-US",
"versions": ["en-US_BroadbandModel.v2020-01-16"],
"created": "2022-11-XXX-XXXX",
"name": "MyDrums-1",
"description": "MyDrums-demo",
"progress": 100,
"language": "en-US",
"updated": "2022-11-XXX-XXXX",
"status": "available"
}
Status (available)
Status: %-15s ( %d )
available 20
{"words": [{
"display_as": "paradiddles",
"sounds_like": ["paradiddles"],
"count": 1,
"source": ["drums1"],
"word": "paradiddles"
}]}
{
"owner": "d3443a47-XXX-XXXX-95b9-f62bce50bb38",
"base_model_name": "en-US_BroadbandModel",
"customization_id": "7868e363-XXX-XXXX-96fd-c506774eebca",
"dialect": "en-US",
"versions": ["en-US_BroadbandModel.v2020-01-16"],
"created": "2022-11-XXX-XXXX",
"name": "MyDrums-1",
"description": "MyDrums-demo",
"progress": 100,
"language": "en-US",
"updated": "2022-11-XXX-XXXX",
"status": "available"
}
#------------------
# Verify a trained model by using an audio
#------------------
customization_id: 7868e363-XXX-XXXX-96fd-c506774eebca
basic_model: en-US_BroadbandModel
Test audio ...
{
"result_index": 0,
"results": [
{
"final": true,
"alternatives": [
{
"transcript": "it's great to play the drums The hi hat is something very special ",
"confidence": 0.98
}
]
},
{
"final": true,
"alternatives": [
{
"transcript": "it forms the basis for many rhythms syncopations are sometimes distributed with paradiddles and they are creating a fantastic rhythm together with the snare and the bass drum and a splash ",
"confidence": 0.94
}
]
}
]
}
#*******************
# Basic flow
#*******************
{
"result_index": 0,
"results": [
{
"final": true,
"alternatives": [
{
"transcript": "hi this is my test for Watson ",
"confidence": 0.94
}
]
},
{
"final": true,
"alternatives": [
{
"transcript": "speech to text ",
"confidence": 0.99
}
]
},
{
"final": true,
"alternatives": [
{
"transcript": "check it out ",
"confidence": 0.99
}
]
}
]
}
...
List of used API calls: