You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have combined the phoneme sets for all three langauges,
English, Chinese, Japanese and started fine tuning using a datset comprised of all three speech languages
The base model I use is the chinese and english base.
However after 500 epochs, the result I get, chinese is good, english is good, however japanese sounds unnatural .
My udnerstanding is that the phonemes are correct but the tone is just not how japanese is spoken.
What can I do to improve this?
Hello, may I ask which pre-trained model you used for fine-tuning? How long did you train? How is the config set up? The model I trained cannot produce complete sentences, and the speech is very strange.
I have combined the phoneme sets for all three langauges,
English, Chinese, Japanese and started fine tuning using a datset comprised of all three speech languages
The base model I use is the chinese and english base.
However after 500 epochs, the result I get, chinese is good, english is good, however japanese sounds unnatural .
My udnerstanding is that the phonemes are correct but the tone is just not how japanese is spoken.
What can I do to improve this?
Here is a sample data of the japanese output. https://soundcloud.com/michael-lin-674069136/japanese-test
The text was updated successfully, but these errors were encountered: