You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a similar project that was developed before I discovered your project. I like that your project uses whisper and VOSK as VOSK seems to be less accurate. Something I had developed that I think may benefit your project is transcript based matching accuracy improvement.
Basically, it does a diff and if a word is incorrect, it will fix the word match while retaining the timestamp. This is useful if you have a known accurate transcript such as a pdf of an audiobook. Just figured I'd share and hope it can be of help.
def improve_match_with_transcript(results_list, transcript_word_array): # attempts to improve accuracy of the match using transcript
print("using transcript")
improved_detection = []
results_words_only_list = []#only words not start, end, or anything else.
for item in results_list:#combine our results list with words only
results_words_only_list.append(item.get('word'))
#differ = difflib.Differ()
html_table = difflib.HtmlDiff().make_table(
fromlines=transcript_word_array,
tolines=results_words_only_list,
fromdesc="Transcript",
todesc="Results",
context=False # Show surrounding context
)
print(f"table:'{html_table}'")
soup = BeautifulSoup(html_table, 'html.parser')# Parse the HTML using BeautifulSoup
# Find the table element by its class name
table = soup.find('table', class_='diff')
# Initialize an empty list to store the rows
result_rows = []
# Find all the rows in the table body
rows = table.tbody.find_all('tr')
#for row in rows:
for i in range(0, len(rows), 1):
cells = rows[i].find_all('td')
# Check if the first cell is 'n' or 't'
if cells[0].text.strip() == 'n' or cells[0].text.strip() == 't':
# Extract the data from the cells
result_line_index = cells[4].text.strip()
# if the word wasn't detected in our original result, we must continue anyways because we have no start and end
if result_line_index == "":
continue
transcript = cells[2].text.strip()
results = cells[5].text.strip()
# Create a dictionary for the row data
row_data = {
'Transcript': transcript,
'Results': results,
'Res_index' : result_line_index
}
# Append the row data to the list
result_rows.append(row_data)
improved_detection = results_list
for item in result_rows:
results = item["Results"]
transcript_word = item["Transcript"]
res_index = item["Res_index"]
#print(f"trans: {transcript_word} res: {results} res_index: {res_index}")
list_index = int(res_index)-1
#print(f"Transcript: {transcript_word}, Res_index: {res_index}, List index: {list_index}")
improved_detection[list_index]['word'] = transcript_word
#print(f"improved detection: {improved_detection}")
return improved_detection
The text was updated successfully, but these errors were encountered:
I have a similar project that was developed before I discovered your project. I like that your project uses whisper and VOSK as VOSK seems to be less accurate. Something I had developed that I think may benefit your project is transcript based matching accuracy improvement.
https://github.com/therealmichaelberna/FoulFilter/blob/main/word_finder.py
Basically, it does a diff and if a word is incorrect, it will fix the word match while retaining the timestamp. This is useful if you have a known accurate transcript such as a pdf of an audiobook. Just figured I'd share and hope it can be of help.
The text was updated successfully, but these errors were encountered: