Skip to content

ChenHsieh/IOB_seminar_transcript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The transcripts of spring 2023 seminar of IOB at UGA auto generated by the OpenAI Whisper large model.

Description

All the transcripts with different file format of IOB Spring seminars as of April 2023 are in the out folder of the github repo.

The scripts used to generate the transcripts are in the scripts folder. I first used the yt_dlp.sh to download youtube videos and convert them into mp3 format. (ty yt-dlp)

yt-dlp -x --audio-format mp3 -o '%(title)s.%(ext)s' {youtube link}

Then use the prep_whisper_job.py to generate commands for each sound file. It would generate cmd.sh for me to submit a lot of job to run whisper on UGA's sapelo2 cluster.

whisper --model large -o out -- './{filename}';

Finally, I used prep_jekyll_page.py to generate the markdown file for each transcript so we can see this github page.

Status

I will update when I want to. Please feel free to use the transcripts for your own purpose or contact me for more interesting projects.

links: https://y.at/💻🌲🎓🚀🌕

About

The transcripts of IOB seminar auto generated by the OpenAI Whisper large model

Resources

License

Stars

Watchers

Forks