Yay, I found a link that I’d been having a hard time finding! I vaguely remembered coming across a web-based audio recording and cleanup tool that displayed two sections of your script at a time (current and upcoming) and used speech recognition to detect when you either moved on to the next segment or repeated yourself, cleaning up the audio by using the last take and removing silences. It used Levenshtein distance to calculate the difference between the phonemes of the results of speech recognition and the expected text. I made https://github.com/sachac/subed-record inspired by it (minus the clever speech recognition bit). It came to mind again when I started making more videos, but I couldn’t remember the name of the project. While cleaning up my notes for livestreaming, I came across my note on it. Here’s that project: https://github.com/stevenwaterman/narration.studio
The web-based version is gone, and the code doesn’t run as is. The JSON version of the CMU Pronouncing Dictionary is no longer at https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict , but I think I can get it from https://github.com/words/cmu-pronouncing-dictionary . Might be fun to dust it off and see what I can do with the ideas, especially now that OpenAI Whisper gives us pretty good speech recognition.