One step closer to figuring out live autocaptions that might be semi-tweakable! I adapted some code from to let me also automatically save the JSON and text for further processing. I think I’ll be able to use start-process in Emacs to get that to listen to my audio and put the text in a buffer, so we can get live notes during streaming or braindumping. If I can use Alsa to pipe audio into the process, I might be able to rig it up to send lines to an IRC channel using ERC or overwrite a text overlay that OBS uses, so a future EmacsConf might even have auto captions for live talks. Bonus points if I can someday figure out how to correct misrecognized words on the fly, either by pattern-matching on common errors or having a quick way I can replace a word or two…

