One thing I've found challenging about the Whisper APIs is that it performs quite poorly when trying to do "realtime transcription" - I played around with some of the whisper.cpp stuff to get it running, and with the tiny model, I was almost able to get reliable transcriptions, but it seems like other than static mp3 files, it is a Hard Problem [tm] that will need further work to get really good.
My use case was to try to make an AI assistant that would transcribe my audio requests and then turn that into a payload for one of the GPT-X APIs
My use case was to try to make an AI assistant that would transcribe my audio requests and then turn that into a payload for one of the GPT-X APIs