MediaCatch Beta is now out! Get started today

New AI feature coming soon: Speech-to-text
Back to blog

New AI feature coming soon: Speech-to-text

Written by Corina Inés Chouciño

Ever wish you could instantly transcribe an interview, a video or a podcast? We are about to launch speech-to-text-models in all of the Nordic languages.

Furthering our mission to build AI features that understand media as humans do, we are making remarkable progress within speech-to-text technology.

Last week we added "speaker ID" to the speech-to-text AI feature. Speaker ID makes it easy to identify who is saying what in transcripts, when you get a printout of a video or audio file.

Within the general elections context in Denmark, we applied and tested it at the party leader's debate.

Notice the "speaker ID" at the top of the video (and the subtitles generated by our speech-to-text model, not least!).

The MediaCatch speech-to-text technology can be applied to any piece of audio-carrying content, making it visually understandable and easy to digest at speed.

Some of the challenges: Since the first version was launched, speech-to-text technology has faced considerable challenges related to accuracy. But we have been constantly improving performance relating to background noise, punctuation placement, capitalization, correct formatting, timing of words, speaker identification, terminology, etc.

The MediaCatch speech-to-text AI feature is being trained concurrently to understand multiple tones, sentiments and contexts across audio carrying content.

If you are interested in learning more about speech-to-text feature, or how AI can help you supercharge your company, processes and workflows, you can subscribe to our newsletter, or drop us a message via the contact form.

Get next gen media and AI insights delivered to your inbox

Or contact Carsten Lakner to get more information