OpenAI adds gpt-4o-mini-tts, a text-to-speech model that it says delivers more nuanced and realistic-sounding speech, and two speech-to-text models to its API

www.implicator.ai/claude-gets- ... tip @techmeme.com @fry69.dev : Unrelated to OpenAI, here is an interesting text to speech model/generator with supports “emotion” sounds like <laugh>, <chuckle>, <sigh>, etc — Demo space on HuggingFace -> huggingface.co/spaces/prith... @implicator : OpenAI drops next-gen voice models that actually work. Scary-good transcription + AI voices with personality. Plus video tech on deck. Silicon Valley's game of catch-up begins 🎯 — www.implicator.ai/openai-upgra... tip @techmeme.com X: @openaidevs : Three new state-of-the-art audio models in the API: 🗣️ Two speech-to-text models—outperforming Whisper 💬 A new TTS model—you can instruct it *how* to speak 🤖 And the Agents SDK now supports audio, making it easy to build voice agents. Try TTS now at https://openai.fm/. Jijo Sunny / @jijosunny : Today @OpenAI dropped 3 impressive voice models, and we were the first to test them internally (thanks!). Bottom line: It's the best STT model by far—and we've tested them all. I was pleasant surprised by how well it handled context and nuance in smaller languages like Malayalam. Samrat Man Singh / @samratmansingh : OpenAI's new TTS looks(and sounds) pretty great for the price. Also, I hope this pushes other providers to just price API usage by minute. Every other TTS provider(ElevenLabs, Cartesia, etc) currently have monthly credits pricing. [image] Justin Uberti / @juberti : Lots of new audio stuff today: - ASR: gpt-4o-transcribe with SoTA performance - TTS: gpt-4o-mini-tts with playground at https://openai.fm/ - Realtime API: new noise reduction and semantic VAD - Agents SDK: add voice to an agent with 10 LOC Details: https://platform.openai.com/ ... LinkedIn: Marc Manara : 2025 is the year of voice... and agents.. and well, voice agents. — OpenAI launched 3 new audio models today - 2 new speech-to-text models and a new text-to-speech model. … Olivier Godement : Voice AI agents are getting real and fun! We're launching new audio models and tools to make it easy to build capable voice agents. …

TechCrunch 2025-03-21 Kyle Wiggers

Discussion

@implicator @implicator on bluesky
Claude gets web-smart. Real-time search + AI synthesis = game changer. Financial analysts, sales teams, researchers: Your AI assistant just learned to time travel. The knowledge cutoff era ends now 🔍 — www.implicator.ai/claude-gets- ... tip @techmeme.com
@fry69.dev @fry69.dev on bluesky
Unrelated to OpenAI, here is an interesting text to speech model/generator with supports “emotion” sounds like <laugh>, <chuckle>, <sigh>, etc — Demo space on HuggingFace -> huggingface.co/spaces/prith...
@implicator @implicator on bluesky
OpenAI drops next-gen voice models that actually work. Scary-good transcription + AI voices with personality. Plus video tech on deck. Silicon Valley's game of catch-up begins 🎯 — www.implicator.ai/openai-upgra... tip @techmeme.com
@openaidevs @openaidevs on x
Three new state-of-the-art audio models in the API: 🗣️ Two speech-to-text models—outperforming Whisper 💬 A new TTS model—you can instruct it *how* to speak 🤖 And the Agents SDK now supports audio, making it easy to build voice agents. Try TTS now at https://openai.fm/.
@jijosunny Jijo Sunny on x
Today @OpenAI dropped 3 impressive voice models, and we were the first to test them internally (thanks!). Bottom line: It's the best STT model by far—and we've tested them all. I was pleasant surprised by how well it handled context and nuance in smaller languages like Malayala…
@samratmansingh Samrat Man Singh on x
OpenAI's new TTS looks(and sounds) pretty great for the price. Also, I hope this pushes other providers to just price API usage by minute. Every other TTS provider(ElevenLabs, Cartesia, etc) currently have monthly credits pricing. [image]
@juberti Justin Uberti on x
Lots of new audio stuff today: - ASR: gpt-4o-transcribe with SoTA performance - TTS: gpt-4o-mini-tts with playground at https://openai.fm/ - Realtime API: new noise reduction and semantic VAD - Agents SDK: add voice to an agent with 10 LOC Details: https://platform.openai.com/ ..…

Chronicles

OpenAI adds gpt-4o-mini-tts, a text-to-speech model that it says delivers more nuanced and realistic-sounding speech, and two speech-to-text models to its API

Related Coverage

Discussion