2025-08-29
Voice-only programming with the new OpenAI Realtime API ... I spend a lot of time these days pair programming with LLMs. Often I'm talking rather than typing. This “voice dictation” use case has become an important vibe benchmark for me. Being able to create text input just by [video]
ZDNET
OpenAI makes its Realtime API generally available with features like MCP support and debuts gpt-realtime, its most advanced speech-to-speech model, in the API
[video] @liodakis : Congrats to @pbbakkum on shipping gpt-realtime! It's been awesome watching him and the multimodal team sweat the details and get to a GA quality multimodal mode...
2025-02-05
Google's full release of Gemini 2.0 Flash is a great thing for the voice AI ecosystem. Up to this point, almost every production voice AI agent has used GPT-4o. Voice AI apps need an LLM with fast TTFT, good instruction following, reliable function calling, and natural
The Verge
Google releases Gemini 2.0 Flash via its API, an experimental Gemini 2.0 Pro version via its apps, Gemini 2.0 Flash Thinking, and 2.0 Flash-Lite in AI Studio
Gemini 2.0 AI updates include cheaper access for developers, and AI that can use other Google apps like YouTube.