OpenAI has raised the bar yet again. While the news at the Spring Update event didn’t involve any forays into search engine territory, OpenAI won the hearts and minds of many with its new GPT-4o model. It’s fast, smooth, and with an improved Voice Mode, it looks eerily like the AI assistant from Spike Jonze’s 2013 film Her.
GPT-4o Voice Assistant is Finally Here and More AI Use Cases
But more importantly, it’s a big step forward in terms of smartphone voice assistants that ChatGPT is aiming to be and is now ideally suited for. Here’s everything you need to know about GPT, the voice mode upgrades on ChatGPT, and what they mean for the industry.
GPT-4o ('o' for omni) is the company's new flagship model and also the first model to combine text, vision and audio. It has GPT-4 level intelligence, but is faster and more efficient. In the previous version of Voice Mode, which worked with a mix of three models with different levels of intelligence, much of the core GPT-4 level intelligence was lost. This is where GPT-4o differs.
GPT-4o is the first model, trained end-to-end across the three modalities of text, vision, and audio, to exclusively control Voice Mode. And it shows. In one of the demos, people at OpenAI got ChatGPT on two phones to talk to each other and sing songs.