OpenAI has launched GPT-4o, the latest iteration in its GPT series, marking a significant leap in AI. GPT-4o, with “o” signifying “omnimodal,” ushers in a new era of human-computer interaction by accepting and generating combinations of text, audio, and image inputs and outputs.
The large language model (LLM) race is heating up. While OpenAI is working diligently to stay relevant, Anthropic, a company founded by former OpenAI researchers, is reportedly close to launching its AI assistant Claude on iPhones.
To stay at the forefront of the competitive field of AI, OpenAI, the maker of the widely popular ChatGPT AI bot, announced the upcoming release of GPT-4o on Monday. The new models boast impressive capabilities, including realistic voice conversation and seamless text and visual data interaction.
OpenAI’s livestream showcased GPT-4o’s groundbreaking audio features. Users can now converse with ChatGPT in real-time, seamlessly interrupting it mid-sentence, just like in natural dialogue. This feature eliminates delays and facilitates a truly natural human-computer interaction.
“It feels like AI from the movies … Talking to a computer has never felt natural for me; now it does,” OpenAI CEO Sam Altman wrote in a blog post.
In another demonstration, researchers showcased GPT-4o’s impressive ability to translate languages in real-time. This further highlights the model’s versatility across different tasks. The OpenAI researcher, clearly impressed, complimented the chatbot for demonstrating “how useful and amazing you are.”
In a human-like response, ChatGPT responded: “Oh, stop it! You’re making me blush!”
Altman later posted “her” on X, highlighting the progress, possibly referencing the 2013 film “Her” about a man in love with his AI assistant.
During the presentation, OpenAI’s CTO, Mira Murati, highlighted GPT-4o’s impressive response times to audio inputs, clocking in as fast as 232 milliseconds – rivalling human conversational speed.
GPT-4o Performance and Accessibility
While retaining its predecessor, GPT-4 Turbo’s prowess in English text and code processing, GPT-4o makes significant strides in understanding non-English languages. The model surpasses its rivals in vision and audio comprehension.
Additionally, GPT-4o operates at double the speed to offer a 50 per cent cost reduction and boasts five times higher API rate limits than its predecessor. Murati described GPT-4o as a “game-changer” for voice interaction.
Previously, limitations in ChatGPT’s Voice Mode, reliant on a complex multi-model system, caused noticeable delays and hindered its ability to capture subtle cues like tone and background noise.
GPT-4o eliminates these issues by implementing a unified model for all communication channels (text, vision, and audio). This streamlined approach allows the model to interpret and generate responses with greater nuance, fostering more natural and engaging interactions.
The Altman-led AI startup emphasizes safety as a core principle in GPT-4o. The model underwent rigorous evaluations across multiple benchmarks to confirm its proficiency in handling multiple languages, audio, and visual data.
Model Evaluation, Rollout, and Developer Access
This focus on safety extends across all modalities, with built-in safeguards and innovative voice output systems designed to promote responsible use. Moreover, OpenAI mitigated potential risks, primarily related to the new audio features, by conducting extensive red teaming exercises with over 70 external security researchers.
OpenAI is taking a phased approach to release GPT-4o’s functionalities. Initially, users can access text and image capabilities through the existing ChatGPT interface. This functionality is free, with increased message limits for Plus subscribers. Developers can also use the API to experiment with GPT-4o’s text and vision capabilities.
Furthermore, OpenAI plans to introduce limited support for audio and video functionalities in future updates. Despite its impressive capabilities, GPT-4o has limitations across all modalities. However, OpenAI is actively working on improvements.
The initial rollout of audio outputs will offer a curated selection of voices to ensure adherence to safety protocols. OpenAI remains committed to continuous risk mitigation and transparency. In the future, they plan to release comprehensive system cards detailing GPT-4o’s full potential.
According to a 2023 report, OpenAI might be accelerating the release of GPT-5, potentially arriving earlier than its usual schedule.