OpenAI's new model GPT-4o makes its debut, capable of understanding facial expressions and tone, enhancing interaction with "humans".

OpenAI Takes the Spotlight with the Launch of GPT-4o

Just a day before Google I/O, OpenAI stole the show by unveiling their latest model, GPT-4o. While possessing the same level of intelligence as GPT-4, the new model boasts even more powerful abilities in speech and video processing, offering users an experience close to interacting with real humans.

The uniqueness of GPT-4o can be glimpsed from its name itself, with “o” standing for omni, meaning “all” or “everything” in Chinese. This signifies the model’s ability to transcend text, audio, and video reasoning. “We’re excited to announce the release of GPT-4o, our new flagship model capable of real-time inference for audio, video, and text,” OpenAI stated in a press release.

Further reading:
Worldcoin might partner with OpenAI and PayPal! Will this trigger more regulatory concerns?

Approaching human response capabilities, “like AI in movies”

While GPT-4 can also recognize images and perform text-to-speech conversions, OpenAI previously divided these functions among different models, resulting in longer response times. GPT-4o, on the other hand, integrates all these features into one model, known as omnimodel. Compared to its predecessor, GPT-4 Turbo, GPT-4o performs similarly in English and programming languages, but exhibits significant performance improvements in languages other than English. Additionally, the API is faster and costs up to 50% less.

OpenAI has stated that GPT-4o achieves human-like response times, providing users with a more natural conversational experience. It can respond to questions within a minimum of 232 milliseconds and an average of 320 milliseconds. In comparison, GPT-3.5 and GPT-4 had response times of 2.8 seconds and 5.4 seconds, respectively, in voice mode.

In OpenAI’s demonstration, GPT-4o was able to provide real-time translation, enabling smooth conversations between individuals speaking different languages.

Image / YouTube

During OpenAI’s demonstration, GPT-4o showcased its ability to provide real-time translation, facilitating seamless conversations between individuals speaking different languages. It can also deliver bedtime stories with a more expressive and lively voice, or teach people how to solve simple math problems in a conversational manner that closely resembles human speech.

According to OpenAI, GPT-4o can “understand” the user’s expressions and tone, knowing how to respond and quickly switch between different tones. It can sound mechanical one moment and then sing with liveliness in the next. Mira Murati, Chief Technologist at OpenAI, explained that the development of GPT-4o was inspired by human conversational processes. “When you stop talking, it’s my turn to speak. I can understand your tone and respond. It’s that natural, rich, and interactive,” she said.

“The new speech (and video) models are the best computer interfaces I’ve seen, like AI in movies,” added Sam Altman, CEO of OpenAI, in a blog post. “I’m a little stunned by how real this is and how much the response time and expressive capability at human levels have changed.”

Although the demonstration was not flawless, MIT Technology Review noted that GPT-4o occasionally interrupted people or made unsolicited comments about a host’s attire. However, it quickly resumed normal behavior after being corrected by the demonstrator.

Murati revealed that with the power of omnimodel, future advancements in GPT technology will include explaining game rules to users after watching sports broadcasts, going beyond simple tasks like translating text in images.

OpenAI stated that GPT-4o will be available to users in the free version, while paid subscribers will enjoy a 5x increase in message limits for the free version. The subscription-based voice service based on GPT-4o is expected to be available for testing by users next month. The fact that GPT-4o is offered for free reflects OpenAI’s success in reducing costs.

However, due to concerns about potential misuse, OpenAI mentioned that the voice function will not be immediately available to all API users. Instead, it will initially be provided to selected trusted partners in the coming weeks.

ChatGPT Desktop App Launches, GPT Store Opens for Free

Alongside the significant enhancements in speech and video capabilities with GPT-4o, OpenAI also announced an updated ChatGPT UI for web users, claiming to offer a more conversational main interface and message presentation. Murati emphasized that despite the increasing complexity of the model, she hopes to make the user’s interaction with AI simpler, clearer, and more effortless, allowing users to focus on collaborating with ChatGPT instead of worrying about the UI.

OpenAI also introduced the ChatGPT desktop app, which will initially be available for macOS, with a Windows version scheduled to launch later this year. It is worth mentioning that news of OpenAI’s negotiations with Apple for AI technology collaboration coming to a close led to speculations when the macOS version app was released ahead of other platforms.

OpenAI announces the macOS version of the ChatGPT application.

Image / OpenAI

Furthermore, OpenAI has made the GPT Store, launched earlier this year, freely accessible to all users. This platform allows developers to customize various chatbots and make them available in the store for others to use. Free users will now have access to specific features that were previously exclusive to paid users.

Sources:
OpenAI, TechCrunch, MIT Technology Review

OpenAI’s new model GPT-4o makes its debut, capable of understanding facial expressions and tone, enhancing interaction with “humans”.

The Trump Coin Makes a Resounding Comeback: Propelling Cryptocurrencies to Greatness Once Again

【Perspective】As Cryptocurrency Investments Gain Prominence, We Must Reconsider “Long-Term Investing,” “Short-Term Trading,” “Volatility,” “Greed,” and “Fear”

Comparison of Market Capitalization and Potential Disappearance Among TWD, HKD, and BTC

Opinion Global Warming and Climate Anomalies How Can Blockchain Support Taiwans Carbon Neutrality Policy

How Meme Coins Helped Solana Turn the Tide: What is the Officially Promoted Time.fun?

Tired of Being a Retail Investor? How Does Dub’s “One-Click Copy Trading” Service Work with Pelosi and the Federal Reserve?

What are the differences after the official launch of Holesky Testing Network and the countdown to Ethereum’s “Pectra Upgrade”?

How North Korea Cultivates World-Class Hackers: The Case of the $1.5 Billion Cryptocurrency Theft

Popular Posts

Exclusive Interview: The Key Driver Behind LINE’s “Mini Programs” – How Kaia Seamlessly Guides Over a Billion Users into the Web3 World?

Who Profits and Who Loses from Trump Memecoins? Four Key Principles for Successful Traders Amid Market Volatility in 2025.