Which AI Excels at Reading? The Champion is "Not ChatGPT": A Showdown of 5 Mainstream AIs, with Only "One" Not Experiencing Hallucinations.

Key Point 1:

In the AI reading test, Claude achieved the highest score with a stable performance free of “hallucinations,” closely followed by ChatGPT; however, the overall AI scores were relatively low.

Key Point 2:

Different AI systems exhibited varying levels of understanding across literary, legal, scientific, and political domains, with inconsistent performance.

Key Point 3:

Experts believe that AI currently cannot replace human reading, especially in handling important documents, and can only serve as an auxiliary tool.

Which AI Reads Best?

As of 2025, generative AI has introduced numerous features focused on data integration, such as Google’s Notebook LM and various Deep Research capabilities. These functions rely on the AI models’ “reading ability” and their reasoning capabilities after data input.

Regarding the reading capabilities of five mainstream AI models, The Washington Post’s test results indicated that Claude, developed by Anthropic, performed the best, achieving the top overall score and being the only AI that did not exhibit any “hallucinations” (where AI fabricates information). OpenAI’s ChatGPT came in second.

To conclude, irrespective of the score ratings, The Washington Post’s test results show that current AI still has significant deficiencies in deep understanding and analysis, with an overall average score of only about 70%, equivalent to a D+ in academic grading, indicating that AI’s reading comprehension abilities have considerable room for improvement.

AI’s Strengths Vary: Claude Excels in Law, ChatGPT in Literature

The Washington Post evaluated five AIs, including Claude, ChatGPT, Copilot, Meta AI, and Google’s Gemini. The testing covered four major fields: literary fiction, legal contracts, medical research, and political speeches, and experts in each field conducted blind evaluations of the AI responses. The results are as follows:

Literary Field: ChatGPT 7.8; Claude 7.3; Meta AI 4.3; Copilot 3.5; Gemini 2.3.

Legal Field: Claude 6.9; Gemini 6.1; Copilot 5.4; ChatGPT 5.3; Meta AI 2.6.

Health Sciences Field: Claude 7.7; ChatGPT 7.2; Copilot 7; Gemini 6.5; Meta AI 6.

Political Field: ChatGPT 7.2; Claude 6.2; Meta AI 5.2; Gemini 5; Copilot 3.7.

Overall Scores:

Claude: 69.9
ChatGPT: 68.4
Gemini: 49.7
Copilot: 49
Meta AI: 45

In summary, Claude edged out ChatGPT by a narrow margin, while Gemini, Copilot, and Meta AI fell below the 50-point rating. Notably, Claude was the only AI that did not produce hallucinations.

The documents tested included the literary work “The Jackal’s Mistress,” medical papers on COVID-19 and Parkinson’s disease, a leasing agreement, and a construction contract in the legal field, as well as Trump’s speech documents in the political domain.

The results demonstrated significant discrepancies in AI performance across different professional fields. For instance, ChatGPT excelled in literature and politics but lagged in understanding legal documents; Claude achieved the highest scores in legal and health sciences.

However, even the best-performing Claude did not score at the top in the literary domain, and Gemini’s performance in literary comprehension was criticized by reviewers as “inaccurate, misleading, and hasty,” giving an impression of attempting to gloss over shortcomings.

It is worth noting that, apart from Claude, the other four AIs exhibited varying degrees of information fabrication during testing. This confirms that AI’s ability to read long texts remains limited, leading to issues where generated summaries often omit crucial information or overly emphasize positive content while neglecting negative details.

Note 1: The original test was conducted from April to May 2025, using the following AI versions: ChatGPT-4o, Gemini 2.0 Flash, Claude 3 Sonnet, Llama 4, Copilot for Microsoft 365.

Note 2: Evaluators scored each AI response on a 10-point scale, with scores in each subject area being the average of all ratings. The total score was presented on a 100-point scale, with equal weight assigned to the four subject areas.

Expert Summary: AI Cannot Replace Human Reading

Despite some AIs demonstrating impressive capabilities in specific analytical tasks, such as ChatGPT’s summaries of novels and reviews or Claude’s suggestions for revising legal documents and insights for medical papers, experts overall maintain a cautious stance regarding current AI’s reading comprehension abilities.

For example, corporate lawyer Sterling Miller, who participated in the evaluation, noted that AI’s performance in handling legal documents is not stable enough to replace professional lawyers; novelist Chris Bohjalian remarked that AI responses sometimes resemble “robots wearing human masks,” pretending to understand but actually not doing so.

The journalist who led the test suggested that if using AI to assist with reading, it is best to use at least two tools for comparison and that important documents concerning personal interests should still be carefully read personally.

In summary, AI can currently serve as an auxiliary tool, such as helping to quickly grasp new topics or interpret specialized terminology, but one should not fully rely on its results.

This article is collaboratively reprinted from: Digital Times

Which AI Excels at Reading? The Champion is “Not ChatGPT”: A Showdown of 5 Mainstream AIs, with Only “One” Not Experiencing Hallucinations.

South Korean President Convenes Four Major Banks to Discuss Won-Backed Stablecoin with Circle, Aiming for Passage of Legislation in October

The Ultimate Showdown in Payments: Stripe vs. Circle – Who Will Secure the “Money Highway” in the AI Era?

Wall Street’s Blockchain “Crash Course”: PayPal and BlackRock Are Involved! Understanding the Present Activities of Three Major Financial Giants

Unlocking Financial Freedom through RWA: Can the Dream of Homeownership be Achieved with 50,000 Despite Soaring Property Prices?

Is Ethereum Facing Decline Within the Next Decade? A Ultimate Showdown Between Cardano’s Founder and Vitalik Buterin: Who Will Prevail?

[Perspective] The Turmoil of the Tariff War: Is Bitcoin a Hedge Against the Dollar Crisis?

【Perspective】The Applications of AI Are Easily Comprehensible, While the Significance of Web3 Remains Ambiguous

Considering Buying a Solana or XRP ETF? Hold On! SEC Hits the Brakes, Key Decisions Postponed Until October

Wall Street’s Blockchain “Crash Course”: PayPal and BlackRock Are Involved! Understanding the Present Activities of Three Major Financial Giants

HTC Enters the AI Glasses Market: VIVE Eagle “Traditional Chinese, Purely Made in Taiwan” Priced at 15,000 TWD – What Features Does It Offer?

MetaMask Set to Launch Its Token? Reports Indicate Announcement of Stablecoin mUSD This Week, with Possible Launch by the End of August.

Popular Posts