Key Point 1:
In the AI reading test, Claude achieved the highest score with a stable performance free of “hallucinations,” closely followed by ChatGPT; however, the overall AI scores were relatively low.
Key Point 2:
Different AI systems exhibited varying levels of understanding across literary, legal, scientific, and political domains, with inconsistent performance.
Key Point 3:
Experts believe that AI currently cannot replace human reading, especially in handling important documents, and can only serve as an auxiliary tool.
Which AI Reads Best?
As of 2025, generative AI has introduced numerous features focused on data integration, such as Google’s Notebook LM and various Deep Research capabilities. These functions rely on the AI models’ “reading ability” and their reasoning capabilities after data input.
Regarding the reading capabilities of five mainstream AI models, The Washington Post’s test results indicated that Claude, developed by Anthropic, performed the best, achieving the top overall score and being the only AI that did not exhibit any “hallucinations” (where AI fabricates information). OpenAI’s ChatGPT came in second.
To conclude, irrespective of the score ratings, The Washington Post’s test results show that current AI still has significant deficiencies in deep understanding and analysis, with an overall average score of only about 70%, equivalent to a D+ in academic grading, indicating that AI’s reading comprehension abilities have considerable room for improvement.
AI’s Strengths Vary: Claude Excels in Law, ChatGPT in Literature
The Washington Post evaluated five AIs, including Claude, ChatGPT, Copilot, Meta AI, and Google’s Gemini. The testing covered four major fields: literary fiction, legal contracts, medical research, and political speeches, and experts in each field conducted blind evaluations of the AI responses. The results are as follows:
Literary Field: ChatGPT 7.8; Claude 7.3; Meta AI 4.3; Copilot 3.5; Gemini 2.3.
Legal Field: Claude 6.9; Gemini 6.1; Copilot 5.4; ChatGPT 5.3; Meta AI 2.6.
Health Sciences Field: Claude 7.7; ChatGPT 7.2; Copilot 7; Gemini 6.5; Meta AI 6.
Political Field: ChatGPT 7.2; Claude 6.2; Meta AI 5.2; Gemini 5; Copilot 3.7.
Overall Scores:
Claude: 69.9
ChatGPT: 68.4
Gemini: 49.7
Copilot: 49
Meta AI: 45
In summary, Claude edged out ChatGPT by a narrow margin, while Gemini, Copilot, and Meta AI fell below the 50-point rating. Notably, Claude was the only AI that did not produce hallucinations.
The documents tested included the literary work “The Jackal’s Mistress,” medical papers on COVID-19 and Parkinson’s disease, a leasing agreement, and a construction contract in the legal field, as well as Trump’s speech documents in the political domain.
The results demonstrated significant discrepancies in AI performance across different professional fields. For instance, ChatGPT excelled in literature and politics but lagged in understanding legal documents; Claude achieved the highest scores in legal and health sciences.
However, even the best-performing Claude did not score at the top in the literary domain, and Gemini’s performance in literary comprehension was criticized by reviewers as “inaccurate, misleading, and hasty,” giving an impression of attempting to gloss over shortcomings.
It is worth noting that, apart from Claude, the other four AIs exhibited varying degrees of information fabrication during testing. This confirms that AI’s ability to read long texts remains limited, leading to issues where generated summaries often omit crucial information or overly emphasize positive content while neglecting negative details.
Note 1: The original test was conducted from April to May 2025, using the following AI versions: ChatGPT-4o, Gemini 2.0 Flash, Claude 3 Sonnet, Llama 4, Copilot for Microsoft 365.
Note 2: Evaluators scored each AI response on a 10-point scale, with scores in each subject area being the average of all ratings. The total score was presented on a 100-point scale, with equal weight assigned to the four subject areas.
Expert Summary: AI Cannot Replace Human Reading
Despite some AIs demonstrating impressive capabilities in specific analytical tasks, such as ChatGPT’s summaries of novels and reviews or Claude’s suggestions for revising legal documents and insights for medical papers, experts overall maintain a cautious stance regarding current AI’s reading comprehension abilities.
For example, corporate lawyer Sterling Miller, who participated in the evaluation, noted that AI’s performance in handling legal documents is not stable enough to replace professional lawyers; novelist Chris Bohjalian remarked that AI responses sometimes resemble “robots wearing human masks,” pretending to understand but actually not doing so.
The journalist who led the test suggested that if using AI to assist with reading, it is best to use at least two tools for comparison and that important documents concerning personal interests should still be carefully read personally.
In summary, AI can currently serve as an auxiliary tool, such as helping to quickly grasp new topics or interpret specialized terminology, but one should not fully rely on its results.
This article is collaboratively reprinted from: Digital Times
Further Reading:
Is Chunghwa Telecom untrustworthy? An article analyzing the key points of “Google revoking its certificate”: revealing three major management deficiencies behind it.
Strengthening the key puzzle of e-commerce logistics: What is third-party logistics? A look at the competitive strengths and performance of four convenience store pickup options.
Editor-in-chief: Li Xiantai
This article’s initial draft was authored by AI, organized and edited by: Li Xiantai
Source: The Washington Post