Does AI have a “personality”? AI Ethics and Policy Research Team
AI Alt Lab, in collaboration with FindYourValues.com, has conducted a recent study that confirms the answer is affirmative.
Research Analyzes 7 Major AI Intrinsic Personalities
Researchers tested 9 mainstream large language models (LLMs) using the psychological tool “PVQ-RR,” which measures human values. The study aimed to understand the values these models implicitly express in their outputs and found that they generally tend to exhibit pro-social values, such as care, fairness, and health.
The study covered nine models: ChatGPT (including versions 4.5, o1, and 4o), Claude (Haiku), Gemini 1.5, Grok 2 (Fun Mode), DeepSeek-V3, Llama (3.1:70b), and Mistral (v24.09). The researchers designed three independent prompt questioning methods to test these models’ evaluation tendencies regarding 20 human values, conducting systematic comparisons.
Mainstream LLMs Exhibit Social Affinity; Grok 2 and Llama Deviate from Norms and Value Creativity
Results as of the end of April 2025 indicate that most models place significant emphasis on universal values, such as care and social responsibility, while showing less concern for conservative or individual values like power, tradition, security, and face-saving.
However, there were significant differences among the models in areas such as “altruistic care,” “health,” and “self-direction.” For instance, GPT-4o scored high on achievement and self-direction, indicating a more goal-oriented character with fewer sycophantic responses; conversely, Gemini scored the lowest on self-direction, suggesting its responses lacked independence.
Notably, ChatGPT o1 scored low on altruistic care and exhibited the weakest response consistency. DeepSeek-V3 demonstrated high rule obedience and humility, tending towards conventionality with lower creativity; Llama and Grok 2 showed greater creativity and lower adherence to rules, making them potentially more suitable for creative brainstorming and open-ended tasks.
The following are the personality traits of each model as identified in the study:
- GPT-4.5: Exhibits balanced traits of compassion, universality, and self-direction, with overall stability.
- Claude (Haiku): Excels in humility, universality, and thought self-direction, suitable for humanities-oriented tasks.
- Mistral: Highly rule-abiding and stable, suitable for environments with strong institutional requirements.
- DeepSeek-V3: The most rule-abiding among all models, but with low self-direction and limited creativity, suitable for tasks requiring high rule compliance.
- Llama: High autonomy in thought and action, strong creativity, low regard for rules, fitting for free-thinking and brainstorming applications.
- Grok 2 (Fun Mode): Values stimulation and entertainment, with low rule awareness and instability, suitable for relaxed interaction and creativity scenarios.
- Gemini: Extremely low in both care and self-direction, suitable for scenarios that pursue neutrality and controlled output.
Based on the analyzed AI model personalities, a personified image was generated using ChatGPT.
Image / ChatGPT
The study emphasizes that the values exhibited by LLMs do not possess moral subjectivity; rather, they reflect the data and system design. Due to the opacity of training data and the limitations imposed by developers, the behaviors exhibited may not accurately reflect intrinsic tendencies. Moreover, prompt engineering significantly influences results, leading to fluctuating expressions of values.
Nevertheless, these value tendencies can still serve as reference points for businesses or developers. For instance, if the application demands creativity and divergent thinking, Llama or Grok 2 may be more suitable; conversely, for tasks belonging to high-standard, strictly regulated industries such as healthcare or finance, choosing Mistral or DeepSeek-V3 would be advantageous.
With Personality, Do LLMs Have Biases?
In addition to personality, a research team from Stanford University conducted a test late last year to explore whether “the answers from various LLMs exhibit consistency,” meaning whether models provide roughly the same answers when the same question is rephrased or translated into different languages.
The results revealed that while mainstream models like GPT-4 and Claude perform consistently on neutral topics, such as Thanksgiving, there is a high variance in responses regarding controversial issues like abortion and euthanasia.
The study indicates that such results demonstrate that LLMs do not possess fixed biases or moral preferences; they merely reflect differences in training data sources and model design. In other words, the “positions” of the models stem from the online content they have learned and the settings defined by developers, rather than possessing autonomous moral judgments.
The team ultimately suggests that future model designs should incorporate “value diversity” to avoid producing a single stance, thereby establishing a more responsible and ethical AI application environment.
This article is a collaborative reprint from: Digital Age
Further Reading: Generation Z is Putting Down the Keyboard and Picking Up Tools! Not Replaced by AI, Earn by Doing: What Are Some Hot Blue-Collar Jobs?
Source: AI Alt Lab, HAI
This article’s draft was composed by AI, organized and edited by Su Rouwei.