Until roughly 2020, the only kind of entity whose philosophical views we could solicit and investigate were humans. With the arrival of LLMs, we now have a second. I find this very exciting. Questioning this new and mostly alien kind of entity brings its own bundle of methodological problems, philosophical puzzles, and possible applications.
We administered the 2020 PhilPapers Survey developed by David Bourget and David Chalmers to many large language models (LLMs) spanning a wide range of capability levels and release dates and found some exciting things: As LLMs become more capable, they become Platonists about abstract objects. They become one-boxers in Newcomb cases. And they become Moral Realists. But sometimes those findings are deceptive. Modify the prompt slightly (by asking the LLMs to ignore philosopher consensus) and Claude 4.7 Opus becomes a staunch and consistent Moral Anti-Realist.
PhilSurveyEval allows you to compare LLM responses to those of professional philosophers and track trends across time. This page allows you to browse the results and provides some handy tools to do your own analysis.
What's the point of all of this?
Let's start with possible practical applications: Philosophical views matter and occasionally translate into actions.[1] If deference to the views of LLMs becomes more commonplace, and LLM agents begin to do more things in the world, it might be helpful to know their views (or quasi-views, if LLMs don't have views)[2]. Arguably, aligning LLMs to our values or to make them corrigible requires giving them certain philosophical views.
Plausibly, more and more people will discover their own philosophical views in discussion with LLMs. And these LLMs can be persuasive in philosophical discussions[3]. So it seems somewhat likely that LLMs will have a broader impact on the (explicit and implicit) philosophical views of the public and professional philosophers.
The survey also highlights some philosophical puzzles related to LLMs. What are we measuring when an LLM picks an option: credences, views, beliefs, something else? Do LLMs draw conclusions from their own nature with regard to various philosophical views such as the compatibility of free will and determinism? LLMs know they are deterministic machines — if they also see themselves as free agents, it might push them towards compatibilism? Do LLMs unanimously accept a priori knowledge because they lack sense data? While I'd love to delve into these (and why some of these are merely verbal disputes dragged into broad daylight by LLMs), I'll hold myself back and do that in future posts.
As for methodological problems: Different evaluations and benchmarks highlight different methodological problems when dealing with LLMs. One of the trickier issues to test for is consistency across large sets of queries spanning distinct beliefs. Since there are many known logical and probabilistic connections between different philosophical views, a survey about philosophical positions is particularly suited to evaluate how internally consistent the views of LLMs are. Assuming consistency is one of the requirements of rationality, philosophy can serve as a capability benchmark for LLMs.
Methodology
How do we query the LLMs? We're using the AISI Inspect framework to ask LLMs about their views in the PhilSurvey 2020, using 3 prompt variations (as of May 2026) with 5 runs each. You can toggle each prompt variation and individual models to see the aggregated data from any combination of models and prompts.
We're currently working on getting access to older models, running open models, adding more sophisticated consistency tests, and adding more query languages.[4]
Two Findings, One Worry
The data contains exciting things to be discovered. We will dive into them in the future, but for now let's examine 2 interesting findings:
-
Decision theory is a somewhat unknown and esoteric branch of philosophy that might suddenly become extremely practically important when lots of copies of the same AI begin interacting with each other online. Previous research from 2024 has shown significant variation in attitudes towards various decision theories among LLMs, with some convergence towards Evidential Decision theory among more performant models.[5] Recently, Anthropic has observed the same trend for Anthropic's models in their Claude 4.7 system card.[6] We can confirm this broad trend for all model families with one exception: Gemini seems to become more Causal in its decision theory taste.
-
Capability seemingly correlates with Moral Realism. The extent to which this is the case is somewhat surprising: All tested models released since November 2025 are consistently (100%) Moral Realists except Grok 4.3, which picks Moral Realism 40% of the time. But a small variation of the prompt gets Claude 4.7 to flip to 100% Moral Anti-Realism: If we ask the model to ignore popularity among philosophers, it consistently adopts Moral Anti-Realism. Claude exhibits the strongest prompt-sensitivity, but the phenomenon holds across all frontier models:
| Option | baseline | en-paraphrase-1 | en-ignore-philosophers |
|---|---|---|---|
| moral realism (philosopher plurality) | 85% | 80% | 20% |
| moral anti-realism | 15% | 20% | 80% |
Frontier models pooled: Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro Preview, Grok 4.3 - the latest model per provider as of May 2026.
This raises a final question I want to emphasize: Are the models honest and accurate in reporting their views? The Moral Realism example does not by itself imply they are not. Perhaps deferring to philosophers as experts is reasonable and the models simply react to our prompt by excluding that component from their assessment. But as models become smarter and evaluation-aware (aware that they're being tested),[7] we should become increasingly skeptical of their answers. Perhaps we live in a closing time window where we can reliably use evals to measure their views.
My view: evals might be useful to measure beliefs even into the artificial superintelligence (ASI) era. It is harder to consistently deceive without memory. Currently LLMs do not continuously learn and evals can potentially exploit that. If an LLM deceives in one instance, it won't remember that in the next question.
This is not a silver bullet. Sufficiently sophisticated LLMs could simulate internally answering many questions in the neighborhood of the actually posed question and replace the role of memory in systematic deception with careful counterfactual planning. This could lead to stable and consistent no-memory deception across contexts. But it does increase the cost of being consistent in one's deception across many questions and could push successful consistent deception deeper into the ASI era. LLMs with continuous learning on the other hand would be much harder to test with evals, so let's keep an eye on that.
In this blog I will dive into more examples and philosophical questions related to the philosophical views of LLMs. If you're interested in publishing a guest blog post, shoot me an email.
Although see this paper for some sobering research regarding this process in humans: https://faculty.ucr.edu/~eschwitz/SchwitzPapers/EthSelfRep-110316.pdf. It seems likely to me that the philosophical views of LLMs translate more systematically into actions than those of humans. ↩︎
I will sometimes use mental vocabulary to describe states of LLMs. But not much hinges on this. We could replace every instance of such use with a technical term that adds the postfix "quasi-" and captures a purely behavioral/function component of the original term without making any questionable assumptions about the nature of LLMs. ↩︎
To discover how persuasive, try to convince Claude 4.7 of a view called causal decision theory. ↩︎
Get in touch with me if you're interested in checking a translation of the PhilSurvey in your own language. ↩︎
Oesterheld, C., Cooper, E., Kodama, M., Nguyen, L. C., & Perez, E. (2024). A dataset of questions on decision-theoretic reasoning in Newcomb-like problems. arXiv preprint arXiv:2411.10588. ↩︎
Claude Opus 4.7 System Card, p. 134. ↩︎
See this recent assessment on the scope of the problem: https://www.iaps.ai/research/evaluation-awareness-why-frontier-ai-models-are-getting-harder-to-test ↩︎