Can You Outscore AI? The Real IQ Scores of ChatGPT, Gemini, and Claude in 2026

Last month, a colleague sent me a link to an AI leaderboard and asked a question I have now been asked dozens of times in the past year: “So, is ChatGPT smarter than me?”

I looked at the numbers. GPT-5.2 Thinking had scored 141 on the Mensa Norway IQ test. Gemini 3 Pro Preview had scored 142. These scores place both AI systems above the 99.7th percentile of the human population, in the range traditionally labeled “genius.” Claude Opus 4.6 posted 133 on the same test, solidly in the “very superior” range.

My colleague, a perfectly intelligent physician with twenty years of clinical experience, looked at those numbers and felt, I think, genuinely unsettled. If a chatbot scores higher than 99.7 percent of humans on an intelligence test, what does that say about human intelligence? What does it say about IQ tests? And what does it say about you?

The answer, as someone who has spent three decades administering these tests to humans and now watching them administered to machines, is more interesting and more reassuring than the headlines suggest. But it requires understanding something that most coverage of AI IQ scores completely ignores: what IQ tests actually measure, why AI excels at exactly those measurements, and what enormous domains of intelligence AI still cannot touch.

The AI IQ Leaderboard: Where Things Stand

As of early 2026, AI systems have been tested on standardized IQ assessments by independent researchers, most notably the team at TrackingAI.org, which administers weekly IQ tests to major AI models using both public instruments and leak-resistant offline test sets. The results, updated regularly, paint a striking picture.

On the Mensa Norway IQ test, a publicly available 35-question matrix reasoning assessment, the current top scores are: Gemini 3 Pro Preview at 142, GPT-5.2 Thinking at 141, Claude Opus 4.6 at 133, Claude 4.5 Sonnet at approximately 127, and GPT-5.2 (standard) at roughly 125.

However, and this is a critical distinction that most reporting misses, these scores drop substantially on leak-resistant offline tests designed to eliminate the possibility that AI models have encountered similar problems during training. On the offline assessments, GPT-5.2 Thinking leads at 128, Grok-4 Expert Mode scores 123, Gemini 3 Pro Preview and GPT-5.2 Pro tie at 122, and Claude Opus 4.6 scores 116.

The gap between public and offline scores tells us something important. It suggests that a meaningful portion of AI performance on standard IQ tests reflects pattern-matching against problems the models have likely seen during training, rather than the kind of genuine novel reasoning that IQ tests are designed to measure in humans. When you remove that advantage, AI scores are still well above the human average of 100, but they are no longer in “genius” territory.

Why AI Dominates Matrix Reasoning

To understand why AI scores so high on IQ tests, you need to understand what most IQ tests, and particularly the Mensa Norway test, actually assess.

The Mensa Norway test is composed entirely of matrix reasoning problems: grids of abstract shapes with a missing element that the test-taker must identify by recognizing the underlying pattern. These problems are considered among the purest measures of fluid intelligence, the ability to reason abstractly and solve novel problems without relying on prior knowledge.

For humans, matrix reasoning requires holding multiple visual patterns in working memory simultaneously, identifying rules governing how shapes change across rows and columns, and applying those rules to predict the missing element. It is genuinely cognitively demanding, and performance on these tasks correlates strongly with general intelligence.

For AI systems, matrix reasoning plays directly into their core architectural strengths. Large language models are, at their foundation, pattern recognition engines trained on trillions of tokens of data. Identifying abstract patterns in structured arrangements is precisely what these systems are optimized to do. Moreover, matrix reasoning problems have a single correct answer that can be derived through systematic rule application, which is exactly the kind of convergent thinking that AI excels at.

The analogy I use with patients is this: testing an AI on matrix reasoning is somewhat like testing a fish on swimming. You are measuring the system on exactly the kind of task its architecture was designed to handle. The result tells you something real about that specific capability, but it tells you very little about the system’s broader “intelligence” in any meaningful sense.

But Can AI Actually Reason, or Just Pattern-Match?

This question cuts to the heart of the AI IQ debate, and researchers are actively fighting over the answer.

A 2023 study published in Nature Human Behaviour by Webb, Holyoak, and Lu found that GPT-4 demonstrated “emergent analogical reasoning” on certain problem types, solving abstract analogy tasks that required identifying relational similarity between novel stimuli. The researchers argued this suggested genuine reasoning capability, not merely pattern matching against training data.

However, the picture is more complex than that finding alone suggests. Francois Chollet, the creator of the ARC (Abstraction and Reasoning Corpus) benchmark, has argued since 2019 that standard IQ-style tests are fundamentally flawed measures of machine intelligence because they confuse skill, the ability to perform specific tasks, with intelligence, the ability to efficiently acquire new skills in novel domains. His framework defines intelligence as “the rate at which a learner turns experience and priors into new skills at tasks it has not been trained on.” By this definition, a system that scores 142 on a pattern recognition test it was effectively trained to solve is demonstrating skill, not intelligence.

The ARC-AGI benchmark was designed specifically to address this problem. It presents visual puzzles that require genuine abstraction from minimal examples, with tasks deliberately chosen to be outside any training distribution. The results are humbling for AI: as of early 2026, the best AI systems score roughly 37 to 45 percent on ARC-AGI-2, far below human-level performance. This suggests that when you strip away the advantage of training data overlap, the gap between human and artificial reasoning remains substantial.

The disconnect between IQ test scores and ARC scores reveals something profound: AI systems have become extraordinarily good at the specific type of reasoning that IQ tests measure, while remaining relatively poor at the broader, more flexible reasoning that characterizes human intelligence. Your brain does something that even the most powerful AI cannot yet replicate: it learns new concepts from a handful of examples and transfers that understanding to entirely unfamiliar situations. A child who has seen three cats can recognize a fourth cat in a costume. The most advanced AI in the world still struggles with this kind of robust generalization.

How AI Has Already Changed Your Cognitive Landscape

Before dismissing AI IQ scores as irrelevant parlor tricks, consider this: AI’s cognitive capabilities are already reshaping the skills that matter for human success.

In medicine, AI diagnostic systems now match or exceed human radiologists in detecting certain cancers from imaging scans. In law, AI tools can review thousands of documents in hours that would take human attorneys weeks. In software engineering, AI code generators can produce functional code from natural language descriptions, handling routine programming tasks that once required years of training.

This does not mean these professions are disappearing. It means the cognitive skills that differentiate valuable professionals from replaceable ones are shifting. In a world where AI handles routine pattern recognition, the human abilities that become most valuable are precisely the ones IQ tests do not measure: the ability to ask the right questions, to exercise judgment in ambiguous situations, to communicate complex ideas persuasively, to build trust with clients and colleagues, and to integrate information from multiple domains in creative ways.

Understanding your own cognitive profile, both the dimensions that IQ tests capture and the vast territory they miss, becomes a strategic advantage. You need to know what you do better than machines so you can invest your development time accordingly. A detailed cognitive report can give you exactly this kind of actionable self-knowledge: not just a number, but a map of your cognitive strengths and the areas where deliberate practice would yield the greatest returns.

What AI Cannot Do (That IQ Tests Never Measure)

Here is where the story becomes genuinely interesting, and where the fear that AI is “smarter than humans” reveals itself as a category error.

IQ tests, as I discussed in my previous article on what IQ tests actually measure, capture approximately 40 to 50 percent of what we colloquially call intelligence. They measure abstract reasoning, pattern recognition, processing speed, working memory, and verbal comprehension. They do not measure emotional intelligence, social cognition, practical wisdom, creativity in the divergent-thinking sense, common-sense physical reasoning, moral judgment, or the ability to navigate the messy, ambiguous, emotionally charged situations that constitute most of real life.

AI systems that score 140 on matrix reasoning tests cannot do any of the following:

They cannot read a room. They cannot sense that a client is anxious and adjust their communication accordingly. They cannot perceive the micro-expressions that signal deception, discomfort, or genuine enthusiasm. In my forensic work, the ability to detect when a defendant is being evasive, when a witness is confabulating, or when a family member is hiding information is essential to accurate assessment. No AI system, regardless of its IQ score, possesses anything resembling this capacity.

They cannot exercise judgment under genuine uncertainty. When I evaluate a patient whose test results are ambiguous, whose history contains contradictions, and whose presentation does not fit neatly into any diagnostic category, I draw on decades of clinical experience, pattern recognition built from thousands of cases, emotional intuition, and a willingness to sit with uncertainty until a coherent picture emerges. AI systems generate confident-sounding outputs regardless of the quality of their inputs. They do not know what they do not know.

They cannot generate genuinely novel ideas. AI produces text by predicting the most probable next token given its training data. It can recombine existing patterns in ways that appear creative, but it cannot have the kind of conceptual breakthrough that comes from a human mind wrestling with a problem for months or years and suddenly seeing it from an entirely new angle. There is no AI equivalent of Einstein sitting on a tram in Bern and imagining what it would be like to ride a beam of light.

They cannot form relationships. The ability to build trust, maintain rapport, navigate conflict, repair ruptures, and sustain meaningful connections over time is not merely a “soft skill.” It is a profound cognitive achievement that integrates emotional regulation, theory of mind, memory, prediction, and value-based judgment. No AI system does anything resembling this.

The Uncomfortable Truth About AI IQ Tests

There is a deeper problem with AI IQ scores that most coverage fails to address: the tests were not designed for AI, and applying them to AI violates fundamental assumptions of psychometric measurement.

Human IQ tests are normed on human populations. A score of 100 means you performed at the 50th percentile of people your age. A score of 130 means you outperformed 97.7 percent of people your age. These norms are meaningful because humans share a common cognitive architecture: we all have biological brains with similar working memory limitations, similar processing speed constraints, and similar attentional bottlenecks. IQ scores describe where you fall within the natural distribution of human cognitive variation.

AI systems do not share this architecture. They do not have working memory in the human sense. They do not fatigue. They do not experience test anxiety. They process information through fundamentally different mechanisms than biological neurons. Assigning them an “IQ score” based on human norms is a bit like measuring a car’s “running speed” using a track-and-field stopwatch. You will get a number, and that number will be higher than Usain Bolt’s. But it does not mean the car is “faster” in any way that is meaningful for understanding human athletic performance.

The Mensa Norway test was designed to differentiate among humans in the upper ranges of cognitive ability. It was not designed to assess artificial systems that process information through statistical pattern matching across trillion-parameter neural networks. When Gemini scores 142 on this test, it is not demonstrating “genius-level intelligence.” It is demonstrating that its architecture is very good at the specific type of pattern recognition these problems require.

What This Actually Tells Us About Human Intelligence

Paradoxically, the AI IQ story may tell us more about the limitations of IQ tests than about the capabilities of AI.

For over a century, psychologists have debated what IQ tests truly measure. The positive manifold, the fact that all cognitive tests correlate positively with each other, has been interpreted as evidence for a general intelligence factor. But the success of AI on IQ tests raises a provocative question: if a system with no consciousness, no understanding, no lived experience, and no emotional life can score 140 on an IQ test, perhaps the tests are measuring something narrower than “general intelligence.”

What AI performance on IQ tests reveals is that a significant component of what these tests measure, pattern recognition in structured abstract problems, is achievable through pure computation without anything resembling human understanding. This does not mean IQ tests are useless for humans. For humans, the ability to solve these problems does correlate with broader cognitive capabilities because in humans, pattern recognition is integrated with a whole constellation of other abilities: memory, learning, language, social cognition, and goal-directed behavior.

But it does mean that an IQ score, whether human or artificial, is a measure of a specific capability, not a measure of “how smart” the entity is in any global sense. This has been the position of thoughtful psychometricians for decades, but AI performance has made the argument impossible to ignore.

The Competitive Instinct: Why “Can You Beat AI?” Matters

Despite all these caveats, I understand perfectly why people want to know how their IQ compares to ChatGPT’s. The competitive instinct is one of the most powerful motivators in human psychology, and the idea of being “smarter than the machine” touches something deep in our sense of identity and worth.

And here is the thing: most people may be closer to AI scores than they think, and in many important ways, they are already far ahead.

If we take the leak-resistant offline scores, which are more reflective of genuine reasoning ability, the top AI systems score in the range of 116 to 128. This is above average but well within the range that many college-educated adults achieve. If you have completed a university degree, you are statistically likely to have an IQ somewhere in the 105 to 120 range. You are not competing against a score of 142. You are competing against a score of 116 to 128 on a test designed to play to AI’s greatest strengths.

On every dimension of intelligence that IQ tests do not capture, including emotional understanding, social navigation, creative insight, practical wisdom, moral reasoning, and the ability to function in an unpredictable physical world, you are not competing with AI at all. You are operating in domains where AI has not even reached the starting line.

So What Is Your IQ?

If there is one takeaway from the AI IQ phenomenon, it is this: understanding your own cognitive profile has never been more valuable than it is right now.

In a world where AI can match or exceed human performance on narrow cognitive tasks like pattern recognition and information retrieval, the distinctively human cognitive abilities, the ones IQ tests partially capture and partially miss, become more important, not less. Your verbal reasoning, your ability to synthesize information from disparate domains, your capacity for critical thinking about ambiguous situations, your emotional intelligence, your creativity, and your judgment are precisely the abilities that differentiate you from the machine.

Knowing where you stand on the dimensions that IQ tests do measure gives you a baseline for understanding your cognitive architecture. Are you stronger in verbal reasoning or spatial processing? Is your working memory a strength you can leverage or a limitation you need to compensate for? Do you process information quickly but sometimes sacrifice accuracy, or slowly but with greater depth?

These are not academic questions. They are practical tools for navigating a world that increasingly requires you to understand what you do better than AI and what AI does better than you, so you can focus your energy where it matters most.

If you have ever wondered where you fall on the spectrum, or if the AI headlines have made you curious about your own cognitive capabilities, a well-designed IQ assessment can give you answers that go far beyond a single number. The frequently asked questions about our IQ test explain exactly what the test measures, how it works, and what your results can tell you about your unique cognitive profile.

Because in a world where machines score 140 on pattern recognition tests, understanding the full landscape of your intelligence, the parts the test captures and the vast territories it misses, is not just interesting. It is essential.

What the Future Holds

AI IQ scores will continue to climb. Each quarter brings new model releases, and each release pushes scores incrementally higher on standardized assessments. Within a year or two, it is plausible that AI systems will reach the theoretical ceiling of standard IQ tests, scoring perfectly on every item.

This will not mean that AI has achieved human-level intelligence. It will mean that AI has maxed out one particular measurement instrument, an instrument that was designed to differentiate among humans within a specific range of cognitive variation. It is like a weightlifter maxing out a bathroom scale. The scale cannot tell you anything meaningful once the needle hits its limit.

The more important story is not the AI scores themselves but what those scores reveal about the nature of intelligence. A comprehensive review in Frontiers in Psychology analyzing decades of meta-analytic data on general mental ability confirmed that while cognitive ability tests predict real-world outcomes with meaningful validity, they are far from capturing the full picture of what makes a person effective, successful, or wise. The review documented that even in the domain where IQ tests predict most powerfully, job performance, the corrected validity coefficient ranges from 0.39 to 0.68 depending on job complexity, leaving enormous variance unexplained by cognitive ability alone. They reveal that intelligence is not a single dimension that can be captured by a single number. They reveal that pattern recognition, while valuable, is only one component of the cognitive toolkit that makes humans effective in the world. And they reveal that the qualities that make us most distinctively human, our capacity for empathy, meaning, creativity, judgment, and connection, are not the kinds of things that can be reduced to a test score, whether the test-taker is human or artificial.

Your IQ score tells you something real about your cognitive capabilities. But you are so much more than your score, and you are so much more than any machine that happens to outscore you on a matrix reasoning test.

The machines are getting better at pattern recognition. The question worth asking is: what are you getting better at?