Six Types of Hard Problems: A Framework for What AI Can and Can’t Automate

Every time a new AI model drops, the same conversation erupts: is it smarter than the last one? Will it take my job? Which benchmark did it crush?

These are the wrong questions. They treat intelligence as a single axis — more or less of it — when the reality is far more nuanced. The difficulty of a problem is not a single dimension. It is at least six distinct dimensions, and AI is automating them on wildly different timelines.

Understanding this framework will not just change how you evaluate the next model release. It will change how you evaluate your own career.

The Myth of Monolithic Intelligence

When Google released Gemini 3.1 Pro in February 2026 and it led on 13 of 16 head-to-head benchmarks against Anthropic’s Claude Opus 4.6 and OpenAI’s o3-ultra, the natural reaction was to declare it “the smartest model.” On ARC AGI-2 — a test specifically designed to measure novel reasoning that cannot be solved through pattern matching — Gemini 3.1 Pro scored 18.2%, nearly doubling the previous best score of 9.8% in a single generation. The largest single-generation jump in novel reasoning capability ever measured.

Impressive. But here is the uncomfortable follow-up question: does that make it the best model for your work?

Almost certainly not. Because your work is probably not bottlenecked by novel reasoning. It is bottlenecked by something else entirely — coordination, ambiguity, domain expertise, sustained effort, or emotional intelligence. And these are all different kinds of hard, being automated on completely different schedules.

The critical mistake most professionals and most organizations make is evaluating AI capability as a monolith. “Is AI getting smarter?” is as imprecise as asking “is this athlete getting better?” without specifying whether you mean their sprint speed, their endurance, their strategic thinking, or their ability to perform under pressure. A chess grandmaster and a trauma surgeon are both brilliant. Their brilliance has almost nothing in common.

The same is true for the problems AI is learning to solve.

The Six Types of Hard

Type 1: Reasoning Hard

Can you solve this mathematical proof? Can you find the logical flaw in this legal argument? Can you trace through a complex distributed system to locate the bug?

Reasoning-hard problems require sustained chains of logical inference. They demand the ability to hold multiple variables in mind simultaneously, follow implications across many steps, and arrive at conclusions that are verifiably correct or incorrect.

Automation timeline: Now. This is the type of difficulty that current AI models are automating fastest. Gemini 3.1 Pro’s Deep Think capability — extended reasoning chains where the model works through a problem step by step, sometimes for minutes, before producing an answer — represents a step change in pure reasoning. The model has disproved open mathematical conjectures that professional mathematicians had been unable to resolve and caught errors in published peer-reviewed scientific papers.

If your value to your organization is primarily that you reason more rigorously than your colleagues, you are in the category most directly challenged by current AI advances.

Type 2: Effort Hard

Can you review 500 contracts looking for the same liability clause? Can you process 10,000 data entries checking for inconsistencies? Can you maintain focus for eight hours on a task that is tedious but not intellectually demanding?

Effort-hard problems are not cognitively complex. They are volumetrically demanding. The difficulty is sustaining attention and accuracy across repetitive execution at scale.

Automation timeline: Now, via agentic models. The current generation of agentic AI systems — models that can sustain work for hours or days on extended tasks — is built precisely for this. Agentic models like Opus 4.6 and OpenAI’s Codex are purpose-built for sustained effort across long tasks: code generation across entire repositories, document review at scale, data processing pipelines that run for hours.

If your work is primarily effort-hard, the automation is not coming. It is here.

Type 3: Coordination Hard

Can you align twelve stakeholders across four departments to agree on a product specification? Can you manage dependencies across three engineering teams working on interconnected systems? Can you navigate the competing priorities of a matrix organization to get a project funded?

Coordination-hard problems require understanding multiple perspectives, managing information asymmetries, negotiating trade-offs between parties with different incentives, and maintaining alignment across time as circumstances change.

Automation timeline: Early stages, slow progress. Multi-agent AI systems are beginning to tackle coordination problems — orchestrating workflows across multiple AI agents, managing handoffs, resolving conflicts between agent outputs. But coordination in organizations involves politics, trust, implicit agreements, and social dynamics that current AI systems barely model. A multi-agent system can coordinate task execution. It cannot navigate the unspoken dynamics of a cross-departmental budget negotiation.

Type 4: Domain Expertise Hard

Can you diagnose this rare autoimmune condition given an atypical symptom presentation? Can you evaluate this M&A deal given your knowledge of how regulators in this specific jurisdiction have ruled on similar transactions? Can you predict how this composite material will behave under sustained thermal cycling?

Domain expertise hard requires knowledge accumulated through years of practice — not just information, but the pattern recognition that comes from having seen hundreds of cases, made mistakes, and developed intuition that goes beyond what is written in textbooks.

Automation timeline: Slow, with a persistent gap. AI models are getting better at simulating domain expertise through training data. A frontier model can discuss cardiology or contract law with impressive fluency. But there is a meaningful gap between “has read about it” and “has lived it.” A senior engineer does not debug faster because they reason better than a junior engineer. They debug faster because they have seen that exact stack trace before, they know the library’s undocumented quirks, and they remember the production incident from 2019 that had the exact same root cause.

A veteran M&A attorney does not evaluate deals better because they are smarter. They evaluate them better because they have closed 300 deals and internalized which representations and warranties actually get litigated versus which ones are boilerplate that nobody enforces.

This “lived it” knowledge is being slowly absorbed into training data, but the gap remains real — particularly in domains with thin published literature, where the most valuable knowledge exists as institutional memory and professional intuition rather than written text.

Type 5: Ambiguity Hard

Can you determine what the client actually wants when they say they want it to “feel more premium”? Can you decide which product feature to build next when market signals are contradictory? Can you make a hiring decision when two candidates are strong in completely different dimensions?

Ambiguity-hard problems are fundamentally different from the preceding types. The difficulty is not computational. It is that there is no objectively correct answer. The best response depends on values, context, organizational strategy, stakeholder dynamics, and judgment calls that cannot be resolved through more information or better reasoning.

Automation timeline: Very slow. This is where human judgment remains most irreplaceable. AI systems can generate options, surface relevant data, and even articulate trade-offs. But the act of making a judgment call under genuine uncertainty — where reasonable people would disagree, where the answer depends on values rather than logic — remains stubbornly human.

It is also, not coincidentally, where the most valuable knowledge work happens. The decisions that shape organizations are not the ones with clear right answers. They are the ones that require navigating ambiguity.

Type 6: Emotional Intelligence Hard

Can you tell that a team member is approaching burnout even though they insist they are fine? Can you navigate the politics of a board meeting where two directors have conflicting agendas they have not stated? Can you deliver devastating performance feedback in a way that motivates rather than demoralizes?

Emotional intelligence hard problems require reading people — their unspoken concerns, their emotional states, their social dynamics, their motivations — and responding in ways that are situationally appropriate and relationally effective.

Automation timeline: Furthest out. Current AI systems can simulate empathy in text. They can use appropriate tone and express concern. But actual emotional intelligence — the real-time reading of a room, the navigation of interpersonal dynamics, the judgment about when to push and when to back off — requires a kind of situated social awareness that is nowhere near automation.

Why This Framework Changes Everything

The practical power of this framework is that it transforms vague anxiety about AI (“will it take my job?”) into a specific, actionable analysis.

Step one: Audit your work by problem type. Spend a week categorizing every task you do — not by topic, but by what makes it hard. Is it reasoning hard? Effort hard? Coordination hard? Domain expertise hard? Ambiguity hard? Emotionally hard?

Most professionals will discover that the tasks they spend the most time on are not the same tasks that make them most valuable. A financial analyst may spend 60% of their time on effort-hard tasks (data processing, report compilation) and 20% on ambiguity-hard tasks (investment recommendations under uncertainty). The 60% is highly automatable today. The 20% is the reason they have a job.

Step two: Stop asking “which model is best.” The question is meaningless without a problem type qualifier. For pure reasoning — mathematical proofs, logical analysis, complex debugging — Gemini 3.1 Pro with Deep Think is likely the strongest current option. For daily knowledge work — drafting, summarizing, conversational analysis — Claude and ChatGPT remain excellent and the differences are marginal. For sustained effort tasks — long code generation, document review, data processing — agentic models are purpose-built.

Asking “which model is best” is like asking “which vehicle is best” without specifying whether you need to cross an ocean, navigate a city, or haul freight.

Step three: Invest in your least-automatable capabilities. If your primary value to your organization is reasoning, you are in a race against models that are doubling their novel reasoning scores in single generations. The trajectory is clear. You will not lose that race next quarter, perhaps not next year, but the direction is unmistakable.

If your primary value is navigating ambiguity, building consensus, making judgment calls under uncertainty, and leveraging deep domain expertise developed over years, you are in a substantially stronger position. These capabilities are the hardest to automate and the most valuable in organizational contexts.

The Organizational Implications

This framework has consequences beyond individual careers. It reshapes how organizations should think about AI deployment.

Most companies deploy AI based on where the technology is most impressive in demos — reasoning and effort tasks. They automate report generation, data analysis, content drafting. These are genuine productivity gains. But they are also the lowest-value applications of the framework.

The highest-value applications are in the coordination, ambiguity, and domain expertise layers — using AI to augment (not replace) human capabilities in the areas where decisions are most consequential. An AI system that helps a medical team consider differential diagnoses they might have missed (augmenting domain expertise) or that surfaces contradictory market signals for a product team to evaluate (structuring ambiguity) creates value at a different order of magnitude than one that drafts emails faster.

Organizations that deploy AI only against reasoning-hard and effort-hard problems will capture incremental efficiency. Organizations that learn to deploy AI as a complement to human capabilities in ambiguity-hard and coordination-hard domains will capture strategic advantage.

The Uncomfortable Truth About Benchmarks

This framework also explains why AI benchmarks are increasingly misleading as indicators of real-world impact. Benchmarks overwhelmingly measure reasoning-hard and effort-hard capabilities. They test mathematical ability, coding proficiency, factual recall, and logical deduction. These are the most automatable types of difficulty.

Benchmarks do not measure — and structurally cannot measure — the types of difficulty that matter most in organizational contexts: coordination under political complexity, judgment under genuine ambiguity, domain intuition built from years of practice, emotional intelligence in high-stakes interpersonal situations.

When a model’s benchmark scores double, the correct response is not “AI is twice as capable.” It is “AI is twice as capable at the specific types of difficulty that benchmarks measure, which are also the types being automated fastest, which are also the types that contribute least to the most valuable knowledge work.”

That is a less exciting headline. It is a more accurate one.

What Comes Next

The six-type framework is not static. The boundaries will shift. Domain expertise is being eroded faster in fields with rich published literature (law, medicine) than in fields with thin documentation (specialized manufacturing, niche consulting). Coordination capability is advancing as multi-agent architectures mature. Even ambiguity handling will improve as models develop better representations of organizational context.

But the hierarchy of automation difficulty is unlikely to invert. Reasoning and effort will remain the easiest to automate. Emotional intelligence and ambiguity will remain the hardest. And the professionals and organizations that understand this hierarchy — who invest in the capabilities at the hard-to-automate end and use AI to handle the rest — will be the ones who thrive in the years ahead.

The question is no longer whether AI is getting smarter. It is getting smarter at specific types of smartness, on specific timelines, with specific implications for specific kinds of work. The framework is the strategy.

🧭 Decision Radar

Dimension	Assessment
Relevance for Algeria	High — Algerian professionals and enterprises need a framework to assess actual AI vulnerability rather than reacting to benchmark headlines
Infrastructure Ready?	Partial — AI tools available, but organizational capacity to audit work by problem type is undeveloped
Skills Available?	Partial — Algerian professionals are strong in domain expertise (oil & gas, agriculture, Mediterranean construction) which is among the slowest to automate
Action Timeline	6-12 months
Key Stakeholders	Individual professionals, HR directors, university career services, CTOs evaluating AI deployment strategy
Decision Type	Strategic

Quick Take: Algerian professionals should audit their own work using the six-type framework. Those whose value lies in deep domain expertise — particularly in sectors like hydrocarbons, agriculture, and regional regulatory knowledge — have more runway than they may think. Those whose value is primarily reasoning or effort should urgently diversify their skill portfolio.

The Myth of Monolithic Intelligence

The Six Types of Hard

Type 1: Reasoning Hard

Type 2: Effort Hard

Type 3: Coordination Hard

Type 4: Domain Expertise Hard

Type 5: Ambiguity Hard

Type 6: Emotional Intelligence Hard

Why This Framework Changes Everything

The Organizational Implications

The Uncomfortable Truth About Benchmarks

What Comes Next

🧭 Decision Radar

Sources & Further Reading

Leave a Comment Cancel reply

Most recent

32

The Web Is Forking: How AI Agents Are Creating a Parallel Internet

34

The Token Is the New Unit of Work: How 3 Engineers Outproduce 10

34

The Dark Factory: Software Where No Human Writes or Reviews Code

36

Three Developer Tracks for 2026: Orchestrator, Architect, Domain Translator

34

Six Types of Hard Problems: A Framework for What AI Can and Can’t Automate