AI Adoption Strategy: Why 95% of Projects Fail (and the Scoping Discipline That Beats the Odds)

MIT found 95% of enterprise AI pilots deliver zero ROI. The fix is a scoping discipline that maps dependencies and picks the right AI for the job.

Why 95% of Enterprise AI Projects Fail to Ship Value

In August 2025, MIT published a report called The GenAI Divide: State of AI in Business 2025. The headline finding was direct. Ninety-five percent of integrated enterprise AI pilots delivered zero measurable impact on profit or loss, despite $30 to $40 billion in combined spending during the year. Only 5% of projects produced a real business result.

Gartner's forecast for agentic AI points in the same direction. More than 40% of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls.

The problem is not the technology. Foundation models, agent frameworks, and vector databases are more capable than they were a year ago. The problem is how enterprises decide what to build, in what order, and with which tool.

Jacob Andra, CEO of the AI consulting firm Talbot West, framed the pattern on The Security Podcast of Silicon Valley. "The most common issue we see is a client who has already decided the answer is an LLM, and they want us to help them wrap a workflow around it. That is scoping in reverse."

This article walks through the scoping discipline that separates the 5% that ship value from the 95% that don't. It is grounded in the reasoning Andra and Talbot West co-founder Stephen Karafiath use when they evaluate real engagements.

Improper Scoping Is the Root Cause

The MIT study broke down how the 5% of successful AI projects differed from the failing 95%. One finding stood out. Strategic partnerships, where an outside team helped scope the work, deployed successfully 67% of the time. Internal builds, where the company scoped and executed alone, deployed successfully only 33% of the time.

The gap is not vendor magic. Outside scoping teams ask harder questions earlier. They force clients to define the workflow in detail before selecting a tool. They insist on a measurable baseline before building anything.

Karafiath, who spent 30 years in enterprise software at GE and Oracle and led Oracle's developer innovations team, described the failure mode. "People see a demo, get excited, and then try to reverse engineer a real business problem onto the capability they just saw. The problem they end up solving is not the problem that was hurting them."

Three scoping mistakes repeat across failed projects:

  • Solving a problem the business is not actually paying to solve. The pilot produces a cool output, but no P&L line changes.

  • Ignoring the data pipeline that feeds the model. The AI works on a clean sample and fails on production data.

  • Treating AI as a single capability instead of a toolkit. The team commits to an LLM before asking whether an LLM is the right fit.

A disciplined scoping pass catches all three before the first line of code.

Map Dependencies Before You Pick a Tool

Before selecting a model or a vendor, the scoping team needs to draw the full dependency map of the workflow the AI will touch. Andra calls this "finding the iceberg under the demo."

A dependency map has four layers:

Data dependencies. Where does the input data live? How clean is it? Who owns it? What happens when the source schema changes? Most pilots break when the real data behaves differently than the training sample.

Process dependencies. What steps happen before and after the AI call? What systems consume the output? If the AI changes the output format, how many downstream systems break?

Human dependencies. Who reviews the AI's output? Who overrides it? Who is accountable when it is wrong? Projects that skip this layer tend to produce outputs no human trusts enough to act on.

Security and compliance dependencies. What data classification is touched? What audit trail is required? How will the AI's decisions be explained to regulators? The cost of these dependencies is often larger than the model cost itself, especially in regulated sectors like defense, finance, and healthcare.

The dependency map is also the place where AI agent permissions and least privilege decisions need to be made. Agentic systems inherit every risk in every system they are allowed to touch. Map the graph first, then decide which edges the agent is allowed to use.

The output of this step is a scoped problem statement. Not a tool choice, not a vendor short list. A clear statement of the workflow, its dependencies, and the outcome the business is willing to measure.

Know When an LLM Is the Wrong Answer

Only after the dependency map is complete does the team pick the AI approach. This is where the 95% of failed projects go wrong most often. They assume the answer is a large language model because LLMs are what leadership has heard of.

"LLMs are one tool in a toolbox," Andra said. "They are extraordinary at some things. They are genuinely bad at others. If you build your whole stack around the assumption that the LLM will figure it out, you will lose money."

Four categories of problems often produce better results without an LLM as the primary engine:

  • Deterministic decisions with known rules. Credit approvals, tax calculations, eligibility checks. A rules engine beats an LLM on accuracy, speed, and auditability.

  • Numeric forecasting on structured data. Demand planning, fraud scoring, capacity modeling. Classical machine learning and Bayesian models have been solving these for decades with lower cost and better interpretability.

  • Structured extraction from documents with a fixed schema. Invoice fields, contract clauses, lab results. Purpose-built extraction models with rule-based post-processing outperform LLMs on precision and recall.

  • Anything that requires a numeric guarantee. Compliance thresholds, risk limits, regulatory reports. LLMs are probabilistic by design. If the answer must be correct every time, the LLM is not the answer.

Karafiath pulled an example from his time at Oracle. "We built Bayesian fraud detection that ran against banking transactions in the 1990s. It still runs. It catches fraud reliably. It is explainable to a regulator in about one sentence. No LLM is going to replace that with better results. But I see teams try."

The right pattern is often a blend: an LLM on the front end for natural language interaction, classical models for the numeric work, and a symbolic or rules layer for the decisions that must be correct. This is the same ensemble pattern that shows up in secure AI coding workflows, where the LLM suggests code but deterministic analysis catches the vulnerabilities the LLM introduces.

The Human Layer: Change Management and Training

Even a technically perfect AI deployment fails if the humans downstream do not adopt it. This is the layer every failed pilot we have seen underweights.

Three failure patterns show up:

No one is trained to use the output. The model produces a score, a summary, or a recommendation, and the people it was built for keep doing the workflow the old way. The project dashboard shows usage at 3%.

No one is trained to override the output. Staff do not know when to trust the AI and when to push back. Either they rubber-stamp everything, which lets errors through, or they reject everything, which kills the ROI.

No one owns the feedback loop. The model drifts. Nobody notices because no one is responsible for watching.

Change management is a line item, not a footnote. A realistic AI adoption strategy budgets for training, for shadow-run periods where humans and AI produce outputs in parallel, and for a named owner who holds the model accountable after go-live.

Andra put the principle plainly. "If your pilot succeeds but your people do not adopt it, you have built a science project, not a business system."

The same principle shows up in security tooling adoption, where shift-left security programs fail not because the tools are bad but because the human workflow was never redesigned around them.

What a Disciplined AI Adoption Framework Looks Like

Talbot West uses a framework called APEX, short for AI Prioritization and EXecution. It is one example of how a disciplined scoping process can run. The structure is useful even if the name is not.

APEX runs in three rounds:

Round 1: Problem cataloging. List every workflow where AI could plausibly help. Score each on business value, data readiness, and organizational readiness. Do not pick a tool. Do not pick a vendor. Just list and score.

Round 2: Dependency and fit analysis. For the top candidates, map the full dependency graph. Identify the right AI class for each, including the option "not AI." Some problems are better solved with software engineering, not AI.

Round 3: Pilot design with a measurable baseline. Design the narrowest possible pilot. Define the baseline metric. Set the success threshold before starting. Time-box it. If the pilot hits the threshold, scale it. If not, kill it and move on.

The pattern shows up in other disciplined AI programs. It mirrors the way successful security teams build programs: define the risk, design the smallest intervention, measure honestly, expand what works. Illumio's experience building a security product from scratch makes the same point: ambitious outcomes come from narrow, well-scoped starts.

The three-round structure costs more upfront than "pick a tool and prototype." It saves 10 times that cost by not shipping another project into the 95% bucket.

Listen to the Full Episode

Jacob Andra and Stephen Karafiath joined Jon McLachlan (co-founder of YSecurity and Cyberbase.ai) and Sasha Sinkevich (co-founder of YSecurity and Cyberbase.ai) on The Security Podcast of Silicon Valley to discuss why most enterprise AI projects stall and what disciplined scoping looks like in practice.

The full conversation covers the MIT 95% failure data, the APEX framework, the LLM-versus-other-AI decision, and the lessons Talbot West has learned from scoping AI engagements for aerospace, defense, and mid-market clients.

Why do most AI projects fail?

What is the MIT 95 percent AI failure statistic?

How do you scope an enterprise AI project?

When should you not use an LLM?

Meet the hosts

Jon McLachlan

Co-Founder, YSecurity & Cyberbase

Questions founders and engineers actually ask, with decisions not theater.

Questions founders and engineers actually ask, with decisions not theater.

Sasha Sinkevich

Co-Founder, YSecurity & Cyberbase

Pushes past surface answers into architecture, tradeoffs, and what scales.

Pushes past surface answers into architecture, tradeoffs, and what scales.

The Security Podcast of Silicon Valley

jon@thesecuritypodcastofsiliconvalley.com

The Security Podcast of Silicon Valley

jon@thesecuritypodcastofsiliconvalley.com