You Can't Skip the Baseline: The Hidden Risk in Your AI Data Foundation

Jennifer Leonard
Apr 22
3 min read

Updated: Apr 22

The hardest part of AI strategy isn’t the technology. It’s convincing anyone to fix what’s underneath before they’ve already failed.

The real risk isn’t messy data. It’s losing your baseline.

When teams hit data problems, messy inputs, misaligned definitions, unreliable signals, the instinct is to move forward anyway. New tools. New tracking. Cleaner schemas built from scratch.

And honestly, I get it. Legacy data wasn’t designed for how we want to use AI today. Fixing it is slow, cross-functional, and nearly impossible to prioritize when leadership is pushing for outcomes.

But here’s the problem that doesn’t get talked about enough: when you move forward without addressing what’s already there, you don’t just work around the mess. You lose your baseline.

And without a baseline:

you can’t prove improvement
you can’t measure lift
you can’t tell if AI is working, or just different

So teams end up in a strange place. Cleaner systems. Better tools. No way to show impact.

A more realistic approach to your AI data foundation

You don’t need to fix all your historical data. Most teams won’t, and that’s okay. But you do need something to benchmark against. Here’s how I’ve seen teams navigate it:

Keep enough historical data for a directional baseline.

You don’t need everything. You need enough to establish a before. Even imperfect historical data gives you a reference point that clean, new data alone can’t provide.

Build clean signals going forward.

New tracking, cleaner schemas, more reliable inputs are worth building. Just don’t treat them as a replacement for the past. Treat them as the start of a better record.

Or start smaller than you think you need to.

This is the one that actually works most often. Pick one segment, one customer journey, one use case you can measure end to end. Align on the outcome, agree on what success looks like, define the signals you need to track it. Then build from there.

Not everything needs to be fixed. But something needs to be comparable.

What I’ve seen in practice

In my own experience working across marketing and ecommerce, two problems come up constantly.

The first is that the same metric means different things to different teams. What counts as “engaged” in one system isn’t what counts as “engaged” in another. What one team calls a conversion, another team tracks differently. No one is wrong exactly, but no one is aligned either. And when you try to build on top of that, the cracks show up fast.

The second is that no one knows what data actually exists, or who owns it. You assume a signal is being captured somewhere. You find out it isn’t. Or it is, but three teams have three versions of it and none of them have been reconciled.

I saw this play out directly when I put together a measurement framework for a team, mapping the full funnel from email engagement through to sales outcomes, defining what to track, who owned it, and how to connect the dots across systems. Their response when they saw it was immediate: “This is exactly what we need. But we’ll never get there.”

That’s the real problem. It’s not that teams don’t understand the value of a solid data foundation. It’s that the path feels too slow, too cross-functional, and too hard to prioritize.

And without that foundation, there’s no reliable way to define a baseline, agree on whether something is working, or support the kind of hyper-personalization that AI actually requires.

Why this keeps getting skipped

Even when teams know the foundation matters, it’s still a hard sell. The work doesn’t look like progress. Aligning definitions, mapping journeys, establishing measurement, none of it demos well. It doesn’t generate quick wins. And it competes with initiatives that feel more visible and more urgent.

So it gets pushed aside. Leadership wants to move fast. Teams stay focused on execution. The foundational work sits in the middle, owned by everyone and no one.

What’s changed with AI is the consequence. These gaps used to create inefficiency. Now they create a hard stop. Personalization that doesn’t land. Automation that doesn’t scale. Insights that can’t be trusted.

One consultant I spoke with put it plainly: teams only come around after the failed attempts. The foundational work is nearly impossible to sell upfront, so most organizations end up doing it anyway, just later and at higher cost.

Closing

Use AI to speed up the work.

Building a solid AI data foundation isn't glamorous work. But don’t skip the part that tells you if it’s actually working. That part isn’t a formality. It’s the only thing that lets you prove the rest was worth it.