Bringing AI to an analytics team without losing rigour

My team writes SQL faster than we did a year ago, drafts documentation in minutes, and gets a running start on analysis we used to begin from a blank page. None of that is the point. The point is that the numbers we hand to the business are still right — and we can still explain how we got them. Here's how I try to keep both true at once.

There's a quiet trap in rolling AI out to a data team. The tools make everyone faster, and "faster" feels like progress, so it's tempting to declare victory and move on. But an analytics team's product isn't volume — it's trusted numbers and sound judgement. If AI raises throughput while quietly eroding the rigour underneath, you haven't improved the team. You've just industrialised the production of plausible-looking mistakes.

So when I think about AI for my team, I'm not optimising for "more AI". I'm optimising for better, faster analytics that is still trustworthy. That's a higher bar, and it changes how you introduce the tools. Below is the approach I've settled on — practical, not preachy, and very much a work in progress.

Start with a traffic-light policy, not a ban list

The first thing people want from leadership is clarity: where am I encouraged to use this, and where will I get in trouble? A long list of prohibitions kills enthusiasm and pushes usage into the shadows, where you have no visibility at all. So instead of a ban list, I use a simple traffic-light framing — green, amber, red — that anyone can recall without opening a document.

One policy people can recall from memory: green encouraged, amber needs a human check, red never.

The lines that matter most are the red ones, and they're deliberately few. Nothing touching personally identifiable information or client data gets pasted into a tool we don't control. No production credentials. And no final number reaches a stakeholder on the strength of an AI output alone — a person has to have checked it. Everything green is genuinely encouraged; I want people leaning into it. Amber is the interesting middle, and it's where the real culture work happens.

Every AI output is a draft until a human checks it

If I could install one belief into the team, it would be this: an AI output is a draft, not an answer. It's a confident, well-formatted, occasionally completely wrong first pass that still needs a human to own it. Large language models are built to produce fluent, plausible text, which means a wrong answer arrives with exactly the same confidence as a right one — there's no tell in the tone. The well-documented tendency of these systems to generate fluent but false content is precisely why verification can't be optional in an analytics setting.

The danger isn't that AI is wrong. It's that AI is wrong confidently, in clean prose, at the exact moment you're tired and the deadline is close.

Verification only sticks if it's cheap and habitual, so we try to make checking part of the workflow rather than a separate chore. A query the model wrote gets run against a known result before it's trusted. A summary of a dataset gets spot-checked against the raw rows. A number gets reconciled against a source the team already trusts. The standard I keep repeating is simple: if you couldn't defend this output to a sceptical colleague — or to an auditor — it isn't finished yet.

The rule that travels well You can use AI to get to an answer faster. You cannot use AI to avoid understanding the answer. If you can't explain why the output is correct, you don't ship it — you go back and understand it first.

Teach two skills: prompting with context, and spotting confident-wrong

"AI fluency" gets thrown around loosely, but for a data team I think it comes down to two concrete, teachable skills — and they pull in opposite directions, which is the point.

The first is getting more out of the tool: giving it real context instead of a one-line ask. The schema, the business definitions, a sample of the data, the actual constraint you're working under. Most disappointing outputs aren't a model failure — they're a context failure. People who learn to front-load context get dramatically better drafts, and they get there faster.

The second is distrusting the tool well: developing a nose for the confident-wrong answer. The fabricated column that sounds like it belongs in the table. The join that silently changes the grain and inflates every total. The statistical claim stated with certainty and zero basis. This is the skill that protects rigour, and it's the harder one to teach — so we practise it deliberately, swapping real examples of plausible-but-wrong outputs we've each caught. Nothing builds healthy scepticism faster than watching a teammate nearly ship a beautifully formatted error.

Fluency that helps

Pasting schema, definitions and a data sample as context
Asking the model to show its reasoning so you can audit it
Treating every output as a draft to verify
Using AI to learn a concept faster, then applying judgement

Fluency that hurts

Copy-pasting output straight into a report unchecked
Trusting a number because it "looks about right"
Letting the tool define metrics instead of the business
Using AI to skip the understanding, not just the typing

Protect the data: know what never goes in the box

The fastest way to turn an AI rollout into an incident is a careless paste. For an analytics team this is the highest-stakes line, because we handle exactly the data that shouldn't leave the building: personal information, client records, commercially sensitive figures. The guidance here can't be a vague "be careful" — it has to be specific enough to act on under pressure.

Concretely, we don't paste personally identifiable information, customer or client data, credentials, or sensitive business figures into tools that aren't sanctioned and contractually covered for it. When someone genuinely needs the model's help on real data, the move is to work with structure and synthetic stand-ins — share the schema and a few fake rows that mirror the shape, not the live data — or use an approved, enterprise-controlled environment where the data-handling terms are clear. The mindset I try to instil: treat any external tool like a public channel until proven otherwise, and let the privacy and security of the data decide what's allowed, not convenience.

A path, not a memo

You don't get a verification culture by announcing one. It comes from giving people a sequence they can actually follow, so here's roughly the order I take a team through.

Publish the traffic lights. Make green/amber/red unambiguous and short enough to remember. Clarity beats a forty-page policy nobody reads.
Make verification the default, not an extra step. Bake "is this checked?" into reviews and definitions of done, so the question gets asked every time without anyone being the bad cop.
Train on context and on catching confident-wrong. Run real sessions on prompting with context and on spotting plausible errors, using examples from your own work.
Lock down data handling. Be explicit about what never gets pasted and offer a sanctioned alternative, so people aren't choosing between productivity and safety.
Measure value honestly. Track where AI genuinely helped and where it created rework, and be willing to say "this one didn't pay off."

Measure value honestly — including the costs

It's easy to report AI wins and quietly skip the times it sent someone down a wrong path for an afternoon. If you only count the saves, you'll over-trust the tool and slowly lower your guard — which is the exact opposite of what an analytics team needs. So I try to keep the scoreboard honest: where did AI clearly speed us up, where did it create rework, and where did a near-miss get caught only because someone verified properly? That last column isn't a failure of the policy. It's the policy working as designed, and it's worth celebrating as loudly as the wins.

The honest accounting also keeps expectations grounded. Some tasks get dramatically faster with AI; others barely move, or get slower once you factor in the checking. Naming that out loud protects the team from the hype cycle and from each other's FOMO. We adopt the tool where it earns its place and we're relaxed about the places it doesn't.

The goal was never "more AI"

If I do this well, the team won't talk much about AI at all. It'll just be another capable tool in the kit — used heavily where it helps, set aside where it doesn't, and never trusted blindly. The deliverables that leave my team will be faster to produce and exactly as defensible as they were before: clean numbers, clear lineage, sound judgement, a human who can stand behind every figure.

That's the whole game. Not more AI — better, faster, still-trustworthy analytics. The rigour was always the product. AI is just a faster way to get there, as long as you never let it become a faster way to be wrong.