← All writing

Data privacy in the age of AI: what to keep close

AI tools are now in everyone's daily workflow, which means the question is no longer "should I use them?" but "what am I about to paste in?" Most of the privacy risk I see isn't dramatic — it's an analyst dropping a customer export into a consumer chatbot to "just clean it up." This is a grounded guide to staying useful without being careless. Not paranoid, just deliberate.

I use AI every day for real work, so I'm not here to tell you to switch it off. The opposite: the people who get the most out of these tools are usually the ones who've thought about where the data goes once it leaves their hands. A few simple habits cover almost all of it, and none of them slow you down once they're muscle memory.

Classify before you paste

The single most useful habit is a half-second mental check before anything goes into a prompt: what kind of data is this? You don't need a formal taxonomy — most data falls cleanly into one of four buckets, and the bucket tells you what's allowed where.

Public already published Internal team-only, low harm Confidential contracts · finance PII people · regulated Any AI tool consumer chatbots OK Enterprise tier only DPA · training opt-out Redact, or run local anonymize first the bucket decides the destination — not your deadline
A simple gate: the more sensitive the data, the fewer places it's allowed to go.

Public data — anything already on the open internet — can go anywhere; there's nothing to protect. Internal data is mundane company material that would cause little harm if it leaked, but still shouldn't be fed into training corpora. Confidential covers contracts, unreleased numbers, and commercial terms. PII — names, emails, IDs, anything that identifies a real person — is the category that carries legal weight and deserves the most care. The rule of thumb is short: never send confidential or personal data to a consumer chatbot.

Consumer vs enterprise: it's a different contract

The free or personal tier of an AI product and its business tier can look identical in the chat window, but the terms underneath are not the same. With most consumer tiers, your inputs may be used to improve the provider's models unless you turn that off, and your controls over retention are limited. Business and enterprise tiers typically reverse those defaults: your data isn't used for training, it's governed by a data processing agreement (DPA), and you get clearer retention and deletion terms. The major providers document this split openly — it's worth reading the page for whichever tool you've standardised on, for example OpenAI's enterprise privacy notes or Anthropic's privacy centre.

The same model can be a privacy liability or a perfectly safe tool. The difference isn't the model — it's the tier you're on and the data you feed it.

If your organisation is going to use AI seriously, the cleanest move is to put everyone on a business tier with training opt-out switched on, and make the personal accounts the exception rather than the default. It removes the hardest judgement call — "is this one OK to paste?" — from the moment of pressure, which is exactly when people get it wrong.

For the most sensitive work, keep it on the machine

There's a tier above "enterprise with a DPA," and that's data that simply never leaves your environment. Open-weight models now run well on a laptop or a controlled server, and for the genuinely sensitive tasks — drafting over a confidential contract, exploring raw HR or health records — a local model means the data never touches a third party at all. It's not the right tool for everything; the very best models still live in the cloud. But for the small slice of work where the answer to "can this go to a vendor?" is a firm no, on-device is the honest answer.

Redact before you send

Most of the time you don't need the sensitive part of the data to get the help you want. If you're asking AI to write a SQL query, summarise a complaint, or draft a reply, the customer's actual name and account number add nothing — so strip them first. Replace real identifiers with placeholders (CUSTOMER_A, ACME_LTD, +971-XXX), keep the structure, and you get the same quality of answer with none of the exposure. Anonymisation is the cheapest privacy control there is, and it's almost always available if you pause to look for it.

Common slip The classic mistake is pasting a full spreadsheet export — orders, emails, phone numbers and all — into a consumer tool to "reformat it." You almost never needed the personal columns for the task. Delete or mask the columns you don't need before the data leaves your laptop, not after.

It's not just etiquette — it's an obligation

When the data is about customers or employees, privacy stops being a personal preference and becomes a duty you're holding on someone else's behalf. Under GDPR-style regimes — and the UAE has its own data-protection framework — personal data has to be processed lawfully, kept to what's necessary, and not quietly shipped off to systems and regions nobody agreed to. I won't lecture on the statutes; the practical version is simpler. If you'd be uncomfortable explaining to the person why their data ended up in a particular tool, that discomfort is the signal. Treat other people's data the way you'd want yours treated, and most of the compliance follows.

Questions to ask any AI vendor

Before a tool gets near real data, a handful of questions sort the safe options from the ones to avoid. You're not looking for perfect answers — you're looking for clear ones.

  • Do you train on our data? The answer for a business tier should be a plain no, ideally on by default.
  • How long is data retained, and can we control it? Look for stated retention windows and a real deletion path.
  • Where is it processed and stored? Region matters for regulated data — "somewhere in the cloud" isn't an answer.
  • Will you sign a DPA? If a vendor won't put data terms in writing, that is the answer.
  • Who are the sub-processors? Your data is only as private as the weakest link they hand it to.

The short checklist

None of this needs to be heavy. Here's the version I actually run in my head, in order, every time:

  • Classify first. Public, internal, confidential, or PII? Decide before you type.
  • Match the tool to the tier. Sensitive data goes only to enterprise tiers with training opt-out — or stays local.
  • Redact what you can. Mask names, IDs and accounts; you rarely need them for the task.
  • Check the contract, not the chat box. Confirm the DPA, retention and region before the first real upload.
  • If in doubt, keep it close. When you can't answer "where does this go?", don't send it.

That's the whole discipline. Used this way, AI is a genuine accelerator and the privacy cost is close to zero — because the sensitive material never went anywhere it shouldn't. The goal isn't to be afraid of these tools. It's to know, every time, exactly what you're handing over and to whom.

Oleksandr Tverdokhlieb
Oleksandr Tverdokhlieb
Data Analytics Manager · Dubai — building data platforms, automation and applied AI.
More writing → How I use AI Connect on LinkedIn ↗