Rules first, code second: customer segmentation you can actually audit

Ask five people in a business what a "high-value customer" is and you'll get five answers. Now ask whose definition the monthly churn report uses. Usually nobody can say — it's buried in a spreadsheet someone built two years ago, or in code only one analyst understands. That gap, between what the business means and what the numbers actually compute, is where customer analytics quietly loses people's trust.

When I built a customer-analytics system — modelled on CRM work I delivered for a Nespresso distributor — I started from a single principle: rules first, code second. The segmentation logic should be readable by the business, owned by the business, and auditable by anyone. The code's job is just to apply it, deterministically, every month.

The model: RFMT, and what to leave out

Every customer is scored each month on independent dimensions built on an RFMT model — Recency, Frequency, Monetary, and Tenure. RFM is a classic; the Tenure dimension matters in a subscription-like consumables business because a six-month customer and a six-year customer who spend the same are not the same risk.

The more important modelling decision was what to exclude. In this business, engagement is driven by consumable purchases — the coffee, not the machine. So device and accessory sales are deliberately left out of the segmentation. A customer who buys one expensive machine and never reorders isn't "high value"; they're a churn risk wearing a big first invoice. Getting that exclusion right changes who lands in every tier.

The hard part of segmentation isn't the maths. It's agreeing what the words mean — and then making the code agree too.

What it produces

From those dimensions, each customer gets a small set of decision-ready signals every month:

Value tier — Diamond, Platinum, Gold, Silver, Bronze, derived from average monthly consumption, order size and six-month order frequency, with a passive state for inactive customers.
Activity status — active vs. inactive, based on whether they purchased in the trailing 12 months.
Lifecycle events — New, Lost, and Reactivated flags, recomputed each month from cumulative purchase windows.
Churn risk — high-value customers showing the early shape of lapsing, flagged for intervention before they go quiet.
Trends — how the value-tier mix and lifecycle shifts move across rolling 12-month periods.

The churn-risk flag is the one that pays for the whole system. Catching a Gold customer drifting toward Lost — while there's still a relationship to save — is worth far more than a tidy retrospective of who already left.

Rules as data, not as code

Here's the part I care about most. The thresholds — what counts as Diamond, what window defines churn, how long inactivity lasts before "Lost" — live in a business-logic specification, not scattered through the code. That spec is the single source of truth, and it ships as documentation right next to the dashboard, so anyone can audit exactly how a number was produced.

Two things fall out of that:

It's brand-configurable. Each brand can tune its own thresholds, and the entire history recalculates deterministically from the new rules. No re-coding, no "we think it's roughly this."
It's auditable. When someone disputes a number — and they will — the answer is a documented rule, not "let me check the script." Disagreements move from the code to the definition, which is exactly where a business should argue about them.

If a stakeholder can't open a document and see why they're a "Gold churn risk," the segmentation isn't finished — no matter how good the model is. Auditability is a feature, not an afterthought.

Built for millions of rows

The architecture is a deterministic, three-stage pipeline — raw data in, monthly snapshots and a static dashboard out, no server required at runtime.

Ingest — customers, products and transactions (or a seeded synthetic generator) load into normalized SQLite tables.
Compute — parameterized, set-based SQL runs the rules; a Python orchestrator replays them month by month to build a full 12-month history in one reproducible build.
Publish — snapshot JSON exports drive a Plotly.js dashboard, and the specs render to HTML beside it.

The key performance idea is to pre-aggregate, not query live. Rather than scanning millions of transactions on every dashboard load, the system collapses them into compact monthly per-customer snapshots. Millions of rows become query-ready summaries, so the dashboard stays instant no matter how large the history grows — and because it's all plain set-based SQL, it's easy to read and verify.

Why not a clustering model?

I get asked this a lot: why rules instead of k-means or a churn classifier? Two reasons. First, explainability is the product here. A sales lead acts on "this Gold customer hasn't reordered in 60 days," not on "cluster 4, probability 0.71." Second, the rules are the institutional knowledge — they encode how this business actually thinks about its customers. A model that's 3% sharper but nobody trusts or can adjust is worse than a transparent rule everyone owns. There's absolutely a place for ML on top of this; it just shouldn't be the thing the business has to take on faith.

If you're building your own

Write the definitions down first. Literally write the spec before the SQL. If you can't state the rule in a sentence, you don't understand it yet.
Decide what doesn't count. Exclusions (one-off devices, internal accounts, returns) shape the segments as much as inclusions.
Make it deterministic and replayable. Same data plus same rules should always give the same history — that's what makes it auditable and what lets you change a threshold without fear.

The whole system is open source and ships with a live, synthetic-data demo, so the logic can be inspected end to end without exposing any real customer data. That openness isn't incidental — it's the same principle as the spec. If the rules are the point, you should be able to read them.