11,734 buyers.
13 distinct groups.
Here's where the money lives.
Patternbank's buyers are not one audience. We've split them into thirteen named groups using twelve years of transaction data. Some are gone. Some are ours to keep. One small group is slipping away — and it's worth more than the rest combined.
The five families
Before we look at the thirteen segments individually, here's how they group. This is the picture leadership should hold in their head.
Heroes
Our base. Small group, outsized revenue. They keep coming back at full price.
Climbers
Buyers still getting to know us. Recently active, building a relationship — they could become Heroes if cultivated.
Slipping
Used to be our base. Started drifting. Not gone, just absent. The biggest revenue recovery opportunity on the platform.
Already Lost
More than half our buyer base. Already gone. Stop spending money trying to bring most of them back. Mass-touch only.
Trend Audience
Different product, different conversation. They buy seasonal Trend Reports — never a Design. Don't mix them with Design-buyer campaigns.
Slipping VIPs is the priority of priorities.
1,099 buyers who used to spend £2,894 each. They've drifted away over the past 1-5 years. Recovering even 10% of them at half their old pace adds more than three months of platform revenue per year — every year going forward. Reactivation isn't a side bet. It's the headline initiative.
How we got here.
Two parallel methods. Twelve years of data. Seven design decisions. The approach is deliberately overbuilt for a small platform — because the cost of segmenting wrong (confident, wrong campaigns) is higher than the cost of segmenting carefully.
Two tracks, one segmentation
We ran two independent analyses in parallel from the same raw transaction data. Each catches what the other misses.
Industry-standard frameworks, calibrated to your data
We applied the buyer-segmentation rule frameworks marketers use across e-commerce — Recency, Frequency, Monetary value — and calibrated the thresholds against Patternbank's actual buyer distribution.
The output: twelve named segments plus the Trend Report carve-out. Every buyer falls into exactly one. Every segment is explainable to anyone in the business.
This is what we ship to Nina. Repeatable, refreshable, easy to act on.
Clustering algorithms — let the data speak
We used clustering algorithms (k-means, agglomerative hierarchical, density-based) to find natural groupings without imposing rules. No predefined segments. No assumptions about what the answer should look like.
The output: five commercial archetypes — Casual Shoppers, Core Shoppers, Premium Lovers, Deal Seekers, Power Shoppers.
This keeps Track A honest. Where they agree, we're catching reality. Where they disagree, we either fix the rules or learn something new.
The signals we used
Seven dimensions of behaviour, each interpretable on its own. Three core (the spine of the segmentation), four overlay (filters Nina layers on top).
Where rules and clusters agree
The reconciliation matrix. Each cell shows what percentage of a Track A segment landed in each Track B cluster. Strong concentrations (60%+) mean the rule is catching real behaviour. Diffuse rows mean the segment cuts across multiple natural groupings — overlay territory.
Read the matrix: Champions are 92% in Core+Power Shoppers (they're broad-taste). One-and-Done is 84% in Casual Shoppers (rule catches them cleanly). Where Track A and Track B diverge — particularly around discount behaviour and premium-buying — those signals became overlays rather than segment-defining rules.
The seven design decisions
At Phase 5, we walked through seven decisions in dependency order. Each was anchored in the cross-tab evidence. Each is documented with rationale and reversibility notes.
What we explicitly chose not to do
Discipline matters. Seven things we considered and dropped.
- Spot-check buyer sign-off — the rubric ships on statistical merit + clustering cross-validation. Tracing 3-5 named buyers wouldn't have changed any decision.
- A "premium subscription" overlay — the database flag's product meaning is unclear. Until clarified, it contributes nothing.
- Sub-splitting "Long Gone" into 5-10 years vs 10+ years — defer until a campaign requires the distinction.
- Promoting Discount-Dependency to a first-class segment — would override Champions/Loyal for buyers who happen to redeem coupons. Wrong trade.
- Adding new segments for marginal commercial distinctions — held at 13. Every additional segment is a tax on Nina's mental model.
- Per-transaction FX precision in the rubric — deferred. The 3% bound is well below band-boundary spacing.
- Designer-affinity sub-segments — left for Phase 6+. Possible future overlay if specific designer campaigns need it.
How we use it.
The segmentation is the input, not the output. The output is a calmer, sharper marketing operation: who we email, how often, what we offer them, where we route them, and crucially — who we leave alone.
Priority tiers · what to activate first
Not all segments justify equal effort. The four-tier priority is a pragmatic ranking based on cohort size × per-buyer ARPU × likely conversion lift.
The headline three
Important, secondary
Light-touch tail
Different audience
Per-segment detail · click any segment to expand
Slice recipes · the cohorts you actually target
Where overlays earn their keep. These are the cuts where segment × overlay gives you a precision-targeted cohort that no single segment could.
Refresh cycle · how it stays alive
Re-run the SQL view. Buyers shift between segments naturally as recency moves. ~10 minutes of work plus a dashboard refresh.
Re-run Track B clustering. If the natural groupings have shifted, review whether the rules need adjustment. Not for reassignment — for honesty.
Re-derive the breadth and exclusivity thresholds from the new buyer distribution. Slow-moving signals; annual is enough.
A reactivation campaign for Slipping VIPs.
What we need to build.
The segmentation is shippable today as static SQL. To make it operationally durable — quarterly refresh, dashboard, marketing-tool integration — there's a small engineering lift. None of it requires schema changes. Most is canonical patterns reused from existing infrastructure.
What we deliver as artifacts
Source-of-truth SQL
phase-5-rubric-v1-final.sqlCTE-based segmentation logic. Returns user_id → segment + RFM band + monetary value. ~150 lines, single MySQL query against existing tables.phase-3-track-a-distributions.sqlDistribution queries that calibrated the band thresholds. Reproducible.phase-3-5-followups.sqlDiagnostic queries (uncategorised bucket analysis, Design-only mean recalibration).
Reference data + analysis
cluster-assignments-final.csv9,212 rows. Per-buyer cluster assignment from Track B. Pseudonymous user_id only — no PII.fx-rates.csvBank of England daily FX rates 2014-03-24 → 2026-04-17. Basis for all GBP normalisation.final-segmentation-v1.mdHuman-readable spec. Segment definitions, slice recipes, refresh cadence.decisions-log.mdPhase 5 design decisions with full rationale. The audit trail.
What needs to be built
Five engineering tasks, sized roughly. None require schema migrations.
buyer_segments_current
Materialised view (or scheduled table refresh) that runs the rubric SQL against live data. Output columns: user_id, segment, segment_family, monetary_band, frequency_band, recency_band, premium_tier, discount_tier, breadth_tier, engagement_tier. Indexed on user_id and segment.
buyer_segments_current. Tabs: segment headcount over time, mean spend / frequency / recency by segment, overlay distribution within segments, period-over-period migration matrix, slice recipe quick-pulls.
fx-rates.csv. Recompute segment assignments at per-transaction precision. Confirm the 3% bound holds in practice. One-off check, not ongoing.
Maintenance cadence
The materialised view refreshes weekly. Buyers migrate naturally between segments as recency advances — no human intervention required.
Re-run the Track B clustering pipeline (Python venv, ~3 hours). Compare new cluster shapes to v1.1. Flag drift if any cluster shifts >15%.
Re-derive R/F/M band quantiles from the latest buyer distribution. Update breadth and exclusivity overlay thresholds. Document changes in versioned doc.
Data dependencies
All dependencies are on existing tables. No schema changes required. Two optional improvements would unlock additional analysis: a lightboxes.created_at column (currently missing — limits behavioural analysis of saved-design lifecycle), and a per-transaction FX rate field (would close the 3% precision gap permanently).
Mailchimp integration · putting segments into campaigns
Today, Mailchimp doesn't know about Patternbank's segmentation. Building a sync pipeline lets Nina target campaigns by segment and overlay directly inside the Mailchimp UI — same workflow she uses today, just with sharper audiences. No custom marketing tools required.
Tag model
Each buyer gets one segment tag, one family tag, up to four overlay tags, plus typed merge fields for spend, frequency, last purchase quarter.
Sync pipeline
Weekly Python job reads buyer_segments_current → calls Mailchimp API → upserts tags + merge fields. Runs in <5 minutes for ~12K buyers.
Saved Segments
Mailchimp Saved Segments configured to mirror the slice recipes from Tab 3. Nina selects "Slipping VIPs × Broad Taste" from a dropdown — no SQL, no manual list export.
Measurement loop
Mailchimp campaign metrics (open, click, conversion) flow back into the analytics view. Per-segment activation rate becomes measurable; the rubric stays honest.
segmentseg-slipping-vips · seg-champions · seg-new-starters ...familyfam-heroes · fam-climbers · fam-slipping · fam-lost · fam-trendpremium_tierprem-active · prem-lapsed · prem-mixed · prem-nonediscount_tierdisc-never · disc-occasional · disc-acq-oneshot · disc-repeatbreadth_tierbreadth-narrow · breadth-wide · breadth-naengagement_tiereng-vhigh · eng-high · eng-mid · eng-low · eng-vlowMEAN_SPENDLIFETIME_FLAST_Qthis-quarter · 9-20q-ago · 20q+-agoTags drive audience filtering (mutually exclusive within a tag family — a buyer is in exactly one segment, one family, one premium tier, etc). Merge fields drive personalisation tokens in copy and conditional content blocks.
Privacy + consentEmail is the join key — already in Mailchimp via existing opt-ins. Tags + merge fields are metadata about already-consented contacts. The pipeline only updates contacts who already exist in Mailchimp; never imports new contacts from the production DB. Buyers who delete their account in Patternbank get tags removed in the next sync. Standard Mailchimp data residency applies (EU or US, depending on account region).
1 day documentation for Nina (slice recipes → Saved Segment names)
Database improvements we'd recommend
The segmentation works on existing data. Several missing fields are limiting what we can do next — particularly around behavioural analysis. Three tiers, sized by what each unlocks.
lightboxes.created_atlightbox_items.created_at
follows.created_at
products.set_to_premium_at
Column on
transactions, or date-indexed FX table
users.first_purchase_at
Denormalised column
MIN(transactions.created_at) — requires a self-join across ~12K users. Pre-computed column makes cohort-tenure queries instant and simplifies the engagement-rate overlay calculation.
New
target_segment field on discounts
users.buyer_statusExplicit enum
baskets.abandoned_atExplicit timestamp
Recommended sequencing: Tier 1 first (lightbox + follows timestamps especially — these are paired and likely require similar migrations). Tier 2 alongside the Phase 6 Looker dashboard work. Tier 3 when convenient.
Privacy & data handling
Defensive defaults
- All persisted artifacts are aggregate-only. Segment-level counts, percentages, means. No per-buyer rows in any committed file.
- Per-buyer cluster CSV uses pseudonymous integer user_id only. No emails, names, addresses, phone numbers.
- The buyer_segments_current view stores user_id + segment tags only. No PII duplicated. Marketing tools query the view; they don't get the raw buyer table.
- The DB connection used during Phase 1-5 is read-only via a dedicated
pb_readonlyaccount, accessed via SSH tunnel only. The credentials and tunnel can be revoked in under five minutes if ever required.
Explore the segments.
Pick segments, layer overlays, simulate campaigns. Every count below comes from the v1-final rubric run on Patternbank's actual transaction history. Nothing is mocked.
All 11,734 buyers across all 13 segments. Use the filters at left to narrow the cohort and surface the structure beneath.
How the selected cohort distributes across the four overlay dimensions. Highlighted rows are over-represented (≥1.5×) compared to the universe average. Faded rows are under-represented.