Segmentation Workflow

Pre-Study Decision

Choosing Your Outcome Variable

The outcome variable is the single most important design decision in a segmentation study. It must be agreed before fieldwork begins and captured in the screener — never inferred from the interview.

The Default: Current Tool Selection

For most KwantumLabs studies, current tool selection (which product the participant is actively using today) is the right outcome variable. It is objective (verifiable from the screener), binary or categorical (clean to work with statistically), and directly relevant to the client — they want to know who uses them versus competitors.

Capture it as a screener field (for example, Q4: "What is the main tool your team uses for recruiting today?"). This becomes the ground truth for cluster validation. It must never come from the interview itself — if you infer adoption from what someone says in the interview, the outcome variable is contaminated.

Why current adoption has limits. Current adoption is backwards-looking — it tells you who already bought. At low market penetration (under 25% of your sample), the outcome variable has poor variance. Most of the comparison is built on a small group of users, confidence intervals on cluster-level adoption rates become wide, and validation statistics lose power. In those cases, consider evaluation status (is this person actively evaluating alternatives right now?) as the primary outcome.

Outcome Variable Selection Framework

Client's primary question	Outcome variable	When to use
Who buys us and why?	Current product adoption (binary)	Best when client has 25%+ penetration in sample
Who is in-market right now?	Active evaluation status (binary)	Best at any penetration level; forward-looking
Who would pay our price?	Willingness to pay tier (ordinal)	Best for pricing strategy studies
Who would switch from a competitor?	Switching likelihood (binary or ordinal)	Best for competitive displacement studies
Who is at risk of churning?	Retention risk signal	Best for post-sale / customer success studies

One primary outcome, secondary cross-tabs. Choose one outcome variable that anchors the segmentation. After clustering, you can cross-tab each segment against secondary outcome variables (WTP, switching intent) as profiling dimensions. This adds texture without distorting the cluster structure. The primary outcome is not debated after data collection — it is agreed before fieldwork begins.

The Temporal Contamination Risk

Interview questions fall into two temporal layers. This distinction determines whether a variable can be a clustering input.

Module A: Historical Evaluation

Questions about the past evaluation process — what triggered it, what criteria drove the decision, who was involved, why alternatives were rejected.

Upstream of the tool choice. Defining variable candidates.

What triggered the evaluation?
What criteria mattered most?
Why was the previous tool rejected?
Who else was involved in the decision?

Module B: Current State

Questions about current satisfaction, gaps, and needs — describing what the participant experiences with the tool they chose.

Downstream of the tool choice. Profiling variables by default.

How satisfied are you with your current tool?
What features do you wish it had?
What frustrates you about it?
What does your current workflow look like?

Module B data must not be used as defining variables. A participant who chose Ease-of-Use Tool A may now report ease of use as highly important — because the tool shaped their expectations, not because ease of use predicted the choice. Using Module B data as defining variables introduces reverse causality: the outcome (tool choice) partly causes the variables you are using to predict it. The resulting clusters group people by downstream effects rather than upstream drivers.

Step 1

Study Design

Decisions made here determine what recommendations are possible. Problems at this stage cannot be fixed in analysis.

The Three-Audience Test

Before writing a single interview question, identify which downstream audiences will use the segmentation findings. Each audience needs a different type of variable from the data.

Marketing

Needs to know what different buyers respond to and what language they use.

Pain points in the participant's own words
Evaluation criteria and decision filters
What messaging resonates or triggers rejection
What "winning" looks like in their mind

Sales

Needs to identify which type of buyer they are speaking with and what to say differently.

Observable signals: current tool situation
Buying timeline and who else is involved
What triggered the evaluation
Switching motivation: capability gap vs. status quo

Product

Needs to know which capabilities matter to different buyers and what blocks adoption.

Feature value hierarchy (what must the tool do?)
Table-stakes requirements and disqualifiers
Current workarounds and friction points
What would accelerate purchase or adoption

If your interview guide only covers one audience, you can only make recommendations to one audience. The gap will be visible in the deliverable and the client will notice.

Sample Size Requirements

Intended Segments (k)	Minimum N	Preferred N	Max Defining Variables
2 segments	40	60	4-6
3 segments	60	90	6-9
4 segments	80	120	8-12
5 segments	100	150	10-15

Pre-specify k before data collection. Record the expected number of segments and your rationale. If statistical criteria during analysis point to a different k, that is fine — but document why you deviated. Choosing k after seeing the data to fit a narrative is circular and reduces validity.

Required Screener Fields

Capture firmographic variables in the screener, not the interview. Screener data is cleaner, consistently formatted, and does not depend on interview recall or coding.

Field	Why it belongs in the screener
Company size	Primary firmographic clustering variable; must be verified, not self-reported in conversation
Seniority level	Predicts budget authority and decision-making role
Budget authority	Defines who can actually make the purchase decision
Current tool brand(s)	Competitive context; becomes a profiling variable
Product adoption flag	The outcome variable. Must come from the screener — not inferred from the interview — to avoid circularity in cluster validation

Step 2

Coding the Transcripts

The existing KwantumLabs pipeline. This step is not new — it is documented separately in the How to Code Transcripts reference.

Discovery Phase (4 agents)

Two extractors independently pull meaning units from each response. A synthesizer merges both extractions and groups meaning units into themes. A validator stress-tests the codebook for coherence and coverage.

Output: codebook.json — human-reviewed before proceeding to application.

Application Phase (3 agents)

An inclusive coder and a conservative coder independently code each participant against the codebook. An arbiter resolves disagreements. Cohen's Kappa is computed per code and overall.

Output: coded/final_codes.json — one object per participant with firmographic fields and theme arrays.

What comes out of this step: A JSON file with one object per participant. Fields include screener firmographics (company size, seniority, current tool), theme presence arrays from interview questions, and any ordinal responses captured in structured questions. This data is rich but not yet structured for cluster analysis — that is what Steps 3 and 4 do.

Step 3 — New

Dimension Reduction

Going from 20-30+ coded theme variables to 5-8 composite dimensions suitable for cluster analysis. This step is required — it is not optional cleanup.

Why raw themes can't go into clustering: With many variables relative to participants, distance measures become unreliable. Variables that happen to correlate inflate the weight of whatever they share. The resulting clusters are unstable and hard to reproduce. You need at least 10 participants per clustering variable.

The loquacity bias problem: Theme codes are binary (mentioned or not). Participants who talk more will have more codes as "present" — not because they have more needs, but because they generated more text. Without aggregation, you risk clusters that separate "people who talked a lot" from "people who gave brief answers."

The Four-Step Reduction Process

Step	What you do	Rule
1. Variance filter	For each binary theme code, calculate the proportion of participants coded positive	Exclude any theme below 20% or above 80% — move to profiling
2. Group redundant themes	Identify themes that tend to co-occur in the same participants — they measure the same underlying construct	Give each group a dimension name representing the underlying construct, not the individual themes
3. Create composite scores	For binary theme groups: dimension = 1 if any component theme is present, 0 if none. For ordinal groups: standardize then average	OR logic for binary; standardize + average for ordinal or mixed
4. Check the ratio	Count total defining variables and divide N by that count	Must be ≥ 10 participants per variable. At N=70: maximum 7 variables

Example: Maze Study Dimension Groupings

Dimension Name	Source	Component Themes / Fields
Organizational Complexity	Screener (ordinal)	Company size tier, UX maturity, stakeholder breadth
Decision Authority	Screener (ordinal)	Seniority level, budget authority, procurement involvement
Quality Orientation	Interview (composite binary)	Accuracy concern, research rigor requirement, output quality frustration, AI trust concern
Budget Constraint	Interview (composite binary)	Price sensitivity, small-team discount need, budget authority limitation
Openness to Change	Interview + screener (composite binary)	Switching motivation: capability gap, competitive context: augmenting, evaluation mode: active
Workflow Integration Need	Interview (composite binary)	Integration requirement, tool consolidation motivation, stakeholder sharing need

Result: 6 dimensions for N=70 = 11.7 participants per variable. Above the 10:1 minimum. The original 30 attributes reduced without losing meaningful signal.

Step 4 — New

Variable Classification

Every coded variable must be assigned to one of three buckets before any analysis begins. This classification is a deliberate decision, not a default.

Defining Variables

Go into the cluster analysis. Determine which cluster each participant falls into. Must pass all three decision tests.

Typical examples: Company size, seniority, switching motivation, openness to change, budget constraint composite, quality orientation composite

Outcome Variables

Do NOT go into clustering. Used after clustering to validate that segments predict something useful. These are what you are trying to explain.

Typical examples: Product adoption, likelihood to switch, NPS, willingness to pay, current satisfaction rating

Profiling Variables

Do NOT go into clustering. Used after clustering to describe and communicate each segment to the client. Make segments communicable.

Typical examples: Job title, industry, current tool brand, verbatim quotes, decision rules, feature priorities

Critical rule: Outcome variables must never enter clustering. Product adoption, likelihood to buy, current tool brand — if you put these into clustering, the algorithm groups people by the thing you are trying to predict. The resulting segments will tell you nothing about why buyers behave the way they do.

The Three Decision Tests

A variable must pass all three tests to be a defining variable. Fail any one and it goes to profiling.

Test	Question to ask	Rule if it fails
1. Variance	Does this variable vary across participants? (Binary: is it between 20-80% positive?)	Move to profiling — it describes the sample but doesn't differentiate it
2. Purpose	If two participants differ on this variable, would marketing, sales, or product do anything differently for each?	Move to profiling — it is descriptive color, not a strategic differentiator
3. Redundancy	Is this variable already captured by another variable in the defining set (they tend to co-occur)?	Consolidate into a composite — both measuring the same dimension inflates that dimension's weight

Variable Type Reference

Variable Type	Typical Bucket	Rationale
Company size (screener)	Defining	Strongly predicts needs and purchase behavior in B2B markets
Seniority level (screener)	Defining	Predicts budget authority, decision role, evaluation criteria
Product adoption flag (screener)	Outcome	This is what you are trying to predict — never a defining variable
Current tool brand (screener)	Profiling	Describes current state; is an outcome of past purchase, not a driver of future needs
Binary theme presence (interview)	Profiling (default) or Defining if passes all 3 tests	Too granular alone; aggregate into composite dimensions first
Switching motivation (interview)	Defining	Strongly predicts evaluation mode and openness to new tools
Satisfaction rating (interview)	Outcome	Measures current state, not an underlying buyer need
Feature value rating (interview)	Profiling or Defining	Defining for the 1-2 most differentiating features; profiling for the rest
Verbatim quotes	Profiling only	Illustrative; never quantitative

Step 5

Pre-Analysis Checks

Answer all six questions before running any cluster analysis. If you cannot answer yes to all of them, stop and resolve the issue first.

Question	If yes	If no
Is N sufficient for the pre-specified k?	Proceed	Do not proceed. Report the sample size constraint to the client. Consider consolidating to a lower k that the sample can support.
Has the variance filter been applied to all binary theme codes?	Proceed	Apply the filter now. Any theme below 20% or above 80% prevalence must be moved to profiling before continuing.
Are outcome variables explicitly excluded from the defining variable set?	Proceed	Remove them now. List them in the profiling dataset for post-clustering validation.
Is the total defining variable count within the N/10 ratio?	Proceed	Remove the weakest differentiators until the ratio is met. Weakest = lowest variance or weakest theoretical connection to purchase behavior.
Do the defining variables include at least one firmographic variable?	Proceed	Add company size or seniority as a defining variable. Pure needs-based segmentation produces segments sales cannot identify without a full interview.
Has a human researcher reviewed the dimension groupings?	Proceed	Get a second researcher to review the groupings before running. Dimension groupings are a theoretical claim — they should not be made by one person without review.

Step 6

Running the Cluster Analysis

Choosing the right distance metric, selecting k, and understanding the simultaneous approach.

Distance Metric: Always Use Gower Distance

The defining variable set will contain a mix of ordinal (company size encoded as 1-3), continuous, and binary (composite dimension scores) variables. Euclidean distance assumes all variables are continuous and comparable in scale — it is incorrect for mixed types. Gower distance handles each variable type appropriately: ordinal variables by rank, binary by Dice coefficient, continuous by normalized absolute difference. It is the standard choice for mixed-type interview data.

Selecting k: Three Criteria

Criterion type	Method	Guidance
Statistical	Silhouette score, BIC (if using model-based clustering), gap statistic	Run k=2 through k=6. The k with the highest silhouette score (or lowest BIC) is the statistical optimum. This is the starting point, not the final answer.
Practical	Substantiality check: minimum segment size	No segment should represent fewer than 8-10% of the sample. At N=70, that means no segment with fewer than 6-7 participants. If a k produces a segment below this threshold, consolidate to a lower k.
Interpretive	Researcher review of segment profiles	Do the segments make strategic sense? Are they meaningfully different from each other in ways that would lead to different marketing, sales, or product decisions? If two segments look almost identical, merge them.

The 3-4 B2B expectation: Practitioners (B2B International) observe that B2B markets, after applying the substantiality filter, typically yield 3-4 actionable segments. This is an empirical observation, not a rule. But if your analysis is pointing to k=7 or k=8 at N=70, that is a signal to investigate whether you have too many defining variables or whether the distance matrix is being distorted by high dimensionality.

Why Simultaneous (Not Sequential) at Our Scale

Sequential segmentation (split by firmographics first, then find needs-based sub-segments within each group) leaves you with roughly 20-33 participants per firmographic tier at N=70-100. Finding stable sub-segments within 20 people is not reliable.

The simultaneous approach feeds all defining variables — firmographic and needs-based together — into a single clustering run using all N observations for every grouping decision. Firmographic variables are clustering inputs, not pre-filters. The resulting segments are naturally hybrid, defined by both who a buyer is and what they need.

Step 7

Evaluating Segment Quality

Before presenting segments to a client, every segment solution must pass Kotler's five criteria and a bootstrap stability test.

Kotler's Five Criteria

Criterion	Definition	Common failure mode
Measurable	Size and characteristics can be quantified	Segment defined by latent attitudes with no way to measure prevalence in the broader market
Substantial	Large enough to warrant a distinct strategy	Segments with n<5 in a 70-person study; any segment below 8-10% of sample
Accessible	Can be reached through distinct marketing and sales actions	No channel or media profile; no observable identifier that sales can use without a full interview
Differentiable	Responds differently to the marketing mix	Two segments that share the same core pain points and the same evaluation criteria
Actionable	Effective programs can be designed for each segment	No clear recommendation attached to a segment for any of the three audiences

Bootstrap Stability Testing

Dolnicar, Grun & Leisch (2018) argue that bootstrap stability analysis is non-negotiable before reporting segment solutions. Without it, you cannot know whether the segments are a feature of the population or an artifact of this particular sample.

Step	Action
1	Draw 200+ bootstrap resamples of the data (sampling with replacement)
2	Re-run the cluster analysis on each resample using the same k and distance metric
3	Measure how consistently the same participants cluster together across resamples (Jaccard or Rand index)
4	Report the stability index in the methodology section of the deliverable

Stability threshold: A stability index above 0.75 is adequate. Below 0.6 means the cluster solution is unreliable — different samples would produce different segments. If stability is below threshold, consolidate to a simpler k.

The Identifiability Requirement

Each segment must have at least two observable identifiers — signals a sales rep can assess from LinkedIn, a company website, or the first five minutes of a discovery call — without needing to conduct a full research interview. If you cannot name two observable identifiers for a segment, it fails the Accessible criterion and cannot be used for sales targeting.

Observable signal type	Where to find it
Company size	LinkedIn, public data, company website
Seniority and job title	LinkedIn
Industry and company type	LinkedIn, company website
Current tech stack	G2, Capterra, job postings
Buying signals	Recent job postings, funding announcements, company growth signals
Discovery call signals	Current tool pain point, whether evaluating to replace or augment

Step 8

Reporting to Three Audiences

One segmentation. Three translations. The same cluster solution gets described differently for marketing, sales, and product — each in the language and format that audience can act on.

For Marketing

The pain point in the segment's own words (use actual quotes)
What they are trying to accomplish (job-to-be-done framing)
What messaging resonates vs. triggers rejection
Which channels and content formats reach this segment
The one or two proof points that matter most to them

For Sales

The 2-3 observable identifiers (firmographics + one behavioral signal)
What they say on a discovery call that reveals their segment
What to lead with in the pitch
Which objections to expect and how to handle them
When to pursue vs. deprioritize based on segment fit

For Product

Top 2-3 capability requirements (what the tool must do)
Table-stakes blockers (what disqualifies a tool immediately)
What would move this segment from consideration to purchase
What roadmap investment would increase adoption for this segment
What they currently work around — friction the product can remove

Segment Profile Template

Every segment delivered to a client should include a one-page profile structured to serve all three audiences.

Field	Content	Audience
Segment name	Short, memorable name capturing the core motivation (e.g., "The Insight Purist")	All
Size	N in study sample, estimated % of addressable market	All
Firmographic fingerprint	Typical company size, seniority, industry, buying role	Sales
Observable identifiers	2-3 signals visible before or in the first 5 minutes of a conversation	Sales
Core pain point	In their own words — use a representative verbatim quote	Marketing
Evaluation criteria	What a tool must do for them to consider it; what triggers elimination	Marketing + Sales
Top feature priority	The capability that matters most and most differentiates this segment	Product
Product adoption rate	% of this segment currently using the client's product (from outcome variable)	All
Strategic priority	High / Medium / Low — based on adoption rate, segment size, and fit with client's strategy	All

Quick Reference

Pre-Analysis Checklist and Decision Tables

Before You Run a Segmentation — 10-Point Checklist

Segmentation purpose is documented and agreed with the client
All three audiences (marketing, sales, product) are covered in the interview guide, or the scope has been explicitly limited and noted
k is pre-specified with rationale
All firmographic variables are captured in the screener (not inferred from interview)
Outcome variable is captured in the screener and explicitly excluded from the defining variable set
Variance filter applied — no binary theme with <20% or >80% prevalence in the defining set
Theme codes aggregated into composite dimensions
Final defining variable count does not exceed N/10
Dimension groupings reviewed by a second researcher
Bootstrap stability analysis is planned and will be reported in the deliverable

Statistical Test Reference

Comparison type	Test	When to use
Two proportions (segment A vs. B on a binary outcome)	Fisher's exact test	Any 2x2 comparison; preferred when cell sizes are small
Multiple groups on a binary outcome	Chi-square test	3+ group comparison; use with caution if any expected cell count is below 5
One proportion vs. a known benchmark	Binomial test	Comparing interview finding to a known external rate (e.g., Gong data)
Cluster solution predictive validity	AUC-ROC with permutation test	Does segment membership predict the outcome variable better than chance?
Cluster stability	Bootstrap Jaccard or Rand index	Does the same cluster solution emerge consistently across resamples?

Variable Classification at a Glance

Variable	Bucket	Reason
Company size	Defining	Predicts needs and purchase behavior
Seniority	Defining	Predicts budget authority and decision role
Switching motivation (capability gap)	Defining	Predicts evaluation mode and openness to change
Product adoption	Outcome	What you are trying to predict — never a clustering input
Current tool brand	Profiling	Describes past purchase; not a driver of future needs
NPS / satisfaction	Outcome	Post-adoption metric; measures outcome, not underlying need
Individual theme codes (raw)	Profiling (default)	Too granular; aggregate into composite dimensions first
Composite needs dimension	Defining (if passes 3 tests)	Aggregated signal with sufficient variance and purpose link
Verbatim quotes	Profiling only	Illustrative; never quantitative

Segmentation Workflow

Eight Steps from Transcripts to Recommendations

Choosing Your Outcome Variable

The Default: Current Tool Selection

Outcome Variable Selection Framework

The Temporal Contamination Risk

Study Design

The Three-Audience Test

Sample Size Requirements

Required Screener Fields

Coding the Transcripts

Discovery Phase (4 agents)

Application Phase (3 agents)

Dimension Reduction

The Four-Step Reduction Process

Example: Maze Study Dimension Groupings

Variable Classification

The Three Decision Tests

Variable Type Reference

Pre-Analysis Checks

Running the Cluster Analysis

Distance Metric: Always Use Gower Distance

Selecting k: Three Criteria

Why Simultaneous (Not Sequential) at Our Scale

Evaluating Segment Quality

Kotler's Five Criteria

Bootstrap Stability Testing

The Identifiability Requirement

Reporting to Three Audiences

Segment Profile Template

Pre-Analysis Checklist and Decision Tables

Before You Run a Segmentation — 10-Point Checklist

Statistical Test Reference

Variable Classification at a Glance