Internal Methodology Reference

Segmentation Workflow

How KwantumLabs moves from interview transcripts to defensible market segments and audience-specific recommendations for marketing, sales, and product teams.

The Complete Workflow

Eight Steps from Transcripts to Recommendations

Steps 3 and 4 are newly added to the existing pipeline. Everything else builds on the coding infrastructure already in place.

1
Study Design
2
Transcript Coding
New
3
Dimension Reduction
New
4
Variable Classification
5
Pre-Analysis Checks
6
Cluster Analysis
7
Segment Validation
8
Recommendations

The foundational principle: Segmentation purpose must drive variable selection — not the other way around. Before any data is collected or analyzed, identify which decisions this segmentation needs to inform and which audiences will act on the findings. The method follows the purpose (Yankelovich & Meer, 2006).

1
Step 1

Study Design

Decisions made here determine what recommendations are possible. Problems at this stage cannot be fixed in analysis.

The Three-Audience Test

Before writing a single interview question, identify which downstream audiences will use the segmentation findings. Each audience needs a different type of variable from the data.

Marketing

Needs to know what different buyers respond to and what language they use.

  • Pain points in the participant's own words
  • Evaluation criteria and decision filters
  • What messaging resonates or triggers rejection
  • What "winning" looks like in their mind
Sales

Needs to identify which type of buyer they are speaking with and what to say differently.

  • Observable signals: current tool situation
  • Buying timeline and who else is involved
  • What triggered the evaluation
  • Switching motivation: capability gap vs. status quo
Product

Needs to know which capabilities matter to different buyers and what blocks adoption.

  • Feature value hierarchy (what must the tool do?)
  • Table-stakes requirements and disqualifiers
  • Current workarounds and friction points
  • What would accelerate purchase or adoption

If your interview guide only covers one audience, you can only make recommendations to one audience. The gap will be visible in the deliverable and the client will notice.

Sample Size Requirements

Intended Segments (k) Minimum N Preferred N Max Defining Variables
2 segments40604-6
3 segments60906-9
4 segments801208-12
5 segments10015010-15

Pre-specify k before data collection. Record the expected number of segments and your rationale. If statistical criteria during analysis point to a different k, that is fine — but document why you deviated. Choosing k after seeing the data to fit a narrative is circular and reduces validity.

Required Screener Fields

Capture firmographic variables in the screener, not the interview. Screener data is cleaner, consistently formatted, and does not depend on interview recall or coding.

FieldWhy it belongs in the screener
Company sizePrimary firmographic clustering variable; must be verified, not self-reported in conversation
Seniority levelPredicts budget authority and decision-making role
Budget authorityDefines who can actually make the purchase decision
Current tool brand(s)Competitive context; becomes a profiling variable
Product adoption flagThe outcome variable. Must come from the screener — not inferred from the interview — to avoid circularity in cluster validation
2
Step 2

Coding the Transcripts

The existing KwantumLabs pipeline. This step is not new — it is documented separately in the How to Code Transcripts reference.

Discovery Phase (4 agents)

Two extractors independently pull meaning units from each response. A synthesizer merges both extractions and groups meaning units into themes. A validator stress-tests the codebook for coherence and coverage.

Output: codebook.json — human-reviewed before proceeding to application.

Application Phase (3 agents)

An inclusive coder and a conservative coder independently code each participant against the codebook. An arbiter resolves disagreements. Cohen's Kappa is computed per code and overall.

Output: coded/final_codes.json — one object per participant with firmographic fields and theme arrays.

What comes out of this step: A JSON file with one object per participant. Fields include screener firmographics (company size, seniority, current tool), theme presence arrays from interview questions, and any ordinal responses captured in structured questions. This data is rich but not yet structured for cluster analysis — that is what Steps 3 and 4 do.

3
Step 3 — New

Dimension Reduction

Going from 20-30+ coded theme variables to 5-8 composite dimensions suitable for cluster analysis. This step is required — it is not optional cleanup.

Why raw themes can't go into clustering: With many variables relative to participants, distance measures become unreliable. Variables that happen to correlate inflate the weight of whatever they share. The resulting clusters are unstable and hard to reproduce. You need at least 10 participants per clustering variable.

The loquacity bias problem: Theme codes are binary (mentioned or not). Participants who talk more will have more codes as "present" — not because they have more needs, but because they generated more text. Without aggregation, you risk clusters that separate "people who talked a lot" from "people who gave brief answers."

The Four-Step Reduction Process

StepWhat you doRule
1. Variance filter For each binary theme code, calculate the proportion of participants coded positive Exclude any theme below 20% or above 80% — move to profiling
2. Group redundant themes Identify themes that tend to co-occur in the same participants — they measure the same underlying construct Give each group a dimension name representing the underlying construct, not the individual themes
3. Create composite scores For binary theme groups: dimension = 1 if any component theme is present, 0 if none. For ordinal groups: standardize then average OR logic for binary; standardize + average for ordinal or mixed
4. Check the ratio Count total defining variables and divide N by that count Must be ≥ 10 participants per variable. At N=70: maximum 7 variables

Example: Maze Study Dimension Groupings

Dimension NameSourceComponent Themes / Fields
Organizational Complexity Screener (ordinal) Company size tier, UX maturity, stakeholder breadth
Decision Authority Screener (ordinal) Seniority level, budget authority, procurement involvement
Quality Orientation Interview (composite binary) Accuracy concern, research rigor requirement, output quality frustration, AI trust concern
Budget Constraint Interview (composite binary) Price sensitivity, small-team discount need, budget authority limitation
Openness to Change Interview + screener (composite binary) Switching motivation: capability gap, competitive context: augmenting, evaluation mode: active
Workflow Integration Need Interview (composite binary) Integration requirement, tool consolidation motivation, stakeholder sharing need

Result: 6 dimensions for N=70 = 11.7 participants per variable. Above the 10:1 minimum. The original 30 attributes reduced without losing meaningful signal.

4
Step 4 — New

Variable Classification

Every coded variable must be assigned to one of three buckets before any analysis begins. This classification is a deliberate decision, not a default.

Defining Variables

Go into the cluster analysis. Determine which cluster each participant falls into. Must pass all three decision tests.

Typical examples: Company size, seniority, switching motivation, openness to change, budget constraint composite, quality orientation composite

Outcome Variables

Do NOT go into clustering. Used after clustering to validate that segments predict something useful. These are what you are trying to explain.

Typical examples: Product adoption, likelihood to switch, NPS, willingness to pay, current satisfaction rating

Profiling Variables

Do NOT go into clustering. Used after clustering to describe and communicate each segment to the client. Make segments communicable.

Typical examples: Job title, industry, current tool brand, verbatim quotes, decision rules, feature priorities

Critical rule: Outcome variables must never enter clustering. Product adoption, likelihood to buy, current tool brand — if you put these into clustering, the algorithm groups people by the thing you are trying to predict. The resulting segments will tell you nothing about why buyers behave the way they do.

The Three Decision Tests

A variable must pass all three tests to be a defining variable. Fail any one and it goes to profiling.

TestQuestion to askRule if it fails
1. Variance Does this variable vary across participants? (Binary: is it between 20-80% positive?) Move to profiling — it describes the sample but doesn't differentiate it
2. Purpose If two participants differ on this variable, would marketing, sales, or product do anything differently for each? Move to profiling — it is descriptive color, not a strategic differentiator
3. Redundancy Is this variable already captured by another variable in the defining set (they tend to co-occur)? Consolidate into a composite — both measuring the same dimension inflates that dimension's weight

Variable Type Reference

Variable TypeTypical BucketRationale
Company size (screener)DefiningStrongly predicts needs and purchase behavior in B2B markets
Seniority level (screener)DefiningPredicts budget authority, decision role, evaluation criteria
Product adoption flag (screener)OutcomeThis is what you are trying to predict — never a defining variable
Current tool brand (screener)ProfilingDescribes current state; is an outcome of past purchase, not a driver of future needs
Binary theme presence (interview)Profiling (default) or Defining if passes all 3 testsToo granular alone; aggregate into composite dimensions first
Switching motivation (interview)DefiningStrongly predicts evaluation mode and openness to new tools
Satisfaction rating (interview)OutcomeMeasures current state, not an underlying buyer need
Feature value rating (interview)Profiling or DefiningDefining for the 1-2 most differentiating features; profiling for the rest
Verbatim quotesProfiling onlyIllustrative; never quantitative
5
Step 5

Pre-Analysis Checks

Answer all six questions before running any cluster analysis. If you cannot answer yes to all of them, stop and resolve the issue first.

QuestionIf yesIf no
Is N sufficient for the pre-specified k? Proceed Do not proceed. Report the sample size constraint to the client. Consider consolidating to a lower k that the sample can support.
Has the variance filter been applied to all binary theme codes? Proceed Apply the filter now. Any theme below 20% or above 80% prevalence must be moved to profiling before continuing.
Are outcome variables explicitly excluded from the defining variable set? Proceed Remove them now. List them in the profiling dataset for post-clustering validation.
Is the total defining variable count within the N/10 ratio? Proceed Remove the weakest differentiators until the ratio is met. Weakest = lowest variance or weakest theoretical connection to purchase behavior.
Do the defining variables include at least one firmographic variable? Proceed Add company size or seniority as a defining variable. Pure needs-based segmentation produces segments sales cannot identify without a full interview.
Has a human researcher reviewed the dimension groupings? Proceed Get a second researcher to review the groupings before running. Dimension groupings are a theoretical claim — they should not be made by one person without review.
6
Step 6

Running the Cluster Analysis

Choosing the right distance metric, selecting k, and understanding the simultaneous approach.

Distance Metric: Always Use Gower Distance

The defining variable set will contain a mix of ordinal (company size encoded as 1-3), continuous, and binary (composite dimension scores) variables. Euclidean distance assumes all variables are continuous and comparable in scale — it is incorrect for mixed types. Gower distance handles each variable type appropriately: ordinal variables by rank, binary by Dice coefficient, continuous by normalized absolute difference. It is the standard choice for mixed-type interview data.

Selecting k: Three Criteria

Criterion typeMethodGuidance
Statistical Silhouette score, BIC (if using model-based clustering), gap statistic Run k=2 through k=6. The k with the highest silhouette score (or lowest BIC) is the statistical optimum. This is the starting point, not the final answer.
Practical Substantiality check: minimum segment size No segment should represent fewer than 8-10% of the sample. At N=70, that means no segment with fewer than 6-7 participants. If a k produces a segment below this threshold, consolidate to a lower k.
Interpretive Researcher review of segment profiles Do the segments make strategic sense? Are they meaningfully different from each other in ways that would lead to different marketing, sales, or product decisions? If two segments look almost identical, merge them.

The 3-4 B2B expectation: Practitioners (B2B International) observe that B2B markets, after applying the substantiality filter, typically yield 3-4 actionable segments. This is an empirical observation, not a rule. But if your analysis is pointing to k=7 or k=8 at N=70, that is a signal to investigate whether you have too many defining variables or whether the distance matrix is being distorted by high dimensionality.

Why Simultaneous (Not Sequential) at Our Scale

Sequential segmentation (split by firmographics first, then find needs-based sub-segments within each group) leaves you with roughly 20-33 participants per firmographic tier at N=70-100. Finding stable sub-segments within 20 people is not reliable.

The simultaneous approach feeds all defining variables — firmographic and needs-based together — into a single clustering run using all N observations for every grouping decision. Firmographic variables are clustering inputs, not pre-filters. The resulting segments are naturally hybrid, defined by both who a buyer is and what they need.

7
Step 7

Evaluating Segment Quality

Before presenting segments to a client, every segment solution must pass Kotler's five criteria and a bootstrap stability test.

Kotler's Five Criteria

CriterionDefinitionCommon failure mode
Measurable Size and characteristics can be quantified Segment defined by latent attitudes with no way to measure prevalence in the broader market
Substantial Large enough to warrant a distinct strategy Segments with n<5 in a 70-person study; any segment below 8-10% of sample
Accessible Can be reached through distinct marketing and sales actions No channel or media profile; no observable identifier that sales can use without a full interview
Differentiable Responds differently to the marketing mix Two segments that share the same core pain points and the same evaluation criteria
Actionable Effective programs can be designed for each segment No clear recommendation attached to a segment for any of the three audiences

Bootstrap Stability Testing

Dolnicar, Grun & Leisch (2018) argue that bootstrap stability analysis is non-negotiable before reporting segment solutions. Without it, you cannot know whether the segments are a feature of the population or an artifact of this particular sample.

StepAction
1Draw 200+ bootstrap resamples of the data (sampling with replacement)
2Re-run the cluster analysis on each resample using the same k and distance metric
3Measure how consistently the same participants cluster together across resamples (Jaccard or Rand index)
4Report the stability index in the methodology section of the deliverable

Stability threshold: A stability index above 0.75 is adequate. Below 0.6 means the cluster solution is unreliable — different samples would produce different segments. If stability is below threshold, consolidate to a simpler k.

The Identifiability Requirement

Each segment must have at least two observable identifiers — signals a sales rep can assess from LinkedIn, a company website, or the first five minutes of a discovery call — without needing to conduct a full research interview. If you cannot name two observable identifiers for a segment, it fails the Accessible criterion and cannot be used for sales targeting.

Observable signal typeWhere to find it
Company sizeLinkedIn, public data, company website
Seniority and job titleLinkedIn
Industry and company typeLinkedIn, company website
Current tech stackG2, Capterra, job postings
Buying signalsRecent job postings, funding announcements, company growth signals
Discovery call signalsCurrent tool pain point, whether evaluating to replace or augment
8
Step 8

Reporting to Three Audiences

One segmentation. Three translations. The same cluster solution gets described differently for marketing, sales, and product — each in the language and format that audience can act on.

For Marketing
  • The pain point in the segment's own words (use actual quotes)
  • What they are trying to accomplish (job-to-be-done framing)
  • What messaging resonates vs. triggers rejection
  • Which channels and content formats reach this segment
  • The one or two proof points that matter most to them
For Sales
  • The 2-3 observable identifiers (firmographics + one behavioral signal)
  • What they say on a discovery call that reveals their segment
  • What to lead with in the pitch
  • Which objections to expect and how to handle them
  • When to pursue vs. deprioritize based on segment fit
For Product
  • Top 2-3 capability requirements (what the tool must do)
  • Table-stakes blockers (what disqualifies a tool immediately)
  • What would move this segment from consideration to purchase
  • What roadmap investment would increase adoption for this segment
  • What they currently work around — friction the product can remove

Segment Profile Template

Every segment delivered to a client should include a one-page profile structured to serve all three audiences.

Field Content Audience
Segment name Short, memorable name capturing the core motivation (e.g., "The Insight Purist") All
Size N in study sample, estimated % of addressable market All
Firmographic fingerprint Typical company size, seniority, industry, buying role Sales
Observable identifiers 2-3 signals visible before or in the first 5 minutes of a conversation Sales
Core pain point In their own words — use a representative verbatim quote Marketing
Evaluation criteria What a tool must do for them to consider it; what triggers elimination Marketing + Sales
Top feature priority The capability that matters most and most differentiates this segment Product
Product adoption rate % of this segment currently using the client's product (from outcome variable) All
Strategic priority High / Medium / Low — based on adoption rate, segment size, and fit with client's strategy All
Quick Reference

Pre-Analysis Checklist and Decision Tables

Before You Run a Segmentation — 10-Point Checklist

Statistical Test Reference

Comparison typeTestWhen to use
Two proportions (segment A vs. B on a binary outcome)Fisher's exact testAny 2x2 comparison; preferred when cell sizes are small
Multiple groups on a binary outcomeChi-square test3+ group comparison; use with caution if any expected cell count is below 5
One proportion vs. a known benchmarkBinomial testComparing interview finding to a known external rate (e.g., Gong data)
Cluster solution predictive validityAUC-ROC with permutation testDoes segment membership predict the outcome variable better than chance?
Cluster stabilityBootstrap Jaccard or Rand indexDoes the same cluster solution emerge consistently across resamples?

Variable Classification at a Glance

VariableBucketReason
Company sizeDefiningPredicts needs and purchase behavior
SeniorityDefiningPredicts budget authority and decision role
Switching motivation (capability gap)DefiningPredicts evaluation mode and openness to change
Product adoptionOutcomeWhat you are trying to predict — never a clustering input
Current tool brandProfilingDescribes past purchase; not a driver of future needs
NPS / satisfactionOutcomePost-adoption metric; measures outcome, not underlying need
Individual theme codes (raw)Profiling (default)Too granular; aggregate into composite dimensions first
Composite needs dimensionDefining (if passes 3 tests)Aggregated signal with sufficient variance and purpose link
Verbatim quotesProfiling onlyIllustrative; never quantitative