Internal Methodology Reference
How KwantumLabs moves from interview transcripts to defensible market segments and audience-specific recommendations for marketing, sales, and product teams.
Steps 3 and 4 are newly added to the existing pipeline. Everything else builds on the coding infrastructure already in place.
The foundational principle: Segmentation purpose must drive variable selection — not the other way around. Before any data is collected or analyzed, identify which decisions this segmentation needs to inform and which audiences will act on the findings. The method follows the purpose (Yankelovich & Meer, 2006).
Decisions made here determine what recommendations are possible. Problems at this stage cannot be fixed in analysis.
Before writing a single interview question, identify which downstream audiences will use the segmentation findings. Each audience needs a different type of variable from the data.
Needs to know what different buyers respond to and what language they use.
Needs to identify which type of buyer they are speaking with and what to say differently.
Needs to know which capabilities matter to different buyers and what blocks adoption.
If your interview guide only covers one audience, you can only make recommendations to one audience. The gap will be visible in the deliverable and the client will notice.
| Intended Segments (k) | Minimum N | Preferred N | Max Defining Variables |
|---|---|---|---|
| 2 segments | 40 | 60 | 4-6 |
| 3 segments | 60 | 90 | 6-9 |
| 4 segments | 80 | 120 | 8-12 |
| 5 segments | 100 | 150 | 10-15 |
Pre-specify k before data collection. Record the expected number of segments and your rationale. If statistical criteria during analysis point to a different k, that is fine — but document why you deviated. Choosing k after seeing the data to fit a narrative is circular and reduces validity.
Capture firmographic variables in the screener, not the interview. Screener data is cleaner, consistently formatted, and does not depend on interview recall or coding.
| Field | Why it belongs in the screener |
|---|---|
| Company size | Primary firmographic clustering variable; must be verified, not self-reported in conversation |
| Seniority level | Predicts budget authority and decision-making role |
| Budget authority | Defines who can actually make the purchase decision |
| Current tool brand(s) | Competitive context; becomes a profiling variable |
| Product adoption flag | The outcome variable. Must come from the screener — not inferred from the interview — to avoid circularity in cluster validation |
The existing KwantumLabs pipeline. This step is not new — it is documented separately in the How to Code Transcripts reference.
Two extractors independently pull meaning units from each response. A synthesizer merges both extractions and groups meaning units into themes. A validator stress-tests the codebook for coherence and coverage.
Output: codebook.json — human-reviewed before proceeding to application.
An inclusive coder and a conservative coder independently code each participant against the codebook. An arbiter resolves disagreements. Cohen's Kappa is computed per code and overall.
Output: coded/final_codes.json — one object per participant with firmographic fields and theme arrays.
What comes out of this step: A JSON file with one object per participant. Fields include screener firmographics (company size, seniority, current tool), theme presence arrays from interview questions, and any ordinal responses captured in structured questions. This data is rich but not yet structured for cluster analysis — that is what Steps 3 and 4 do.
Going from 20-30+ coded theme variables to 5-8 composite dimensions suitable for cluster analysis. This step is required — it is not optional cleanup.
Why raw themes can't go into clustering: With many variables relative to participants, distance measures become unreliable. Variables that happen to correlate inflate the weight of whatever they share. The resulting clusters are unstable and hard to reproduce. You need at least 10 participants per clustering variable.
The loquacity bias problem: Theme codes are binary (mentioned or not). Participants who talk more will have more codes as "present" — not because they have more needs, but because they generated more text. Without aggregation, you risk clusters that separate "people who talked a lot" from "people who gave brief answers."
| Step | What you do | Rule |
|---|---|---|
| 1. Variance filter | For each binary theme code, calculate the proportion of participants coded positive | Exclude any theme below 20% or above 80% — move to profiling |
| 2. Group redundant themes | Identify themes that tend to co-occur in the same participants — they measure the same underlying construct | Give each group a dimension name representing the underlying construct, not the individual themes |
| 3. Create composite scores | For binary theme groups: dimension = 1 if any component theme is present, 0 if none. For ordinal groups: standardize then average | OR logic for binary; standardize + average for ordinal or mixed |
| 4. Check the ratio | Count total defining variables and divide N by that count | Must be ≥ 10 participants per variable. At N=70: maximum 7 variables |
| Dimension Name | Source | Component Themes / Fields |
|---|---|---|
| Organizational Complexity | Screener (ordinal) | Company size tier, UX maturity, stakeholder breadth |
| Decision Authority | Screener (ordinal) | Seniority level, budget authority, procurement involvement |
| Quality Orientation | Interview (composite binary) | Accuracy concern, research rigor requirement, output quality frustration, AI trust concern |
| Budget Constraint | Interview (composite binary) | Price sensitivity, small-team discount need, budget authority limitation |
| Openness to Change | Interview + screener (composite binary) | Switching motivation: capability gap, competitive context: augmenting, evaluation mode: active |
| Workflow Integration Need | Interview (composite binary) | Integration requirement, tool consolidation motivation, stakeholder sharing need |
Result: 6 dimensions for N=70 = 11.7 participants per variable. Above the 10:1 minimum. The original 30 attributes reduced without losing meaningful signal.
Every coded variable must be assigned to one of three buckets before any analysis begins. This classification is a deliberate decision, not a default.
Go into the cluster analysis. Determine which cluster each participant falls into. Must pass all three decision tests.
Typical examples: Company size, seniority, switching motivation, openness to change, budget constraint composite, quality orientation composite
Do NOT go into clustering. Used after clustering to validate that segments predict something useful. These are what you are trying to explain.
Typical examples: Product adoption, likelihood to switch, NPS, willingness to pay, current satisfaction rating
Do NOT go into clustering. Used after clustering to describe and communicate each segment to the client. Make segments communicable.
Typical examples: Job title, industry, current tool brand, verbatim quotes, decision rules, feature priorities
Critical rule: Outcome variables must never enter clustering. Product adoption, likelihood to buy, current tool brand — if you put these into clustering, the algorithm groups people by the thing you are trying to predict. The resulting segments will tell you nothing about why buyers behave the way they do.
A variable must pass all three tests to be a defining variable. Fail any one and it goes to profiling.
| Test | Question to ask | Rule if it fails |
|---|---|---|
| 1. Variance | Does this variable vary across participants? (Binary: is it between 20-80% positive?) | Move to profiling — it describes the sample but doesn't differentiate it |
| 2. Purpose | If two participants differ on this variable, would marketing, sales, or product do anything differently for each? | Move to profiling — it is descriptive color, not a strategic differentiator |
| 3. Redundancy | Is this variable already captured by another variable in the defining set (they tend to co-occur)? | Consolidate into a composite — both measuring the same dimension inflates that dimension's weight |
| Variable Type | Typical Bucket | Rationale |
|---|---|---|
| Company size (screener) | Defining | Strongly predicts needs and purchase behavior in B2B markets |
| Seniority level (screener) | Defining | Predicts budget authority, decision role, evaluation criteria |
| Product adoption flag (screener) | Outcome | This is what you are trying to predict — never a defining variable |
| Current tool brand (screener) | Profiling | Describes current state; is an outcome of past purchase, not a driver of future needs |
| Binary theme presence (interview) | Profiling (default) or Defining if passes all 3 tests | Too granular alone; aggregate into composite dimensions first |
| Switching motivation (interview) | Defining | Strongly predicts evaluation mode and openness to new tools |
| Satisfaction rating (interview) | Outcome | Measures current state, not an underlying buyer need |
| Feature value rating (interview) | Profiling or Defining | Defining for the 1-2 most differentiating features; profiling for the rest |
| Verbatim quotes | Profiling only | Illustrative; never quantitative |
Answer all six questions before running any cluster analysis. If you cannot answer yes to all of them, stop and resolve the issue first.
| Question | If yes | If no |
|---|---|---|
| Is N sufficient for the pre-specified k? | Proceed | Do not proceed. Report the sample size constraint to the client. Consider consolidating to a lower k that the sample can support. |
| Has the variance filter been applied to all binary theme codes? | Proceed | Apply the filter now. Any theme below 20% or above 80% prevalence must be moved to profiling before continuing. |
| Are outcome variables explicitly excluded from the defining variable set? | Proceed | Remove them now. List them in the profiling dataset for post-clustering validation. |
| Is the total defining variable count within the N/10 ratio? | Proceed | Remove the weakest differentiators until the ratio is met. Weakest = lowest variance or weakest theoretical connection to purchase behavior. |
| Do the defining variables include at least one firmographic variable? | Proceed | Add company size or seniority as a defining variable. Pure needs-based segmentation produces segments sales cannot identify without a full interview. |
| Has a human researcher reviewed the dimension groupings? | Proceed | Get a second researcher to review the groupings before running. Dimension groupings are a theoretical claim — they should not be made by one person without review. |
Choosing the right distance metric, selecting k, and understanding the simultaneous approach.
The defining variable set will contain a mix of ordinal (company size encoded as 1-3), continuous, and binary (composite dimension scores) variables. Euclidean distance assumes all variables are continuous and comparable in scale — it is incorrect for mixed types. Gower distance handles each variable type appropriately: ordinal variables by rank, binary by Dice coefficient, continuous by normalized absolute difference. It is the standard choice for mixed-type interview data.
| Criterion type | Method | Guidance |
|---|---|---|
| Statistical | Silhouette score, BIC (if using model-based clustering), gap statistic | Run k=2 through k=6. The k with the highest silhouette score (or lowest BIC) is the statistical optimum. This is the starting point, not the final answer. |
| Practical | Substantiality check: minimum segment size | No segment should represent fewer than 8-10% of the sample. At N=70, that means no segment with fewer than 6-7 participants. If a k produces a segment below this threshold, consolidate to a lower k. |
| Interpretive | Researcher review of segment profiles | Do the segments make strategic sense? Are they meaningfully different from each other in ways that would lead to different marketing, sales, or product decisions? If two segments look almost identical, merge them. |
The 3-4 B2B expectation: Practitioners (B2B International) observe that B2B markets, after applying the substantiality filter, typically yield 3-4 actionable segments. This is an empirical observation, not a rule. But if your analysis is pointing to k=7 or k=8 at N=70, that is a signal to investigate whether you have too many defining variables or whether the distance matrix is being distorted by high dimensionality.
Sequential segmentation (split by firmographics first, then find needs-based sub-segments within each group) leaves you with roughly 20-33 participants per firmographic tier at N=70-100. Finding stable sub-segments within 20 people is not reliable.
The simultaneous approach feeds all defining variables — firmographic and needs-based together — into a single clustering run using all N observations for every grouping decision. Firmographic variables are clustering inputs, not pre-filters. The resulting segments are naturally hybrid, defined by both who a buyer is and what they need.
Before presenting segments to a client, every segment solution must pass Kotler's five criteria and a bootstrap stability test.
| Criterion | Definition | Common failure mode |
|---|---|---|
| Measurable | Size and characteristics can be quantified | Segment defined by latent attitudes with no way to measure prevalence in the broader market |
| Substantial | Large enough to warrant a distinct strategy | Segments with n<5 in a 70-person study; any segment below 8-10% of sample |
| Accessible | Can be reached through distinct marketing and sales actions | No channel or media profile; no observable identifier that sales can use without a full interview |
| Differentiable | Responds differently to the marketing mix | Two segments that share the same core pain points and the same evaluation criteria |
| Actionable | Effective programs can be designed for each segment | No clear recommendation attached to a segment for any of the three audiences |
Dolnicar, Grun & Leisch (2018) argue that bootstrap stability analysis is non-negotiable before reporting segment solutions. Without it, you cannot know whether the segments are a feature of the population or an artifact of this particular sample.
| Step | Action |
|---|---|
| 1 | Draw 200+ bootstrap resamples of the data (sampling with replacement) |
| 2 | Re-run the cluster analysis on each resample using the same k and distance metric |
| 3 | Measure how consistently the same participants cluster together across resamples (Jaccard or Rand index) |
| 4 | Report the stability index in the methodology section of the deliverable |
Stability threshold: A stability index above 0.75 is adequate. Below 0.6 means the cluster solution is unreliable — different samples would produce different segments. If stability is below threshold, consolidate to a simpler k.
Each segment must have at least two observable identifiers — signals a sales rep can assess from LinkedIn, a company website, or the first five minutes of a discovery call — without needing to conduct a full research interview. If you cannot name two observable identifiers for a segment, it fails the Accessible criterion and cannot be used for sales targeting.
| Observable signal type | Where to find it |
|---|---|
| Company size | LinkedIn, public data, company website |
| Seniority and job title | |
| Industry and company type | LinkedIn, company website |
| Current tech stack | G2, Capterra, job postings |
| Buying signals | Recent job postings, funding announcements, company growth signals |
| Discovery call signals | Current tool pain point, whether evaluating to replace or augment |
One segmentation. Three translations. The same cluster solution gets described differently for marketing, sales, and product — each in the language and format that audience can act on.
Every segment delivered to a client should include a one-page profile structured to serve all three audiences.
| Field | Content | Audience |
|---|---|---|
| Segment name | Short, memorable name capturing the core motivation (e.g., "The Insight Purist") | All |
| Size | N in study sample, estimated % of addressable market | All |
| Firmographic fingerprint | Typical company size, seniority, industry, buying role | Sales |
| Observable identifiers | 2-3 signals visible before or in the first 5 minutes of a conversation | Sales |
| Core pain point | In their own words — use a representative verbatim quote | Marketing |
| Evaluation criteria | What a tool must do for them to consider it; what triggers elimination | Marketing + Sales |
| Top feature priority | The capability that matters most and most differentiates this segment | Product |
| Product adoption rate | % of this segment currently using the client's product (from outcome variable) | All |
| Strategic priority | High / Medium / Low — based on adoption rate, segment size, and fit with client's strategy | All |
| Comparison type | Test | When to use |
|---|---|---|
| Two proportions (segment A vs. B on a binary outcome) | Fisher's exact test | Any 2x2 comparison; preferred when cell sizes are small |
| Multiple groups on a binary outcome | Chi-square test | 3+ group comparison; use with caution if any expected cell count is below 5 |
| One proportion vs. a known benchmark | Binomial test | Comparing interview finding to a known external rate (e.g., Gong data) |
| Cluster solution predictive validity | AUC-ROC with permutation test | Does segment membership predict the outcome variable better than chance? |
| Cluster stability | Bootstrap Jaccard or Rand index | Does the same cluster solution emerge consistently across resamples? |
| Variable | Bucket | Reason |
|---|---|---|
| Company size | Defining | Predicts needs and purchase behavior |
| Seniority | Defining | Predicts budget authority and decision role |
| Switching motivation (capability gap) | Defining | Predicts evaluation mode and openness to change |
| Product adoption | Outcome | What you are trying to predict — never a clustering input |
| Current tool brand | Profiling | Describes past purchase; not a driver of future needs |
| NPS / satisfaction | Outcome | Post-adoption metric; measures outcome, not underlying need |
| Individual theme codes (raw) | Profiling (default) | Too granular; aggregate into composite dimensions first |
| Composite needs dimension | Defining (if passes 3 tests) | Aggregated signal with sufficient variance and purpose link |
| Verbatim quotes | Profiling only | Illustrative; never quantitative |