Net Promoter Score works for products with large user bases and repeated transactions. A consulting engagement has none of these properties, and using NPS to measure engagement quality produces a number that doesn’t correspond to the thing anyone should care about.
Satisfaction Scores Measure the Wrong Thing During the Hardest Phases
The core problem with NPS for consulting is that the score measures how the engagement felt, and how the engagement felt is inversely correlated with how much value it produced during the hardest phases of the work. When the experience is the product, the metric needs to capture whether the experience produced lasting change, the kind that shows up in how the team operates six months later.
The engagements that score highest on satisfaction surveys are often the ones where the consulting team told the client what they wanted to hear: the analysis confirmed the existing strategy, and the recommendations aligned with what leadership had already decided. The engagements that produce the most lasting value are uncomfortable in the middle. The pre-mortem surfaces risks that leadership was avoiding; the dependency analysis shows the approved timeline is unrealistic. These are the moments that matter, because the alternative is discovering the same problems during execution at ten times the cost. But those moments don’t feel good while they’re happening.
A survey administered at the end captures the final emotional state, not the cumulative value. The score reflects the last mile, not whether the plan holds up during execution.
Small Samples and Bad Timing Make the Score Unreliable
NPS works through aggregation. A consumer product with ten thousand users can identify meaningful patterns, but a consulting firm that runs forty engagements per year has a sample of forty, where a single dissatisfied client moves the score by multiple points. The score lacks the statistical power to tell you anything reliable about performance.
Ask at the end of the engagement, when the final readout produced energy and the team felt ownership, and the score is high; ask six months later, when the team has discovered that the operating model doesn’t work in practice, and the score drops. The six-month score is more meaningful because it reflects durable value. Most firms administer the survey when the score will be highest (i.e., at closing) because the timing produces better numbers and less useful information.
There’s also a buyer-user split. The SVP who signed the statement of work evaluates the engagement based on strategic artifacts for the board, while the workstream leads evaluate it based on whether the process respected their time and produced something they could execute. A single NPS score blends these into one number that represents neither perspective.
Unprompted Referral Behavior Is a More Meaningful Signal
The metric that matters for professional services is referral behavior: would the client recommend us to a peer without being asked? This is different from the NPS question, which asks about hypothetical willingness on a scale. A client who sends an unprompted referral to a peer VP is staking their professional reputation on the recommendation; that behavior requires a level of confidence that a survey score can’t manufacture. Firms that understand they’re selling the experience of going through the work tend to generate more of these referrals.
Tracking referral behavior is harder than administering a survey. It requires monitoring whether past clients make introductions or bring us into new engagements at their next company. The signal is sparse and lagged, but it’s far more meaningful than a number on a post-engagement survey.
A Qualitative Debrief Produces Actionable Feedback
For firms that want structured feedback without the distortions of NPS, the better approach is a qualitative debrief three to six months after the engagement closes. A thirty-minute conversation with the program lead and two workstream leads, structured around two questions:
- What from the engagement is still in use, and what the team stopped using and why
- What they would change if they ran the planning process again
The answers tell us more about our performance than any numerical score. “We still use the decision log every steering committee meeting” signals the artifact was designed well. “We stopped using the operating model within a month because the meeting cadence was too heavy” signals that the operating model design needs rework.
This feedback is specific and tied to outcomes. It tells us what worked, what didn’t, and what to change. The question worth asking is not how the engagement scored but what stays after you leave.