NPS Deep Dive: A Test of Voice vs. Survey Responses

NPS might be the most celebrated metric in business—almost to the point of dogma. Nearly 2/3 of Fortune 1000 companies use that single NPS question to track customer experience. People either swear by it or criticize it for what it misses. And honestly, both camps have a point.

On one hand, NPS has been overused and misapplied, often leading companies to draw shaky conclusions from correlations with detractor behavior. On the other hand, it's a simple, standardized benchmark that helps track progress over time and compare performance against peers—and there’s real value in that.

But NPS alone doesn’t explain why someone gave a score. It misses the deeper story—the drivers, unmet needs, and brand perceptions hiding behind the number. Most tools try to patch this gap with an open-ended text box. So we ran a simple test to compare two collection methods to ask a single follow-up question: survey text open end versus a voice response.

Experiment Design

The goal of the experiment was to compare voice and text as methods for collecting survey responses—specifically focusing on the depth and actionability of the feedback. Both the voice and text groups were drawn from the same research panel, with different participants but matched demographics. No follow-up probes were used. Each participant shared the make of their primary car, gave it an NPS score, and answered a single open-ended follow-up question:

Can you describe a recent experience with your car that most influenced your willingness to recommend us, including what you expected, what stood out — good or bad — and how we came to other options you’ve considered?

Findings

We collected 50 responses for both collection methods - survey and voice. Below is a small sampling of the responses, feel free to check out the collective list here:

Survey Explanations
It is much more comfortable just like I expected, its of good quality and it appears safe
The car is very reliable, does not require frequent services, I missed a full body and engine service session but the car did not develop any problems
This vehicle is very visually appealing. Mechanically sound, and thrilling to drive. The Bronco brand has been around as long as I've been alive so there is historical significance to owning one. Ford is a dependable brand.

Voice Explanations
I just think the driving experience is a lot better than other cars. It's a much smoother drive than other types of makes that I've used, like Jeep and Toyota. And overall, it just is a smoother drive than those, so that's why I prefer it over those two options.
Well, I first I mean, the fact that it runs on fuel cell or hydrogen is a positive for the environment. However, it is quite costly because the number of gas station or number of hydrogen stations are a little bit over 50 in The entire United States. So prices are quite high. Per kilogram or liters and that's about $35 my car max is five, so we're talking about probably over a hundred and $80 a month. And it only gets about 350 miles on a gallon. So I probably would not recommend getting a hydrogen vehicle until the prices come down. And there are more stations that are built
I the gas mileage for one, is one of the biggest things, and then also the lane assist that helps you stay within the lines and also helps you break and stuff like that. That's the biggest, reasons that I like the car. It's spacious.. Negative. I don't really have a lot of negative to say about it. More positive The cargo, it the cargo gate works great or, like, the cargo area is spacious. The cargo lift works great. I think that's about it. I mean, as compared to other SUVs that I saw, it doesn't look like a minivan. Which is what I find the biggest problem with most SUVs being. Because that's what I have is I've I have the Ford Edge. It is looks a little more sporty versus a little more mom mom car.

Response Length

The simple, and naive approach, to contrast the two collection methods is to evaluate the response length. What we find is nearly 2x the number of words spoken versus words types in a survey end. On average, voice responses contained nearly twice as many words as their typed counterparts. This isn’t too surprising; past research has consistently shown that speaking requires less activation energy than typing.

Response length is just one dimension of depth—but what does “depth” really mean? Voice responses often include filler phrases like “let’s see here,” so longer answers aren’t always more insightful or actionable. In fact, using Daniel Kahneman’s framework, typing out an open-ended survey response might align more with Type 2 thinking—slow, deliberate, and effortful—whereas speaking is likely to tap into Type 1 thinking—fast, intuitive, and automatic. To truly evaluate which method yields more meaningful insights, we needed to go beyond word count and analyze the content more closely. We assessed responses across the following dimensions:

Themes / Concepts Mentioned
Extract the unique number of core themes discussed in the NPS explanation (e.g. “comfort”, “gas mileage”, "reliability")

Specificity vs Generality
Determine if the user cited a concrete example vs an abstract explanation. (The Mom Test makes a strong case for asking about concrete experiences)

Vocabulary (Lexical) Diversity
Refers to the variety of words used in a text — essentially, how rich and diverse the vocabulary is. Given LLMs are quite good at evaluating language, we had each explanation graded on a scale of 1-10 using gpt-4.1.

Qualification and Nuance
Determine if “however” or “on the other hand” statements are included indicating the consumer is providing a more deeper nuanced answer.

Time to Complete
The number of seconds it took to complete the three questions

Measure	Survey	Voice	Takeaway
Response Length (avg)	32.4	72.5 ✅	Voice reports roughly 2x length of explanation
Unique Themes (avg)	4.03	4.79 ✅	~20% more themes surfaced in voice
Lexical Diversity Score (1-10)	3.3 ✅	3.0	Survey yields a marginally richer vocubulary
Specificity (percent)	50%	74% ✅	Voice responses are significantly more specific
Qualification / Nuance (percent)	8%	16% ✅	Voice (though both methods scored low overall)
Completion Time (avg)	155 seconds	93 seconds ✅	Voice input is faster to complete

Takeaways

A single quantitative question has its appeal— it’s quick to ask, easy to track, and scales across use cases and verticals, good for benchmarking. But when it comes to understanding or moving the needle, things rarely stay that simple.

To move the needle, you need to go beyond a NPS score to understand the core drivers, unmet needs, and brand perceptions. Using behavioral data can help, and we highly suggest use it when available. However in many cases behavioral data is simply not available or it still leaves the company asking why. Thus we have to engage with customers to best understand why they feel the way they do about your brand/product/service. We strongly believe the collection approach here matters, and at the moment voice has many advantages over legacy survey collection approaches. This in turns garners the most depth of insight which in turn allows for action.

Give it a spin, create your first NPS+ Pulse today or reach out to us.