Whenever I publish a headphone or source equipment review, the most common question I get is something along the lines of “how much better is it than X?”, where X is either something the person already owns, or something the person is considering instead. For headphone reviewers, there’s nowhere to really begin with this type of question, because “better” is rarely defined by the person leaving the comment or question. But because of this, for anyone trying to get the most accurate evaluations, it must be frustrating to simply rely on a multitude of occasionally contrasting subjective impressions from reviewers.
Maybe certain things like detail retrieval or stage could be given a relative comparison (like “headphone A is better than headphone B at detail retrieval”), but there seems to be a consistent desire to be able to benchmark headphones on a numerically represented sliding scale or similarly satisfying conclusion. Most people want me to give an answer like “5% better”, or “20% better” and so on, so the person gets a sense of whether or not it’s worth it for them. I’ve even seen questions on source equipment to the effect of “can I expect a 20%, 30%, or higher improvement over my phone?”. While I’ve tried to give numerical values for my evaluations (at least for headphones), I’m still at a bit of a loss as to how to answer these questions.
I think this notion likely comes from the clearly demonstrable improvements that you can talk about and even represent in this way in other areas. We’re used to be able to do this with PC components - CPUs, GPUs, and so on. We can say things like “this laptop is 30% faster than that laptop” (in whatever benchmark tests we’ve run), and have it make sense the way that any similar statements in headphones wouldn’t. If it doesn’t come from benchmarks in the tech world, I’m sure you could point to any number of other hobbies (like cars for example) and interests where those kinds of statements make sense - where the improvements are measurable and tangible in much more straightforward ways.
But that also doesn’t mean there’s nothing we can anchor headphone and source evaluations to in some fashion. I’m not suggesting headphone appreciation is as close to matters of taste like the traditional view of food/wine appreciation maybe many of us think it is (this is a very interesting topic but it’s way too deep to go into here. I think there’s a more nuanced view of this stuff that maybe lines up more closely with headphone evaluations). There are tangible improvements that show up when going from an M50x to a HiFiMAN Susvara for example, that don’t really depend on a matter of taste - both when it comes to the experience and when it comes to what’s happening physically with the technology.
At the very least we can talk about deviations from frequency response targets. Certain review sites like RTings have devised index scores that are based on the frequency response’s adherence to or deviation from target curves. In my opinion, this hasn’t quite worked out so great in practice because too many other factors were included, and this skewed the results to counter-examples that showed crazy things like the Stax L300 being equivalent to Bose QC35ii. Their process has now been improved to have the score best reflect the use case, which yields much more useful scores - perhaps because of this problem. But the idea in principle - the idea of creating some kind of index score that you can use to get more than just a relative sense of sound quality - is a good one. It’s something that a lot of prospective headphone enthusiasts who don’t want to bother reading and comparing all the subjective reviews out there would find useful.
So at the very least, we could reduce frequency response deviations to a numerical value or index of some kind. But this also leaves out other important factors. At the moment, there’s no way to identify things like detail retrieval in measurements (Sean Olive may disagree, I’ll leave the door open to that). It may very well be the case that technical characteristics like detail retrieval and macrodynamics - even soundstage - could be captured by the more fine-grained frequency response measurements, but we don’t know where to reliably look for it yet. As I’ve said in the past, we can’t hold up a frequency response graph and say “here’s the detail”. And this unfortunately means our index scores (or any comparable percentage result) would be incomplete.
Many reviewers, myself included do provide numerical scores (or grades like the way Crinacle does) to help make comparing different equipment a bit easier. But this is at best still subjective benchmarking. It’s all just based on comparing different equipment and determining which sounds more ‘clear’ or identifying winners between compared equipment along given performance dimensions. Maybe this is all readers are looking for, but I think this type of evaluation still misses some important pieces to be able to say one headphone is a given percentage better or worse than another.
I may have provided answers to that effect in the past, but I tend to think any answer that reduces sound quality improvements to a percentage or some kind of scaled value depends on two important coefficients:
- What you’re used to
- How much you care
In other words, statements made in evaluations that include the subjective benchmarking that gets done by reviewers like myself and others, where we do provide numerical or graded scores, contain the implicit assumption “if you care about this stuff the way I do”. And I’d like to think that reviewers in general do care a lot about this stuff - even the ones we may disagree with. Whether it’s me, Josh, Metal, Max, Crin, etc., I tend think everyone has the same passion for getting their music represented in the best and most enjoyable way possible.
For prospective buyers who haven’t had a chance to experience the variety of equipment reviewers are fortunate enough to get a sense of, if you’re used to using an M50X or similar equipment, and that’s been your expectation of audio quality in general, going up to something like the Susvara or Focal Utopia is going to reveal massive differences. But that alone can’t determine the degree of improvement or “whether it’s worth it” for the listener.
If you care a lot, the difference between an M50x and a Susvara is going to be way more significant than for someone who doesn’t really care at all. I’ve met people who put on flagship headphones and go “it sounds fine”, but their reactions don’t really indicate they’re as impressed as someone who really cares a lot about how their music is represented. And this, to me, is the most important factor.
Perhaps a more realistic example is the difference between the Focal Clear and the Focal Utopia. To me (and many enthusiasts), this comparison reveals a substantial difference in terms of image clarity and overall fidelity, more so than between the Clear and the Elear. But there are those - even within the audiophile community - who aren’t as taken by that difference. This means they’re able to enjoy the Clear’s tonality over the Utopia, in spite of the Utopia’s technical advantages. In this case, it doesn’t necessarily mean they don’t care about their music as much, it just means they value tonality over image clarity and detail.
But I still think the bottom line for answering the initial question of “what percentage better is this headphone from that headphone”, or “how much improvement should I expect to hear” ultimately comes down to how much you care about how well your music is represented, and then identifying which dimensions of headphone performance are going to yield the best representation for you. Is it detail? tonality? soundstage? timbre? etc., The very real differences that exist may only be perceived as a small incremental improvement to some, but it might mean the world to others.