A lot of it comes down to improvements in the bass and mids IMO. Like I think treble may be as good as can be done passively for a few headphones (unless luck is involved). And bass and mids is something easier to make great without personalization - like you’re not constrained by HpTF variation in lower frequencies, particularly with open backs.
And great bass and mids is more noticeable to the listener, unless you really screw up the treble.
I’m having trouble matching the words to the numbers. You’ve got only one headphone that rates an 8 in the treble (Susvara), and only 3 that rate a 7. Shouldn’t “as good as can be done” be a 9, at least?
Also, since I brought up the numbers, there are no 9’s anywhere in your ratings of bass, mid, and treble. So, whence the optimism that something could be made that scores 9’s in multiple areas when nothing today scores a 9 in even one area?
What am I missing here?
It’s easier to get bass and mids right than it is to get treble right. Yes there are certain limitations to passively tuned headphones, but it’s all doable. So if the treble is already an 8 out of 10 or 7.5 or whatever on certain headphones, that’s already pretty good for the hardest part to get right. Another way to think about it would be that… you don’t necessarily need to EQ to get 9.5 or 10/10 in the bass or mids, where as treble, because of HpTF variation and interaction with pinna effects, it would be a statistical miracle to achieve 10/10 without EQ.
Thank you ever so much
Well, we all hear things differently. Just because reviewer #1 rates Headphone X at a 9, reviewer #2 may only rate the same headphone as a 6. I don’t see how that circle could be squared.
Then, there’s variations within the same headphone model. There are a few well regarded brands where I’ve experienced this first hand. Needless to say, I now stay away from those brands.
Naturally… these are our standards based on how we hear these products. We’re not claiming it’s going to be like this for everyone, just for us individually. You can’t expect to get any perfect indication of headphone performance that’s going to track identically for yourself due to how you impact the HpTF.
I understand the thinking behind this proposed reset, but how necessary is it really? If there isn’t any headphone that can get 8/10 without EQ, and just about every headphone can reach 9/10 or 10/10 with EQ, and there isn’t perfect correlation between “EQability” and whether something is 5/10 or 7/10 – which seems indisputable, actually, if bass and mids and treble are averaged without weighting – then you may as well just score things based on how they are without EQ, and, if you want, append to every review that everything can become nearly perfect (or anything else?) with EQ. Put another way, what’s the point of saying something is 5/10 instead of 5/7 or 8/10 if whatever is left off the board can be made up with EQ? I suppose the answer might be “Well, we want everyone to understand how important EQ is!” Fair enough, I suppose; and I suspect that is the actual answer, whether acknowledged or not.
I posit that no rating system for a consumer product is valid absent an explicit cost component.
That’s not quite it. Headphones can get 8/10 or 9/10 without EQ (or even 10/10, but it’s unlikely a passive product achieves a perfect score like that), and they do in certain spots for some of us.
We just said to ourselves that headphones should get held to a higher standard than they currently are, and that means taking what we consider to be the full range of possible sound quality into consideration. It has nothing to do with ranking things with or without EQ, EQ is just the tool that allows us to understand what sound could be given the current compromises we see in headphones.
I like the direction this is going. However, I think that any rating system that is using a mean ideally needs more than 3 data points to be truly useful. One dissenting opinion can really bring the score down a lot, bringing everything closer to the average.
Too much clustered in the 5-6 range makes it hard to differentiate.
I like Listener’s reviews and the balance he brings to the team, but also find that he clearly has an HRTF that is further away from what I hear than Goldensound. Not that it isn’t valid, but his takes are so different that I feel it skews results too much in a small data set.
But at the same time, his experience will be super relevant for others and should be included. This is by no way insinuating that Listener’s thoughts are not important. In a larger data set, he actually starts to be the more important review to give balance.
It’s a shame DMS is no longer with the team, and I could see FC Construct’s thoughts being a good inclusion. In an ideal world, if you could get them, Josh Valour, Zeos, and some other reviewers the team has collaborated with in the past to contribute, I think that you start to compensate for both unit variation and different HRTF and make this even more useful. It will also start to reward headphones that have less unit variation and are less affected by HRTF…which is kind of what we would like in an ideal world isn’t it?
Outside of this, I find that I more closely relate to Goldensound’s HRTF and give his reviews a little more weight for my personal research into a product. Maybe being able to add a weighting system to reviewers that you find matching your findings in the table helps here?
I hope I don’t sound argumentative in replying here, as I do think you’re being earnest, and I learn a lot from these threads; and I’m perseverating a bit only because I think this is an interesting thought problem relating to the more general notion of how we rank and order things. That said, I’m struggling a little with the claim that headphones can get to 8/10 or 9/10 when, according to the rankings, but for two headphones ranked by just one of three reviewers (one of which costs like 60 grand or something, IIRC), nothing ever has. (See below.)
Perhaps I also have a bias for the A/B/C etc. format of TierMaker lists, which just seemed neat and clear. I should acknowledge that. (Maybe I also should acknowledge I’ve long believed that Pitchfork’s album ratings, using a decimal place, are ridiculous. Nobody should believe that whatever evaluations they are doing produce any output warranting that degree of purported precision. Maybe that’s influencing me too.)
In any case, consider the following: Suppose we had 50 kids and we wanted to show everyone’s height. We could use something like a bar graph. In our first go we set the y-axis from 0-100 feet tall. We’ve made a mistake: everyone will look like specks: we won’t see differentiation. If we set it from 0-3 feet tall, we’ve made a mistake in the opposite direction. It makes sense to fix either of those mistakes. Now if we set it from 0-8 feet, and it manages to capture everyone, but someone comes along and says that it’s possible there will be a new record-setter in height, and we need to set it from 0-10, we’d be hard pressed to rule it out, but it’d also be fair to wonder why we needed to do that, if nothing happened to the relative positions of what we’re measuring.
Just because nothing has yet doesn’t mean it’s not possible. A Susvara with a bass boost would be better for me than without, and that could exist. Again that’s the whole point of considering the range of possible sound quality, not merely the range of products that currently exist. The idea is to hold everything to a higher standard, regardless of price. Moreover, look at the individual scores given to various aspects of the sound quality, even I have given several 8.5s and 9s on the board. But if we’re honest about the compromises, those products get brought down overall by other aspects to do with their sound quality. Imagine they didn’t have those compromises. Then those products would be rated more highly.
Well for starters, with human beings, it’s rather difficult to EQ them to be taller to see what it would look like for there to be a taller human being. But more to the point, the analogy to sound quality isn’t exactly appropriate, because we’re making qualitative judgments by our ranking list, not a measure of absolutes.
Think about it like this, say you have a range of possible eating experiences that you could rate out of 10, and you’ve been to all kinds of fine dining restaurants. The food from a burger king isn’t going to reach the high bar of what your food experience could be, and you know this because you’ve eaten all kinds of fine dining. Now I’m not saying all headphones are the equivalent of burger king, but currently, we’re similarly not getting the best experiences available.
Whilst I am in favor of the improving the evaluation process, this example is where the evaluations seem to fall apart. The Austrian Audio Composer score variations are too wide, from 7.0 to 2.9 is outside the range of what one would expect to see. Not sure how someone new to the hobby would work out if it’s a 2.9, a 7.0, or something in between.
Conveniently, we have a video explaining precisely why this can be the case. You may have seen it already. Say for the average person, this headphone has around 5dB excess upper treble. Now imagine if for you that was instead an excess of 15dB upper treble. That’d likely have a fairly negative impact on how you rate a headphone.
I think Listener is also just less tolerant of treble generally and does have a preference for warmer tunings. But you can imagine this kind of outcome if one person is literally hearing these headphones as having significant excesses in certain regions where others don’t.
I think you’re overlooking how much personal variation (HRTF, HPTF, unit variation, etc.) can impact an individual’s experience and thus their opinion/ranking of a transducer. You continue to make statements like this, as if experiences that would or do differ wildly than yours are simply unbelievable. Well, believe it, because as Resolve outlined above, in situ response can vary quite a bit from person to person.
I’m not overlooking that. I’m pointing out the significant limitations associated with the subjectivity of this process. You’re making a “assumption” that is simply not valid. My experience has nothing to do with the statement. The observation has to do with people who are perhaps just getting into the hobby, and see what appears to be significant contradictions with evaluations.
How is a trained listener’s experience with a transducer on their head a contradiction? What exactly is the contradiction here?
I believe objective data and subjective experiences are equally important. But in the end, we don’t hear the music by staring at a frequency response measurement of a headphone while comparing it to various targets. We listen to that headphone on our own head using our own ears, and sometimes one person’s experience (and preference) will differ dramatically from another’s.
Tl;dr, there is nothing unusual about the rankings differing so greatly between Resolve, goldensound and listener, as it all comes down to their subjective experience. Thankfully, they believe data is important as well and are going to great lengths to try to explain why they each have those differences in subjective experience and preference using some of that data.
You can right click the reviewer’s image header in the Overall list to remove them from the averaging, so if for example a list of only Cameron and Andrew’s averaged scores would be interesting to you, you can just right click my image header.
Preliminary testing seems to suggest I prefer around 5-6dB bass to treble delta when HRTF is accounted for, which is what I think most would call a “lighter” tilt.
