This is a common mischaracterization of the Harman research, and I think in some ways this was bound to happen as a result of the piecemeal publication model - they’ve only got themselves to blame on that one I think. I’d encourage you to read through the research, but I too at one point approached it with this same kind of “why should we care what the unwashed masses like” notion. Turns out it’s actually a lot more sophisticated than that, even though there are also places to scrutinize.
A precursor to this would probably be to watch the video I did recently on the various stages of reading headphone measurements, for a bit of context as to the landscape for this discussion.
The Harman target we commonly use is based on the summing of free field and diffuse field head-related transfer functions (HRTFs) on a head and torso simulator, using the anthropometric pinna (GRAS KEMAR). This is why you see the rise up to 3khz and then it comes back down towards 10khz. These results will be unique to every human and every ear, but using a head and torso simulator they were able to get a simulated result.
Now it’s not incorrect to say that the bass to treble tilt is based on listener preference, and this is where two of the studies were influential - the 2018 one included the ‘untrained listeners’ group, which also resulted in more bass overall (although it was only by a small amount). I don’t include this bass shelf in the target I use because I tend to think perhaps a bit highly of those interested in high end audio. But also, given that a lot of us are interested in open-back headphones, I don’t think it’s realistic to expect open-back acoustic designs to be able to achieve this.
But in any case, what’s interesting about the preference tilt, is that there’s a lot that goes into the specific places where the adjustments were made. That is to say, this result for where the bass shelf is didn’t occur at random - or where people’s preferences happened to fall. They specifically targeted the bottom part of the ear gain for where the rise up to 100hz shows up. Or in other words, they put the adjustment there for good reason. This is also generally where the crossover for subwoofers would ideally be.
Now, as far as the level of the preference adjustment goes… all of this actually agrees with the generally more well-understood results for ‘good sound’ in speakers, which also has decades of research behind it. This is where any individual looking at this stuff has to recognize that this is a ‘reference’ curve for a reason. There is a cluster analysis that’s done in one of the papers:
Segmentation of Listeners Based on Their Preferred Headphone Sound Quality Profiles
This obviously means that it’s up to us to recognize that just because the majority prefer certain bass levels and bass to treble balance, that doesn’t mean we individually do.
The mistake that often gets made here is that the reference curve we end up using (the largest grouping), gets treated as prescriptive rather than descriptive. It’s not saying what you SHOULD like, merely what people DO like. It’s up to the manufacturers to decide what to do with that information.
So any analysis of a headphone’s frequency response in relation to this target should be thought of as “in relation to what most people happen to like”, not necessarily how all headphones should be tuned. Now, this is also where my own misgivings come into play. I think the other reason why that statement is correct (beyond just the question of bass and treble balance), is that the target isn’t fine-grained enough for us to know what the frequency response should be above 5khz, given the human ear and its effects. But for a general target, it’s still quite useful.
I think there’s an argument there for manufacturers to aim for the largest group if they want to sell lots of headphones, but beyond that, this says nothing about what you individually like, it’s just a reference point.