Developing a new headphone reference target

Well… I’d like to think that the exclusion of the unwashed masses in the whole ‘trained vs untrained’ thing is a bit of a misnomer, or what we might take from that should probably be scrutinized a bit.

To put it bluntly, I don’t think we should so easily put audiophiles into the ‘trained listener’ category, and if anything we’re more likely to muddy the results than contribute to a clear outcome. Moreover, the whole ‘trained listener’ thing requires quite a bit of consideration as well, as I remember that what constituted ‘trained’ was a minor sticking point when reading through the research and caused me to raise my eyebrows a bit. But, at the end of the day this is the best we have for publicly available research on this topic, so it’s what we go with. Perhaps a better parsing of listener types should be something along the lines of “those who strongly give a shit about tonality”, vs “those who only mildly give a shit or are otherwise indifferent”.

This was also the initial concept behind the Harman Combined target, in the sense that I felt it was a bit more appropriate for picky listeners when it comes to bass to treble delta than 2018. I could’ve just gone with 2013 but I know a lot of us find that to be too shouty at 3khz, myself included, and I also suspect this is one of the reasons for the treble lift in 2018, since, as you increase the treble above 3khz, the glare goes away with lower harmonics not dominating as much over upper ones. Also keep in mind that the region above 10khz is unreadable due to measurement artifacts, so it shouldn’t dip down there like it does to that extent.

There is also another element to this, which is that Harman is smoothed to 1/2 octave. And if you look at Harman in-room and unsmoothed DF, they’re fairly close there with the difference coming down to the Harman conditions, like the specific Revel speakers and that room. Overall ear gain level for an unsmoothed target would be higher around 3khz than what we use too, and I think this is a bit of an issue.

My take on this right now is that this is probably at the edge of what’s tolerable for folks. Like, depending on the ear, you may have a different tolerance for that region, but also we have to consider that for certain types of music and recordings, that high of ear gain just isn’t going to work. So either we bite the bullet on bad recordings and just say too bad for more aggressive music, or we aim to be a bit more conservative in that region for the benefit of versatility - again, just a thought. We don’t have that built into anything yet on the 5128, but it’s up for discussion.

The nice thing about using DF plus slope is that you get a better view of what the treble should be in relation to the highest part of the ear gain, which so far has yielded a massive improvement to treble timbre. But, we’re of course also discovering that this isn’t the same across different headphones. So, whatever happens we’ll need to build in boundaries around the target to reflect those differences - and also the fact that people have different heads/ears.


While a product manager in a totally unrelated field, I tried very hard to find new users that had little or no experience with my product. In your quest you are right in deprecating the trained ear of audiophile testers if you are developing your new target for non-audiophiles.

OTOH, if the target is audiophiles, they are IMAO, too opinionated to present a clear and homogenous target.

Audiophiles? Opinionated? This is the first time I’m hearing of this phenomenon.


Glad to see you guys are really putting some thought and research into this. Maybe 2 targets are in order? A “reference” sloped DF (maybe with minor tweaks) target and a “preference” modified DF target with less ear gain might do the trick. Then users can go off which ever they prefer and less compromises need to be made.


In my book Arya Stealth is close to perfect for orchestral and maybe live music but when it comes to modern pop music it’s too bright and the dip around 1.5k become noticeable. Making it too lean and put emphasis on 3-5k energy.

Is there any consensus on 100hz-1khz area ? Harman have it dipped around 200hz and slightly rising up to 1k. Some prefer totally flat. I’m curious about this region. Did you notice any effect flat vs slightly rising energy up to 1k ? Bass, ear gain and treble always focus points but this 100hz-1khz region is interesting too.


Yes, so with all apologies to Crin, it shouldn’t be flat below 1khz (we’ve chatted about this). Basically, with Harman the bottom part of the ear gain dips down, and that’s because it’s technically still part of the in-room result measured at the ear drum. Not only that, it’s also contoured similarly with DF. So yeah, we should include the bottom part of the ear gain, which is what that dip is. Now… with that said, when you apply a slope to DF, it effectively lifts that whole region, depending on the degree of the slope. But it still retains the same contour or ‘shape’ to it.

This is sort of a hidden benefit to DF plus slope instead of a shelf, in that it gives a bit more presence and ‘beef’ to fundamental tones that token that range and avoids the occasional leanness of Harman for music that doesn’t fully go down into the sub-bass, which I imagine is one of the reason’s why Crin lopped it off at 900hz in the first place. So, bottom line is that yes, the ‘shape’ should be included, but for those who find it perceptually thin, a slope will likely be preferred over a shelf. Personally, I like both styles of tuning, as a bass shelf is fun too.


I am certainly interested in the new targets HP dot com might suggest. After all, EQ is not the devil, albeit my 2nd choice (as I am lazy). More generally, I wonder how any target might be influenced by the input, i.e., on the music being listened to. If, at an extreme, my preferred genre is bass-free, then what happens there does not matter. And if I am a BASS-head, what happens there is all that matters, treble be damned. So what song(s) exhibit(s) the breadth to allow one to assess a particular target? (I worry this might border “dumb question” territory, so please humor me.)

I can see why folks would say that, but IMO everything matters, just depending on the person more or less in certain regions. And yeah bass level is a very strong factor in overall listener preference.

In my experience, oftentimes people don’t necessarily know what ‘good’ is or what they prefer until they hear it. It’s why you get so many posts on various places indicating ‘headphone x is the best thing ever’ - but really that’s with limited experience with other stuff. Now… That’s not to say they don’t enjoy it. Merely that they may have an even stronger preference for something else if they get a chance to hear it. Those same people may be back a few months later extolling the virtues of something else they prefer. It’s kind of what makes the hobby a rabbit hole in a way.


Thanks for the response!! I have about 88 thoughts on this, none of which are answers. My mental model is (input) → (headphone/iem transfer function) → “target” → (head/ear transfer function) → (output). Where (input) is whatever the source outputs (and there is potentially a bunch of stuff in that chain!! including EQ) and (output) is what my puny little brain receives. Accuracy is achieved by minimizing ∆ = |(output) - (input)|. Happiness is achieved by minimizing ∆ = |(output) - (imagined/desired input)|. My original question was more than a bit pedantic as I think max∆ is a function of the range of inputs allowed. OK, nerd set to off. It is a wabbit-hole, and I believe that a better model of a headphone/iem transfer function (which I don’t think are really the same, but that is for another day) than Harmon does exist in this model.


As far as your model of perception goes, I think what you’ve got there is right except you can remove the “target” layer, as that’s really satisfied by the head/ear transfer function. When we talk about a target, it’s really just meant to indicate a reference point what we think might be suitable based on that particular head/rig. So kind of like… using a head and ear to give an example of what ‘good’ might look like.

On different humans that’s also bound to vary a bit - not only will it look different, but even with person-specific normalization will there be some differences. I think overall we hear things more similarly than we do differently, but yeah there’s still some variation.

As far as happiness is concerned, that’s always a tough question but again your model is a good stab at it. I do think there’s oftentimes a difference between what people think they like and what they actually like (the Gladwellian examples come to mind), so maybe there’s more to happiness than “bass is great so I want it to have all the bass”.

When it comes to the largest audiences, there’s still an enduring trend of ‘extra bass’ marketing out there, in part because for a long time headphones kinda sucked. And in large part, smaller or lower quality transducers struggle with bass reproduction - think about the speakers in your phone for example. This is what a lot of people are coming from. So it makes sense that bass has for such a long time been a determining factor in the public consciousness of ‘sound quality’. In some ways it’s taken on an inertia of its own - to the point where even people who might not like a bass boost in practice still THINK its good, and overindex on that for their expectation of what constitutes ‘good sound quality’.

