Headphones are inherently maladaptive devices due to their extremely close proximity to the ears they’re funneling sound into, which is different than… basically all other sounds in the world.
Consider a speaker. Due to their distance, their sound radiates into the space we’re in and interacts with the whole of our anatomy—head, shoulders, ears, torsos, legs, etc.—just like every other sound in the real world.
This group of anatomical factors and the effect they have on incoming sound makes up our HRTF, and our HRTF is always affecting all incoming sound in the world around us. More than that, because we’ve been acclimating to it for so long, we are also always expecting the sound of our HRTF. With sounds in the real world, we need not worry or think about it, because sounds in the real world interact with all of this stuff and our brain’s expectation is met. We don’t consciously hear the sounds in the world plus our HRTF, our brain subtracts the part it’s gotten used to (HRTF) automatically.
For this reason, we do not need to consider HRTF with speakers: they actually interact with the whole HRTF. This is why we don’t use Head and Torso simulators for evaluating speaker measurements, our brains already subtract the effects that our anatomy impart to the sound so we can just use flat microphones.
With headphones though, due to their placement on our heads (and the fact that they follow our heads when we turn them), they do not interact with our full HRTFs, and thus we need to evaluate headphone response in terms of the error relative to our perceptual expectation (HRTF), which includes the subtraction of the HRTF that our brain is still bringing to the processing of incoming sound.
In other words, headphones on human heads are only interacting with our ears at a fixed proximity and angle, and this means two things:
- the acoustic event at the eardrum is not colored by the full HRTF
- the brain subtracting the HRTF like it does for sounds in the real world introduces error, because the full HRTF is not present to be subtracted
Lets say I have a pair of well-measuring speakers EQed to a target I like, it’ll probably have a fairly smooth response above 1 kHz measured with a mic, but measured at my ear it’ll have large peaks and dips (but the brain subtracts the peaks and dips because it is used to these peaks and dips being there in the HRTF).
Now if I put an HD 650 on my head, the peaks and dips that are introduced because of that headphone’s placement relative to my ear (as well as it’s actual acoustics, of course), will not perfectly match my HRTF, because it is not interacting with it.
This maladaptive (ie. headphones do not perfectly adapt to the HRTF) character of headphones causes large peaks and dips relative to our HRTF expectation when measured on human heads, and my conjecture is that the degree of bass boost often arrived at in preference research is a band-aid to help balance the colorations that occur on human heads with headphones.