Diffuse Field: Calculate, Characterize, Calibrate

system · February 7, 2024, 12:59am

For anyone making the choice to delve into the realm of headphone and IEM measurements, hearing about Head-Related Transfer Functions (HRTFs) is a question of “when”' not “if.”

Since Crinacle and Headphones.com added the B&K 5128 to our list of measurement fixtures, the Diffuse Field HRTF in particular is being talked about much earlier in the average enthusiast’s journey than it may have been prior.

Despite the ubiquity of the term, I’ve noticed that the wider understanding of what HRTFs are and aren’t—as well as the understanding of why we calibrate our measurements to the Diffuse Field HRTF—is rather incomplete.

Some of the most common questions or misconceptions I see online about HRTFs and our current methodology are:

“What even is an HRTF?”
“Doesn’t Diffuse Field sound bad? Why use a baseline that doesn’t sound good?”
“Why are we compensating to Diffuse Field instead of displaying a target and measurement together raw?”

In this piece I’ll answer all of these questions, along with some others that enthusiasts likely wouldn’t think to ask until they’ve already traveled farther down the headphone measurement rabbit hole.

Fc Construct just put out a very digestible article about our move from GRAS to B&K 5128 data, but in this piece I want to make it crystal clear that buying a fancy new testing rig isn’t the sum of our ambitions for improving headphone measurements.

Our goal isn’t to deprecate one measurement fixture in favor of another. Instead, we’re moving towards integrating multiple test fixtures into a new mode of headphone measurement analysis.

I’ll talk about how we at Headphones.com believe the state of headphone measurements should move forward, the psychoacoustic and interpretive benefits of the change we are pioneering, and how to better utilize headphone measurements for corrective equalization, among other things.

Strap in folks, because this is gonna be another long one.

What is an HRTF?

A “Head-Related Transfer Function” (HRTF) is the sum total effect of how the head and ears affect sound as it arrives at the eardrum from a sound source at a specific point in space. Even minor differences in relative elevation, distance, and angle can have drastic effects on the measured frequency response at the eardrum.

Most of the common HRTFs we see online these days are taken using Head and Torso Simulators (HATS), which are measurement microphones embedded in a human-like manikin to simulate the anatomical factors of human listeners.

The 4 major Head and Torso Simulators used by most manufacturers or audio research bodies; GRAS KEMAR, HeadAcoustics HMSii.3, and Bruel & Kjaer Types 4128-C + 5128-C

HRTFs serve as a way to characterize the contribution of whatever head or HATS is being measured in that specific circumstance. In other words, HRTFs should be thought of as the frequency response of the measurement microphone (HATS) itself in a specific condition.

A “Free Field” HRTF is a single measurement taken at an ear in an anechoic chamber, from a flat-measuring speaker placed at a specific angle of incidence (azimuth) & elevation relative to the ear.

The below picture shows a group of Free Field HRTFs—solely changing the azimuth and distance—relative to the left ear of a GRAS KEMAR manikin. Seeing as this example entirely leaves out the variable of elevation, the differences we see between source locations may only get more drastic with more measurements taken with changing elevation.

Source: Brungart, D.S., & Rabinowitz, W.M. (1999). Auditory localization of nearby sources. Head-related transfer functions. The Journal of the Acoustical Society of America, 106 3 Pt 1, 1465-79.

Even so, we can see with this limited dataset that the differences between measurements are rather extreme, often on the order of a 20dB difference between source locations.

Interestingly though, we don’t really hear them as huge shifts in tonal color.

More on that shortly, but for now what you really need to know about Free Field is that before Harman, and even before Diffuse Field, the “direct frontal” Free Field HRTF (0 degrees azimuth, 0 elevation) was used as a target for headphone measurements. People designing headphones thought that approximating the sound at the eardrum of an anechoically flat-measuring speaker directly in front of the listener was a good idea.

There were still problems with this paradigm, as was pointed out by one man who challenged this notion.

Why Free Field Isn’t Ideal

In 1986, Gunther Theile published his paper, “On the Standardization of the Frequency Response of High Quality Studio Headphones”. It is the primary source for this article, and I encourage you all to read it (more than I encourage you to read this article).

Whatever I have to say here is a vast oversimplification of the ideas that Theile penned in that original publication, and I will state rather emphatically that any reader would be better served reading the primary source than my summarization. Please go read the paper.

The Diffuse Field HRTF

To get into what Diffuse Field is and why Theile is right, we first need to talk about speakers.

Speakers are physical sound sources that create sound pressure in a room that interacts with your anatomy at a specific angle and elevation. These interactions as shown in the prior section (along with timing and level differences between your ears) produce localization cues: the characteristics of a sound that give our brain a cue that something is coming from a certain place outside of our head.

However, if we were listening to speakers in a Diffuse Field—a space where sound arrives with equal power at all frequencies from all directions—turning our head wouldn’t change frequency response at the eardrum at all, since the sound is arriving with identical power at all frequencies regardless of azimuth or elevation. This would make the sound unlocalizable, seeming to come from either everywhere, or nowhere.

Source: Brüel and Kjaer Product Datasheet for B&K HATS Types 4128-C and 4128-D

These frequency response cues are not the only things that make up localization cues—interaural time differences (ITD) and interaural level differences (ILD) also play a part in our discernment of localization in the world. However, the frequency response differences are the only part that is relevant to using headphones when listening to stereo playback.

It’s worth noting that when listening to music recorded and mixed for binaural playback, ITD and ILD are simulated, because they’re generally recorded using a binaural mic—an artificial head. However, even for this mode of playback, Diffuse Field is generally still the ideal baseline (though perhaps necessitating preference adjustments) for reproduction, since popular binaural microphones like the Neumann KU100 are internally-DF calibrated.

When using headphones and IEMs for listening to stereo playback, there are no consistent interaural time or level differences, and the angle and elevation of the sound source is constant and moves with you. If you turn your head, the frequency response at the eardrum does not change at all. This lack of change means that headphones are, by definition, devices of diffuse localization, often heard as “in-head” localization.

Localization (or Lack Thereof)

Generally, we don't hear the large changes in frequency response resulting from localization cues in the real world as a tonal shift because our brain is constantly subtracting these massive frequency response changes from our perception of sounds in the world. As Theile says, “in natural hearing, spectral features caused by the directivity of the outer ear are apprehended in such a way that they do not occur as tone color defects.” (Theile 1986)

To reiterate: In the real world, our brain’s interpretation of location-based frequency response changes leaves us with the impression of only localization changing, not tone color.

A simulation (not a real measurement) of how the brain removes localization-based frequency response cues. This leaves our perception with solely the sound of the speaker (or any sound source).

Since headphones don’t change their sound as the listener moves their head, it should be rather obvious that we also don’t perceive changes in frequency response as changes in localization.

To be clear, while some may state that their psychoacoustic impressions of certain headphones have shown differences in “soundstage” or “imaging,” I’ve yet to encounter anyone saying that any headphone—even the most spacious headphone out there—makes the instruments in normal stereo music sound consistently localizable as objects in the same room as the listener.

While we don’t hear changes in frequency response in headphones as proper localization cues, we (perhaps obviously) do perceive changes in frequency response in headphones as changes in tone color. This brings us to an important point in Theile’s paper:

If we want a baseline HATS measurement of a flat speaker to characterize the anatomical contributions of the HATS in a way that makes sense for stereo playback in headphones, we should use an HRTF that has no localization cues baked into the frequency response, because any tonal coloration caused by localization cue(s) baked into an HRTF will only be heard as tonal color in headphones, not as a localization cue.

With the paper containing all of this in 1986, Theile proved that the Diffuse Field (DF) HRTF is the appropriate baseline measurement to characterize the effects that the human anatomy will bring to the frequency response of headphones, as Free Field HRTFs have localization cues baked in that will only be heard as tonal coloration.

It’s important now to delineate the difference between “a necessary microphone calibration” and “a preferential target response.” Diffuse Field is rather uncontroversially the former, and to that effect is actually the international standard. But of course, we know thanks to the work of several research bodies such as Harman that Diffuse Field simply isn’t preferred on its own.

Addressing Harman

Reasonably, when we’ve described our approach to people, we’ve often been met with the question “What about Harman? Diffuse Field isn’t preferred!”

We at Headphones.com are big fans of Harman’s research into listener preference regarding speakers and headphones, and their work has made it rather clear to us that Diffuse Field on its own isn’t preferred. However, there are significant reasons to desire using Diffuse Field as a baseline instead of their “In-Room Flat” measurement, and it’s largely up to what questions you are trying to answer.

Harman In-Room Flat

In 2013, Harman brought two Revel F208 speakers into their semi-reflective (not anechoic, not fully diffuse) IEC listening room, and equalized them flat in the room using a 3x3 array of measurement microphones. They measured the flat-equalized speakers—oriented symmetrically in a typical “stereo listening” angle of ±30 degrees—with a manikin equipped with the GRAS KB0071 pinna. The resulting baseline is henceforth referred to as their “In-Room Flat” baseline, and it is shown below as the black line:

Credit: Listener Preferences for In-Room Loudspeaker and Headphone Target Responses (Olive, et al. 2013)

Harman chose In-Room Flat instead of Diffuse Field as their chosen “flat speaker” measurement upon which to test preference filters. They opted to use this speaker and room configuration—akin to what one may find in a mixing or mastering studio—because a primary goal of Harman’s research in this regard has been to close what Floyd Toole called “The Circle of Confusion,” which describes an unfortunate loop that has befallen the state of audio device evaluation.

Credit: Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms (Toole 2008)

Those at Harman doing research into listener preference in headphones thought it prudent to make their baseline speaker measurement as similar as possible to the circumstances that music is made on—usually a 2-channel speaker setup oriented in a ±30 degree stereo arrangement. Doing so would mean that the preference target they arrived at for headphones would be backwards compatible with speakers, thus going a significant way to standardizing the target response across both devices, and potentially closing the Circle of Confusion.

While this is a goal worth pursuing, and this research helps answer some very important questions about headphone measurements, it doesn’t answer every question.

For example, the above In-Room Flat measurement is rather smoothed, thus any of the specific features of the KB0071 ears used for the measurement are lost to the smoothing, which negatively impacts our ability to fully characterize the contributions of the ears to the measurements of headphones made using said ears. Additionally, this ±30 degree stereo setup may add what may be perceived as tonal color to listeners, as the theoretical basis for humans perceiving frequency response changes as tonal color when listening to stereo music and not localization cues is strong.

SRF vs DF

Regardless of the methodological differences between these two baselines, we can see above that Harman’s semi-reflective field (SRF) In-Room Flat measurement and the DF HRTF of the GRAS KEMAR equipped with the same pinnae are really not that different. Indeed, the fact that In-Room Flat includes significant contribution from room reflections is likely why these two baselines are closer than not.

This marginal difference is why we at Headphones.com feel content taking the spread of corrective EQ filters applied to Harman’s chosen baseline (which we know more commonly in the form of Harman’s 2013, 2015, and 2018 targets) as well as the segmentation paper in 2019, and using them on our chosen Diffuse Field baseline instead.

As the theoretical basis for DF being ideal for stereo playback in headphones is still compelling, we feel it prudent to use DF instead as we still see it as compatible with Harman’s research, while also having benefits in answering questions that Harman’s work simply did not answer.

Additionally, we’re exchanging a single target curve for a range of preference based on Harman’s work, as this better serves the truth that Harman rather robustly showed: listener preference varies.

The preference bounds we’ve drawn from Harman’s research

We think our methods here are in-step with the goals and conclusions found in Harman’s research, while deviating enough to give us freedom to expand upon their findings by using data from multiple measurement fixtures—and we’ll get into why this is so important shortly.

While preference is very important, I want to focus now on the idea of a “necessary microphone calibration”, as it’s an idea that hasn’t been explored in our corner of this hobby nearly enough.

Microphone Calibration: It’s Good And Normal, I Swear

One of the biggest questions I’ve encountered in the last few months has to do with what we at Headphones.com used to call “target compensation,” but are now calling “calibration.” And here I want to drive home one of the (many) important aspects about our choice to calibrate our measurements using the Diffuse Field HRTF:

The Diffuse Field HRTF is not a “target” curve. What it is is the most psychoacoustically-relevant baseline measurement that we can use to characterize the expected contributions of the HATS to a headphone measurement.

Simply put: it’s mic calibration.

Think about measuring a speaker with a not-perfectly-flat microphone: If you just wanted to see a measurement of the speaker, you would calibrate away the frequency response of a microphone, right? This is why anyone measuring audio devices (except in recent years with headphones) subtracts the expected frequency response of the measurement microphone.

Calibrating a HATS measurement should be thought of in exactly the same way as the common-practice of using a measurement mic calibration file, as a HATS is literally a measurement microphone.

Subtracting the relevant DF HRTF subtracts both the total sonic contributions of the head and ears expected at the eardrum of the listener as well as accounts for the expected psychoacoustic circumstance of diffuse localization in headphones and IEMs, imparting no extra color from localization cues.

The components of a raw headphone measurement isolated to show what components remain after calibration

After this, we are ideally left with only the response of the headphone and the result of the specific interaction between the headphone and the ears/head—often called HpTF (Headphone-to-ear Transfer Function). HpTF describes the unique impedance interaction between headphone and ear—how the headphone load changes the response at the ear, and how the ear load changes the response of the headphone. We’ll come back to this in a later section.

When we don’t calibrate our measurement microphones, we open ourselves up to a whole host of perceptual and methodological woes when it comes to interpreting measurements. No matter how smart we think we are, we are all subject to misusing and misinterpreting data when we have unspecified or uncontrolled parameters clouding our analysis.

Raw data is both harder to interpret without making mistakes, as well as simply more cluttered. Additionally, not calibrating using the Diffuse Field baseline makes it impossible to do important and rather interesting things with headphone measurements that we should take advantage of as soon as possible.

Why Calibrated Data is Better

When we calibrate our measurements with a DF HRTF, it becomes both easier to digest and more useful for a few reasons (aside from obvious things like reducing clutter):

The Sine Illusion
Comparability to speakers
Isolation and characterization of measured variables
Comparability to headphones measured on other heads
EQ

Let’s start with the first and work our way down.

The Sine Illusion

In the above graph on the left, we have two lines trending in similar ways. Let’s say the top one is the headphone, and the bottom one is the target response, with the left side being “bass” and the right side being “treble.”

If we were to do our common practice of “target compensation,” subtracting the bottom line from the top line to see the difference between the two as an “error curve,” what do you think the resulting curve on the right would look like?

At a glance, one may see the two lines starting at an equal distance apart, trending closer as they travel farther to the right. This means the resulting “calibrated” measurement would be flat in the bass, trending downward in the treble (since the distance between the two shrinks).

The correct answer is shown below here.

And more examples are shown in the link here.

This guided interpretation is a very brief example of what’s called “The Sine Illusion.”

In short, humans are bad at discerning the difference of two curved but similarly-trending traces.

The Sine Illusion is a perceptual landmine that I have seen enthusiasts step on numerous times when looking at raw headphone measurements against target curves, often leading to incorrect interpretation of the data. We should take any opportunity we can to minimize the Sine Illusion whenever possible, as doing so minimizes the chance of data misinterpretation.

That being said, the Sine Illusion isn’t solved perfectly by Diffuse Field calibration on its own. The ideal solution would be a target that is universally regarded as “expected neutral,” making the error curve of the measurement “deviation from expected neutral.” This would mean we only need to pay attention to one line to know how something may sound, because flat would equal good… but most of us in headphone measurements know that no such target—that works exceptionally across different heads and transducer types—exists.

Even if it’s not a perfect fix, when you DF calibrate you are often making one of the lines—a target based on preference, most commonly—much simpler in its shape by removing the main HRTF factor that targets like Harman have.

While DF calibration isn’t perfect for solving the Sine Illusion, it’s a big enough upgrade to matter. With solely DF calibration you may sacrifice a little Sine Illusion when compared to compensating directly to a preference target, but the other things you gain makes the sacrifice worth it.

Comparability to Speakers

When using the best measurements of speakers we have available based on CEA2034 data output from the Klippel NFS System, the ideal way to read them for perceptual relevance is looking at the Estimated In-Room Response, which usually shows a good speaker to have a broadly downsloping response akin to a good headphone after Diffuse Field calibration.

Speaker sound is of course very different from headphone sound. Speaker sound conforms and adapts to your full HRTF, because it is actually interacting with all of it. Thus our brain does as it normally does, subtracting the effects of our specific anatomy & resulting location-based FR cues, leaving us only with “the sound of the speakers.”

This can be safely assumed, and this is why we generally don’t measure and judge speakers with HATS microphones: being able to assume the interaction between speaker and listener greatly reduces the variables we need to account for and makes well done speaker measurements among the easiest to interpret.

Headphones unfortunately still have to take the full head interaction into account, as the human ear expects the interaction of the full head. However, once we have fully-characterized the contributions of the ears and the head with the DF HRTF, subtracting those contributions from a headphone measurement gives us a result that is much more comparable to the Estimated In-Room Response of a speaker, only with the unavoidable HpTF effects also embedded in the frequency response.

I’ve personally had a lot of fun seeing how headphones roughly compare to speakers using this method, and it has been interesting to see how even “well-tuned” headphones differ above 1 kHz from the behavior of good speakers—the biggest difference being a larger overall magnitude of peaking and dipping on headphones.

While again, it’s not perfectly comparable, it’s a big enough step forward in the realm of comparability between headphones and speakers that I wouldn’t be surprised if this new method of comparison informed the design goals for headphone manufacturers going forward (or already has behind-the scenes).

Isolation and Characterization of Variables

How do we know if a feature on a headphone measurement is due to the headphone, the head, or the interaction between the two? The only way to isolate these variables is to objectively characterize the parts of the system as best as you can.

To my knowledge we have never been able to successfully and consistently isolate the variables of the HpTF interaction and the headphone. I personally wouldn’t be surprised if we never characterize a consistent transfer function between the two. Headphones interact in close enough proximity with whatever measures them that their response will always be rather load-dependent. In other words, whatever fixture the headphone is placed on—and the way it’s placed—will always affect the way the headphone itself behaves, and we would expect this behavior to occur on humans similarly to how it occurs on HATS rigs.

What we can characterize and isolate is the expected anatomical contribution of the human hearing system in a controlled environment—in this case, the environment most relevant to headphone listening—by subtracting the Diffuse Field HRTF of the rig measuring the headphone. This alone can lead to measurements being much less confusing.

Do you happen to remember Resolve, Crinacle or Oratory1990 mentioning the 9 kHz dip on their GRAS 43AG/45CA measurements, like in the below measurement of the HD 650?

There were myriad explanations for this dip across our community. Some said it was simply an artifact that wasn’t present in real ears, while some said it was present but specific to the interaction between that headphone and that ear. Some even claimed it was both present and necessary for “good sound,” or even things like soundstage.

When it comes to our messaging on Headphones.com, we’ve generally been clear that this was—at least in part—an issue with evaluating a fine-grained medium (headphones) against a coarse-grained target (Harman). Since we didn’t have the DF HRTF to characterize the head, we simply couldn’t be 100% sure what this feature was or what it meant.

I certainly would never have guessed that such a precipitous dip was present in the DF HRTF of the rig itself. And yet:

This DF HRTF measurement was measured and calculated by Blaine LaCross with the help of oratory1990. It is still currently in the process of being independently verified by a third party.

The large dip in this region seen on multiple headphones measured on this fixture isn’t a feature of the headphone or its interaction with the rig—it is (pending verification and validation) a feature of the rig itself! Thus we get our example of why DF calibration helps to clarify what would otherwise be more confusing: Being able to attribute features to the rig and calibrate them away can make the data we actually want to look at easier to interpret.

DF calibration means we can whittle a headphone measurement down to as few variables as possible: the headphone, and the way the headphone interacts with the head it’s being measured on. This isolation & characterization of variables is important on its own, but also crucial for the next big benefit of this method.

Comparability Between Measurements on Different Heads

Early on in our transition towards Diffuse Field, we at Headphones.com mentioned the comparability between measurements on different heads being a feature of DF calibration that we were excited to explore, but I think our messaging could’ve been better on that—some people ended up confused about what we meant.

I think our prior use of the word “compensation” caused people to think that after DF calibration, measurements from different heads would look nearly or exactly the same.

Of course, at this point we know that we can only isolate the response of the HATS—the DF HRTF—from the measurement, and can’t consistently calibrate away the interaction between a headphone and the head itself. This is where HpTF comes back into play: there will always be differences in the way the same headphone behaves on different heads thanks to HpTF.

For example: If we measure a Sennheiser HD 650 on five of the major rigs—B&K 4128 and 5128, GRAS 43AG/45CA equipped with either the KB006x or the KB50xx ears, and the HeadAcoustics HMSii.3—and apply rig-specific DF calibration for each measurement, we get a result that may look something like this:

This may seem a worryingly scattershot result upon first glance, but worry not! What you are seeing is something rarely if ever actually shown in the discourse surrounding headphone measurements: the spread of variation in how a headphone may respond on different heads!

When you put an HD 650 on multiple people’s heads, they’re not all going to report it sounding exactly the same, are they? Of course not! We know audio is anything but simple, and we know rather well that two people can and almost certainly will hear the same headphone very differently.

To someone with GRAS KB006x-like ears, HD 650 may indeed sound very dark above 6 kHz (Sennheiser veil, anyone?). But for someone with B&K 5128-like ears, HD 650 may even sound bright or thin, causing them to be especially surprised given the narrative surrounding HD 650 is “warm-neutral reference headphone”.

Who’s really right? Both of them!

This is, to me, the most obvious benefit of Diffuse Field calibration, and why I’m so excited to help Headphones.com bring forth this new paradigm of measurement analysis. If we’re trying our best to triangulate why people’s impressions may differ, this is by my estimation the closest we have ever been to having evidence to how it may differ.

Illustrating the range of behavior that a headphone can exhibit when placed on multiple heads has made clear to me that much of the disagreement you see online between enthusiasts is rather easily explained. Even discounting the private language argument Resolve loves to talk about, headphones are objectively behaving rather differently between heads. Now that we can see these differences as plainly as any measurement, hopefully we can start to palliate a lot of confusion (and aggression) that arises when it comes to incongruity of impressions.

Another benefit of being able to see the range of performance on different heads is that we can see which headphones adapt—keeping their tonal profile consistent—relatively well between different heads… and which headphones don’t. This is an aspect of performance I’m greatly interested in testing further, as wide variability in performance across users is something I think any manufacturer or consumer would want to minimize.

In my last big article The Shape of IEMs to Come, I mentioned how the Sennheiser HD 800(S) was among the lowest “acoustic output impedance” designs out there, which plays a part in its placement variation being among the lowest in over-ear headphones. HD 800S also has some of the lowest variation between measurements on different rigs of any headphone I’ve seen. This means that the tone of the HD 800S would likely be heard very similarly between a range of people: scooped in the upper-midrange around 2 kHz, with a large peak somewhere between 5-7 kHz and maybe another peak between 10-14 kHz.

On the other hand, with something like the Hifiman HE400SE shown below, we might find people having wider variations of impressions above 1 kHz. I could absolutely see some listeners calling it too dark, while I could see others calling it too bright. In fact, I could see a listener having complaints about elevations in basically any part of the tonality above 3 kHz here. Unfortunately, it would be hard to predict exactly where any treble peaks would be without testing the headphone yourself.

In short, interpretation of a single headphone measurement on a single head has never been all that useful in predicting how a headphone will perform across a variety of users. Utilizing the ability to see how a headphone performs on various heads is, by contrast, potentially very instructive. It may bolster our ability to communicate our own experiences, and maybe even aid in our understanding of how a headphone should be designed to maximize sonic performance for as many users as possible.

EQ

Even if you're one of the many readers not into EQing your headphones, I still encourage you to read this section, because it’s likely most peoples’ past attempts at EQ were—through no fault of their own—flawed at best.

The discourse and common-practice surrounding EQ is partly to blame here, as the most common mode of EQing until only recently was preset-based EQ to Harman 2018—either via Jaakko Paasonen’s incredibly convenient AutoEQ tool, or through Oratory1990’s widely-available Harman 2018 EQ presets.

Unfortunately, these processes largely—though not completely—ignore the factor of variation between listeners’ heads, which we can see above is potentially rather more dramatic than we may have assumed prior for some headphones. They also don’t account for the characterization of the head most of the measurements use, as we can see above the >6 kHz region on the KB50xx DF HRTF is much more jagged than the >6 kHz region on the Harman Target.

But what if we compensated for the variation between heads somewhat, finding an “average” of the spread of headphone + HpTF interactions and using that average for EQ?

This is precisely what I recommend, in fact, and what only DF calibration with measurements on multiple heads makes possible.

My website allows you to “Average All” measurements in the graph field, but you can also use Room EQ Wizard for this to produce an averaged DF calibrated measurement.

Once you have measurements of a headphone measured on multiple heads with DF calibration, you can average this data to have what is essentially an average measurement of the headphone + the average interaction it has with a head.

At the same time, you have an excellent visual representation of the range of possibilities in the “high uncertainty” region above 1 kHz that will contain features the headphone may or may not exhibit on your specific head.

For example, I’ve EQed this DF calibrated and multiple-heads-averaged HD 800 measurement above to roughly fit a flat -1dB/octave downward tilt under 3 kHz (but you could very well use Harman filters if you prefer that target).

I can see that on pretty much all heads there will be an elevation around 5-7 kHz on pretty much all heads, but I also see that anything goes after that point. I’ll need to check the area above 3 kHz rather carefully by ear to see exactly what corrections sound best. After doing so, I got the below (excellent sounding, to me) result:

The KB50xx measurement for example shows a very large, high-Q peak right around 10-11 kHz where I hear one, and it bothers me a ton. Because I can see it happening on at least one head, I don’t feel weird at all using a ~6 to 9dB dip to correct for a peak roughly in the same spot. Without this measurement data, I may have felt hesitant to make large corrections like that, even if it improved the sound.

Unfortunately though, there are still significant speed bumps to this method:

It is best to avoid using a single headphone measurement per head; unit and placement variation still exist
Not all DF HRTFs are created equal (we’ll talk more about this in another post).

That being said, DF calibration across measurements on multiple heads provides a more thorough look into what I’ll call “the range of possible outcomes” of a headphone’s performance. When using an averaged measurement for EQ, one can expedite corrections for the parts that likely won’t vary hugely (but still may depending on the headphone), while getting a better insight than we have prior into what kind of behavior is within the realm of possibility in the “high uncertainty” area above 1 kHz.

Conclusion

My colleague Fc Construct wrote an excellent piece detailing the step forward from the GRAS to the B&K measurement systems, so I thought it fitting that I provide a companion piece to aid in our step forward from raw to calibrated data.

The interpretation of headphone measurements across our hobby has been woefully bereft of the depth of analysis that a problem as multifaceted and complex as “headphones” merits. We have been using one head (GRAS) with one target (Harman) for years now, and many enthusiasts have—reasonably—shown misgivings about drifting slightly from this paradigm.

However, to us it’s evident that trading a single head and single target for multiple heads and a range of preference simply seems like the right way to accommodate for how listeners’ experiences actually differ.

I hope by now I’ve convinced some readers that DF calibration offers tangible benefits in terms of measurement legibility, understanding the separate contributions of the head and headphones, comparability of headphones to speakers, comparability of the same headphone on multiple heads, and corrective equalization.

In our view, continuing to prioritize raw measurement data on a single head would deprive the discourse and our community of these benefits, and for that reason we’re opting to display DF calibrated measurements along with our preference bounds derived from the existing headphone and speaker preference literature. I personally feel this method both best captures the range of possible outcomes, as well as gives us further insight into how something as subjective as “headphone sound” ought to be judged or even talked about.

Blaine’s article summing up our newest endeavor in data presentation itself is forthcoming, and with that we should finally be all caught up on information we need our readers to know in order to understand what we’re doing with measurements. This article mostly served to explain why we’re doing it.

We’ll try our best to keep improving things for everyone in our space, so measurements become both easier to interpret and use, and harder to misinterpret and misuse. Thanks so much for sticking with us as we evolve and grow. Until next time!

If you have any questions about this article, feel free to ping me in our Discord channel, which is where me and a bunch of other headphone and IEM enthusiasts hang out to talk about stuff like this. Thanks so much for reading.

This is a companion discussion topic for the original entry at https://headphones.com/blogs/features/diffuse-field

haidrojyn · February 7, 2024, 4:34pm

Excellent read, @listener. Thanks for all the effort you’ve put into writing this. Crazy to see the measurement Blaine and oratory did!

Rael67 · February 8, 2024, 6:17am

Great article. I learn very much in this forum.

AudioTool · February 8, 2024, 6:23pm

This is excellent information! I especially appreciate the “Comparability Between Measurements on Different Heads” section as I’ve often wondered about how much influence the interaction between the headphone and measurement rig has.

Sounds good. Where is your website?

testmonkey · February 8, 2024, 7:10pm

Such an interesting article! My comment is a bit in the weeds but …

I just wanted to remind everyone of something from a quote I heard and don’t know the source of, “what is measured is what gets done”. Having a common approach to a better measurement will give manufacturers a way to design out undesirable characteristics shown up by that measurement. Inconsistency increases uncertainty and leads to poorer design decisions.

taronlissimore · February 8, 2024, 8:59pm

Currently all lives on @listener 's squiglink here Listener's EQ Playground . You can play around with the EQ playground, scraped IEM and headphone measurements, etc…

Rael67 · February 10, 2024, 1:01pm

First of all, sorry for my bad english. I have a question concerning the spread of variations when measuring a headphone on different rigs. Lets take for example the Senn HD 650, that was measured on the five different rigs. Now you use a different set of earpads for this headphone that will give you for example less bass response. When you now measure the headphone again on the same rigs. Will the effect of using the different set of earpads be identical for all rigs? Otherwise spoken when you compare „rig for rig“ the FR graph with the first set of earpads and the different set of earpads, will the difference be identical?

SwedishMike · February 11, 2024, 3:11pm

Thanks you for the effort and new insights it brings to me.

Mike

listener · February 19, 2024, 12:52am

All of the units in that measurement of HD 650 on five different “heads” are averages of multiple units, in an effort to minimize pad-wear variation and unit variation

AudioTool · June 15, 2024, 12:47am

@listener I just re-read this article and I got more out of it the 2nd time. This time I understood the distinction between using DF HRTF as a calibration and using it as a target better.

Also I think the difference between HRTF and HpTF is clear to me now. If I understand correctly the answer to the question of “How does my personal anatomy affect how I hear this headphone?”, we have to account for HRTF + HpTF. Neither term includes the affects of the other. If that’s right then is there a term for the combination of the two? I think most people use HRTF to mean the combination.

Also I thought this Sonarworks Blog post was relevant to this topic: Top 10 best translating headphones - Sonarworks Blog

listener · June 17, 2024, 2:10pm

If that’s right then is there a term for the combination of the two?

I guess you’d just call it the total in-situ frequency response? I don’t think there’s an acronym for it but yes, it is a mix of HpTF + HRTF that we have to be accounting for.

That Sonarworks blog is interesting. I wish they published the underlying data or talked more about their methodology for gathering data.

Topic		Replies	Views
"Myths About Measurements" Discussion Thread Audio Science	113	1608	August 9, 2025
Measurements: Charts, Graphs, Software & Methods Audio Science	203	18125	November 26, 2024
Tuning EQ to your personal HRTF Audio Science	65	3159	April 16, 2025
Developing a new headphone reference target	35	6420	November 15, 2023
Understanding the Headphones.com IEM Measurements Audio Science	19	5337	January 15, 2025