@Ishcabible asked me to post here with regard to my observations on the usability of cheaper measurement systems and their issues, but given the focus of this thread, let me lead with how headphones are properly measured, and then explain the issues of the cheaper options, as it may give some context.
Warning: boring nerd stuff ahead.
For those not inclined to read that sort of thing, the TL;DR is that the EARS is more suitable than an earless flat plate for use in headphone measurements, but still noticeably flawed - but that the real systems for this sort of work cost quite a lot more than most people are inclined to pay.
@Torq has already covered the majority of the measurement chain in his first post - in the modern day, we typically start our generation process and end our analysis process in software, with a DAC and ADC (and our DUT) in the middle - and mentioned my favourite software suit for this purpose (ARTA), so I’m going to focus here on the really troublesome part of the system: the coupler that lets us measure a headphone with a microphone.
Conceptually, a headphone is a lot like a speaker - indeed, you can make a headphone from speaker drivers, or a speaker from headphone drivers (yes, those are Grado SR drivers in a line array) - and thus a lot of folks’ first inclination is to attempt to measure them like a speaker: in free field with an omnidirectional microphone. The results one gets from this will immediately show the problem: there will be essentially no low frequencies, and the higher frequency response is unlikely to match subjective perception either. The reason for this is simple: headphones, with a scant few exceptions, are designed to be used while acoustically coupled to your ears, and when that coupling is lost, their performance becomes quite nonrepresentative.
To address this, one could embed a microphone in a plate, allowing the earpads to seal and creating a pressurized air chamber, just as we see on the head. Long ago, you might have seen this sort of design as standard test equipment (although is so, it’s so long that I haven’t seen a standard for it), and you still see designs along these lines used for assembly line QC (GRAS 45CC, for example), as well as in the DIY measurement rigs some people construct.
Unfortunately, this still doesn’t match the conditions on the human head - we don’t just have a flat expanse of flesh with an eardrum in the middle, we have ear canals and pinnae that present a unique acoustic load to the headphone. The simplest attempt to mimic this which lives on to the modern day - although it really shouldn’t - is the IEC60318-1/“318 coupler” . It has an aperture roughly the diameter of the concha cavum, and includes an acoustic resonating circuit to roughly approximate the ear’s impedance. This was a step closer to accuracy, but sadly still quite far away, and I have rarely seen 60318-1s used for headphone measurements in the modern day (and to ill effect when they were - as an example, a measurement by Axel Grell using a Brüel & Kjær 60318-1 sim, displaying the HD600’s infamous 6khz peak ).
The lack of an accurate pinna substantially compromised the accuracy of the 318, and its flat metal plate, on which the headphone’s earpad is placed and held either by a clamp of sorts or by gravity, doesn’t really match how headphones fit on heads. Fortunately, the science of acoustically simulating humans marched on with time, producing the Head And Torso Simulator (HATS). Originally made for other sorts of acoustic testing - the oldest widely produced one, the KEMAR, was made for hearing aid and other audiometric purposes - they include an IEC60318-4/IEC711 coupler internally for simulation of the ear impedance, and use a standardized “average” pinna externally. The result is a reasonably accurate acoustic approximation of a human being, at least up to around 8-13khz (depending on your level of strictness), which is a pretty good testing tool for headphones intended for humans. Tyll Hertsens of Innerfidelity used a Head Acoustics HMSII.3 for his extensive headphone measurements, among many others.
Head And Torso Simulators which comply with the pertinent specs (ITU-T P58, IEC60318-7, etc) are the standard in the industry for measuring headphones, and while they are not perfect, they are the best we have, and yield generally quite perceptually relevant results. Unfortunately, they are also very expensive - the least costly I know if is US$22,0000 new - putting them out of reach of enthusiasts, and making them relatively pricey even for laboratory use. As a result, GRAS produces an anthropomorphically accurate ear on a plate with a 60318-4 simulator, the 43AG which retails for a comparatively moderate 5000-6000EUR to my understanding. It does not directly comply with an IEC standard for test fixtures, but with an accurate ear impedance and pinna design, it is quite suitable for task - Sean Olive is an iconic user of the 43AG, and the Harman target was derived using one (mounted in a styrofoam head, or so I’ve heard).
There exists an additional, related category of couplers which feature a pair of ear and cheek simulators on a large, metal fixture - the oldest I am aware of is GRAS’ 45CA, but recently both Audio Precision and Larson Davis have released very similar units - which is intended for measurements of hearing protection, but may also be used for measurements of headphones. Such designs allow a more head-like mounting than standard ear-and-cheek systems, while costing less than HATS, and it is my understanding that many headphone manufacturers use them for R&D and QC purposes. Like the ear and cheek sims, they lack accurate head geometry, but this is likely not a major difference for most headphones outside of coupling variation and the occasional real oddball like the K1000.
This is not the full history of the simulation of the human ear - and the IEM folks may be peeved at my omission of the 2CC coupler - nor a complete rundown of the market for ear and head simulation (a concept too boring for even me), but I hope it provides some context on how designs have developed toward the current standard and some of the options that presently exist.
So, that’s the state of the field as stands - we make acoustical simulators of human ears and heads, and the expectation is that you’re using one if you are measuring headphones, and, particularly, one which has anthropomorphic ears and a 60318-4 coupler in it - but why do you need to care? It would seem, from a quick look around the internet, that a lot of people are doing measurements without these simulators and not having any problems, so why go through all the trouble to make these (and spend all the money to buy them)? Can’t we just take a cheaper system - like a microphone in a flat plate - and apply a compensation to have its response match the fancier sims? That would appear to be the attitude expressed in these posts - that the difference between the couplers is, for the most part, consistent between headphones, and thus that we can make relative comparisons between headphones even on an inaccurate coupler. Certainly, that’s been the position I’ve heard most commonly from people using DIY measurement systems - the premise of presenting a measurement of a “reference” headphone so that the relative deviations in frequency response of the system can be seen is quite common, for example.
Unfortunately, this is not the case - different headphones will interact differently with a given coupler, and when your coupler deviates substantially from being anthropomorphic, those deviations aren’t guaranteed - or even likely - to match the ones that’ll happen on your head. As a demonstration, I have measured three headphones which I have on-hand and which I believe are fairly widely measured, the Sennheiser HD800, Hifiman HE-560, and Denon D2000, on my standards-compliant HATS, a Brüel & Kjær 4128C, a MiniDSP EARS, a DIY flat plate rig using a Dayton EMM6 microphone, and an older HATS which omits the 60318-4 coupler, a Neutrik-Cortex MK2.
This looks like a bit of a mess (and that’s after I averaged each of the five measurement sets that were taken on each system!), so let’s turn in into something more comprehensible. This is the difference between a given headphone on my 4128C and one of the other measurement systems, with all blue traces being differences between headphones on the 4128 and the EARS, green being 4128 and flat plate, and yellow the 4128 and the older, 60318-4-less HATS. All traces are aligned at 1khz, and you can likely disregard most of the variation below 500hz-1khz¹ due to the influence of leaks (the old HATS, in particular, is a bit of a pin head, so stuff leaks a lot more on it), but what remains is interesting: while the EARS and older HATS deviate mostly in the same direction with all three headphones out to 8khz or so, the flat plate is all over the place (and the EARS has a greater degree of deviation between the different headphones).
Here’s the same data visualized as highest and lowest value by frequency for difference, each colour is a different system. If the two lines are close together for a band, we can (at least for these headphones) pretty safely apply a compensation for frequency response in this area and expect it to apply consistently. Where the lines are far apart, different headphones are varying by different amounts relative to our anthropomorphic reference point at that frequency, and so no correction can be expected to consistently work. Here we can see that the older HATS, which uses human average ears with microphones in them, deviates to the same degree for the most part up to 8khz, beyond which headphones vary quite a bit due to placement anyway, so in this case it would appear that the emulation of the ear impedance was not all that important (which isn’t unexpected - the 60318-4-free Neumann KU100 is a fairly popular option for measuring over-ear headphones as well), while the EARS displays some pretty large variations albeit in thin bands, and the flat plate is completely variable.
So what can we take away from this? I’d say that we can safely say:
-
Earless flat plate designs are just not suitable for characterizing headphone frequency response. They vary substantially in ways which cannot be consistently predicted and compensated for, and as a result cannot be relied upon for perceptually relevant measurements of higher frequencies - which matter quite a lot.
-
The MiniDSP EARS is better in this regard, although not fully trustworthy - it’s the better alternative if you can only have one of the two, for sure, however.
-
A binaural microphone with anthropomorphic ears but no ear simulators might well be entirely suitable for headphone measurements for enthusiasts’ purposes - I believe that binaural mics are a bit of a growth market at the moment, so hopefully something with performance close to my old HATS or the Neumann KU100 will be released at a less painful price point, it seems like it could be quite useful.
-
It’s too damn expensive to measure headphones, man.
1: These are the free and diffuse field² transfer functions of an example 4128C, which are pretty close to the population average, and you’ll notice that there’s really not much adjustment below 500hz, and most of the change up to 1khz is a gentle rise into the ear resonance - sharp deviations in this area are more likely to be a leak in the headphone’s coupling than a legitimate flaw in the measurement system’s design (well, unless you count “let’s stuff leak” as a flaw, I suppose), unless somebody got pretty creative with the system.
2: I’m not touching this in this post for the most part, because it’s already too damn long, but I’ll briefly note that being able to characterize a measurement system in a given sound field is an additional merit of anthromorphic designs.