Measurements: Charts, Graphs, Software & Methods

The latter bit sort of answers the former there, in a lot of cases - there are other cases where measurements aren’t representative of on-head performance, but a lot of times folks extrapolate a bit much in terms of how a measurement will sound, IMO. “doesn’t sound like it measures”, to me, means one of two things - either 1, the measurement doesn’t match what’s going on on your head, at your eardrum, or 2, the expectation you had about how measurements correlate to sound wasn’t quite spot-on.

Just to check, do you mean the high Q/sharp notch that’s right around 3khz, or the general dip in that band that Resolve showed? High Q notches like this one are generally known to be pretty difficult to subjectively detect (there’s a section in Harman’s How to Listen for finding them which is pretty neat!), because the thinner the band that’s notched out is, the less likely that a significant amount of musical information is going to fall there.

The generally being below target in the ear resonance band I’d expect to be comparatively much more audible, and to dominate the subjective impression of that part of the upper midrange.

Oh absolutely - measurements will most certainly lead your subjective impressions (or at least prime them). Of course, so too will the design language of a product, the circumstances you hear it under, etc - I wouldn’t say we “shouldn’t” measure before we listen, but if we do, it can definitely have an impact on what we hear. However, if we wanted to avoid impacts on what we heard, we’d need to blindfold ourselves and have volunteers carefully lower headphones into position on our ears, to avoid the other biasing effects :stuck_out_tongue: It’s absolutely a significant effect and worth considering, however, and generally I think best practice as a reviewer is starting the listen > measure > listen loop with listening.

For consumers, however, unless you have easy access to a panoply of demo headphones, measurements are among the best option for narrowing down purchase choices at the moment - in that case, I’d definitely recommend that any of us with a budget constraint look before we spend, even if that means coming in with some priming for what to hear (perhaps it’ll push us in a positive direction, even - nothing wrong with biasing you to like your headphones more).

IMO it would be more ideal for “virtual demo” type systems to be used, either via binaural recording or equalization presets, but these have a lot of issues in implementation - they’re more intuitive to users, but they’re very, very hard to get right.

Kind of a complex one. I’ll quite strongly avow that anything which is audible in sound is measurable. That does not, however, mean that everything which we perceive when listening to a product is measurable in its sonic output. The classic example here is the visual impact of a speaker on perception - Sean has a great example with larger vs smaller speakers in this blog post - but for headphones ergonomics, styling, brand image, and probably even little details like personal history with a brand will likely have some influence on what we hear. Our subjective experience of sound is a complex interplay of psychology and acoustics, and we can’t (yet?) measure the brain side of things.

Perhaps when we have probe microphones that come with a cap of brain probes to monitor neural activity, we’ll be able to perfectly predict how a headphone is heard from measurements, but for now, I would say at least that we can predict the aspects that are based on its actual sound output. That said, I don’t think that 100% of information is contained in a single frequency response plot on an averaged ear - I talked a little bit about this with @Resolve on his podcast, although I didn’t get as far into the weeds as I’d planned to.

Absolutely agree, and I’m not sure how anyone could disagree with this position (unless you didn’t believe in biases, I suppose, but that seems a bit…eccentric).

As far as DACs and amplifiers - presuming we’re talking about equipment that’s “sufficiently linear” under the conditions it’s being used - I think you will find that experiments in limiting people’s exposure to such products to only the signal effects alone tends to support the idea that a lot of what’s heard there isn’t coming from the transfer function of the DAC or amp. Which, of course, is fine - if it sounds better to you, that’s good, right?

5 Likes

Family should always come first :slightly_smiling_face: Please don’t feel rushed to reply - I’m kind of at loose ends this last day or so, so I’m being hyperactive on forums.

For tubes, I think that’s just a subset of the question of “sounds better than measurements suggest” - and that, in turn, hinges on why we think measurements suggest that. Tube amplifiers, in many cases, have distortion that isn’t substantially higher than the devices they’re sending the signal to - and often quite a lot below the thresholds we’ve hashed out for audibility of low order distortion products. Would we expect this to sound worse? I wouldn’t, at least.

And, for the stuff that’s within the range of audibility…well, if you measure something, and people say they enjoy products that measure that way, it strikes me that this is an indication that you should take that measurement as suggesting it’ll sound good. I’m not convinced that’s the main factor in play with tube amplifiers, but in a hypothetical where we see a proven-to-be-audible amount and sort of distortion, and people prefer stuff with it, that’s not a “bad” measurement to me as such (at least with “people will like this product” as our bar for “good” - bear in mind, this is subjective!).

1 Like

This follows from the evolved, biological hearing system. Hearing cannot be directly changed or improved without humans becoming a new species. Technology has a known target and can easily/quickly improve to meet or exceed its potential.

Synesthesia. There’s a world of research on this topic. There’s a world of market research on the union of psychology and sensory influences.

We can measure an awful lot about the neurology of hearing right now.

Psychoacoustic research has been going on in earnest for several decades. I’d say Bose and MP3s were built on it.

We do, essentially. The tools are just stuck in research labs, expensive, difficult to use, invasive, and probably not economically worth the effort. Our current solution is to have variety of products to match personal preferences and personal anatomy. That’s why Grado, Sennheiser, and Beats survive.

2 Likes

I’m pretty sure that this is not a reflexive property. In other words, anything measurable is audible.

Further, while it may be measurable, that does not mean that we know what we are trying to measure. There may be some property, ratio, or relation that we have not yet developed the means or theory to measure.

An example might be if there is any time or phase shift induced by passing an audio signal through various wires, of various compositions and configurations, (diameter, length, material, dielectric, geometry, number of strands, etc.). Probably 30 or 40 years ago, we would have been hard put to measure much other than resistance. Have we got all the parameters figured out now? Are you sure?

What about the effect of random neutrinos? (That’s a joke, son, as Senator Claghorn, or Foghorn Leghorn would say.)

3 Likes

This is something I think about a lot actually… how many factors are missing from the equation? I mean for a long time people thought the world was flat!!! (Some still do, which is just ridiculous).

So, with that in mind, I always like to question and play devils advocate as much as possible… We only know what we know after all… and we don’t know, what we don’t know :wink: Profound!!! lol

I personally think that this is ultimately the issue with a lot of things… people think they have it all figured out until, well they don’t … as long as we keep pushing and questioning and driving towards better and better understanding. At the end of the day it is all just a moving bar of understanding and discovery…heck we haven’t even mapped the human brain properly or know fully how it works ( I look forward to becoming one with the machines :wink: ) anyhow… time to put the daughter to bed…

5 Likes

Haha yep, in the end this is just human nature as it has always been. We instinctively fear the unknown.

Some have a natural desire to challenge the unknown, some just avoid it, but then there are those who free themselves from it by simply imagining it away by projecting whatever truth they’d be comfortable with. It’s just the instinctive fight - flight - freeze response.

2 Likes

Well to quote Buckaroo Banzai… ‘wherever you go, there you are!’ :joy:

3 Likes

I think Olive’s work is fascinating, though I have a lot more studying to do.

One thing I find interesting with a ‘preferred curve’ is that our preferences are largely shaped, I think, by what is familiar.

Ive seen evidence that younger listeners actually prefer the ‘modern’ equalization with compressed dynamics and extra ‘loudness’

So will the curve change as the preferences of the testees change?

I know there is a lot more to it than that and maybe I’m understanding the target curve incorrectly, but personally, I tend to prefer less bass - or maybe I just haven’t heard good bass from head phones :slight_smile:

And I’ve also become more sensitive to ‘brightness’ over the years.

A lot of people have never (or rarely) heard unamplified natural instruments - so what is their frame of reference for what sounds right?

Even acoustic instruments are usually amplified and reproduced with speakers, which have their own color.

I’m also not sure where timing and phase shifts show up in current measurements. I’ve seen evidence that timing differences are readily and statistically identifiable - but not by everyone - and that this can be the result of learning and experience - with musicians and directors better able to hear this.

Does that mean I can hear them? I’m not sure, but the speakers I grew up with, Vandersteen, are specifically designed with phase and timing elements to match the sound of masters that Richard Vandersteen recorded personally and heard live… They are often described as ‘laid back’, I think for this reason.

And I particularly enjoy accurate and precise imaging, after timbre and tone… How do I understand and interpret that from current measurements?

Personally, I ascribe to the idea that virtually everything can be measured… But do we know HOW to measure it and understand what it means and translate it into our own listening preference and enjoyment?

2 Likes

Well, that’s one way to frame it, certainly. I tend to put that out there to delineate that I don’t think there’s some “x factor” that’s only present in our ears (a premise you run into sometimes, implicitly or explicitly, with some golden eared folks).

A very good point, when I said “we” I was referring more to the body of people working on measuring headphones and other sound reproduction products - I’ve yet to see an AES or ASA paper with fMRIs involved (although that’d be extremely cool!). Given the necessity of magnets in moving coil designs, perhaps fMRI-based research is a good case for e-stats :smile:

Absolutely, there are many, many things we can measure that it would be ludicrous to suggest we can hear (ultrasonic and infrasonic sounds, distortion and noise products south of -20dBSPL, etc). I generally consider a reasonable “objective” maxim to be “any audible component of sound is measurable; not all measurable components of sound are audible”.

I’m slightly confused by this example, as in the 80s or 90s - heck, in the 50s or 60s - we could definitely look at delay and phase shift by a number of means. I’ve seen papers on audibility for those factors going back at least into the 70s, and as much as anything that probably reflects me being too young to have encountered anything older.

I’m not sure that we’ve got the best systems for interpreting some of what we presently measure - Olive’s work on the Harman target and its associated statistical models is a good example; frequency response measurements aren’t new, but his approach allows us to make better inferences about how they relate to human preference and is quite current; various approaches to weighting of distortion products by audibility (Temme, Geddes, etc) would be another example - but I’m pretty sure that we can measure all the behavior of an audio transducer that impacts how we hear it.

Whether we’re choosing to make use of all the measurements we can do that are useful, whether we’re making ideal use of the data we get, I think that’s an area where things are a bit more ambiguous - but at a basic level, it’s just not that hard to measure sound. We’ve been doing it for an awfully long time, and as @generic notes, our ears kindly limit our requirements for audibility to reasonable levels.

Intuitively, I would tend to think that what is preferred in influenced by what we’re exposed to, but the history of research on preference seems to indicate that if that’s true, our influences are pretty stable. From the 80s to present, there have been mild refinements in headphone target response, but with a reasonably strong agreement (I actually crashed @Resolve’s live stream to argue in defense of Gunther Thiele’s target from 1984 as being closer than people give it credit for to Harman’s baseline some time ago), and to my understanding the speaker world has seen an even greater continuity of agreement in what’s preferred, from Bruel & Kjaer to the NRC experiments to Olive’s more recent work.

As far as the preferences of age groups go, that’s actually something of a bee in Sean’s bonnet - this blog post has some pertinent links but you can find a few more in the Harman paper family regarding how various factors (age, nationality, experience with audio, etc) correlate with preferences in headphones. Here’s a stand-out example.

Time/phase behavior is entirely measurable, but in the case of amplifiers and DACs there should be no meaningful group delay or phase shift in the audio band (in the same way that there should be no deviation in frequency response), and in the case of headphones their nature as minimum phase devices makes the phase component of their behavior directly correlated with their frequency response. So it’s “there” in the measurements, even with a frequency response curve alone…but there’s nothing to see, really. You can certainly make plots of phase shift versus frequency, or group delay, or what have you as you so prefer, but you’re not getting to new information unless something’s gone pretty severely wrong somewhere in that headphone.

I end up pointing at the work of Thiele a fair bit relative to my contemporaries, but this is part of why. He’s always concerned himself quite a bit with timbre and with the localization of sound in space - his 2016 convention paper is quite interesting reading, including some commentary on spatial localization of sounds and the “sound image”. Unfortunately, it’s not freely available. His much older but utterly seminal in the field paper, On The Standardization of the Frequency Response of High-Quality Studio Headphones, is free however, and it’s a really worthwhile read if you have the time and interest. Theile’s take, as I understand it, is pretty much identical to my own, and would be that frequency response is the primary contributor to accurate timbre and imagine, with suitable recordings. Thiele’s specific argument, which I tend to sympathize with, is that the recording-to-playback circle is closed properly with a diffuse field equalized headphone and binaural recordings - outside of this case, he’s argued for virtualization of speakers with HRTF modeling and head tracking for stereo recordings, and I’d tend to suggest that to people looking for accurate imagine and soundstage for headphone reproduction. Smyth Research does some fascinating work in this area as a standalone product (the “Realizer”).

That’s a much higher bar to set :laughing: I think that the only reasonable answer you can give there is “to some extent”. We definitely have measurements which we can reliably correlate with people’s perceptions . They don’t cover the full extent of what people perceive when listening to things at present (or, at least, won’t perfectly predict how people will rank different options), and there are areas where I think new ways of analyzing and gathering data could be useful¹, but we definitely can get a pretty useful picture from our current analyses, and our data acquisition probably isn’t the major bottleneck to improvements.

Getting from “putting weird alternating voltages into the product” to “predicting how people will feel about it more accurately” is definitely the area where audio metrology has the greatest challenges, in my perception, and that’s a major part of why work like Sean Olive’s and Floyd Toole’s is so vastly important - but I’d like to think we’ve made a fair bit of headway thus far, and that we’re continuing to improve.

¹: Specifically, I’ve got a really strong hunch regarding variations with in situ frequency response for headphones impacting people’s impressions of them, possibly with individual HRTF interactions

3 Likes

THIS. It’s taken me a long time to stop being an audio purist and this is the biggest reason. Over 90% of what I listen to is electronic or rock recorded in a studio. I have no true reference as to what sounds “correct” and even if I did I wouldn’t want the sound of a real live rock band coming out of the middle of my head. :exploding_head: So now I’m more of “if I like it better, it is better” IE: correct for me.

Accurate soundstage is my favorite part of a good speaker system. But I don’t really care about it with headphones since, in my opinion, there is no such thing unless you are listening to a binaural recording. I wish I could afford a Smyth Realizer. I’d probably change my opinion.

Interesting because I have found that differences in soundstage between headphones seem to disappear when both are equalized to the Harman curve. However I’m obviously not really paying close attention to soundstage when comparing headphones.

Thank you for this discussion. There are probably better examples. In part, I was thinking more about modest testing rigs, and not those that one might find in peer-reviewed studies. I was also thinking about relatively short runs of wire, where even a measured difference might be inaudibly small. And finally, I was thinking about tests that may be performed by a company with a bias to support some particular claim.

I used to be in scientific, technical and medical publishing, and having been on the production side of some journals, I’m well aware of my own technical shortcomings, particularly when the math gets heavy.

But still, looking at the chain as a system, it’s not just wires, and solder, and solder composition and how an IC might be seated, algorithms used in DAC and ADC of a recording, I think that in some cases, we are simply not that far in our understanding from The Butterfly that Stamped.

3 Likes

I think there is room for both. I’ve been to concerts, classical performances, and rock shows of all kinds. Some of the performances did sound gawd awful, like Metallica in an enclosed indoor sports arena. Others were amazing like Tom Petty or Phil Collins at the Red Rocks amphitheater, or a symphony, or the silly little outdoor concert I saw in the middle of nowhere Virginia where Little Feat and Pat Benetar performed.

But when I listen to the same songs I heard live in concert on my home system, I almost exclusively prefer the studio versions. The live recordings aren’t live. It’s not that I don’t know the difference, it’s that in knowing the difference I cannot stand recorded “live” music because THAT is what definitely doesn’t sound real anymore. Get my meaning?

I also prefer headphones and sources that are colored, and not neutral or technically correct. Even though I know what real instruments sound like.

Liking something that moves you isn’t wrong. Even if that means you’ve never heard a real instrument, or a digital recording. Even if that means others think it is incorrect. Implying that others who haven’t heard live instruments should have their opinions dismissed out of hand is just wrong.

6 Likes

I totally agree it is all about enjoying the music - and so many factors contribute to that - even if they are just my expectation biases.

The science of music reproduction is fascinating to me and the engineering and various approaches to equipment design is too. I DO think measurements are important and give us important insight.

But I am most definitely not a purist in the sense of trying to ‘reproduce what the artist intended’… How would I even know what that is, and while that is an interesting idea, maybe I like and enjoy their work differently than they do anyway!

The only personal reference I have to what ‘sounds right’ is comparing to unamplified acoustic instruments like piano and guitar - and even that varies from instrument to instrument.

I reason that if that sounds ‘natural’ and pleasing to me - that other music is being reproduced ‘reasonably’ accurately as well.

But at the end of the day, the most important thing to me is do I want to listen to another song?

It is about the emotional connection to the music and those ‘my God, it sounds so beautiful I want to cry’ moments… And the nights I want to relisten to all my favorite music because it is enrapturing and I stay up too late and show up blurry eyed for work the next day!

I think understanding what the measurements tell us can help me find gear that I will like - hopefully more than what I have. And if I find more pleasure in something that is not totally ‘neutral’ or ‘accurate’, I’m OK with that.

I’m not looking for the ‘perfect’ system but the one that is ‘perfecf’ for me - which may change over time as my preferences and experience evolves - and as my ears get older! :joy:

3 Likes

Btw, I wasn’t trying to pick on you. I’ve seen that line used before in condescending ways and it irks me every time. I understand that wasn’t your intent

2 Likes

I didn’t take it that way at all - just good conversation and exchange of ideas.

And I certainly didn’t mean to sound ‘snobbish’.

Comment about acoustic instruments just means that they are my only PERSONAL point of comparison - and whether others value that or not, only matters to them.

It was not to put anyone down, just an observation that very little music is heard without some color from amplification of some kind - the conclusion being, that experience will affect what sounds ‘right’ to us because it is what we know and what we are familiar with.

I made a comment on another site a few weeks ago that my wife enjoys music from the Amazon echo. I don’t… So much :slight_smile: but I would never want to take away her enjoyment by telling her the frequency response sucks… I’m much happier that she shares the love of music and that we find songs we like and share with each other.

One of my kids loves the original beats - I don’t :slight_smile: but I love that he loves music! And man, that kid can sing and has amazing rythym and can make up new lyrics on the spot about any topic to almost any song… Wish I had 1/2 the musical talent he does!

5 Likes

There is more than familiarity. Each person’s anatomy and physiology are slightly different and this has an impact. Furthermore, human audio processing (as shaping perceptions) does change with training. Age also plays a role, as hearing tends to decline over time.

I used to commute with a set of Beats IEMs on the subway. They were actually a good choice because background noise (i.e., trains rumbling by) tended to mask the bass with ordinary non-Beats IEMs. Other stuff could sound shrill.

Absolutely agree. SO many factors go into the experiencing of music even beyond the sound.

Saw an interesting TED talk about how owning something makes us like it better. This is deep brain stuff, as one of the studies he cited showed that even people with long term memory loss, who were given one of 5 different prints preferred that print in the future even though they couldn’t even remember owning it!

Edited: looked up link to Ted talk for those intersted…Dan Gilbert: The surprising science of happiness | TED Talk

3 Likes

I wrote this on my blog this weekend after seeing a lot of misinterpretations and dialogue the last week or so over IEM measurements…

Full copy + paste here:

On the Topic of Measurements…

One thing that I’ve seen posted a lot lately is how measurements stack to one another and this and that. One thing I’d like to clear up is how these measurements are taken and what limitations there are.

First off, there are many different measuring rigs used out in the hobbyist world. Some use DIY rigs such as buying a Dayton IMM-6 or MiniDSP UMIK and using tubing to create a meaurement rig for IEMs. Some use MiniDSP EARS, which is a budget measurement rig that has an artificial set of ears and measurement mics in the center. Some use a 3D printed coupler called Veritas created by a DIY IEM enthusiast. Some use industry standard measurement rigs, that are either made by the big brands like GRAS and B&K, or some use Chinese knockoffs (like me) which are clones of the GRAS RA0045 coupler.

Each of these systems has its limitations. The only one that is relatively consistent across the board is the IEC-60138-4 /IEC711 couplers, but even those can have variation in certain things. These are made to an industry standard though.

These limitations can include resonance within the coupler itself – the GRAS RA0045 has inherent resonance at around 13.5KHz which creates an artificial spike in energy at the frequency. This can actually be shifted around by how far the IEM is inserted into the mic, as well as what tips are used. The deeper the insertion, the higher frequency the resonance will appear. The shallower the insertion, the lower the frequency.

The below example is 4 measurements of the same earphone. In this case, it is the included earphones that comes in Samsung Galaxy phones made by AKG/Samsung. Insertion into my measurement rig was only slightly shifted in and out by less than 1 mm each time, and the results should show you how much this affects the measurements.

Some of us in the measurement hobby have used 8KHz as a target resonance frequency to aim for, as this works well with the stick version of the coupler. You’ll notice that, in a lot of graphs, there is a uptick in sound pressure level (SPL) around 8KHz, and this is done on purpose. Unfortunately, this makes the area from 8KHz and BEYOND less accurate.

With my newer base coupler version, the same insertion depth puts this peak at around 9.5KHz so there will be some variation moving forward.

Now back to the inaccuracies – the GRAS unit is only accurate to about 10KHz. This is even mentioned in Harman International’s various papers when creating their IEM Target Curve, that they only measure up to 10KHz, and that’s why there’s a drop off of their curve at that frequency.

So what am I getting out of this?

The frequency response after 8KHz can vary and may not be accurate. Also, each individual’s hearing is dependent on their ear anatomy and pass 10KHz or so, it can vary greatly due to canal resonance and insertion depth. This is also why CIEMs may sound differently than a universal, and measure differently as well.

And the other thing is, measurements may differ from user to user just based on the rig used and the measurement depth used and the tips used. There’s a lot of variables at play.

So comparing measurements to another has a lot of limitations and assumptions, and one should only assume measurements are accurate upto about 7-8KHz, no matter the system. If using EARS, this accuracy drops to about 4K, and even less for other systems like the IMM6.

There have been some improvements with measurement rigs moving forward, as GRAS has newer couplers like the RA004X that has a frequency resonance much higher up the response due to dampening in the coupler. Whether this is accurate of real hearing is up to debate though.

Other Good References:

https://www.soundstagesolo.com/index.php/features/181-how-to-read-our-headphone-measurements

https://www.soundstagesolo.com/index.php/features/152-is-minidsp-ears-the-death-of-headphone-measurement-or-its-savior

9 Likes

Awesome post. One thing I’ve noticed with the RA0402 sim is that it’s not exactly a straightforward question of readability vs accuracy. There’s definitely less energy at around 8khz compared to the 0045 (just comparing with Crin’s graphs for example), I think also for the sake of not having the 13khz resonance, but at this point I don’t actually know which is more accurate for 8khz specifically.

I think there’s a risk of confirmation bias when you look at a graph - it’s quite easy to go ‘looking’ for the various features you see in the measurement. But I’ve been trying to guess how something is going to measure on both sims and find myself frequently landing more closely towards the RA0402 for 8khz. There have definitely been times when I’ve heard more energy than that sim shows as well though, so it’s still something that’s debatable I think. And of course, the subjective element is even more fallible given potential framing effects and listening context. But at the very least, at a certain point we need to anchor what we measure to what we hear.

6 Likes

Great write up… Thank you.

2 Likes