Soundstage Is (Much) More Complicated Than You Think

What does that mean?

If one channel has phase-inverted content in the recording then it’s going to be phase-inverted in one headphone side, same as with speakers.

3 Likes

Concur. Concur. Concur.

The differences between speaker and headphone soundstages include (1) the ability to place speakers in a natural forward and distant position whereby most sound hits the ear at an oblique angle (matching natural live sound), and (2) the ability to blend an array of speaker drivers with different properties (versus single-driver headphones, as multi-driver setups tend to fare poorly).

A good playback soundstage reproduces the fine, ephemeral echoes and positional delays of the recording environment – BUT BUT BUT – true live sound always comes in from multiple angles with multiple reflections and subtle delays. Reproduction quality is forever limited by the resolution of the recording mic (as a speaker in reverse) and the recording medium. If the recording medium and playback drivers are unable to reproduce a subtle 2nd, 3rd, or 4th order reflection, then the nuance of a live soundstage is lost.

Headphones sidestep the mud introduced by the room with playback speakers too – much of the tone and timbre results from a room’s hardwood, carpet, thin walls, concrete, and other building materials. That mud may come across as a better or more natural (realistic) soundstage to some. The old Bose 901 tried to simulate a live experience with a bunch of rearward facing speakers. This was a simulation rather than a reproduction of the (often simple stereo or artificially mixed) recorded source.

Many implicit assumptions here. I generally disagree. Many things about audio become crystal clear when one produces music and hears which perceptual factors carry over from live to a recording. There’s a FIERCE debate about the role of tonewood in guitars. While effectively 100% of people agree that tonewood is a major thing for acoustic guitars, there’s sharp and enduring disagreement about its relevance for electric guitars. While a player can feel different vibrations, a listener (let alone a recording) may miss all of those differences. The metal guitar strings vibrate differently based on thickness, construction, and age, then the magnetic electric guitar pickups transform string vibrations some way, then the amplifier transforms the sounds again and may highly distort them, then the recording mic again transforms them or cuts the nuances, etc. It’s not unlike how one’s own voice sounds different when played back versus spoken. Multiplied by ten.

I had no clue until I started messing with guitars myself. So much gets lost versus live, even with electric guitars. Headphone quality assessments must follow from the quality and nuances preserved in a recording, and is very much embedded in the specific setup (@Polygonhell) . Many people habituate to (i.e., hear as “normal”) a given sound profile based on their physical sensory potential and personal hardware. Then, they will quickly habituate to a second source because humans generally perform poorly at audio comparisons over time (failed blind ABX tests). This is NOT subjective experience, rather, it’s a product of ineffective perceptual testing methods that do not standardize to individual hearing potential. This can be done. It has been done. Perceptual testing is what underlies all the spatial and noise canceling tricks of Bose, Apple, THX, etc. software.

5 Likes

Yes we understand a lot about tone masking and the first part of hearing (perceiving tones). Most of that research was done for compression when it still mattered.

What’s a lot less well understood is how our brains deal with the temporal side, your perception of sound isn’t based solely on instantaneous tones, but rather some memory of them over some time frame. This is clearly true because without that you couldn’t understand language.
And it’s unclear what signals have what importance in that, your brain is going to both try and make sense of the signals it’s getting over time and I think that’s probably why we hear depth and height in the stage even when it’s not in the recording. Some subtle difference in what a signal chain is producing, leads to the variation in that. But I have no idea what.
HRTF’s are a part of that, and we should bear in mind that that processing is happening as a part of perception even when you circumvent the mechanism with headphones.

I think the perceptual part is also why some things just sound wrong, in a way that’s hard to quantify.

The other part of stage to me other than size is about how distinct individual components are, if I have a mix with two singers on top (or really close) of each other how well can I separate them into distinct voices. And what I term Focus, does the sound emanate from a laser tight dot/line or an area.
I found the D8000Pro when I first heard it disconcerting, because sounds were so laser focused in it’s stage it sounded artificial and it sounded unnatural, I later got used to it, but it isn’t a problem I’ve had with any other headphone.
If you compare a lot of mid tier speaker systems with digital frontends to good analog setups, there is a similar effect, the digital comes across as hyper focused, and can be less convincing because of it.
I find remasters that try and clean up all the tape noise of an older recording have similar problems elements in the mix seem to “float”, and don’t feel grounded.
Perhaps having a small amount of white noise in the mix is enough to help the brain believe what it’s hearing. But I don’t really miss the noise in high quality newer digital masters.

I guess what I’m trying to convey here is there is a lot that contributes to perception, and it’s nice to be able to quantify individual parts of that, but in the end what matters is the totality of the experience.

4 Likes

That is what I mean. While yes, headphones are judged with music (and all of the qualities in the music ultimately effect the end-user’s perception), there are qualities of the headphone that are separate from the music itself. Namely FR, as this holds constant amidst the changes of the music.

I want to be clear that I agree with all of the people saying “The qualities of music itself will effect the report of soundstage,” but I disagree with the idea that people necessarily know their impressions are susceptible to/effected by this. I’ve seen many that have claimed that soundstage is, like FR, a constant quality of the headphone itself (and furthermore one that for some reason cannot be measured).

While yes, many people here understand how to separate qualities of the recording from qualities of the headphone and can track how one effects the other, this is not—I think—how many audio enthusiasts think about soundstage. That is (partially) why I wanted to write this piece: to make it clear which things are actually a constant acoustic characteristic of the playback device itself and which things are the recordings/program material.

1 Like

So where do tricks like “Spatial Audio” fit into this discussion? In particular, what about headphones, IEMs or ahem… Apple AirPods Pro earbuds fit into this?

Right there! Spatial audio with head tracking actually simulates all of the information necessary for real localization, so its arguably the closest thing we have to properly binauralized stereo recordings.

2 Likes

This was amazing! Love your very in depth work listener!

1 Like

Love this article. It’s thoughtful, detailed, and informative. I appreciate how it dissects and analyzes a complicated topic.

Re: ILD, I do think headphones have a role to play. Passive headphones cannot introduce ILD, but they do have a role to play in how ILD encoded within the recording (and decoded by the DAC and amplifier) are finally converted into the mechanical movement of the diaphragm and the resulting SPL. The differences in the mechanism and the diaphragm material have an impact on the transfer function from electrical signal to actual sound pressure levels. The shape or linearity of the that transfer function will impact the fidelity of the ILD encoded within the recording by the recording sound engineer. Whether a more linear response or a more skewed one would produce a better soundstage is an interesting topic. Complicating that topic would be how the sound engineer adjusted levels and their intended target for the recording. A recording that was tuned to the playback equipment of the mid-century of the last millennium may be quite different than a more modern recording. Without evidence, I speculate that more modern recordings may do better with a more linear transfer. Unsure of what would be a good match for more vintage recordings.

Just a thought…

3 Likes

I don’t know about headphones not having any meaningful reflections inside the cup.

Some manufactures use the cup geometry and damping inside to tune the headphones.

There is no way to measure soundstage in speakers either btw.
It’s an illusion.
Sure we can take an RT60 decay response to see the reverb time for all freq. but to be able to take a measurement to see if the width or depth of the speaker soundstage is good or not isn’t possible.
It takes speaker setup time and some educated guesses.
Dr Toole also has changed his idea of side reflections and now is more for them instead of absorbing the first reflections on the sides. He says they give more special information.

My speaker setup in my previous house had a soundstage that literally extended out beyond the speakers, around 4 feet even depending on the recording and how hard things were panned.
In my current house I have way more depth, but not as much width.
Getting a good center image with 2 channel stereo is also trial and error. I can’t do much with REW to measure that. I use REW to measure for bass accuracy and take averages from my seated position for FR. But not soundstage.

Aren’t we humans evolved to have very acute sense of time and are sensitive to very very small ms changes?
I think there have been tests on this. That we are more sensitive to time than FR.

Most of us have seen/heard the differences ear pads make. Those must be happening because of small,reflections .

Dunno. Just throwing that out there.

So while we know in speakers that we get reflections and direct sound, no one can measure it.

2 Likes

Maybe this explains how sensitive we are to time differences ?

https://pubs.aip.org/asa/jasa/article/145/1/458/638769/Smallest-perceivable-interaural-time-differences

2 Likes

They say people could hear ITD down to 6.9 micro seconds. The distance between the driver and the cup of a closed back is like 1cm? So a reflection would be 2cm? That would take like 58 micro seconds for the reflection to go from the back of the driver, bounce back and reach the driver again.

Not sure what that means but perhaps it’s detectable, or at least changes the perceptions of “sound stage”.

By the way here’s a fun blind test for testing timing:

2 Likes

Hmmm? Non sequitur.

The inability to measure something does not equate to it being an illusion. Old joke about bad science: “Why are you looking for your lost keys under this street light when you dropped them way down the road?” “There is light here and it’s too dark to see anything where I dropped them.”

Illusions are rather aggressively studied, and they map human perceptual realities (biology + psychology) onto physical systems. Some illusions can be fully explained while others present challenges. Even still, if a playback system involves illusions the effect is real (primary) and any deficient measurements that may be available are irrelevant.

Consider the Shepard Tone:

Etc.

1 Like

I meant that it’s an aural illusion. But that doesn’t mean it’s not real in our perception.

However we can’t quantify it and measure it.
But it’s there and it’s real.

My main idea was on reflections, since our ears are so sensitive to time.

We can’t quantify it and measure it yet?

All sorts of audio discussions get stuck on electrical measurements and the confidence of those who use them.

1 Like

We can’t measure or quantify soundstage. Yes that’s true in speakers as well.

Why do you keep repeating what I say ? Lol

I didn’t. I said YET.

I was suggesting that fixating on current methods and technologies is a mistake.

1 Like

Then don’t put a question mark after Yet and put a comma before it.
Dude. Lol. Text is bad enough.

Can we get back to the point please? Pretty please?

Yes , we can’t measure even speaker soundstage , yet.
My main point once again was humans are super sensitive to the time domain in hearing.
We also know that speakers , although we can’t measure or quantify soundstage, we know it happens from reflections.
So it doesn’t seem at all impossible to me that certain geometry of headphone cups and ear pads create whatever soundstage is in headphones.
I think most can say from experience that the HD600 range headphones sound extremely flat and like a blob.
While for me at least most of the time headphones with a better soundstage tend to have a different geometry. Or have the drivers angled to assist, somehow in creating whatever soundstage we can with such small reflections
If indeed that is what is happening.

Time and money , and this could be more well understood as well.

2 Likes

There are key differences between speakers and headphones worth considering here. When you wear headphones, it creates an acoustic system with the ear, which is different from speakers in a room where you’re a certain distance away. So I’m not sure we should be quite running with the analogy of [reflections of the room] similar to [reflections in the cup].

But even so… if there are internal reflections to worry about, you’d see that in the measurements. It’s why you often see foam or some mechanism in the cup behind the driver to break up the wave in closed back headphones. And you also see wonky FR results with headphones like the Eris that don’t - though that one technically functions like an open back headphone.

1 Like

But that is sorta my argument as well.
( of course we are not arguing here , we are just exchanging ideas).
In speakers standing waves both inside the box and from the room show up on freq response as well.
But showing it in a time domain graph like an RT60 decay time , gives us a newer perspective. It also shows the rooms reverb time.
And the mere fact that headphone designers as you mentioned out foam, or some sort of dampening further makes it conclusive that refections inside headphones indeed do matter.

My point was that we have no way in telling soundstage even in speakers which have longer decay times.
So reflections in both speakers and headphones exist. Decay times are different. I think we can agree on that.
We assume that in speakers the reflections from the room create the aural illusion of soundstage. However there is still no way to measure it. We can’t deduce if speaker A in room B setup a certain way will create a deep wide soundstage. That’s all trial and error and speculation. Why I brought up Tooles recent change of heart of side reflections for soundstage width.

So maybe, given our ears are insanely sensitive to time ( which is scientifically proven) , just maybe the geometry of the cup, angle of the driver, affect the reflections in headphones that creates different sound stages in them. Even though , yes they are incredibly limited. The soundstage is limited compared to speakers that is.

Oh and also to note. It’s the highest freq which are mostly directional, so those would be the most important.

2 Likes

But this goes back to headphones being minimum phase, meaning the time domain stuff is proportional to FR in headphones. If it’s not behaving in this manner, we might consider it broken.