How to compare amps more fairly - my experience

I suppose it’s possible, but it seems logical to me that if he didn’t hear a difference in a sighted test that he wouldn’t hear a difference in a blind one. I mean, the whole reason for blind testing is to remove the influence of biases that would cause us to form an opinion based on subjective factors other than audio performance.

4 Likes

I don’t believe that’s a valid conclusion. SINAD is not a comprehensive statement of quality of a device–it’s a measurement at a single reference point (typically 1kHz at certain fixed loads). It’s a good indicator, but we’re not listening to 1kHz sine waves, and headphones don’t behave like simple fixed loads. Actual performance can deviate when you move away from the measured reference conditions.

All of the amps I own are > 100db SINAD, but under the right conditions I can hear repeatable differences between them. The 600 ohm Beyerdynamic DT880 is a notoriously difficult to drive headphone. It also features a non-removable single-ended cable, meaning that without modifications you’re stuck with an amp’s SE performance. Despite the extraordinarily high SINAD of my THX 887 amp, its power rating at 600 ohms is significantly lower than the power rating of my V280. I hear a clear difference in performance between those two amps with those headphones.

Interestingly, I do not hear such a dramatic difference between those amps with my Beyerdynamic T1.2s, which are also 600 ohm headphones. What gives? Well, take a look at the sensitivity ratings for these two headphones:

DT880 Edition, 600 Ohm

  • Nominal Impedance: 600 ohm
  • Nominal Sound Pressure Level (Sensitivity): 96 dB

Source:

T1, 2nd Gen

  • Nominal Impedance: 600 ohm
  • Nominal Sound Pressure Level (Sensitivity): 102 dB (1mW / 500 Hz)

Source:

That might not look like a big difference, but 6dB difference in sensitivity means that 880s need 4x the power to match the same volume as T1.2! The similar impedance means that loudness should scale similarly. But 880s need a big head start. The math is described here:

I wasn’t aware of the difference in sensitivity between these two models until I looked into specs to see if I could explain my subjective listening experience. Sure enough, those numbers tell me a good story:
The 880s seem to need power just outside the ideal operating capabilities of a lot of mid-fi amps, whereas the increased sensitivity of T1.2 means they fall just inside the bounds of what those amps do well.

I don’t know that ASR measured the V280, but I’ll assume that it’s probably similar to the V281 that they did measure, given that both share the same amplifier section (the 281 has XLR outputs and preamp capabilities, I believe). In any case, the V281 has a SINAD* of 111 and the 887 is shown on the same graph with a SINAD of 119, but 880s clearly sound better on my V280.

* Note the chart actually says “Best Case SINAD.” The 600 ohm 880 does not allow the 887 to operate under best case conditions.

Incidentally, I have spoken to a number of people who have balance-modded their 880s and all have reported dramatic improvements in quality while listening balanced on amps like THX models and A90. It’s not the “balanced” part that’s doing that, but the fact that balanced outputs tend to have more power because of the differential signals.

I certainly wouldn’t say so. Although I have found a lot of cases where cheaper amps sound just as good as more expensive ones. There are good explanations for why they might sound the same:

  • bottlenecks in the audio chain (try YouTube music on a good amp, lol)
  • headphones are a good match for operating capabilities of the cheaper amp (see above example)
  • lack of critical listening experience or inattention to the right details
  • expectation biases, etc.

Another anecdote from my recent experience:

I mentioned my Holo May DAC above. Since I wrote that initial impression–the one where volume matching collapsed perceived differences on my loudspeakers–I have enabled the following audio chain: Both DACs → Mapletree Audio passive XLR switch → Goldpoint level control → McIntosh 8207 amp → Susvara. I have both DACs chained together in Roon, so whenever I play anything through the two DACs, I can use the Mapletree switch to immediately toggle between the two DACs. It’s not perfect, because as I discovered before, the May has a higher output voltage, and is thus about 3dB louder than my Ayre DAC (this chain doesn’t have the benefit of level trimming like my pre-amp chain does). So I have to fiddle a little bit to volume match.

Nevertheless, I have noticed that the May produces a larger, more spread out, more holographic soundstage than the Ayre, which sounds smaller and more intimate. For some tracks–notably stuff busy with spatial detail like “Down to the River to Pray” by Alison Krauss from the O Brother Where Art Thou? OST–this bigger stage with better imaging is awesome. But for tracks like “Krigsgaldr” by Heilung I actually prefer the more intimate staging, which plays well with the tension of that track as well as conveying a greater sense of being a participant in the music, rather than an observer.

I have no good explanation for why May sounds more holographic, though that does agree with some other impressions I have collected from others who have heard it. And I still didn’t notice the difference out of my loudspeakers, which might simply mean that it’s not important or that my loudspeaker setup is not configured optimally to convey differences. Anyway, I know what I hear, but I cannot explain why.

It’s also worth underscoring that neither one of these presentations is objectively better. I do have a great deal of respect for the heroic engineering that went into May to make such a clean R2R DAC, but that still doesn’t change the fact that I prefer the Ayre for some music. For that matter, the ADI-2 → A90 chain on my desk has a similar more intimate presentation, and is considerably cheaper than either of the DACs in my loudspeaker setup.

I’m personally thrilled that I do hear differences between my two DACs, because it lets me use the dial on my XLR switch like a soundstage selector.

8 Likes

That was my first take too, but when I read @nhatlam96’s post again, I don’t think he’s saying that. I
think he’s saying that if you listen to 2 amps that have the same sonic signature, there’s a danger that bias and volume mismatches might make us believe that they sound different.

I think those are valid points, having just finished a 4 amp comparison that I still need to post (Lyr 3, Asgard 3, Pendant vs a Burson Soloist 3 still in trial period, to see if I want to keep it). There were clear differences between them, but in some cases, they were very similar and I have to admit I did catch myself looking for a difference. Once I realized that’s what I was doing, I had do the test again, challenging myself to see if I really was hearing the difference or it was just wishful thinking. Obviously the repeated test was still subjective and subject to bias.

5 Likes

The unreliability of audio memory is something that I find practically as important as volume matching. I’ve spoken to some (very experienced) people who recommend avoiding rapid A-B testing because they feel it can lead to poor impressions or bad conclusions. Instead, they recommend spending a good long time with a chain before changing things up.

I disagree with that advice (though not entirely) because of how it allows poor listening memory to color impressions. If you can instantaneously switch something in a chain, then you can catch any differences that one variable makes. If you don’t hear any differences, are there any? Probably not.

Still, I wouldn’t recommend either approach to the exclusion of others. There is value in getting to know gear before simply jumping into rapid testing. For example, listening to a variety of music can reveal synergies or lack thereof in a particular chain. Plus I have occasionally come across gear that requires a significant acclimation period (brain burn-in?).

A good example of both of these is the Unique Melody MEST (IEM) I recently acquired. At first listen, I actually didn’t like them. I had just come off of hearing Anole VX, and the mids on the MEST sounded kind of grainy and diffuse. The stock tips also weren’t a great fit for me, so I ended up swapping those out, twice, finally settling on some Xelastic tips. Still, the first few songs were kinda “meh” for me, but then something clicked–it’s like the imaging snapped into focus, and since then I can’t unhear the odd way that they present spatial information. With the right tracks–particularly stuff like Nine Inch Nails and EDM with exotic sounds and lots of layers–MEST imaging is just magic.

Anyway, not to digress into a MEST review, the point is that I might not have been able to fully appreciate the MEST had I not taken the time to adjust to them, learn what they do, and find recordings that play to their strengths. A simple A-B test might have blown right by what makes them special by either using “wrong” auditioning tracks or by failing to allow myself time to adjust to their unique presentation.

8 Likes

Exactly. Knowing there are potential biases helps, but as @Superfly said, we can’t just wish them away. However, since we have to live with our biases, anyway, we can at least humor the process until the effort of being “right” outweighs the benefit of being happy. :wink:

My favorite part of the NwAvGuy blog that @voja shared is the assertion that “Music is Art but Audio is Science.” I prefer flipping the order, but the point is the same: objective data and measurements can go a long way to inform how equipment behaves and what might by synergistic, but in the end music is art, beauty is in the ear of the beholder, and we all appreciate differently.

It’s worth the effort to experiment, form impressions, try to confirm them, see if we can correlate them with others’ findings, and hope that doing so can steer us to invest wisely. But the whole idea of trying to label any kind of objectively “best” stuff is folly because of personal preference.

I do appreciate collecting others’ opinions, both about gear I own and gear I do not. In the former case it’s interesting to discover when my own private findings correlate or deviate from what others report. In the latter it can help me to identify new stuff to try. Arya is a great example of something I most likely would have passed on had the hype train not caught my attention so that the common threads in multiple reviews eventually piqued my interest.

5 Likes

My thoughts on bias below… kind of off topic… but aligns with better trying to understand and view one’s own biases…

I’ve been saying for years that we are all made up of Biases… all are learned from our environments, friends, family, experiences, the society we live in, etc… it is being aware and seeking out why we have these biases that I think more people need to look inwards and determine if they are good or bad biases… biases are kind of the new term to define fight or flight lol… they tell us how to react based on the above-learned behaviors or shared thoughts.

More importantly, I think everyone should constantly be re-evaluating their own biases to ensure that we are not allowing them to steer us wrong… there are always outliers, and biases can be extremely wrong especially the societal/learned/group think ones… so, I ask you to stop… think and dwell on what you are doing and try and determine where that bias is coming from and how best to understand your own actions based on that bias… recognizing that everything we do, decisions we make are technically a bias, is a fun mental game to play and think on… try and determine where you learned your biases :wink: it is fun and can be very eye opening.

Cheers… I enjoy these types of discussions… but, I’ll let you all get back to discussing amps, and how they do or do not sound the same… I personally think they sound different depending on implementation and technology used… though there is a lot that is, very similar if not the same in various line-ups… THX seems to be a big culprit of this in my opinion… anyhow… back to work I go!

7 Likes

Yeah I guess you could call that my SINAD bias, I will keep that in mind. Thank you @speleofool and @drifitingbunnies .

I did that in my test. The RNHP and G103-S are considered to make the Focal Clear more lifelike and organic, while the SP200 would be analytical and harsh. Also, I am not here to “prove” that all amps sound the same. I am here to promote awareness for bias and volume level mismatch, so headphone comparisons can be done more fairly.

It’s a shame that nobody clicks on that link, because it’s the same what I wrote, just a lot better lol
Also quite interesting that there was no difference between 1600$ Benchmark DAC1 Pre and 200$ O2.

Yes, I had the same feeling when I did SP200 vs G103-S. I wanted the SP200 to win, due to its new THX technology made in 2020. Unfortunately for me, it was not any better than the old G103-S made in 2013. That’s a seven year difference! That is a lot of years, considering our advancement in technology lately, but I heard absolutely no difference between G103-S and SP200, which shattered my THX-fanboyism or THX-bias.

Guys a little off topic question. I wanted to order the V281, but in Germany only the V280 is available. Would it be okay for you guys if I did SP200 vs V280 instead?

Yeah, I meant that haha English isn’t my primary language :sweat_smile:

7 Likes

Yes, it is a very lengthy and detailed article. I like that it covers several subjects and explains each point.

The link has been brought to my attention by a user name Rikudou_Goku from HiFi Guides. I have it bookmarked in my “audio controversy” folder :wink:

3 Likes

The manufacturers know what amp pairs best, so why not simply ask them? For example, Grado recommends the Lyr 3.

5 Likes

I’ve observed stranger, illogical things related to blind triangle testing and placebos.

In addition to blind testing for purposes of determining if gear sounds different or grading gear, blind testing/listening is also beneficial in the context of articulating unbiased sound quality attributes.

2 Likes

You’re my man :+1: :+1:!

Sometimes things are so simple, but we (the people) are too cerebral :man_shrugging: :man_facepalming:.

3 Likes

There are reasons why someone would opt for a different amplifier: price, looks, inputs/outputs, availability, built… or simply you might not even like the sound of the recommend amp. But asking the manufactorer for recommendations can also be helpful, havnt thought about that. Thanks!

Synergy is definitely something that some people disregard but plays a very important part of selecting audio equipment. Scientifically you can prove why certain amps work better with planars or dynamics but a character of an amp can either enhance or diminish a headphones strengths or weaknesses. Generally I like to stay within one brand’s line with the assumption that they test with their own gear and that there is synergy to be gained there. Obviously that’s not always the case but manufacturers will often have a goal in mind of what they’re trying to deliver.

One example is the Feliks Audio Euforia is used for both the Focal Utopia and Empyrean testing. Hopefully it shouldn’t surprise anyone that when they listen to those combos, they will leave having enjoyed their experience.

8 Likes

This has been an interesting discussion to follow and I read @nhatlam96 thought process as one many of us have had in the beginning: 1+1 = 2
But there are so many elements to a dac and an amp so just because 2 dacs uses the same AKM4490 chip it doesn’t mean they sound the same, regardles if you or your mates can’t hear any difference.

And as anyone interested in hi-fi quickly finds out and so many has stated already: ears and people are different.
When reading nhatlam96’s thoughts and approach to this I just think: dude, go out and try it out, what you are doing, trying to reason your way to the right gear and almost see it as an equation is futile.
And part of this hobby is to try different stuff, not to solve a math equation :face_with_monocle:

So enjoy the hunt for your gear; enjoy the different sounds you get from said gear; don’t reason your way to it solely (as a part of the hunt it is okay, bc we do look at what chip a dac has) and don’t rush to your endgame too quickly - the journey is actually part of the fun :smiley:

Enjoy the hobby my man and as my fellow Dane Soren Kirkegaard once said:

13 Likes

EDIT: different wording. So, do you agree that without blind testing and volume matching, amp comparisons have little value?

The point of blind testing is to eliminate bias. the only way to properly blind test is to not know the objective of the test, it’s hypothesized results, etc. Thus any blind testing performed by a single individual on themselves is not blind testing, and to pretend otherwise is ignorance, willful to boot.

And of course the problem you’ll have with blind testing on somebody else is that you’re going to get their subjective opinions based on their preferences, not your own.

Blind testing to evaluate your own preferences in gear is pointless.

Volume matching is a thing though.

Just my 2 cents.

8 Likes

I am simply pointing out that you need to conduct a proper double-blind test if you want to remove expectation bias, because you’re just as likely to convince yourself that you don’t hear a difference as you are that you do.

And then that such a test requires precision level matching (which requires a meter, scope or analyzer) and can’t be done by ear.

Doing a direct A/B/X comparison between amplifiers definitely loses objective value if those factors aren’t accounted for.

Even so, it’s just one data point among many. And unless you’re listening to test tones is not very useful. Listening to music is an emotional experience, complicated by non-repeatable (nor controllable) elements, both physical and mental. As a result, you will not necessarily respond to pass one through something they same way you do to a second or third or nth.

You can’t control for that in one evaluator, let alone normalize it across more than one. And, as such, while it may be the best tool for that job, it’s a pretty poor one in this case … as there simply isn’t a reliably assessable, fully deterministic, outcome the first place.

8 Likes

thank you, for my next comparison (sp200 vs V200) I will do this:

  1. check amp output impdeance: SP200: 1.3ohm and V200: 0.06ohm. Both have low output impedance, there is no audible impact on the frequency response of the Clear.
  2. precise volume matching via multimeter with true rms (sine wave 1kHz or 3kHz via tone generator)
  3. ab switcher
  4. invite my friend and we will take turns with the switcher. The listener should be facing away from the audio setup and sit from a distance. The distance is ~2.5meter away from the setup, because my headphone cable is only 3meters long.

I am aware that I can not make the perfect setup, but I think this isn’t all too shabby or?

1 Like

It should take care of the level matching well enough. And it’s an improvement on you doing the switching yourself. It’s better than most people will try. But it is still a long way from being a proper double-blind test all the same.


Which is all to say that it might prove useful to you, but it is otherwise still well within the realms of objective or academic irrelevance.

2 Likes

Im trying to improve this aspect. my idea is, I buy an 3.5mm extension cable and let the listener sit in a different room. I think this will improve it a bit.

1 Like