This is likely going to part one in a blind testing series. We’ll see how it goes.
I’ve been wanting to do some blind testing for a while now, especially after reading Jason Stoddard’s excellent article on ‘blind listening’ in his Schiit Happened blog series. If anything, this indicates that when we don’t know what we’re listening to, concerns about perfect measurements and transparency drop out entirely. This is an important conclusion for many of us who may have only seen data points and haven’t had the opportunity to evaluate the equipment in question. But in this post I want to also point out something about the limits of blind testing - something that I discovered (confirmed?) partially as a result of trying it out. My takeaway at the moment after having tried it out is that while interesting, the process may not be as valuable as some may think.
One of the reasons I wanted to try out a version of blind listening is in part to try and test myself, to see if I could actually tell the difference and identify each source correctly. I’ve been a staunch defender of the notion that DACs and amps do make a difference for the experience beyond what measurements indicate, and that most people can recognize those differences when paying attention to specific things. This statement of course comes with the following disclaimers:
- How noticeable the differences are depends on how revealing the headphones are
- There isn’t a huge difference in DACs past a certain point.
- Headphones make the biggest difference overall
Setup
I have to state that this was not a scientific, definitive or ideal setup and I think I could improve it over time, however it was still effective at achieving the ‘blind’ part. It involved using two completely different source streams, volume matched (this is essential), playing the same file at the same time on repeat, and then being blindfolded so I couldn’t see anything. I used the Room Equalization Wizard (REW) and the EARS rig for volume matching. I then had my girlfriend plug the headphones I was using into each source at random and I had to guess which one was which. She even tried to throw me off deliberately by plugging the headphones into the same source several times in a row. For this test I used both the $2000 Audio Technica ATH-ADX5000, and the $350 HiFiMAN Sundara (just to test if lower end source equipment was distinguishable from high end gear on appropriately priced headphones as well). For this first experiment I tested them with the following sources to try comparing gear from three separate price categories:
- iFi Pro iDSD ($2500)
- Mytek Liberty DAC/amp ($995)
- iFi Hip DAC ($150)
Ideally I’d want to have a wall between myself and the sources being used, or some way of ensuring that there’s complete separation between the subject (me) and the test equipment, as well as some way of having additional source streams hooked up. At the moment I’m limited to only comparing two at a time. I’ve considered potentially using some form of switch or splitter to get an immediate changeover so I don’t need it unplugged and re-plugged, but that may also make it potentially easier as well.
The other limitation is that I wasn’t able to test just the DAC portions through a common amplifier, again because I didn’t use an RCA switch. This means that I was just evaluating two completely distinct source chains for differences, and not individual pieces of each chain. The main reason for doing this is because the sources I happened to be using were DAC/amp combos, and so they each had capable headphone outputs. I had tried testing them in the past by running them to a common amplifier, but found that this introduced too many additional variables, and the result would depend more on the way each source functioned. For example, whether one is being run as a pre-amp and the other not. Maybe I’ll try this again with different DAC units at some point.
Results
In any case, once I was blindfolded and got the test running I realized this was going to be harder than I thought. This is partly because I had spent all this time setting everything up and I wasn’t properly adjusted to what I should be listening for. This is probably the biggest factor that skewed my results, and I will explain why in a moment, but for each comparison here’s how well I was able to distinguish each source:
- Mytek Liberty vs iFi Pro iDSD (ATH-ADX5000): This had the biggest difference between the two, but I was only able to correctly choose which was which 7/10 times (there’s a reason for this).
- iFi Hip DAC vs Pro iDSD (Sundara): Comparing these two, there was less of a difference, but I only got it wrong once.
If you’re thinking that this seems counter-intuitive, that’s because it is. Why did I score better for the test that I thought didn’t show as significant of a difference? Why did I score worse for the one that I thought had a more significant difference?
There are a number of explanations, but the first is to confirm one of the caveats noted above: namely that how revealing the headphone is will determine discernibility of sources. The ATH-ADX5000 is noticeably more revealing of the sources than the HiFiMAN Sundara, and this explains why I found the biggest difference to be in the first test. This may sound like it should be obvious, however it doesn’t explain why I didn’t score as well for that test as I did on the one with less revealing headphones.
As mentioned earlier, I think I missed a few on the Mytek Liberty vs the Pro iDSD because I was coming at it cold. Basically, I wasn’t warmed up to either of the sources yet and therefore wasn’t able to properly understand what it was that I was hearing and where to look for differences. I found that as the test went on, it became easier and easier to tell them apart. For anyone wondering, I’ve spent quite some time evaluating each of these before this test, and the biggest differences that I’ve found over time is that the Liberty has a slightly more articulate and analytical representation of instruments that token certain parts of treble frequencies. The Pro iDSD is also quite well defined, but on the ATH-ADX5000 I found the elevated treble response to sound just a touch more relaxed. My pick fo the ATH-ADX5000 would probably be the Pro iDSD as a result.
With the second test, while these two sources are in vastly different price categories, I suspect I scored better because I had been warmed up to the Pro iDSD by this point and already had a solid imprint of what that sounded like to compare the Hip DAC to. The nice thing about this test not revealing as strong of a difference is that we can take comfort in knowing that our entry to mid-level headphones (even benchmark headphones like the Sundara) don’t require us to have crazy expensive source gear to get the most - or close to the most - out of them. In fact, I’d say for most headphones the iFi Hip DAC or other entry level sources are probably good enough, and it’s only when you get up into the flagship territory that the differences start to matter.
There’s also another explanation for these results that’s perhaps even more important to keep in mind, and it’s also why I start to think that maybe this blind testing approach isn’t as valuable for differentiating between equipment chains as many of its proponents imagine. For this test I was listening to a specific song, switching several times over the course of it, and for identifying differences I think this is actually not the right approach - even if it’s a realistic way for us to use the equipment in question.
Using a full song (on repeat) means that you may be thrown off due to a low or high energy part of the song. This is especially true for well-recorded music where there is a lot of dynamic range. So you may think you hear an extra bit of glare or definition in the treble at one part of the song, but it’s more to do with the way the drummer hit the cymbal at that particular part than anything else, and this can lead to guessing wrong if that happened to be the part you were listening to when the sources were switched. To get around this problem, we should probably be listening to specific parts of a song on repeat, or potentially even consistent test recordings that last 15-30 seconds at most.
Conclusion… for now
In my mind this calls into question whether or not we should care about blind testing/listening at all for anything beyond the fact that it’s an interesting experience. So by that I mean, it may be interesting to see how well you can identify your sources from one another, but it probably shouldn’t be involved in any equipment recommendations. Moreover, the ability to distinguish one source from another does not mean that the subject properly knows or understands how it sounds - and this is something that I think blind test proponents make a mistake about.
To make this point more strongly, I’m reminded of a problem found in epistemology that investigates whether knowledge attributions have a two-part or three-part structure (Jonathan Schaffer’s argument in favor of ‘contrastivism’ found here). The basic premise of this idea is that discernibility is an implicit yet essential component of knowledge statements that look something like “Resolve knows that the Mytek Liberty DAC sounds like this”. Supposedly these statements include the ability to distinguish this from something else (in this case the Pro iDSD). I imagine this is the attractive part about blind testing and discernibility - the ability to distinguish one from the other should indicate that A) there is a difference and B) the test subject knows what that difference is, or at worst, if we can’t distinguish them from one another there either isn’t a difference or we don’t know what the difference is.
But the problem with this line of thinking (and indeed one of the problems with Schaffer’s claim) is that discernibility on its own is not enough for the knowledge claim “Resolve knows X” to be true. Just because you can correctly guess which is which between two sources in the ideal test environment, using the same partial recording - even with more reliable certainty than I was able to do with the testing above - does not require that you’ve gained comprehensive insight into the way each source sounds. Or in other words, the assertion that “there’s a difference” isn’t a complete description of what each piece of source equipment sounds like.
While I was able to correctly guess the sources most of the time, it would become exponentially more difficult if more sources were included in the test. So rather than distinguishing between two sources, imagine if you had to distinguish among four different sources, or six. Failing to correctly guess which is which in this situation (which is likely) doesn’t mean the subject wasn’t able to hear a difference, it just means that hearing a difference wasn’t enough to correctly identify them. This is where discernibility starts to show its weaknesses, and it makes me wonder if blind testing proponents place too much value on it.
In any case, in spite of my reservations about the value of blind testing and discernibility, I want to see if I can improve the test conditions and making the setup easier to do. The main reason for this is just to be able to simply be a better listener more than anything else. The value I see in doing something like this is more along the lines of “oh neat” in most cases, unless there are examples where I score so poorly that it starts to show that I can’t hear a difference. I think there are a number of source comparisons like that, where we start to convince ourselves that we hear things that aren’t actually there.
Next steps
If anyone has done any blind testing, let me know your findings. I’ll be posting updates here from different tests, along with some additional methodological improvements. Ideally I’d like to find a way of doing this that yields more meaningful results - if that’s possible.