In my experience, this hasn’t been boiled down to a particular metric or dataset yet - and… I’m someone who genuinely loves data/measurements etc. To some degree I do understand that rankings like SINAD are visually compelling, and people want this to mean something because it gives them confidence in a purchase. People can continue to believe in this metric all they want, I’ve just never found it to correlate with ‘good sound’ past a certain point. This is a long-standing debate/question, but when it comes to amp scaling, for the stuff that isn’t broken or flawed in some way, I don’t think you can predict better or worse with those existing metrics or indices people are currently using.
With that said, I think there are two ways to think about this. One is that we’re not capturing the right data, which is possible but… I also think perhaps unlikely. The second way to think about this is to say we’re capturing the right data but not analyzing sound quality preferences in terms of the right things. So for example, you could have certain nonlinearities that reliably contribute to positive experiences. I think this is also more likely because at the very least we know that THD+N is not at all predictive of preferred sound quality the way FR is. So there’s no meaningful subjective preference there for ‘better measuring gear’ with sources the way there is with headphone FR. Sean Olive and co demonstrated that nonlinear distortion did not have a correlated negative effect on preference (while linear distortion was audible at a lower threshold). You can check out that paper here. So the question would be… are there any positive correlations? To that I don’t have the answer, but I think maybe.
But regardless, the point about potentially positive nonlinearities is one that some objectivists will hate, because they believe that amplifiers should do literally nothing but increase the volume. Personally I think that’s a bit myopic - in fact this was confirmed to me some time ago when I got to interview some amplifier manufacturers who were deliberately trying to conserve some nonlinearities for one reason or another. Mainly, they’re not even trying to design for anything that scores well on that ranking because they also don’t see the value in it from a sound quality preference perspective, although that may change soon enough for marketing reasons… which we’ve already started to see in certain places.
In any case, I imagine that if you did a study with a decently large panel, and you used the same headphones but different source equipment, you’d find that people would prefer things that don’t necessarily score as well on something like SINAD. There might be some agreement with that as well, but it wouldn’t be anywhere near the kind of ranking you see on that index. Instead, what I imagine you’d get is something that’s a bit all over the place, meaning that this is simply not a meaningful indicator past a certain threshold.
To your question about the DAC and how it performs, I have no idea. I used the Matrix X Sabre Pro - which coincidentally does measure well. But the bottom line for me is that if you’re looking for a metric to predict ‘scaling’, I haven’t found it yet - but it sure isn’t SINAD.

