Upscaling and what it means?

So I’m unfamiliar with the practice of upscaling, I’m squarely in the camp of plug a thing in and it should sound good…then start researching and see what else can be done to mod/tweak/upgrade to put my personal spin on a thing.

So my question is what is upscaling and how do I go about getting a good understanding of it. Can it be as simple as changing the settings in Midi on Macs? Or do I need a specific piece of equipment?

This whole question is being brought up due to @Torq and his new Chord toy :smile:

Edit: I’ll be researching it also, and share what I find.

4 Likes

Well, to me upscaling is a process by which a video signal is scaled up to display on higher resolution displays compared to the original signal. For example, a 720P video signal must be up sampled to fit on a 1080p display. To do this, they use an algorithm to determine what color missing pixels need to be. If they need to fit a new pixel in between two others, and those 2 pixels are both red, then the new pixel should be red. That’s easy. But what if one were Blue, and the other yellow? Should it fill with blue, yellow, or some blend of the two? Clearly, this is complicated, and can lead to visual artifacts if done poorly.

With audio, I think it is commonly referred to upsampling. The same principal applies though. You’re creating new information based off of what has come before and after in an audio stream. While it can lead to a higher resolution sound, you can also lead to audio artifacts.

8 Likes

Upsampling…oops that is what I meant.

What is the positive and negatives to this. Besides the obvious poor implementation of some coding and adding bits that don’t belong?

For instance on Midi if I put the setting higher than 48Hz what does that actually do? Is this considered upsampling?

Yes I’m asking silly questions on purpose :wink:

1 Like

It stops me from asking them.:grin:. This is a great topic to discuss, thanks for putting it out there. It’s fascinating what the boffins can do now. It’s really clever. I was vaguely aware of upscaling, but, like many other I suppose, I don’t have a great understanding beyond the basic principles of how it works.

For audio purposes, as @ProfFalkin has said, we usually refer to “upscaling” as “upsampling” - and sometimes as “oversampling”. These terms are often used interchangeably, although a more proper application of them would be that “oversampling” means sampling at a higher rate than Nyquist for the source signal and “upsampling” means performing a sample-rate-conversion (SRC) from one already-sampled source to a higher rate.

While both “oversampling” and “upsampling” work to solve a similar problem, specifically to make it easier to implement the necessary filters for sampling (brick wall/anti-imaging) and replay (reconstruction), the first is applied at capture time in the ADC and the second by the DAC (though there are some DAC architectures which also do internal “oversampling” earlier in their conversion steps).

If you sample a normal audio signal at 44.1 kHz, which is the CD standard, you need a brick-wall filter that absolutely ensures no audio information is passed to the ADC with a frequency higher than 22,050 Hz (otherwise you’ll get images - i.e. false data - lower in the audio band). If you want a flat response from 20 Hz to 20 kHz, then that means you have to attenuate the input from 0 dBFS to -96 dBFS over just 2,050 Hz. If you oversample the input at, say, 176.4 kHz, for the same audio content, your filter now simply has to go from 0 dBFS to -96 dBFS over a span of 66.1 kHz (88.2. kHz - 20 kHz). Which is a much shallower curve and easier (and cheaper) to engineer reliably.

Remember that the input filter operates in the analog domain as it must occur prior to the signal reaching the ADC!

There’s a decent overview of it, with illustrations and examples, here. And I’m happy to get into a detailed discussion on specific aspects of it as needed/desired.


It is worth noting that many DACs, and in particular delta-sigma designs, already do their own upsampling - whether you want them to or not (though some allow you to choose if it happens, and sometimes by how much)!

Schiit’s entire multi-bit line over-samples (for Yggdrasil it is to 8x … or 8 fs - where “fs” is the base sample rate, so 44.1 kHz input gets upsampled to 352.8 kHz), Chord’s DACs do an even more extrema upsampling, in two stages, for example with DAVE first to 16 fs and then by a further 256 fs.

These DACs also use proprietary filters (“Super Combo Burrito” for Schiit’s line, “Watts Transient-Aligned” for Chord’s for example). A typical filter, built into a DAC chip, might use 256 “taps”. When you see references to “tap length” or “filter length”, each “tap” is a specific conversion coefficient, and the longer the filter the more likely you are to get to conversion coefficients of zero. Higher sample rates require longer filters (more taps) to do this. There is no benefit to having a million-tap filter on raw 44.1 kHz (non-upsampled) content, as the vast majority of the taps will have a zero coefficient.


From a less theoretical effect, let’s talk about actual application and software - per the questions in the original post.

Upsampling can, indeed, be done in software. In fact for both macOS and Windows, if you set the output rate to your audio device (e.g. via the Audio Midi Utility on macOS) to a higher rate than the source material being played, then the OS will upsample the content on the fly.

This is generally NOT a desirable thing as you have no control over how this upsampling is done, and there are multiple approaches, filters and levels of precision that can be applied, which have different implications and potential artifacts - the built-in OS upsampling generally isn’t as good as dedicated software.

Of note, here, is what happens by default in Android-based systems. Android’s standard audio-stack assumes a sample rate of 48 kHz. Any source material not at a multiple of 48 kHz undergoes sample-rate-conversion. For example, standard streaming content, CD content, and most compressed audio will be resampled from 44.1 kHz to 48 kHz. This is a non-integer conversion, which makes the math and precision much more involved (and critical) than a simple powers-of-two conversion (e.g. 48 kHz -> 96 kHz).

More precise conversions and filters (e.g. an ideal sinc filter) are more demanding in terms of power (batter) and CPU, than is ideal for a cellphone, and as a result those sample-rate-conversion implementations are optimized for power rather than quality. Thus we want to avoid that conversion in the device if we can, and this is one reason why Android-based DAPs sometimes tout having a custom-audio stack to bypass this process.

Going further …

On a Mac or a PC, there are myriad ways to do upsampling in software. Many high-end music-player applications allow you to enable upsampling, and they generally implement much more sophisticated schemes than you’ll find built into the OS.

Audirvana+, for example, allows you not only to specific many of the details of how the upsampling is performed, and to what degree, but even allows you to choose between two different upsampling engines, “SoX” (open source) and “iZotope”.

If you want more control, and even more sophisticated approaches, including control over things like filter type, tap-length, noise-shaping (required by all 1-bit, delta-sigma and DSD conversions), then you want to look at "“HQPlayer”.

Most conversions, at sane upsampling rates, can be done easily on the fly. However, extreme upsampling and the resulting long, complex, filters and noise-shapers you want to apply there, are VERY processing-power intensive. HQPlayer, for example, converting 44.1 kHz PCM to DSD512, and then using the highest fidelity poly-sinc filter and high-order noise-shaping, will required a dedicated multi-core computer (or significant GPU compute capacity) to work, and even then can have significant startup-latency.


Hardware up-samplers/filters originated when the required processing was more than was easily accommodated on reasonably priced general purpose hardware/computers. Most of that is now handled by software in the real-world (either on the computer, on a basic DSP chip in the DAC).

Extreme hardware up-sampling, and in particular the necessary filtering and noise-shaping you must apply to get the benefits of it, still requires serious processing power (as per the HQPlayer example above). This is where things like Chord’s M-Scalers come in … as they use a massively-parallel DSP approach to do both the upsampling and then the complex filtering and noise-shaping over very long tap length filters.

The Chord Hugo M-Scaler, which is to my knowledge the most advanced and extreme hardware audio upsampled/filter available, uses an FPGA that provides 740 DSP cores, and utilizes 528 of those in parallel to upsample to 4096 fs before applying a 1,015,808 tap implementation of Rob Watt’s “WTA” filter, and reducing the final output rate to something the DAC can handle (upto 768 kHz in the case of Chord’s newer DACs). And even with such powerful hardware on tap, this incurs about a 1.4 second latency. And the result of this is effectively an ideal implementation of a sinc-filter that optimally recovers the originally sampled data for material up to 44.1 kHz and 16-bits, and gets closer than anything else I’m aware of for higher rates and bit-depths.


So, short version - you can experiment with upsampling (and filtering) in software. Doing so to a high degree requires special software and powerful hardware. And otherwise you can look at various hardware options, the highest-spec of which is, today, the M-Scaler. From there the rubber-meets-the-road as you start to consider the audible effects of this processing vs. what it means in terms of math, theory and the demands/easements it enables on the actual hardware implementation.

11 Likes

So to summarize, the reason one might use an M-Scaler is that one either 1) has a NOS (non-oversampling) DAC or 2) has an oversampling DAC but believes that the M-Scaler can oversample better ?

2 Likes

Pretty much.

How applicable the M-Scaler’s, or software like HQPlayer’s, upsampling is to a given DAC is also influenced by how said DAC treats its input. Most oversampling DACs (which is most DACs) have a fixed maximum level of oversampling they’ll apply - beyond which higher resolution input isn’t oversampled.

For DACs that accept input at rates that defeat their internal oversampling (e.g. an original Schiit Bifrost doesn’t oversample content fed to it at 176.4 or 192 kHz), there is, theoretically, more of a benefit than to one that can oversample further (e.g. Yggdrasil accepts 192 kHz input, but oversamples internally to 384 kHz).

The filter and noise shaping should have more of an audible effect than the upsampling part. The ability to apply closer-to-ideal filtering is just dependent on higher sampling rates. And it is the filtering and noise-shaping that really chews up the processing time. Something you can easily experiment with in the trial version of HQPlayer for example - compare the coarser filters and lower-order noise-shapers on any level of upsampled content, and you’ll quickly see that.

5 Likes

Thinking about the context of this some more last night …

It’s probably worth pointing out (even though I would hope it is largely self-evident) that, absent wanting to alter the performance of a high-quality true-NOS DAC (such as the Holo Audio or Metrum units), upsampling/filtering is mostly in the realm of “things you tweak once you have everything else where you want it”.

For example, high-quality EQ will have a much more pronounced (and useful) effect on one’s listening than upsampling - and one that you can always ensure is beneficial to you, as you’re able to apply it “to taste”.

Buying a better transducer is the next most prominent change you can make, followed by amps and DACs.

Indeed, if we just focus on DACs as a case-in-point, just sitting here with the M-Scaler and three of Chord’s DACs (DAVE, Hugo 2, Qutest) that can take full advantage of it, I would say that while I do hear definite differences, all of which so far I would classify as improvements when applying the M-Scaler to the chain, what I do NOT find is that the addition of the M-Scaler alters the performance ranking of those units.

In other words, the M-Scaler -> Hugo 2 or M-Scaler -> Qutest chain does not, for me, result in an across-the-board better end result than using DAVE on its own. Which would mean that if I was building my system again, I would still want to get to the point where I’d bought DAVE, and gotten the rest of my chain optimized, before I bought the M-Scaler.

At least with software-based approaches the investment is far less (<$200 for the best software way to do this sort of thing I know of). Although to take that to its ultimate capability you’ll still be putting a couple of thousands dollars into the necessary hardware to run the software at that level reliably.

Anyway, long story-short … there’s what upsampling is useful for as applied internally in almost all DACs, and the number of rather tricky problems it helps to address there vs. extreme upsampling and special filtering beyond that. The former is high-value, low-cost, the latter is higher-cost and the value is heavily dependent on a number of other factors.

6 Likes

The joys of this hobby are that there are sooo many options, and anyone person will never have the same thing as another. Allowing for these types of knowledge passing and discussion =) I think that Upsampling is just another cool thing that is an option in the chain of many other tweaks/mods you can do. I also agree it is the pursuit of ones own preference to perfection that allows for these types of discussions and hobby. I started researching this and knew that others had way more knowledge than I do, and majority of the articles I found were too biased in ones own opinion for me to take overly seriously. Hence me posting it here. Thank you for taking the time to post valuable information, and passing down some knowledge!

3 Likes

I was recently using a trial version of this software - https://topazlabs.com/gigapixel-ai/ - to greatly increase a medium size image to large scale hi-res, amazing that it actually added realistic and accurate-looking fine detail to small fuzzy details where the information wasn’t really there, actually making shells on a beach, wooden details and scraped paint, pieces of equipment etc. on an old boat make sense and become recognizable as distinct objects. Kind of magic, fueled by machine learning and neural networks that learned from thousands of other images to realistically fill the gaps while enlarging. (Actually far more impressive in use than the examples even show). Was wondering if this sort of machine learning could be applied to audio to create more detail and realism, and how that differs from upscaling on current appriaches like for the m scaler. Mr. Watts?

2 Likes

This is part and parcel of electric guitar / synthesizer effects in creation and production. The problem is that adding ‘details and realism’ at playback fundamentally changes the music to something different than the artist intended, and can radically shift the presentation to different genre or era. Playback calls for relatively modest changes to preserve the nuances, edges, and details of the original.

The consequences of adding details can be easily heard through chorus, reverb, flange, or other effects demos. Reverb is notable because it’s basically the echo experienced in a cave and some live venues.

Tube amps kind of sort of do this by adding harmonics versus solid state amps too.

2 Likes

But I suspect there remains a possibility for a neural network to learn from other well recorded audio and add plausible texture. I would have said that this wasn’t truly possible with imagery until I just saw it for myself. Also think of the recent 4k version of the 19th century Lumiere brothers train film thar appears to be contemporary. It may not be completely accurate to how it would like if filmed today, but believably gives the illusion that it was. Imagine this exact application to audio! https://m.youtube.com/watch?v=3oeDsUh5msY https://m.youtube.com/watch?v=3RYNThid23g&t=3s

1 Like

This video does a better job of explaining the imagery software. What if the same AI method was applied to ‘say you have a tap’ ?https://m.youtube.com/watch?v=8Xm-GqQyToM&feature=youtu.be

@Torq I’ve been spending the last couple of months trying to dig in and understand upsampling, and your description has been very helpful. So, thank you!

One thing I’ve been pondering is whether or not it would be desirable or practical to upsample files I own with software in order to replicate the benefits of something like an m-scaler. While I appreciate that m-scaler is probably among the best options for “real-time” upsampling from any source (including streams), it seems I should be able to accomplish nearly the same thing if I’m willing to sacrifice processing time - or am I wrong about that? That is, if there is good software out there, shouldn’t I be able to feed it my FLAC or WAV files, give it a few minutes, and have it spit out an upsampled file to feed to my DAC? Extending that theory further, is there any benefit to m-scaler (vs software) other than latency?

In a first attempt, I did some research after reading this article on the m-scaler, and found this page by its author, which contains some freeware, including a “Filterless DAC Simulater” that takes any 44.1 kHz .wav file and upsamples it to 176.4 kHz. After performing blind listening tests (thank you to my very patient wife), listening through Hugo 2 w/ Utopia, I was not able to successfully discern the difference. Granted, I performed this test with only one track from a classical piano recording (Rach 2 piano conerto), but I’m guessing there is more to the “filterless” story and/or the upsampling only being 4x. However, my question then would be: isn’t the rest of that stuff the DAC’s job? You mentioned that…

So, then, are the m-scaler AND (for instance) DAVE both doing filtering and noise shaping? Isn’t that just daisy-chaining DACs together (with the first one not actually completing the analogue conversion, of course)? Or, if the m-scaler’s filtering and noise shaping are what is used, are we then essentially throwing out the upsampling and filtering functions of the DAVE and only using it for the final piece of the D to A conversion?

To close out with a few final thoughts, resources, and observations:

  • Here is another highly academic page with upsampling information from Stanford University

  • I performed intensive listening sessions comparing Roon’s native upsampling on vs off (on a fairly powerful PC), and found a tiny loss of detail with it turned on, so I’ve decided against using it for now. The loss of detail was most evident to me when listening to plucked notes on an upright bass, which lost a bit of their “grittiness” (on hard-plucked notes) and got artificially smoothed out.

  • I have heard Dave with and without M-Scaler in a show setting, and I definitely noticed a positive difference, but I am still trying to understand exactly how they pair together on a technical level to product that change

Looking forward to everyone’s thoughts and ideas!

2 Likes

It’s certainly practical. Whether it is desirable depends on exactly what/how the upsampling is done (to what rate, using what algorithm/filter, and in what format if a conversion occurs), whether that includes noise-shaping and filtering, what the source material is and, of course, what DAC you’re feeding the results to.

With suitable software and a powerful enough machine, you can do similar things to the M-Scaler. The actual algorithms and filters will be different (if similar), which may or may not result in audible differences/preferences.

You want “HQPlayer” …

You absolutely can do that. I think the “desktop” version of HQPlayer (~$260), which is generally regarded as the best audiophile-centric tool for the job (as opposed to “studio” or “professional” tools like SoX or iZotope) only allows on-the-fly conversion/playback.

If you want to do batch conversion, and keep the upsampled output as files you need “HQ Player Pro”, which is $3,100 …

Yes.

If you do real-time/on-the-fly processing, you don’t need a big, powerful, computer to run the upscaling software on. Just a small, silent, box. And the M-Scaler doesn’t need to be patched, maintained, rebooted, won’t need fans (any computer doing serious upsampling with HQPlayer is going to need proper cooling … we’re talking big GPUs and/ or a fast multi-core CPUs for the best upsampling, noise shaping and filter options), nor does it add all the other issues that having a computer connected to one’s DAC can introduce.

If you do batch conversion and store/play the converted files then a) your storage requirements go up significantly (HUGELY if you’re converting to ultra-high rate DSD … think tens of gigabytes per album rather than megabytes). You’re also now stuffing far more data across your network.

You can directly feed the M-Scaler from essentially any digital source, where the software route requires you first create files from the source material.

Bear in mind in full-bore mode, the M-Scalers do have about 1.4 seconds of latency (the HMS has a low-latency mode, at 1 f/s, for video use), so they’re really for music playback rather than any other scenario. Though in comparison, depending on settings and the power of the computer involved, we can be talking minutes (or more) latency doing this in software.

The WTA1 filter in Chord DACs is bypassed when coming from an M-Scaler via DBNC.

In DX output mode the M-Scalers only need a digital amplifier (or the pulse-array DAC elements) to do full conversion from digital to analog. So in the future, when the DX amps are release by Rob/Chord, you wouldn’t necessarily need a separate DAC unit at all.

6 Likes

@Torq Thank you so very much for this incredibly thoughtful and thorough response. I believe I have a much better grasp. I may just have to spring for HQPlayer, and see how it goes, but good points on the benefits re: storage.

One final piece I don’t still grasp, however:

If that’s the case, why any sonic benefit as you go from feeding a Hugo 2 vs Hugo TT2 vs Dave? Clearly that is indeed the case, but I’m just missing an understanding of what (processing intensive) steps would be left for the DAC to perform. Is the “hard part” not done, then? What makes Dave > Hugo 2 after the upsampling, filtering, and noise shaping?

Interestingly, I was able to chat with Rob Watts for a minute after one of his presentations at NYC CanJam and I was trying to weigh the importance of M-Scaler in the chain, and when asked, he went as far as to say he’d rather run an M-Scaler → Hugo 2 than Dave alone (though he had to think about it for a few seconds).

2 Likes

Because there’s LOTS more differentiating those products than just the first stage of the upsampling process.

They all have different power supply configurations, regulation/filtering, the output stages are different (both in terms of component quality and implementation), and they differ in the number of pulse-array elements they have (which is a major part of the analog conversion).

From memory, I believe Mojo and the Original Hugo and Hugo TT had 4 pulse-array DAC elements. Hugo 2, TT2 have 10. DAVE has 20.

Going back to the bypassed WTA1 filter. The actual (simplified) processing sequence is:

1 fs to 8 fs native input gets upsampled to 16 fs by the WTA1 filter. This results in an upsampled rate of either 705.6 or 768 kHz (the same as the output rate from the M-Scaler, and since this already done coming out of the M-Scaler it can be skipped by the DAC), depending on whether the source material is 44.1 or 48 kHz.

Then the WTA2 filter upsamples by a further 256 fs. This yields a 2048 fs upsampled data stream which then goes through the pulse-array for noise shaping and analog output at 104 MHz and output as the analog signal.

5 Likes

Got it. Incredibly helpful, again. Thank you so much!

1 Like

Blockquote any computer doing serious upsampling with HQPlayer is going to need proper cooling … we’re talking big GPUs and/ or a fast multi-core CPUs

What a difference a year makes: Mac Mini M1 is a great HQP box, regularly on sale for $600, HPQ 4 runs natively on Apple Silicon, performance equivalent or better than Intel Core i7/i9, completely silent.

Still more “work” than just plugging in MScaler, but many more options, more versatile (e.g. can also run Roon Core), more future headroom via software updates, more tweaking of sound in HQP. This is unlike the current Chord product line which is getting somewhat stale, with Blu Mk II (the Dave version of MScaler) now relegated to the ‘legacy products’ bin on the Chord web site).

Does the M1 have enough grunt to run HQPlayer at its highest levels of upsampling, filtering/modulators and noise shaping?

I’ve not tried that on an M1, since I pretty much stopped piddling about with HQPlayer when I stopped reviewing DACs - as outside applications with true NOS DACs I never found it to actually improve anything on an audible level.

To put it to bed, for me, once and for all, I recently did a bunch of true, blind, properly controlled A/B and AB/X tests* using PGGB pre-processed files, HQPlayer (pre-processed) file and raw source files. That was possible since in all test cases it could be done with pure software AB/X testing (and a true blind comparison) as the only differences are which file is played.

The end result of that was me only being able to reliably tell that about 1 track in 20 had been processed at all. Put the M-Scaler in the chain and it was less frequent than that.

Blu-Mk2 is indeed done, as they can’t get the drive mechanisms anymore. I use that, rather than the HMS I also had, as it is a better aesthetic match to the Chord stack I run. My understanding is that Blu-Mk2 will be replaced with a new transport-less M-Scaler in the Choral chassis (i.e. matches DAVE), possibly with a 6M TAP filter and higher level of upsampling (since if you don’t upsample further, the longer filter does you no good).

I think both HQPlayer and PGGB are things people should try for themselves - provided they are willing to deal with what they take to use (very different requirements and operation, HQPlayer is processing intensive, PGGB is storage intensive).

They are not for me, however, as detecting any differences (let alone “improvements”) required such focused, concentrated, listening as to render it way too much faffing about … especially as I don’t listen to music like that outside of auditioning gear. I personally consider the various claims of night/day differences with anything other than true NOS DACs as pure hyperbole.


*At the next meet I host, I’ll have that set up so people can try that AB/X for themselves. There will be a suitable incentive for people to pass that test - probably a set of flagship cans of their choice (conventional stuff, so no HE1, nor HiFiMan electrostatic nonsense).

4 Likes