There are a bewildering range of audio formats across the high-fidelity world. If you are acquiring some music, there’s a good chance that you’ll have to pick one or other of the formats. Which should you choose. Which ones are better than the others … or can you indeed even tell the difference?
Well, just read on and we’ll break them all down for you so you can make the smart decision. What we’re mostly going to do is break the types of audio into two types, and keep doing that as we drill down.
Two types of high-fidelity audio format: Digital vs Analogue
I know, these days we’re mostly all about digital audio, but let’s not forget analogue. After all, virtually all our music starts its existence in analogue form, and it is all delivered to our ears in analogue form. And many still swear by keeping it in analogue form all the way through.
In what follows I’ll be leaving out many steps. For instance, it’s a rare recording which goes directly from microphone to storage and ultimate distribution without mixing, editing, EQing and other processing. Those kinds of steps are largely the same regardless of format, so we’ll leave them out.
Analogue audio
Audio – sound – is simply vibrations in the air. More specifically, it’s repetitive compressions in the air. It’s often illustrated as little waves on a pond radiating out from where a stone has been tossed in. Let’s go though step by step what happens to these vibrations:
- A musical instrument vibrates the air.
- These vibrations are captured by a microphone and turned into an electrical signal. This is an alternating current signal and, all going well, it will faithfully match the vibrations the microphone captured.
- This signal is recorded onto magnetic tape.
- Copies of the tape may end up being distributed (this was one form of high-fidelity music distribution back in the day), but mostly analogue records on vinyl would be pressed. The signal on the tape would control the cutting head that made the master. The groove on the disk would vibrate in a way that, all going well, faithfully matched the signal, and thus the vibrations initially captured by the microphone.
- You’d buy the recording at your local store, take it home and put it on your turntable (or other record player). The stylus of the turntable would sit in the groove and vibrate as the groove moved to and fro. All going well, that movement would faithfully match the movement of the groove.
- The cartridge of the turntable would turn that movement back into an electrical signal which would, all going well … you know the rest.
- The signal would be fed to an amplifier which would boost it to a level capable of driving loudspeakers.
- The loudspeakers would vibrate – faithfully, we hope – with the signal and produce vibrations in the air. Which is where we started.
I outline these steps in some detail because I was startled to find from one of my own children that in this digital age, none of this seemed intuitive. As opposed to digital audio.
Pros of analogue audio
- In theory, the resulting sound can perfectly match the sound from the original music. Perfectly, exactly, with no resolution limits.
- Degrades relatively gracefully. If perfection isn’t achieved, the sound can be degraded a lot yet still convey useful information, including musicality. I can even occasionally enjoy the music on a very scratchy 78rpm record.
Cons of analogue audio
- In reality, each of those transformations is not performed perfectly. Traditional high-fidelity specifications are attempts to indicate how imperfectly each is performed.
- Some of the imperfections introduce truly objectionable degradations of the signal, such as surface noise from vinyl recordings, including random clicks and ticks; relatively high levels of noise on tape recordings; misalignments in the noise reduction systems designed to limit that noise; significant levels of harmonic distortion introduced at every transformation (just playing back an LP under ideal circumstances introduces about 1% THD).
- The usual inconveniences of analogue audio, such as limited playback time before having to swap disks, inconvenient track selection and so on.
Digital audio
Much of the process of recording and delivering digital audio is actually identical to that for analogue audio. Specifically, steps 1 and 2, and 7 and 8 above are the same. The sound starts off analogue and is captured by an analogue microphone. And it ends up analogue, and it is (usually) boosted by analogue equipment and fed in that form to analogue loudspeakers which create analogue sound in the air. The digital part is about what happens between the microphone and your playback equipment.
Let’s look at that:
- The signal captured by microphone is converted to a digital format.
- That digital version of the signal is recorded onto computer-style storage as a file.
- The data can then be converted to a CD or distributed as a digital audio file, either as a download or by streaming.
- You’ll buy that recording whether at a CD shop, a digital download store or a streaming service and your player – CD player or digital audio player – will convert the digital audio back to analogue.
Pros of digital audio
- In practice, the digital audio can be copied from initial storage to distribution medium to your playback equipment without any degradation whatsoever (in the case of lossless digital audio, as we’ll see).
- Very convenient, especially with streaming services.
Cons of digital audio
- Cannot even in theory capture perfectly the original signal captured by the microphone.
Let’s look at that “Con” in a bit more depth. Why can’t digital audio capture the original signal perfectly? Well, it turns out that digital audio is always an approximation of the original analogue signal because to define it perfectly would require an infinite amount of data.
That’s obviously impractical. The trick with high-fidelity digital audio has always been to represent the original signal closely enough that it is indistinguishable from the original by the human ear. Of course, that “closely enough” is a highly contested term.
Two types of digital audio format: PCM vs DSD
The great majority of digital audio uses the PCM format. That stands for Pulse Code Modulation. A much smaller amount uses DSD, or Direct Stream Digital. In both cases, they record the analogue waveform by representing it with numbers. But they do it in quite different ways. Let’s look at them individually.
PCM
With PCM, the level of the analogue signal is measured at regular – and extremely frequent – intervals of time. For CD-standard audio, that’s 44,100 times every second. For things like movie sound, it’s 48,000 times per second. For high resolution sound, it’s more often, typically, 88,200, 96,000, 176,400 or 192,000 times per second. Sometimes even more.
The measurements use one of two measuring sticks, as it were, one more precise than the other. For CD-standard audio, the measurements are 16-bits. In our more common decimal numbers, that works out to a 65,636. That is, the number that’s recorded every 1/44100th of a second has a value somewhere between -32767 and +32768.
High resolution audio and virtually all professional recording uses 24 bits. That works out to more than 16.7 million.
The higher the sampling rate and the higher the bit depth, the greater precision with which the analogue audio is captured.
Once the digital audio is in your home, then a digital to analogue converter returns it to analogue sound for you to enjoy.
Pros of PCM
- Supported by all digital audio devices
- Wide range of resolutions available
- Can be digitally processed, mixed, EQed, dynamically compressed, have reverb added, and so on for all manner of studio tricks
- Can be digitally compressed losslessly, preserving original quality, in order to consume less bandwidth or storage space
- Can be digitally compressed into a lossy format, allowing enormous bandwidth/space savings for some marginal loss of quality
Cons of PCM
- Some audiophiles consider it to sound inferior to analogue, and some to DSD
DSD
DSD stands for Direct Stream Digital. We do a deep dive into this format here. But, briefly, like PCM it is a digital audio format. Like PCM, it captures and transports an analogue waveform in the form of a stream of numbers. Where it differs is how it represents the waveform. While PCM represents each point in the waveform as though the analogue waveform were sketched out on a graph (the Y-axis is the level, while the X-axis is time), DSD represents it using only binary values – 1s and 0s. The more 1s the higher* the waveform the waveform at that point. The fewer, the lower. This way of representing digital audio is called Pulse Density Modulation. It seems like a new system, but it was actually invented back in the 1950s. It was the use of DSD on the Super Audio CD beginning in 2000 that brought it widespread notice. This stream of 1s and 0s proceeds at more than 2.8MHz; more than 2.8 million times per second.
There’s more to DSD than just that. For example, DSD also uses noise shaping to improve the signal-to-noise ratio in the audible band. And in the years after the appearance of that first form of DSD – these days it’s called DSD64 or 2.8MHz DSD – higher resolution versions of DSD have been introduced: double speed (DSD128, 5.6MHz DSD), quadruple speed (DSD256, 11.2MHz DSD) and beyond.
Pros of DSD
- Many audiophiles believe that it sounds better than PCM, especially the higher speed versions such as DSD128
Cons of DSD
- Many devices do not support DSD, although most audiophile DACs and streamers do support it
- Cannot be digitally processed in native format (particularly a problem for multichannel DSD, since routine functions like speaker time alignment and bass management do not work for DSD, so it must be converted to PCM, thereby obviating the presumed benefits of DSD)
- Little compression possible, leading to large file sizes
- With some equipment requires kludges to work, such as use of DoP (DSD over PCM) to pass DSD over USB with some devices
Two types of PCM audio compression: Lossless vs Lossy
Digital audio files consume a lot of space. Even with modern digital audio players, which may support up to a terabyte of storage, it’s all too easy to fill them up and run out of space if you have a collection of a more than a few hundred albums. And then there’s the matter of keeping data demands down when streaming audio, especially when you’re on a mobile phone.
Which is why we reduce the size of digital audio files. This is called compression (and should not be confused with dynamic compression, which may be performed on the audio by the studio for other reasons). The two broad categories of digital audio compression are lossless and lossy.
Lossless audio
Lossless compression is a system for reducing the size of digital audio files in such a way that the original PCM audio can be reconstituted perfectly.
The most common lossless compression file formats for music are FLAC (Free Lossless Audio Codec) and ALAC (Apple Lossless Audio Codec). (There are also systems widely used on Blu-ray, such as Dolby TrueHD and DTS-HD Master Audio.) These compression systems squeeze the PCM digital audio file to somewhere between 40% and 60% of its original size, saving quite a lot of space and bandwidth. We’ve explained how these systems work here.
Pros of Lossless audio compression
- Reduces the amount of space or bandwidth of digital audio to, typically, approximately half its original size
- Allows the original PCM source to be perfectly reconstructed for zero degradation in audio quality
- Supports high resolution formats (up to 384kHz sample rate)
- FLAC very widely supported in quality audio gear
- ALAC universally supported on Apple gear and quite widely supported on other quality audio gear
- FLAC and ALAC files support ID3 tags (eg. Artist, Album, Genre, album art) to allow easier music management and for display during playback (this was not always the case for uncompressed WAV files)
Cons of Lossless audio compression
- The approximately fifty percent reduction in size is a relatively modest space saving
Lossy audio file formats
Lossy audio file formats use various tricks to achieve much higher compression ratios than lossless formats. A higher compression ratio means more music on portable audio devices and more reliable streaming.
Most achieve high compression ratios by using perceptual encoding techniques. These arise from psychoacoustic studies suggesting some elements of sound in recordings cannot be perceived by listeners. The compression process identifies and removes these. Of course, whether the changes really are inaudible is subject to dispute.
In all likelihood, there’s a gradient of effect. Most lossy compression systems allow the level of compression to be selected. A frequently-used MP3 setting compresses the audio to 128kbps (kilobits per second) and this is broadly accepted by average listeners – although not audiophiles – as at least good enough. However, the differences between the original sound and an MP3 running at 64kbps can be easily discerned by just about anyone. MP3 files made available for those who, for example purchase recordings on vinyl, are typically encoded at 320kbps, the maximum level available for the format. These are accordingly more difficult to distinguish from the original.
An uncompressed CD runs at 1411kbps, so at 128kbps, an MP3 file consumes only 9.1 % of the space required for the original uncompressed audio
The transparency of the MP3 format also depends on the quality of the encoder. Technical advances in encoders improved quality even at the same bitrate. (I must agree with this claim from experience. I’ve been converting stuff to MP3 since the late 1990s and some content sounded terrible at 128kbps using the encoders back then, and now sounds quite good.)
MP3 was one of the first compression formats of this kind. Since it was developed, improved formats working on similar principles have been developed. These typically provide similar audio quality for lower bitrates, or higher audio quality for similar bitrates. The most popular is AAC, the long-standing format favoured by Apple for its music store and streaming. A version of AAC, called HE-AAC v2 – HE stands for high efficiency, allowing lower bitrates for similar sound quality – is the format used for DAB+ digital radio in Australia.
Pros of Lossy audio compression
- Tremendous data reductions for modest loss in fidelity
- Was vital for the advance of Internet-based audio distribution in the early days of low bandwidth communications and players with modest storage capacities
- Remains important in applications such as digital radio
- Virtually universal support for at least MP3 and AAC
Cons of Lossy audio compression
- Loss of fidelity, sometimes modest, sometimes quite significant
Two types of lossless audio formats: High resolution vs standard resolution
Finally, the CD and even the SACD have long since ceased to be the last word in the precise delivery of audio. These days we call CD standard audio – PCM with 16 bits and 44.1kHz sampling – standard resolution. It provides a signal to noise ratio of around 96dB and a frequency response out to slightly above 20,000 hertz.
SACD extends that and so can be called high-resolution audio. And so can the many higher PCM audio standards in which a lot of music is now available. Commonly these use 24 bits and sampling frequencies of 88.2kHz, 96kHz, 176.4kHz and 192kHz.
Typically, these will be losslessly compressed into FLAC or ALAC format. For example, some high-resolution material is available from Apple Music in ALAC format. Also used is MQA, which is a kind of compression system for squeezing what is claimed to be high resolution audio losslessly into standard resolution file sizes. This is available on the TIDAL streaming service.
Can the difference be perceived? There aren’t really any properly conducted scientific studies conclusively demonstrating that people can detect a difference between the same program material in standard and high-resolution formats.
But a big part of high-end audio is removing all hindrances, and possible hindrances, to the pure delivery of perfect audio. So, if its available, and subject to practical considerations (space on your storage device, for example), why not?
Pros of high-resolution audio
- Perhaps improves fidelity compared to standard resolution audio, and definitely ensures that any limitations that standard resolutions may impose are removed
Cons of high-resolution audio
- Uses more space
- May cost more for downloads or for high resolution support on a streaming service
Gear – what do you need to handle all these file formats?
When I’m buying gear for playing music when I’m out and about, and also for the main audio system in my listening room (and for my secondary system), I look for compatibility. Within reason. I have around three hundred vinyl recordings, and of course I don’t expect any portable gear to play them. I have my turntable plugged into the main system for them.
But with digital audio, I want gear which streams my favourite Internet services (TIDAL and Spotify) and handles every main digital audio format, standard and high resolution, lossily and losslessly compressed, PCM and DSD. I can’t say that I’ve checked all possible portable audio players out there, but I’ve been consistently impressed with both Astell&Kern and FiiO portable players.
With the main audio systems, the options have been DACs and streamers. I generally look for DACs capable of handling PCM up to 384kHz and DSD256. Topping and iFi Audio DACs have proved to be first-class choices. As have Moon by Simaudio streamers, which of course also operate as top notch DACs. Also check out the offerings from Matrix Audio for exceptional streaming.
Conclusion
I’m writing this in 2021, and I have to say that regardless of audio format, we now have audio equipment that will perform better than any gear ever in the history of music reproduction. When it comes to specific formats, choose those which work with your equipment … and choose equipment which works with the widest range of formats possible. You never know if that track that you truly want is in some kind of obscure format.
* Actually, I haven’t been able to definitively establish whether it is this way around, or whether the more 1s the lower the waveform at that point.