21st Century Audio links – introducing 20S4.
Below is the article by Andrew Hills (Cleve House Audio), Mike Law (BCD Audio) published in September’s issue of Resolution Magazine.
In days of yore before we knew the answer is 42 and analogue audio ruled the world moving audio around was simple; you ran a cable (or twisted pair if you had a professional set up) and listened to the other end. It worked well, so long as you didn’t mind a bit of added noise, some crosstalk from the other channels if the cables were too close together and a bit less HF if they were long. If the source was needed in two places then you added a splitter. However this splitter brought all sorts of compromises; if the source was a microphone then more added noise resulted and earth loops were also a pain. As the demands of channel counts and distances between source and user grew, analogue copper made way for digital systems. The advantage of placing the mic amp and ADC close to the microphone and multiplexing the resulting digital data onto a single link capable of carrying many channels without crosstalk, added noise or loss of high frequencies (other than those parameters defined by the ADC) was and remains obvious. AES/EBU for stereo and MADI for multichannel became the almost universal standards. With great foresight AES/EBU and MADI both allow 24 bit audio paths. (Don’t forget these formats were defined when 16 bit convertors were hard to get and expensive!). A 24bit path offers performance better in most respects than achievable with even today’s “24 bit” convertors; most of which are only linear to about 21 bits or so.
To get the optimum performance from the ADC the analogue signal level must be carefully controlled. With analogue systems the relative distortion, up to the voltage limits of the chips, stays mostly constant; in digital systems the relative distortion increases as the level decreases. This is known as quantisation noise and is very unpleasant and prohibits adding much gain in the digital domain. Twenty dB or so is a realistic limit before quantisation noise becomes evident. So console manufactures created remote control analogue mic amps which adjusted the pre ADC gain over a range of some 70dB or so. This covers the signal level produced by all types of mic in all environments.
But we have a problem, especially in live sound set ups, where you want the same signal shared across several users. Routing the same digital signal to several users is simple in most systems and there is no compromise with the audio quality. Typically you need to create a mix for the main PA system (known as Front of House or FoH) plus from the same mic a different mix for the artists on the stage (commonly called Monitor Mix). In some cases a third mix for a broadcast or record feed is also required. So, very quickly we end up with the problem of who “owns” the analogue gain control of the pre-ADC mic amp? If, for example, the FoH mix engineer needs a bit more gain on, say, the vocal mic he adjusts the relevant control. This fixes his problem but now the monitor mix engineer suddenly, without him changing anything, finds his balance has gone wrong so he has to turn the mic gain down again. Chaos ensues! A number of console manufactures have created clever systems whereby one of the desks “owns” the analogue gain. Should the master desk change the gain of a mic amp all the “slave” consoles adjust their digital gain by the same amount in the opposite direction thus preserving their balance. This helps the balance problem but does not help if the analogue gain is too high and the signal is clipping; it still clips! If the offsets are large you get increased quantisation noise. Further, these control systems are proprietary so you must use the same manufactures for all the desks.
Stagetec have, with their TruMatch mic amp, partly solved this problem. Each mic amp has a number of converters running in parallel with fixed gain offsets to cover the whole range required by both microphone and line level signals. These multiple convertor outputs are “mixed” in some DSP to provide a 28 bit signal to the link and distribution systems covering the entire gain range without the need for an adjustable analogue control. This wide range digital signal is then “normalised” with a mic gain control in the digital domain to provide a standard level to the channel processing. This allows several consoles to share a mic but with individual control of mic gain. (The use of multiple ADCs in each mic amp is unfortunately very costly and has therefore not come into widespread use). New technology fast ADCs are beginning to offer so called “full range mic amps” for only a small premium on existing designs. But the resulting 28 bit or more signals are not standard PCM and cannot be transported or stored on standard 24 bit systems. The dream would be to have a signal from the mic amp’s ADC that carries the full potential dynamic range at sufficient bit depth for any mic or line signal which may be carried over a standard 24 bit system. This signal could then be routed to as many consoles or recording devices at the same time as required without the risk of unexpected overload (great for live operations) or unexpected balance changes due to one operator adjusting the input gain. Further, this new signal could be sent over any standard 24 bit digital links such as embedded video or IP based networks.
An additional benefit would be if such a system became standard, this “uncontrolled” full range audio signal could be recorded on standard 24 bit recorders allowing the post production operator to effectively control the recorded mic gain in post. (Small or non-existent ENG crews often means nobody has the time to check mic levels before recording!)
We need a system that offers high resolution; >18 bits, over a wide dynamic range; >70dB, without added latency, within a standard AES 24 bit word format.
So how can this be achieved?
Digital Companding ideas are worth a look. It is a fact that we are used to linear PCM signals, where the step size of every bit is the same linearly. However our hearing works logarithmically so that each step should ideally be a logarithmic unit , ( i.e. a very small dB). Unfortunately a very time consuming algorithm is required to make this work, or enormous lookup tables. But compromises in the algorithm can be made which are simple enough to implement and good enough for telephone use. The telephone system uses 16bit linear samples, but passes them to the Encoding module that creates 8bit samples for transmission. The receiver takes in the 8bit signal, reverses the encoding, and creates 16bit linear samples. No latency or delay is required. Unfortunately the world could not agree what algorithm to use, and today we use two similar but different systems: µlaw and Alaw.
The NICAM idea is well known; this reduces the dynamic range of the transmitted signal by buffering the signal, and looking at the peak audio level over a period of say 32mS. The signal is scaled, and information about the peak audio level for this period is transmitted as a scaling index, along with the scaled audio signal. The receiver picks up both signals; reverses the scaling, and recreates the original signal. It is successful, as the Psycho-acoustic masking principle can be applied, in that loud signals dominate very quiet signals with little loss of quality.
The big problem, however, is the signal delay that has to be introduced in order to find the loudest peak in that time interval. Also, where are we going to send the scaling information? If we send it via a different path, such as User bits or channel status it could easily get lost, and it would be disastrous if the receiver ever applied the wrong scale factor to any sample.
Floating point from the computer industry should also be considered. This uses a 24bit linear part, and an 8bit scale factor, resulting in 32bit samples. But we are trying to pass a signal down an established 24bit path. Unfortunately, the linear part of floating point always has to be normalised, meaning that it is not feasible to reduce the width of the scale factor, as we need to scale down small signals to a large degree. The linear part could be truncated further to 20 or 16bits , but a 16bit linear part is not good enough for the purpose we need for high quality uncontrolled audio.
What is the alternative?
It is possible to take the best features of these systems, and produce a robust system that is simple to implement, and has good enough quality for the purpose. Furthermore no time delay or latency is introduced. A 32bit linear PCM signal is taken to the encoder, and split into a 20bit linear part, and a 4bit scale factor. The resulting 24bit signal is compatible with today’s AES3/ MADI/ SDI and AoIP Networking pathways. The decoder takes in the 24bit signal, and multiplies the 20bit linear part with the power of two derived from the 4bit scale factor, resulting in a 32bit linear PCM signal, which is accurate enough for the purpose. As the scale factor is always sent with its matching linear factor, each sample can be optimised, and there is no danger of the wrong scale ever being applied to the wrong sample. Let’s call this signal 20S4.
20S4, in detail.
A 20bit linear part implies a 120dB dynamic range, and if the scale factor is optimised, the distortion and noise components of a real signal will be below 120dB. The linear part therefore defines the quality of the signal.
A 4bit scale implies a 96dB dynamic range, and is logarithmic, so it matches the psychoacoustic requirements of audio. Therefore, in theory, we have a 120 + 96 = 216dB dynamic range!
For real signals and applying more rigorous analysis the range is less than this, but is good enough for the purpose.
Large signals – For large signals, the most significant 20bits of the original signal are transmitted, so that there is some audio loss of quality. Compared to 24bit PCM, the 20S4 signal tends to create 0.0001% THD+Noise in this area, which is worse than 24bit Linear PCM, but far better than that needed for all practical purposes. Remember these scale factors are only applied for the loudest samples, and are better than the best analogue systems.
Medium level signals – When the samples are quieter, the 20S4 encoder scales the signal so that fewer bits of the original signal are lost. If any individual sample is below -18dBFS, we have only lost one bit of the original signal, and below -24dBFS the signal is identical in performance to standard 24bit PCM. Note that the THD+Noise of any signal will rise as the level reduces due to signal truncation. (Quantisation noise as mentioned in the pre-amble.) Contrary to some comments, it is not possible to reduce the THD to zero by dither, without introducing a matching amount of noise from dither. So the THD+Noise figure is the same with or without dither.
Low level signals – When the audio is quieter than -30dBFS, the 20S4 signal is capable of feeding in more bits from our 32bit source samples, and if these are not present, a small amount of dither can be useful.
The equivalent 24bit PCM signal can only truncate our original source. Therefore 20S4 will out-perform 24bit PCM in this region, and can maintain constant THD+Noise versus the rising THD+Noise of 24bit PCM with reducing signal level.
Very low level signals – For the very quietest of signals we assert minimum scale factor, but the 20bit section cannot be normalised. The signal behaves like ordinary PCM, except it is scaled down by 96dB. The scaled signal is very similar to floating point above this level, but departs from floating point at this stage, as it is not normalised.
Deep Thought – Bringing these ideas together, we can now see a high-resolution stage box is possible using standard 24 bit capable links to the desk or record device.
Ideally, the mic amplifier used is good enough to produce optimised 32bit samples with all incoming signal levels. If this is possible, any mixer capable of decoding 20S4 only needs a local gain control, and the stage box analogue gain control is no longer required. Thus this 20S4 signal can be routed to several users in parallel, or recorded on today’s standard systems, with each user having his or her own individual gain control without loss of audio performance.
Sometimes innovation comes from a hard lateral look at existing concepts, and applied in a new way to give large benefits. I hope we have shown it can be worthwhile, and the answer is not 42, but 20S4!
Andrew Hills (Cleve House Audio), Mike Law (BCD Audio)
August 2015.