Добавлен: 03.02.2019

Просмотров: 21703

Скачиваний: 19

ВНИМАНИЕ! Если данный файл нарушает Ваши авторские права, то обязательно сообщите нам.
background image

Audio Compression Systems 7-11

7.1.1c

Prediction and Transform Algorithms

Most audio-compression systems are based upon one of two basic technologies [2]:

Predictive or adaptive differential PCM (ADPCM) time-domain coding

Transform  or  adaptive PCM (APCM) frequency-domain coding

It is in their approaches to dealing with the redundancy and irrelevancy of the PCM signal that
these techniques differ.

The time domain or prediction approach includes G.722, which has been a universal standard

since the mid-70s, and was joined in 1989 by a proprietary algorithm, apt-X100. Both these algo-
rithms deal mainly with redundancy.

The frequency domain or transform method adopted by a number of algorithms deal in irrele-

vancy, adopting psychoacoustic masking techniques to identify and remove those unwanted
sounds. This range of algorithms include the industry standards ISO/MPEG-1 Layers 1, 2, and 3;
apt-Q; MUSICAM; Dolby AC-2 and AC3; and others.

Subband Coding

Without exception, all of the algorithms mentioned in the previous section process the PCM sig-
nal by splitting it into a number of frequency subbands, in one case as few as two (G.722) or as
many as 1024 (apt-Q) [1]. MPEG-1 Layer 1, with 4:1 compression, has 32 frequency subbands
and is the system found in the Digital Compact Cassette (DCC). The MiniDisc ATRAC propri-
etary algorithm at 5:1 has a more flexible multisubband approach, which is dependent on the
complexity of the audio signal.

Subband coding enables the frequency domain redundancies within the audio signals to be

exploited. This permits a reduction in the coded bit rate, compared to PCM, for a given signal
fidelity. Spectral redundancies are also present as a result of the signal energies in the various
frequency bands being unequal at any instant in time. By altering the bit allocation for each sub-
band, either by dynamically adapting it according to the energy of the contained signal or by fix-
ing it for each subband, the quantization noise can be reduced across all bands. This process
compares favorably with the noise characteristics of a PCM coder performing at the same overall
bit rate.

Subband Gain

On its own, subband coding, incorporating PCM in each band, is capable of providing a perfor-
mance improvement or gain compared with that of full band PCM coding, both being fed with
the same complex, constant level input signal [1]. The improvement is defined as subband gain
and is the ratio of the variations in quantization errors generated in each case while both are
operating at the same transmission rate. The gain increases as the number of subbands increase,
and with the complexity of the input signal. However, the implementation of the algorithm also
becomes more difficult and complex.

Quantization noise generated during the coding process is constrained within each subband

and cannot interfere with any other band. The advantage of this approach is that the masking by
each of the subband dominant signals is much more effective because of the reduction in the
noise bandwidth. Figure 7.1.3 charts subband gain as a function of the number of subbands for
four essentially stationary, but differing, complex audio signals.

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Audio Compression Systems


background image

7-12 Compression Technologies for Audio

In practical implementations of compression codecs, several factors tend to limit the number

of subbands employed. The primary considerations include:

The level variation of normal audio signals leading to an averaging of the energy across bands
and a subsequent reduction in the coding gain

The coding or processing delay introduced by additional subbands

The overall computational complexity of the system

The two key issues in the analysis of a subband framework are:

Determining the likely improvement associated with additional subbands

Determining the relationships between subband gain, the number of subbands, and the
response of the filter bank used to create those subbands

APCM Coding

The APCM processor acts in a similar fashion to an automatic gain control system, continually
making adjustments in response to the dynamics—at all frequencies—of the incoming audio sig-
nal [1]. Transform coding takes a time block of signal, analyzes it for frequency and energy, and
identifies irrelevant content. Again, to exploit the spectral response of the ear, the frequency
spectrum of the signal is divided into a number of subbands, and the most important criteria are
coded with a bias toward the more sensitive low frequencies. At the same time, through the use
of psychoacoustic masking techniques, those frequencies which it is assumed will be masked by
the ear are also identified and removed. The data generated, therefore, describes the frequency
content and the energy level at those frequencies, with more bits being allocated to the higher-
energy frequencies than those with lower energy.

The larger the time block of signal being analyzed, the better the frequency resolution and the

greater the amount of irrelevancy identified. The penalty, however, is an increase in coding delay
and a decrease in temporal resolution. A balance has been struck with advances in perceptual

Figure 7.1.3

 Variation of subband gain

as a function of the number of sub-
bands. (

From [2]. Used with permis-

sion.)

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Audio Compression Systems


background image

Audio Compression Systems 7-13

coding techniques and psychoacoustic modeling leading to increased efficiency. It is reported in
[2] that, with this approach to compression, some 80 percent of the input audio can be removed
with acceptable results.

This hybrid arrangement of working with time-domain subbands and simultaneously carrying

out a spectral analysis can be achieved by using a dynamic bit allocation process for each sub-
band. This subband APCM approach is found in the popular range of software-based MUSI-
CAM, Dolby AC-2, and ISO/MPEG-1 Layers 1 and 2 algorithms. Layer 3—a more complex
method of coding and operating at much lower bit rates—is, in essence, a combination of the
best functions of MUSICAM and ASPEC, another adaptive transform algorithm. Table 7.1.1
lists the primary operational parameters for these systems.

Additionally, some of these systems exploit the significant redundancy between stereo chan-

nels by using a technique known as joint stereo coding. After the common information between
left and right channels of a stereo signal has been identified, it is coded only once, thus reducing
the bit-rate demands yet again.

Each of the subbands has its own defined masking threshold. The output data from each of

the filtered subbands is requantized with just enough bit resolution to maintain adequate head-
room between the quantization noise and the masking threshold for each band. In more complex
coders (e.g., ISO/MPEG-1 Layer 3), any spare bit capacity is utilized by those subbands with the
greater need for increased masking threshold separation. The maintenance of these signal-to-
masking threshold ratios is crucial if further compression is contemplated for any postproduction
or transmission process. 

7.1.1d

Processing and Propagation Delay

As noted previously, the current range of popular compression algorithms operate—for all
intents and purposes—in real time [1]. However, this process does of necessity introduce some
measurable delay into the audio chain. All algorithms take a finite time to analyze the incoming
signal, which can range from a few milliseconds to tens and even hundreds of milliseconds. The
amount of processing delay will be crucial if the equipment is to be used in any interactive or
two-way application. As a rule of thumb, any more than 20 ms of delay in a two-way audio
exchange is problematic. Propagation delay in satellite and long terrestrial circuits is a fact of
life. A two-way hook up over a 1000 km, full duplex, telecom digital link has a propagation delay

Table 7.1.1 Operational Parameters of Subband APCM Algorithm

 (

After [2].)

Coding System

Compression 

Ratio

Subbands

Bit Rate,

kbits/s

A to A Delay, 

ms

1

Audio 

Bandwidth, kHz

Dolby AC-2

6:1

256

256

45

20

ISO Layer 1

4:1

32

384

19

20

ISO Layer 2

Variable

32

192–256

>40

20

IOS Layer 3

12:1

576

128

>80

20

MUSICAM

Variable

32

128–384

>35

20

1

 The total system delay (encoder-to-decoder) of the coding system.

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Audio Compression Systems


background image

7-14 Compression Technologies for Audio

of 3 ms in each direction. This is comparable to having a conversation with someone standing 1
m away. It is obvious that even over a very short distance, the use of a codec with a long process-
ing delay characteristic will have a dramatic effect on operation.

7.1.1e

Bit Rate and Compression Ratio

The ITU has recommend the following bit rates when incorporating data compression in an
audio chain [1]:

128 kbits/s per mono channel (256 kbits/s for stereo) as the minimum bit rate for any stage if
further compression is anticipated or required.

192 kbits/s per mono channel (384 kbits/s for stereo) as the minimum bit rate for the first
stage of compression in a complex audio chain.

These markers place a 4:1 compression ratio at the “safe” end in the scale. However, more
aggressive compression ratios, currently up to a nominal 20:1, are available. Keep in mind,
though, that low bit rate, high-level compression can lead to problems if any further stages of
compression are required or anticipated.

With successive stages of compression, either or both the noise floor and the audio bandwidth

will be set by the stage operating at the lowest bit rate. It is, therefore, worth emphasizing that
after these platforms have been set by a low bit rate stage, they cannot be subsequently improved
by using a following stage operating at a higher bit rate.

Bit Rate Mismatch

A stage of compression may well be followed in the audio chain by another digital stage, either
of compression or linear, but—more importantly—operating at a different sampling frequency
[1]. If a D/A conversion is to be avoided, a sample rate converter must be used. This can be a
stand alone unit or it may already be installed as a module in existing equipment. Where a fol-
lowing stage of compression is operating at the same sampling frequency but a different com-
pression ratio, the bit resolution will change by default.

If the stages have the same sampling frequencies, a direct PCM or AES/EBU digital link can

be made, thus avoiding the conversion to the analog domain.

7.1.1f

Editing Compressed Data 

The linear PCM waveform associated with standard audio workstations is only useful if decoded
[1]. The resolution of the compressed data may or may not be adequate to allow direct editing of
the audio signal. The minimum audio sample that can be removed or edited from a transform-
coded signal will be determined by the size of the time block of the PCM signal being analyzed.
The larger the time block, the more difficult the editing of the compressed data becomes.

7.1.2

Common Audio Compression Techniques

Subband APCM coding has found numerous applications in the professional audio industry,
including [2]:

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Audio Compression Systems


background image

Audio Compression Systems 7-15

The digital compact cassette (DCC)—uses the simplest implementation of subband APCM
with the PASC/ISO/MPEG-1 Layer 1 algorithm incorporating 32 subbands offering 4:1 com-
pression and producing a bit rate of 384 kbits/s.

The MiniDisc with the proprietary ATRAC algorithm—produces 5:1 compression and 292
kbits/s bit rate. This algorithm uses a modified discrete cosine transform (MDCT) technique
ensuring greater signal analysis by processing time blocks of the signal in nonuniform fre-
quency divisions, with fewer divisions being allocated to the least sensitive higher frequen-
cies.

ISO/MPEG-1 Layer 2 (MUSICAM by another name)—a software-based algorithm that can
be implemented to produce a range of bit rates and compression ratios commencing at 4:1.

The ATSC DTV system—uses the subband APCM algorithm in Dolby AC-3 for the audio
surround system associated with the ATSC DTV standard. AC-3 delivers five audio channels
plus a bass-only effects channel in less bandwidth than that required for one stereo CD chan-
nel. This configuration is referred to as 5.1 channels.

For the purposes of illustration, two commonly used audio compression systems will be exam-
ined in some detail:

apt-X100

ISO/MPEG-1 Layer 2

7.1.2a

apt-X100

apt-X100 is a four subband prediction (ADPCM) algorithm [1]. Differential coding reduces the
bit rate by coding and transmitting or storing only the difference between a predicted level for a
PCM audio sample and the absolute level of that sample, thus exploiting the redundancy con-
tained in the PCM signal.

Audio exhibits relatively slowly varying energy fluctuations with respect to time. Adaptive

differential coding, which is dependent on the energy of the input signal, dynamically alters the
step size for each quantizing interval to reflect these fluctuations. In apt-X100, this equates to the
backwards adaptation process and involves the analysis of 122 previous samples. Being a con-
tinuous process, this provides an almost constant and optimal signal-to-quantization noise ratio
across the operating range of the quantizer.

Time domain subband algorithms implicitly model the hearing process and indirectly exploit

a degree of irrelevancy by accepting that the human ear is more sensitive at lower frequencies.
This is achieved in the four subband derivative by allocating more bits to the lower frequency
bands. This is the only application of psychoacoustics exercised in apt-X100. All the information
contained in the PCM signal is processed, audible or not (i.e., no attempt is made to remove irrel-
evant information). It is the unique fixed allocation of bits to each of the four subbands, coupled
with the filtering characteristics of each individual listeners’ hearing system, that achieves the
satisfactory audible end result.

The user-defined output bit rates range from 56 to 384 kbits/s, achieved by using various sam-

pling frequencies from 16 kHz to 48 kHz, which produce audio bandwidths from 7.5 kHz mono
to 22 kHz stereo.

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Audio Compression Systems