Добавлен: 03.02.2019

Просмотров: 21701

Скачиваний: 19

ВНИМАНИЕ! Если данный файл нарушает Ваши авторские права, то обязательно сообщите нам.
background image

7-6 Section Seven

Smyth, Stephen: “Digital Audio Data Compression,” Broadcast Engineering, Intertec Publish-

ing, Overland Park, Kan., February 1992.

Terry, K. B., and S. B. Lyman: “Dolby E—A New Audio Distribution Format for Digital Broad-

cast Applications,” International Broadcasting Convention Proceedings, IBC, London,
England, pp. 204–209, September 1999.

Todd, C., et. al.: “AC-3: Flexible Perceptual Coding for Audio Transmission and Storage,” AES

96th Convention, Preprint 3796, Audio Engineering Society, New York, February 1994.

Vernon, S., and T. Spath: “Carrying Multichannel Audio in a Stereo Production and Distribution

Infrastructure,” Proceedings of IBC 2000, International Broadcasting Convention, Amster-
dam, September 2000.

Wylie, Fred: “Audio Compression Techniques,” The Electronics Handbook, Jerry C. Whitaker

(ed.), CRC Press, Boca Raton, Fla., pp. 1260–1272, 1996.

Wylie, Fred: “Audio Compression Technologies,” NAB Engineering Handbook, 9th ed., Jerry C.

Whitaker (ed.), National Association of Broadcasters, Washington, D.C., 1998.

Zwicker, E.: “Subdivision of the Audible Frequency Range Into Critical Bands (Frequenzgrup-

pen),” J. Acoust. Soc. of Am., vol. 33, p. 248, February 1961.

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Compression Technologies for Audio


background image

7-7

Chapter

7.1

Audio Compression Systems

Fred Wylie

Jerry C. Whitaker, Editor-in-Chief

7.1.1

Introduction

As with video, high on the list of priorities for the professional audio industry is to refine and
extend the range of digital equipment capable of the capture, storage, post production, exchange,
distribution, and transmission of high-quality audio, be it mono, stereo, or 5.1 channel AC-3 [1].
This demand being driven by end-users, broadcasters, film makers, and the recording industry
alike, who are moving rapidly towards a “tapeless” environment. Over the last two decades, there
have been continuing advances in DSP technology, which have supported research engineers in
their endeavors to produce the necessary hardware, particularly in the field of digital audio data
compression or—as it is often referred to—bit-rate reduction. There exist a number of real-time
or—in reality—near instantaneous compression coding algorithms. These can significantly
lower the circuit bandwidth and storage requirements for the transmission, distribution, and
exchange of high-quality audio.

The introduction in 1983 of the compact disc (CD) digital audio format set a quality bench-

mark that the manufacturers of subsequent professional audio equipment strive to match or
improve. The discerning consumer now expects the same quality from radio and television
receivers. This leaves the broadcaster with an enormous challenge.

7.1.1a

PCM Versus Compression

It can be an expensive and complex technical exercise to fully implement a linear pulse code
modulation
 (PCM) infrastructure, except over very short distances and within studio areas [1].
To demonstrate the advantages of distributing compressed digital audio over wireless or wired
systems and networks, consider again the CD format as a reference. The CD is a 16 bit linear
PCM process, but has one major handicap: the amount of circuit bandwidth the digital signal
occupies in a transmission system. A stereo CD transfers information (data) at 1.411 Mbits/s,
which would require a circuit with a bandwidth of approximately 700 kHz to avoid distortion of
the digital signal. In practice, additional bits are added to the signal for channel coding, synchro-

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Source: Standard Handbook of Audio and Radio Engineering


background image

7-8 Compression Technologies for Audio

nization, and error correction; this increases the bandwidth demands yet again. 1.5 MHz is the
commonly quoted bandwidth figure for a circuit capable of carrying a CD or similarly coded lin-
ear PCM digital stereo signal. This can be compared with the 20 kHz needed for each of two cir-
cuits to distribute the same stereo audio in the analog format, a 75-fold increase in bandwidth
requirements.

7.1.1b

Audio Bit Rate Reduction

In general, analog audio transmission requires fixed input and output bandwidths [2]. This con-
dition implies that in a real-time compression system, the quality, bandwidth, and distortion/
noise level of both the original and the decoded output sound should not be subjectively different,
thus giving the appearance of a lossless and real-time process.

In a technical sense, all practical real-time bit-rate-reduction systems can be referred to as

“lossy.” In other words, the digital audio signal at the output is not identical to the input signal
data stream. However, some compression algorithms are, for all intents and purposes, lossless;
they lose as little as 2 percent of the original signal. Others remove approximately 80 percent of
the original signal.

Redundancy and Irrelevancy

A complex audio signal contains a great deal of information, some of which, because the human
ear cannot hear it, is deemed irrelevant. [2]. The same signal, depending on its complexity, also
contains information that is highly predictable and, therefore, can be made redundant.

Redundancy, measurable and quantifiable, can be removed in the coder and replaced in the

decoder; this process often is referred to as statistical compressionIrrelevancy, on the other
hand, referred to as perceptual coding, once removed from the signal cannot be replaced and is
lost, irretrievably. This is entirely a subjective process, with each proprietary algorithm using a
different psychoacoustic model.

Critically perceived signals, such as pure tones, are high in redundancy and low in irrele-

vancy. They compress quite easily, almost totally a statistical compression process. Conversely,
noncritically perceived signals, such as complex audio or noisy signals, are low in redundancy
and high in irrelevancy. These compress easily in the perceptual coder, but with the total loss of
all the irrelevancy content.

Human Auditory System

The sensitivity of the human ear is biased toward the lower end of the audible frequency spec-
trum, around 3 kHz [2]. At 50 Hz, the bottom end of the spectrum, and 17 kHz at the top end, the
sensitivity of the ear is down by approximately 50 dB relative to its sensitivity at 3 kHz (Figure
7.1.1). Additionally, very few audio signals—music- or speech-based—carry fundamental fre-
quencies above 4 kHz. Taking advantage of these characteristics of the ear, the structure of audi-
ble sounds, and the redundancy content of the PCM signal is the basis used by the designers of
the predictive range of compression algorithms.

Another well-known feature of the hearing process is that loud sounds mask out quieter

sounds at a similar or nearby frequency. This compares with the action of an automatic gain con-
trol, turning the gain down when subjected to loud sounds, thus making quieter sounds less likely
to be heard. For example, as illustrated in Figure 7.1.2, if we assume a 1 kHz tone at a level of 70

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Audio Compression Systems


background image

Audio Compression Systems 7-9

dBu, levels of greater than 40 dBu at 750 Hz and 2 kHz would be required for those frequencies
to be heard. The ear also exercises a degree of temporal masking, being exceptionally tolerant of
sharp transient sounds.

It is by mimicking these additional psychoacoustic features of the human ear and identifying

the irrelevancy content of the input signal that the transform range of low bit-rate algorithms
operate, adopting the principle that if the ear is unable to hear the sound then there is no point in
transmitting it in the first place.

Quantization

Quantization is the process of converting an analog signal to its representative digital format or,
as in the case with compression, the requantizing of an already converted signal [2]. This process
is the limiting of a finite level measurement of a signal sample to a specific preset integer value.
This means that the actual level of the sample may be greater or smaller than the preset reference
level it is being compared with. The difference between these two levels, called the quantization
error
, is compounded in the decoded signal as quantization noise.

Quantization noise, therefore, will be injected into the audio signal after each A/D and D/A

conversion, the level of that noise being governed by the bit allocation associated with the coding
process (i.e., the number of bits allocated to represent the level of each sample taken of the ana-
log signal). For linear PCM, the bit allocation is commonly 16. The level of each audio sample,
therefore, will be compared with one of 2

16

 or 65,536 discrete levels or steps.

Compression or bit-rate reduction of the PCM signal leads to the requantizing of an already

quantized signal, which will unavoidably inject further quantization noise. It always has been
good operating practice to restrict the number of A/D and D/A conversions in an audio chain.

Figure 7.1.1

 Generalized frequency response of the human ear. Note how the PCM process cap-

tures signals that the ear cannot distinguish. (

From [2]. Used with permission.)

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Audio Compression Systems


background image

7-10 Compression Technologies for Audio

Nothing has changed in this regard, and now the number of compression stages also should be
kept to a minimum. Additionally, the bit rates of these stages should be set as high as practical;
put another way, the compression ratio should be as low as possible.

Sooner or later—after a finite number of A/D, D/A conversions and passes of compression

coding, of whatever type—the accumulation of quantization noise and other unpredictable signal
degradations eventually will break through the noise/signal threshold, be interpreted as part of
the audio signal, be processed as such, and be heard by the listener.

Sampling Frequency and Bit Rate

The bit rate of a digital signal is defined by:

sampling frequency × bit resolution × number of audio channels

The rules regarding the selection of a sampling frequency are based on Nyquist’s theorem [2].
This ensures that, in particular, the lower sideband of the sampling frequency does not encroach
into the baseband audio. Objectionable and audible aliasing effects would occur if the two bands
were to overlap. In practice, the sampling rate is set slightly above twice the highest audible fre-
quency, which makes the filter designs less complex and less expensive.

In the case of a stereo CD with the audio signal having been sampled at 44.1 kHz, this sam-

pling rate produces audio bandwidths of approximately 20 kHz for each channel. The resulting
audio bit rate = 44.1 kHz × 16 × 2 = 1.411 Mbits/s, as discussed previously.

Figure 7.1.2

 Example of the masking effect of a high-level sound. (

From [2]. Used with permis-

sion.)

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

Audio Compression Systems