Добавлен: 03.02.2019

Просмотров: 21723

Скачиваний: 19

ВНИМАНИЕ! Если данный файл нарушает Ваши авторские права, то обязательно сообщите нам.
background image

7-66 Compression Technologies for Audio

Inherent to the decoding process is the unpacking (demultiplexing) of the various types of

information included in the bit stream. Among the options for distribution of this bit stream
information are:

Selected data may be copied from the input buffer to dedicated registers.

Data from the input buffer may be copied to specific working memory locations.

The data may simply be located in the input buffer, with pointers to the data saved to another
location for use when the information is required.

Decoding Components

The audio-compression system exponents are delivered in the bit stream in an encoded form [3].
To unpack and decode the exponents, two types of “side information” are required:

The number of exponents must be known.

The exponent “strategy” in use by each channel must be known.

The bit-allocation computation reveals how many bits are used for each mantissa. The inputs

to the bit-allocation computation are the decoded exponents and the bit-allocation side informa-
tion. The outputs of the bit-allocation computation are a set of bit-allocation pointers (BAPs),
one BAP for each coded mantissa. The BAP indicates the quantizer used for the mantissa, and
how many bits in the bit stream were used for each mantissa.

The coarsely quantized mantissas make up the bulk of the AC-3 data stream. Each mantissa is

quantized to a level of precision indicated by the corresponding BAP. To pack the mantissa data
more efficiently, some mantissas are grouped together into a single transmitted value. For
instance, two 11-level quantized values are conveyed in a single 7-bit code (3.5 bits/value) in the
bit stream.

The mantissa data is unpacked by peeling off groups of bits as indicated by the BAPs.

Grouped mantissas must be ungrouped. The individual coded mantissa values are converted into
a dequantized value. Mantissas that are indicated as having zero bits may be reproduced as either
zero or by a random dither value (under control of a dither flag). 

Other steps in the decoding process include the following:

Decoupling. When coupling is in use, the channels that are coupled must be decoupled.
Decoupling involves reconstructing the high-frequency section (exponents and mantissas) of
each coupled channel, from the common coupling channel and the coupling coordinates for
the individual channel. Within each coupling band, the coupling-channel coefficients (expo-
nent and mantissa) are multiplied by the individual channel coupling coordinates.

Rematrixing. In the 2/0 audio-coding mode, rematrixing may be employed as indicated by a
rematrix flag. When the flag indicates that a band is rematrixed, the coefficients encoded in
the bit stream are sum and difference values, instead of left and right values.

Dynamic range compression. For each block of audio, a dynamic range control value may be
included in the bit stream. The decoder, by default, will use this value to alter the magnitude
of the coefficient (exponent and mantissa) as required to properly process the data.

Inverse transform. The decoding steps described in this section will result in a set of fre-
quency coefficients for each encoded channel. The inverse transform converts these blocks of
frequency coefficients into blocks of time samples.

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding


background image

DTV Audio Encoding and Decoding 7-67

Window, overlap/add. The individual blocks of time samples must be windowed, and adjacent
blocks are overlapped and added together to reconstruct the final continuous-time-output
PCM audio signal.

Downmixing. If the number of channels required at the decoder output is smaller than the
number of channels that are encoded in the bit stream, then downmixing is required. Down-
mixing in the time domain is shown in the example decoder of Figure 7.3.9. Because the
inverse transform is a linear operation, it also is possible to downmix in the frequency domain
prior to transformation.

PCM output buffer. Typical decoders will provide PCM output samples at the PCM sampling
rate. Because blocks of samples result from the decoding process, an output buffer typically is
required.

Output PCM. The output PCM samples are delivered in a form suitable for interconnection to
a digital-to-analog converter (D/A), or in some other form required by the receiver.

7.3.4k

Algorithmic Details

The actual audio information conveyed by the AC-3 bit stream consists of the quantized fre-
quency coefficients [3]. The coefficients, delivered in floating point form, are 5-bit values that
indicate the number of leading zeros in the binary representation of a frequency coefficient. The
exponent acts as a scale factor for each mantissa, equal to 2

-exp

. Exponent values are allowed to

range from 0 (for the largest-value coefficients with no leading zeros) to 24. Exponents for coef-
ficients that have more than 24 leading zeros are fixed at 24, and the corresponding mantissas
are allowed to have leading zeros. Exponents require 5 bits to represent all allowed values.

AC-3 bit streams contain coded exponents for all independent channels, all coupled channels,

and for the coupling and low-frequency effects channels (when they are enabled). Because audio
information is not shared across frames, block 0 of every frame will include new exponents for
every channel. Exponent information may be shared across blocks within a frame, so blocks 1
through 5 may reuse exponents from previous blocks.

AC-3 exponent transmission employs differential coding, in which the exponents for a chan-

nel are differentially coded across frequency. These differential exponents are combined into
groups in the audio block. This grouping is done by one of three methods, which are referred to
as exponent strategies. The number of grouped differential exponents placed in the audio block
for a particular channel depends on the exponent strategy and on the frequency bandwidth infor-
mation for that channel. The number of exponents in each group depends only on the exponent
strategy.

An AC-3 audio block contains two types of fields with exponent information. The first type

defines the exponent coding strategy for each channel, and the second type contains the actual
coded exponents for channels requiring new exponents. For independent channels, frequency
bandwidth information is included along with the exponent strategy fields. For coupled channels,
and the coupling channel, the frequency information is found in the coupling strategy fields.

7.3.4l

Bit Allocation

The bit allocation routine analyzes the spectral envelope of the audio signal being coded with
respect to masking effects to determine the number of bits to assign to each transform coefficient

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding


background image

7-68 Compression Technologies for Audio

mantissa [3]. In the encoder, the bit allocation is performed globally on the ensemble of channels
as an entity, from a common bit pool. Because there are no preassigned exponent or mantissa
bits, the routine is allowed to flexibly allocate bits across channels, frequencies, and audio blocks
in accordance with signal demand.

The bit allocation contains a parametric model of human hearing for estimating a noise-level

threshold, expressed as a function of frequency, which separates audible from inaudible spectral
components. Various parameters of the hearing model can be adjusted by the encoder depending
upon signal characteristics. For example, a prototype masking curve is defined in terms of two
piecewise continuous line segments, each with its own slope and y-axis intercept. One of several
possible slopes and intercepts is selected by the encoder for each line segment. The encoder may
iterate on one or more such parameters until an optimal result is obtained. When all parameters
used to estimate the noise-level threshold have been selected by the encoder, the final bit alloca-
tion is computed. The model parameters are conveyed to the decoder with other side informa-
tion. The decoder then executes the routine in a single pass.

The estimated noise-level threshold is computed over 50 bands of nonuniform bandwidth (an

approximate 1/6-octave scale). The defined banding structure is independent of sampling fre-
quency. The required bit allocation for each mantissa is established by performing a table lookup
based upon the difference between the input signal power spectral density (PSD), evaluated on a
fine-grain uniform frequency scale, and the estimated noise-level threshold, evaluated on the
coarse-grain (banded) frequency scale. Therefore, the bit allocation result for a particular chan-
nel has spectral granularity corresponding to the exponent strategy employed.

7.3.5

Audio System Level Control

The AC-3 system provides elements that allow the encoded bit stream to satisfy listeners in many
different situations. Two principal techniques are used to control the subjective loudness of the
reproduced audio signals:

Dialogue normalization

Dynamic range compression

7.3.5a

Dialogue Normalization

The  dialogue normalization (DialNorm) element permits uniform reproduction of spoken dia-
logue when decoding any AC-3 bit stream [3]. When audio from different sources is reproduced,
the apparent loudness often varies from source to source. Examples include the following:

Audio elements from different program segments during a broadcast (for example, a movie
vs. a commercial message)

Different broadcast channels

Different types of media (for example, disc vs. tape)

The AC-3 coding technology solves this problem by explicitly coding an indication of loudness
into the AC-3 bit stream.

The subjective level of normal spoken dialogue is used as a reference. The 5-bit dialogue nor-

malization word that is contained in the bit stream, DialNorm, is an indication of the subjective

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding


background image

DTV Audio Encoding and Decoding 7-69

loudness of normal spoken dialogue compared with digital 100 percent. The 5-bit value is inter-
preted as an unsigned integer (most significant bit transmitted first) with a range of possible val-
ues from 1 to 31. The unsigned integer indicates the headroom in decibels above the subjective
dialogue level. This value also may be interpreted as an indication of how many decibels the sub-
jective dialogue level is below digital 100 percent.

The DialNorm value is not directly used by the AC-3 decoder. Rather, the value is used by the

section of the sound reproduction system responsible for setting the reproduction volume, such
as the system volume control. The system volume control generally is set based on listener input
as to the desired loudness, or sound-pressure level (SPL). The listener adjusts a volume control
that directly adjusts the reproduction system gain. With AC-3 and the DialNorm value, the repro-
duction system gain becomes a function of both the listener’s desired reproduction sound-pres-
sure level for dialogue, and the DialNorm value that indicates the level of dialogue in the audio
signal. In this way, the listener is able to reliably set the volume level of dialogue, and the subjec-
tive level of dialogue will remain uniform no matter which AC-3 program is decoded.

Example Situation

An example will help to illustrate the DialNorm concept [3]. The listener adjusts the volume
control to 67 dB. (With AC-3 dialogue normalization, it is possible to calibrate a system volume
control directly in sound-pressure level, and the indication will be accurate for any AC-3
encoded audio source). A high quality entertainment program is being received, and the AC-3 bit
stream indicates that the dialogue level is 25 dB below the 100 percent digital level. The repro-
duction system automatically sets the reproduction system gain so that full-scale digital signals
reproduce at a sound-pressure level of 92 dB. Therefore, the spoken dialogue (down 25 dB) will
reproduce at 67 dB SPL.

The broadcast program cuts to a commercial message, which has dialogue level at –15 dB

with respect to 100 percent digital level. The system level gain automatically drops, so that digi-
tal 100 percent is now reproduced at 82 dB SPL. The dialogue of the commercial (down 15 dB)
reproduces at a 67 dB SPL, as desired.

For the dialogue normalization system to work, the DialNorm value must be communicated

from the AC-3 decoder to the system gain controller so that DialNorm can interact with the lis-
tener-adjusted volume control. If the volume-control function for a system is performed as a dig-
ital multiplier inside the AC-3 decoder, then the listener-selected volume setting must be
communicated into the AC-3 decoder. The listener-selected volume setting and the DialNorm
value must be combined to adjust the final reproduction system gain.

Adjustment of the system volume control is not an AC-3 function. The AC-3 bit stream sim-

ply conveys useful information that allows the system volume control to be implemented in a
way that automatically removes undesirable level variations between program sources.

7.3.5b

Dynamic Range Compression

The dynamic range compression (DynRng) element allows the program provider to implement
subjectively pleasing dynamic range reduction for most of the intended audience, while allowing
individual members of the audience the option to experience more (or all) of the original
dynamic range [3].

A consistent problem in the delivery of audio programming is that members of the audience

may prefer differing amounts of dynamic range. Original high-quality programs (such as feature

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding


background image

7-70 Compression Technologies for Audio

films) typically are mixed with quite a wide dynamic range. Using dialogue as a reference, loud
sounds, such as explosions, often are at least 20 dB louder; faint sounds, such as rustling leaves,
may be 50 dB quieter. In many listening situations, it is objectionable to allow the sound to
become very loud, so the loudest sounds must be compressed downward in level. Similarly, in
many listening situations, the very quiet sounds would be inaudible, and they must be brought
upward in level to be heard. Because most of the television audience will benefit from a limited
program dynamic range, motion picture soundtracks that have been mixed with a wide dynamic
range generally are compressed. The dynamic range is reduced by bringing down the level of the
loud sounds and bringing up the level of the quiet sounds. Although this satisfies the needs of
much of the audience, some audience members may prefer to experience the original sound pro-
gram in its intended form. The AC-3 audio-coding technology solves this conflict by allowing
dynamic range control values to be placed into the AC-3 bit stream.

The dynamic range control values, DynRng, indicate a gain change to be applied in the

decoder to implement dynamic range compression. Each DynRng value can indicate a gain
change of 

±24 dB. The sequence of DynRng values constitute a compression control signal. An

AC-3 encoder (or a bit stream processor) will generate the sequence of DynRng values. Each
value is used by the AC-3 decoder to alter the gain of one or more audio blocks. The DynRng
values typically indicate gain reductions during the loudest signal passages and gain increases
during the quiet passages. For the listener, it is often desirable to bring the loudest sounds down
in level, toward dialogue level, and bring the quiet sounds up in level, again toward dialogue
level. Sounds that are at the same loudness as normal spoken dialogue typically will not have
their gain changed.

The compression actually is applied to the audio in the AC-3 decoder. The encoded audio has

full dynamic range. It is permissible for the AC-3 decoder to (optionally, under listener control)
ignore the DynRng values in the bit stream. This will result in reproduction of the full dynamic
range of the audio. It also is permissible (again under listener control) for the decoder to use
some fraction of the DynRng control value and to use a different fraction of positive or negative
values. Therefore, the AC-3 decoder can reproduce sounds according to one of the following
parameters:

Fully compressed audio (as intended by the compression control circuit in the AC-3 encoder)

Full dynamic range audio

Audio with partially compressed dynamic range, with different amounts of compression for
high-level and low-level signals.

Example Situation

A feature film soundtrack is encoded into AC-3 [3]. The original program mix has dialogue level
at –25 dB. Explosions reach a full-scale peak level of 0 dB. Some quiet sounds that are intended
to be heard by all listeners are 50 dB below dialogue level (–75 dB). A compression control sig-
nal (a sequence of DynRng values) is generated by the AC-3 encoder. During those portions of
the audio program when the audio level is higher than dialogue level, the DynRng values indicate
negative gain, or gain reduction. For full-scale 0 dB signals (the loudest explosions), a gain
reduction of –15 dB is encoded into DynRng. For very quiet signals, a gain increase of 20 dB is
encoded into DynRng.

A listener wishes to reproduce this soundtrack quietly so as not to disturb anyone, but wishes

to hear all of the intended program content. The AC-3 decoder is allowed to reproduce the

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding