Добавлен: 03.02.2019

Просмотров: 21724

Скачиваний: 19

ВНИМАНИЕ! Если данный файл нарушает Ваши авторские права, то обязательно сообщите нам.
background image

DTV Audio Encoding and Decoding 7-61

Error-Detection Codes

Each AC-3 sync frame ends with a 16-bit CRC error-check code [6]. The decoder may use this
code to determine whether a frame of audio has been damaged or is incomplete. Additionally, the
decoder may make use of error flags provided by the transport system. In the case of detected
errors, the decoder may try to perform error concealment, or it may simply mute.

7.3.4g

Loudness and Dynamic Range

It is important for the digital television system to provide uniform subjective loudness for all
audio programs [6]. Consumers often find it annoying when audio levels fluctuate between
broadcast channels (observed when channel hopping) or between program segments on a partic-
ular channel (such as commercials being much louder than entertainment programs). One ele-
ment found in most audio programming is the human voice. Achieving an approximate level
match for dialogue (spoken in a normal voice, without shouting or whispering) in all audio pro-
gramming is a desirable goal. The AC-3 audio system provides syntactical elements that make
this goal achievable.

Because the digital audio-coding system can provide more than 100 dB of dynamic range,

there is no technical reason for dialogue to be encoded anywhere near 100 percent, as it com-
monly is in NTSC television. However, there is no assurance that all program channels, or all
programs or program segments on a given channel, will have dialogue encoded at the same (or
even a similar) level. Without a uniform coding level for dialogue (which would imply a uniform
headroom available for all programs), there would be inevitable audio-level fluctuations between
program channels or even between program segments.

Dynamic Range Compression

It is common practice for high-quality programming to be produced with wide dynamic range
audio, suitable for the highest-quality audio reproduction environment [6]. Because they serve
audiences with a wide range of receiver capabilities, however, broadcasters typically process
audio to reduce its dynamic range. This processed audio is more suitable for most of the audi-
ence, which does not have an audio reproduction environment that matches the original audio
production studio. In the case of NTSC, all viewers receive the same audio with the same
dynamic range; it is impossible for any viewer to enjoy the original wide dynamic range of the
audio production.

For DTV, the audio-coding system provides an embedded dynamic range control scheme that

allows a common encoded bit stream to deliver programming with a dynamic range appropriate
for each individual listener. A dynamic range control value (DynRng) is provided in each audio
block (every 5 ms). These values are used by the audio decoder to alter the level of the repro-
duced sound for each audio block. Level variations of up to 

±24 dB can be indicated.

7.3.4h

Encoding the AC-3 Bit Stream

Because the ATSC DTV standard AC-3 audio system is specified by the syntax and decoder pro-
cessing, the encoder itself is not precisely specified [3]. The only normative requirement on the
encoder is that the output elementary bit stream follow the AC-3 syntax. Therefore, encoders of
varying levels of sophistication may be produced. More sophisticated encoders may offer supe-

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding


background image

7-62 Compression Technologies for Audio

rior audio performance, and they may make operation at lower bit rates acceptable. Encoders are
expected to improve over time, and all decoders will benefit from encoder improvements. The
encoder described in this section, although basic in operation, provides good performance and
offers a starting point for future designs. A flow chart diagram of the encoding process is given
in Figure 7.3.8.

Input Word Length/Sample Rate

The AC-3 encoder accepts audio in the form of PCM words [3]. The internal dynamic range of
AC-3 allows input word lengths of up to 24 bits to be useful.

The input sample rate must be locked to the output bit rate so that each AC-3 sync frame con-

tains 1536 samples of audio. If the input audio is available in a PCM format at a different sample
rate than that required, sample rate conversion must be performed to conform the sample rate.

Individual input channels may be high-pass filtered. Removal of dc components of the input

signals can allow more efficient coding because the available data rate then is not used to encode
dc. However, there is the risk that signals that do not reach 100 percent PCM level before high-
pass filtering will exceed the 100 percent level after filtering, and thus be clipped. A typical
encoder would high-pass filter the input signals with a single pole filter at 3 Hz.

The LFE channel normally is low-pass-filtered at 120 Hz. A typical encoder would filter the

LFE channel with an 8

th

-order elliptic filter whose cutoff frequency is 120 Hz.

Transients are detected in the full-bandwidth channels to decide when to switch to short-

length audio blocks to improve pre-echo performance. High-pass filtered versions of the signals
are examined for an increase in energy from one subblock time segment to the next. Subblocks
are examined at different time scales. If a transient is detected in the second half of an audio
block in a channel, that channel switches to a short block. 

The transient detector is used to determine when to switch from a long transform block

(length 512) to a short transform block (length 256). It operates on 512 samples for every audio
block. This is done in two passes, with each pass processing 256 samples. Transient detection is
broken down into four steps:

High-pass filtering

Segmentation of the block into submultiples

Peak amplitude detection within each subblock segment

Threshold comparison

7.3.4i

AC-3/MPEG Bit Stream

The AC-3 elementary bit stream is included in an MPEG-2 multiplex bit stream in much the
same way an MPEG-1 audio stream would be included, with the AC-3 bit stream packetized into
PES packets [7]. An MPEG-2 multiplex bit stream containing AC-3 elementary streams must
meet all audio constraints described in the MPEG model. It is necessary to unambiguously indi-
cate that an AC-3 stream is, in fact, an AC-3 stream, and not an MPEG audio stream. The
MPEG-2 standard does not explicitly state codes to be used to indicate an AC-3 stream. Also, the
MPEG-2 standard does not have an audio descriptor adequate to describe the contents of the AC-
3 bit stream in its internal tables. The solution to this problem is beyond the scope of this chapter;
interested readers should consult [7] for additional information on the subject.

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding


background image

DTV Audio Encoding and Decoding 7-63

Figure 7.3.8

 Generalized flow diagram of the AC-3 encoding process. (

From [7]. Used with per-

mission.)

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding


background image

7-64 Compression Technologies for Audio

7.3.4j

Decoding the AC-3 Bit Stream

An overview of AC-3 decoding is diagrammed in Figure 7.3.9, where the decoding process flow
is shown as a sequence of blocks down the center of the illustration, and some of the key infor-
mation flow is indicated by arrowed lines at the sides [3]. This decoder should be considered
only as an example; other methods certainly exist to implement decoders, and those other meth-
ods may have advantages in certain areas (such as instruction count, memory requirements, num-
ber of transforms required, and other parameters). The input bit stream typically will come from
a transmission or storage system. The interface between the source of AC-3 data and the AC-3
decoder is not specified in the ATSC DTV standard.

Continuous or Burst Input

The encoded AC-3 data may be input to the decoder as a continuous data stream at the nominal
bit rate, or chunks of data may be burst into the decoder at a high rate with a low duty cycle [3].
For burst-mode operation, either the data source or the decoder may be the master controlling the
burst timing. The AC-3 decoder input buffer may be smaller if the decoder can request bursts of
data on an as-needed basis, but the external buffer memory may need to be larger.

Most applications of the standard will convey the elementary AC-3 bit stream with byte or

(16-bit) word alignment. The sync frame is always an integral number of words in length. The
decoder may receive data as a continuous serial stream of bits without any alignment, or the data
may be input to the decoder with either byte or word alignment. Byte or word alignment of the
input data may allow some simplification of the decoder. Alignment does reduce the probability
of false detection of the sync word.

Synchronization and Error Detection

The AC-3 bit steam format allows for rapid synchronization [3]. The 16-bit sync word has a low
probability of false detection. With no input stream alignment, the probability of false detection
of the sync word is 0.0015 percent per input stream bit position. For a bit rate of 384 kbits/s, the
probability of false sync word detection is 19 percent per frame. Byte alignment of the input
stream drops this probability to 2.5 percent, and word alignment drops it to 1.2 percent.

When a sync pattern is detected, the decoder may be estimated to be in sync, and one of the

CRC words (CRC1 or CRC2) may be checked. Because CRC1 comes first and covers the first
five-eighths of the frame, the result of a CRC1 check may be available after only five-eighths of
the frame has been received. Or, the entire frame size can be received and CRC2 checked. If
either CRC word checks, the decoder may safely be presumed to be in sync, and decoding and
reproduction of audio may proceed. The chance of false sync in this case would be the concate-
nation of the probabilities of a false sync word detection and a CRC misdetection of error. The
CRC check is reliable to 0.0015 percent. This probability, concatenated with the probability of a
false sync detection in a byte-aligned input bit stream, yields a probability of false synchroniza-
tion of 0.000035 percent (or about once in 3 million synchronization attempts).

If this small probability of false sync is too large for a specific application, several methods

may be used to reduce it. The decoder may only presume correct sync in the case that both CRC
words check properly. The decoder also may require multiple sync words to be received with the
proper alignment. If the data transmission or storage system is aware that data is in error, this
information may be made known to the decoder.

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding


background image

DTV Audio Encoding and Decoding 7-65

Figure 7.3.9

 Generalized flow diagram of the AC-3 decoding process. (

From [3]. Used with per-

mission.)

Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)

Copyright © 2004 The McGraw-Hill Companies. All rights reserved.

Any use is subject to the Terms of Use as given at the website.

DTV Audio Encoding and Decoding