ВУЗ: Казахская Национальная Академия Искусств им. Т. Жургенова
Категория: Книга
Дисциплина: Не указана
Добавлен: 03.02.2019
Просмотров: 21726
Скачиваний: 19
DTV Audio Encoding and Decoding 7-71
default, which is full compression. The listener adjusts dialogue level to 60 dB SPL. The explo-
sions will go only as loud as 70 dB (they are 25 dB louder than dialogue but receive –15 dB
applied gain), and the quiet sounds will reproduce at 30 dB SPL (20 dB of gain is applied to their
original level of 50 dB below dialogue level). The reproduced dynamic range, therefore, will be
70 dB – 30 dB = 40 dB.
The listening situation changes, and the listener now wishes to raise the reproduction level of
dialogue to 70 dB SPL, but still wishes to limit the loudness of the program. Quiet sounds may
be allowed to play as quietly as before. The listener instructs the AC-3 decoder to continue to use
the DynRng values that indicate gain reduction, but to attenuate the values that indicate gain
increases by a factor of 1/2. The explosions still will reproduce 10 dB above dialogue level,
which is now 80 dB SPL. The quiet sounds now are increased in level by 20 dB/2 = 10 dB. They
now will be reproduced 40 dB below dialogue level, at 30 dB SPL. The reproduced dynamic
range is now 80 dB – 30 dB = 50 dB.
Another listener prefers the full original dynamic range of the audio. This listener adjusts the
reproduced dialogue level to 75 dB SPL and instructs the AC-3 decoder to ignore the dynamic
range control signal. For this listener, the quiet sounds reproduce at 25 dB SPL, and the explo-
sions hit 100 dB SPL. The reproduced dynamic range is 100 dB – 25 dB = 75 dB. This reproduc-
tion is exactly as intended by the original program producer.
For this dynamic range control method to be effective, it must be used by all program provid-
ers. Because all broadcasters wish to supply programming in the form that is most usable by their
audiences, nearly all will apply dynamic range compression to any audio program that has a wide
dynamic range. This compression is not reversible unless it is implemented by the technique
embedded in AC-3. If broadcasters make use of the embedded AC-3 dynamic range control sys-
tem, listeners can have significant control over the reproduced dynamic range at their receivers.
Broadcasters must be confident that the compression characteristic that they introduce into AC-3
will, by default, be heard by the listeners. Therefore, the AC-3 decoder must, by default, imple-
ment the compression characteristic indicated by the DynRng values in the data stream. AC-3
decoders may optionally allow listener control over the use of the DynRng values, so that the lis-
tener may select full or partial dynamic range reproduction.
7.3.5c
Heavy Compression
The compression (COMPR) element allows the program provider (or broadcaster) to implement
a large dynamic range reduction (heavy compression) in a way that ensures that a monophonic
downmix will not exceed a certain peak level [3]. The heavily compressed audio program may be
desirable for certain listening situations, such as movie delivery to a hotel room or to an airline
seat. The peak level limitation is useful when, for example, a monophonic downmix will feed an
RF modulator, and overmodulation must be avoided.
Some products that decode the AC-3 bit stream will need to deliver the resulting audio via a
link with very restricted dynamic range. One example is the case of a television signal decoder
that must modulate the received picture and sound onto an RF channel to deliver a signal usable
by a low-cost television receiver. In this situation, it is necessary to restrict the maximum peak
output level to a known value—with respect to dialogue level—to prevent overmodulation. Most
of the time, the dynamic range control signal, DynRng, will produce adequate gain reduction so
that the absolute peak level will be constrained. However, because the dynamic range control
system is intended to implement a subjectively pleasing reduction in the range of perceived loud-
Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)
Copyright © 2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DTV Audio Encoding and Decoding
7-72 Compression Technologies for Audio
ness, there is no assurance that it will control instantaneous signal peaks adequately to prevent
overmodulation.
To allow the decoded AC-3 signal to be constrained in peak level, a second control signal,
COMPR, (COMPR2 for channel 2 in 1+1 mode) may be included in the AC-3 data stream. This
control signal should be present in all bit streams that are intended to be received by, for exam-
ple, a television set-top decoder. The COMPR control signal is similar to the DynRng control
signal in that it is used by the decoder to alter the reproduced audio level. The COMPR control
signal has twice the control range as DynRng (
±48 dB compared with ±24 dB) with half the reso-
lution (0.5 vs. 0.25 dB).
7.3.6
Audio System Features
The audio subsystem offers a host of services and features to meet varied applications and audi-
ences [4]. An AC-3 elementary stream contains the encoded representation of a single audio ser-
vice. Multiple audio services are provided by multiple elementary streams. Each elementary
stream is conveyed by the transport multiplex with a unique program ID (PID). A number of
audio service types may be coded (individually) into each elementary stream; each AC-3 ele-
mentary stream is tagged as to its service type. There are two types of main service and six types
of associated service. Each associated service may be tagged (in the AC-3 audio descriptor) as
being associated with one or more main audio services. Each AC-3 elementary stream also may
be tagged with a language code.
Associated services may contain complete program mixes or only a single program element.
Associated services that are complete mixes may be decoded and used “as is.” Associated ser-
vices that contain only a single program element are intended to be combined with the program
elements from a main audio service.
In general, a complete audio program (what is presented to the listener over the set of loud-
speakers) may consist of a main audio service, an associated audio service that is a complete
mix, or a main audio service combined with an associated audio service. The capability to simul-
taneously decode one main service and one associated service is required in order to form a com-
plete audio program in certain service combinations. This capability may not exist in some
receivers.
7.3.6a
Complete Main Audio Service (CM
)
The CM type of main audio service contains a complete audio program (complete with dialogue,
music, and effects) [4]. This is the type of audio service normally provided. The CM service may
contain from 1 to 5.1 audio channels, and it may be further enhanced by means of the VI, HI, C,
E, or VO associated services described in the following sections. Audio in multiple languages
may be provided by supplying multiple CM services, each in a different language.
7.3.6b
Main Audio Service, Music and Effects (ME
)
The ME type of main audio service contains the music and effects of an audio program, but not
the dialogue for the program [4]. The ME service may contain from 1 to 5.1 audio channels. The
primary program dialogue is missing and (if any exists) is supplied by simultaneously encoding a
Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)
Copyright © 2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DTV Audio Encoding and Decoding
DTV Audio Encoding and Decoding 7-73
D associated service. Multiple D associated services in different languages may be associated
with a single ME service.
7.3.6c
Visually Impaired (VI)
The VI associated service typically contains a narrative description of the visual program content
[4]. In this case, the VI service is a single audio channel. The simultaneous reproduction of both
the VI associated service and the CM main audio service allows the visually impaired user to
enjoy the main multichannel audio program, as well as to follow (by ear) the on-screen activity.
The dynamic range control signal in this type of service is intended to be used by the audio
decoder to modify the level of the main audio program. Thus, the level of the main audio service
will be under the control of the VI service provider, and the provider may signal the decoder (by
altering the dynamic range control words embedded in the VI audio elementary stream) to
reduce the level of the main audio service by up to 24 dB to ensure that the narrative description
is intelligible.
Besides being provided as a single narrative channel, the VI service may be provided as a
complete program mix containing music, effects, dialogue, and the narration. In this case, the
service may be coded using any number of channels (up to 5.1), and the dynamic range control
signal would apply only to this service.
7.3.6d
Hearing Impaired (HI)
The HI associated service typically contains only dialogue that is intended to be reproduced
simultaneously with the CM service [4]. In this case, the HI service is a single audio channel.
This dialogue may have been processed for improved intelligibility by hearing-impaired users.
Simultaneous reproduction of both the CM and HI services allows the hearing-impaired users to
hear a mix of the CM and HI services in order to emphasize the dialogue while still providing
some music and effects.
Besides being available as a single dialogue channel, the HI service may be provided as a
complete program mix containing music, effects, and dialogue with enhanced intelligibility. In
this case, the service may be coded using any number of channels (up to 5.1).
7.3.6e
Dialogue (D)
The D associated service contains program dialogue intended for use with an ME main audio
service [4]. The language of the D service is indicated in the AC-3 bit stream and in the audio
descriptor. A complete audio program is formed by simultaneously decoding the D service and
the ME service, then mixing the D service into the center channel of the ME main service (with
which it is associated).
If the ME main audio service contains more than two audio channels, the D service is mono-
phonic (1/0 mode). If the main audio service contains two channels, the D service may also con-
tain two channels (2/0 mode). In this case, a complete audio program is formed by
simultaneously decoding the D service and the ME service, mixing the left channel of the ME
service with the left channel of the D service, and mixing the right channel of the ME service
with the right channel of the D service. The result will be a 2-channel stereo signal containing
music, effects, and dialogue.
Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)
Copyright © 2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DTV Audio Encoding and Decoding
7-74 Compression Technologies for Audio
Audio in multiple languages may be provided by supplying multiple D services (each in a dif-
ferent language) along with a single ME service. This is more efficient than providing multiple
CM services, but, in the case of more than two audio channels in the ME service, requires that
dialogue be restricted to the center channel.
Some receivers may not have the capability to simultaneously decode an ME and a D service.
7.3.6f
Commentary (C)
The commentary associated service is similar to the D service, except that instead of conveying
essential program dialogue, the C service conveys optional program commentary [4]. The C ser-
vice may be a single audio channel containing only the commentary content. In this case, simul-
taneous reproduction of a C service and a CM service will allow the listener to hear the added
program commentary.
The dynamic range control signal in the single-channel C service is intended to be used by the
audio decoder to modify the level of the main audio program. Thus, the level of the main audio
service will be under the control of the C service provider; the provider may signal the decoder
(by altering the dynamic range control words embedded in the C audio elementary stream) to
reduce the level of the main audio service by up to 24 dB to ensure that the commentary is intel-
ligible.
Besides providing the C service as a single commentary channel, the C service may be pro-
vided as a complete program mix containing music, effects, dialogue, and the commentary. In
this case, the service may be provided using any number of channels (up to 5.1).
7.3.6g
Emergency (E
)
The E associated service is intended to allow the insertion of emergency or high priority
announcements [4]. The E service is always a single audio channel. An E service is given priority
in transport and in audio decoding. Whenever the E service is present, it will be delivered to the
audio decoder. Whenever the audio decoder receives an E-type associated service, it will stop
reproducing any main service being received and reproduce only the E service out of the center
channel (or left and right channels if a center loudspeaker does not exist). The E service also may
be used for nonemergency applications. It may be used whenever the broadcaster wishes to force
all decoders to quit reproducing the main audio program and reproduce a higher priority single
audio channel.
7.3.6h
Voice-Over (V0)
The VO associated service is a single-channel service intended to be reproduced along with the
main audio service in the receiver [4]. It allows typical voice-overs to be added to an already
encoded audio elementary stream without requiring the audio to be decoded back to baseband
and then reencoded. The VO service is always a single audio channel and has second priority;
only the E service has higher priority. It is intended to be simultaneously decoded and mixed into
the center channel of the main audio service. The dynamic range control signal in the VO service
is intended to be used by the audio decoder to modify the level of the main audio program. Thus,
the level of the main audio service may be controlled by the broadcaster, and the broadcaster may
signal the decoder (by altering the dynamic range control words embedded in the VO audio ele-
Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)
Copyright © 2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DTV Audio Encoding and Decoding
DTV Audio Encoding and Decoding 7-75
mentary stream) to reduce the level of the main audio service by up to 24 dB during the voice-
over.
Some receivers may not have the capability to simultaneously decode and reproduce a voice-
over service along with a program audio service.
7.3.6i
Multilingual Services
Each audio bit stream may be in any language [4]. Table7.3.2 lists the language codes for the
ATSC DTV system. To provide audio services in multiple languages, a number of main audio
services may be provided, each in a different language. This is the (artistically) preferred method,
because it allows unrestricted placement of dialogue along with the dialogue reverberation. The
disadvantage of this method is that as much as 384 kbits/s is needed to provide a full 5.1-channel
service for each language. One way to reduce the required bit rate is to reduce the number of
audio channels provided for languages with a limited audience. For instance, alternate language
versions could be provided in 2-channel stereo with a bit rate of 128 kbits/s. Or, a mono version
could be supplied at a bit rate of approximately 64 to 96 kbits/s.
Another way to offer service in multiple languages is to provide a main multichannel audio
service (ME) that does not contain dialogue. Multiple single-channel dialogue associated ser-
vices (D) can then be provided, each at a bit rate in the range of 64 to 96 kbits/s. Formation of a
complete audio program requires that the appropriate language D service be simultaneously
decoded and mixed into the ME service. This method allows a large number of languages to be
efficiently provided, but at the expense of artistic limitations. The single channel of dialogue
would be mixed into the center reproduction channel, and could not be panned. Also, reverbera-
tion would be confined to the center channel, which is not optimum. Nevertheless, for some
types of programming (sports and news, for example), this method is very attractive because of
the savings in bit rate that it offers. Some receivers may not have the capability to simultaneously
decode an ME and a D service.
Stereo (2-channel) service without artistic limitation can be provided in multiple languages
with added efficiency by transmitting a stereo ME main service along with stereo D services.
The D and appropriate-language ME services are combined in the receiver into a complete stereo
program. Dialogue may be panned, and reverberation may be included in both channels. A stereo
ME service can be sent with high quality at 192 kbits/s, and the stereo D services (voice only)
can make use of lower bit rates, such as 128 or 96 kbits/s per language. Some receivers may not
have the capability to simultaneously decode an ME and a D service.
Note that during those times when dialogue is not present, the D services can be momentarily
removed, and the data capacity can be used for other purposes. Table 7.3.3 lists the typical bit
rates for various types of service.
7.3.6j
Channel Assignments and Levels
To facilitate the reliable exchange of programs, the SMPTE produced a standard for channel
assignments and levels on multichannel audio media. The standard, SMPTE 320M, provides
specifications for the placement of a 5.1 channel audio program onto multitrack audio media [8].
As specified in ITU-R BS.775-1, the internationally recognized multichannel sound system con-
sists of left, center, right, left surround, right surround, and low-frequency effects (LFE) chan-
nels. SMPTE RP 173 specifies the locations and relative level calibration of the loudspeakers
Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)
Copyright © 2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DTV Audio Encoding and Decoding