 |
| |
Pan
Feng is currently a research fellow in the Center for Signal
Processing, Nanyang Technological University, Singapore. His
experience includes 15 years of teaching and research in digital
image processing and video engineering. He has offered numerous
training courses for industry in these areas.
|
|
 |
Digital television (DTV) is a new type of
broadcasting technology that will globally transform television as
we now know it. DTV refers to the complete digitization of the TV
signal from transmission to reception. By transmitting TV pictures
and sounds as data bits and compressing them, a digital broadcaster
can carry more information than is currently possible with analog
broadcast technology. This will allow for the transmission of
pictures with HDTV resolution for dramatically better picture and
sound quality than is currently available, or of several SDTV
programs concurrently. The DTV technology can also provide
high-speed data transmission, including fast Internet access.
This article is an introduction to the principles behind digital
television broadcasting, including audio/video coding and
multiplexing, data scrambling and conditional access, channel
coding, and digital modulations. The article also compares the
three major DTV standards: ATSC-T, DVB-T, and ISDB-T.
DTV System Diagram
Figure 1: Diagram of the DTTB system
The block diagram of a digital-terrestrial-television
broadcasting system (DTTB) is shown in Figure 1. The video,
audio and other service data are compressed and multiplexed to form
elementary streams. These streams may be multiplexed again with the
source data from other programs to form the MPEG-2 Transport Stream
(TS). A transport stream consists of Transport Packets that are 188
bytes in length.
The FEC encoder takes preventive measures to protect the
transport streams from errors caused by noise and interference in
the transmission channel. It includes Reed-Solomon coding, outer
interleaving, and convolutional coding. The modulator then converts
the FEC protected transport packets into digital symbols that are
suitable for transmission in the terrestrial channels. This
involves QAM and OFDM in DVB-T and ISDB-T systems, or PAM and VSB
in ATSC-T. The final stage is the upper converter, which converts
the modulated digital signal into the appropriate RF channel. The
sequence of operations in the receiver side is a reverse order of
the operations in the transmitter side.
MPEG-2 Video Compression
Data compression technology makes digital television
broadcasting possible with a smaller frequency bandwidth than that
of an analog system. Among the many compression techniques, MPEG is
one of the most accepted for all sorts of new products and
services, from DVDs and video cameras to digital television
broadcasting. The MPEG-2 standard supports standard-definition
television (SDTV) and high-definition television (HDTV) video
formats for broadcast applications.
MPEG video compression exploits certain characteristics of video
signals, namely, redundancy of information both inside a frame
(spatial redundancy), and in-between frames (temporal redundancy).
The compression also removes the psychovisual redundancy based on
the characteristics of the human vision system (HVS) such that HVS
is less sensitive to error in detailed texture areas and fast
moving images. MPEG video compression also uses entropy coding to
increase data-packing efficiency.
Figure 2: DCT-based intraframe coding
The intraframe coding algorithm (Figure 2) begins by
calculating the DCT coefficients over small non-overlapping image
blocks (usually 8x8 in size). This block-by-block processing takes
advantage of the image's local spatial correlation properties. The
DCT process produces many 2D blocks of transform coefficients that
are quantized to discard some of the trivial coefficients that are
likely to be perceptually masked. The quantized coefficients are
then zigzag scanned to output the data in an efficient way. The
final step in this process uses variable length coding to further
reduce the entropy.
Figure 3: Motion-compensated interframe coding
Interframe coding (Figure 3), on the other hand, exploits
temporal redundancy by predicting the frame to be coded from a
previous reference frame. The motion estimator searches previously
coded frames for areas similar to those in the macroblocks of the
current frame. This search results in motion vectors (represented
by x and y components in pixel lengths), which the decoder uses to
form a motion-compensated prediction of the video. The
motion-estimator circuitry is typically the most computationally
intensive element in an MPEG encoder (Figure 4).
Motion-compensated interframe coding, therefore, only needs to
convey the motion vectors required to predict each block to the
decoder, instead of conveying the original macroblock data, which
results in a significant reduction in bit-rate.
Figure 4: Block diagram of an MPEG-2 video
compression system
DTV Audio Compression
Unlike video, the three current DTV
standards use three different audio coding schemes: Dolby AC-3 for
ATSC, MPEG audio and Dolby AC-3 for DVB, and MPEG-AAC for ISDB.
However, these audio standards use a similar technique called
perceptual coding and support up to six channelsright, left,
center, right surround, left surround, and subwooferoften
designated as 5.1 channels. A perceptual audio coder exploits a
psycho-acoustic effect known as masking (Figure 5). This
psycho-acoustic phenomenon states that when sound is broken into
its constituent frequencies, those sounds with relatively lower
energy adjacent to others with significantly higher energy are
masked by the latter and are not audible.
Figure 5: Audio perceptual masking
AC-3 is one of the most popular audio compression algorithms
used in DTV, movie theater, and home theater systems. AC-3 makes
use of the psycho-acoustic phenomenon to achieve great data
compression. In the encoding process (Figure 6), a modified
DCT algorithm transforms the audio signal into the frequency
domain, which generates a series of frequency coefficients that
represent the relative energy contributions to the signal of those
frequencies.
Figure 6: Dolby AC-3 audio coding block diagram
By analyzing the incoming signal in the frequency domain,
psycho-acoustically masked frequencies are given fewer (or zero)
bits to represent their frequency coefficients; dominant
frequencies are given more bits. Hence, besides the coefficients
themselves, the decoder must receive the information that describes
how the bits are allocated so that it may reconstruct the bit
allocation. In AC-3, all of the encoded channels draw from the same
pool of bits, so channels that need better resolution can use the
most bits.
The output coefficients generated by the time-domain to
frequency-domain transformation are typically represented in a
block floating-point format to maintain numeric fidelity. Using the
block floating-point format is one way to extend the dynamic range
in a fixed-point processor. It is done by examining a block of
(frequency) samples and determining an appropriate exponent that
can be associated with the entire block. Once the mantissas and
exponents are determined, the mantissas are represented using the
variable bit-allocation scheme described above; the exponents are
DPCM coded and represented with a fixed number of bits (Figure
6).
MPEG audio is a type of forward adaptive bit allocation, while
AC-3 uses hybrid adaptive bit allocation, which combines both the
forward and backward adaptive bit allocation. The main advantage of
MPEG audio is that the psycho-acoustic model resides only in the
encoder. When the encoder is upgraded, legacy decoders continue to
decode newly coded data. However, the disadvantage is that it could
have a heavy overhead for complicated music pieces.
MPEG-2 Transport Stream and
Multiplex
Audio and video encoders deliver elementary stream outputs.
These bit streams, as well as other streams carrying other private
data, are combined in an organized manner and supplemented with
additional information to allow their separation by the decoder,
synchronization of picture and sound, and selection by the user of
the particular components of interest. This is done through
packetization specified in MPEG-2 systems layer. The elementary
stream is cut into packets to form a packetized elementary stream
(PES). A PES starts with a header, followed by the content of the
packet (payload) and the descriptor. Packetization provides the
protection and flexibility for transmitting multimedia steams
across the different networks. In general, a PES can only contain
the data from the same elementary stream.
Elementary, Packetized Elementary, and Transport
Streams
In broadcasting applications, a multiplex usually contain different
data streams (audio and video) that might even come from different
programs. Therefore, it is necessary to multiplex them into a
single streamthe transport stream. Figure 7a shows the
process of multiplexing. A transport stream consists of
fixed-length transport packets, each exactly 188 bytes long. The
header contains important information such as the synchronization
byte and the packet identifier (PID). PID identifies a particular
PES within the multiplex.
Figure 7: (a) The process of multiplexing.
(b) The structure of a transport packet.
It is necessary to include additional program-specific
information (PSI) within each transport stream in order to identify
the relationship between the available programs and the PID of
their constituent streams. This PSI consists of the four tables:
program associate table (PAT), program map table (PMT), network
information table (NIT), and conditional access table (CAT).
Within a transport stream, the reserved PID of 0 indicates a
transport packet that contains a PAT. The PAT associates a
particular PID value with each program that is currently carried in
the transport multiplex. This PID value identifies the PMT for that
particular program. The PMT contains details of the constituent
elementary streams for the program. Program 0 has a special meaning
within the PAT and identifies the PID of the transport packets that
contains the optional NIT. The contents of the NIT are private to
the broadcaster and are intended to contain network-specific
information. The CAT is identified by a PID of 1 and contains
information specific to any conditional access or scrambling
schemes that are in use.
Navigating an MPEG-2 Multiplex
MPEG-2 PSI tables only give information concerning the multiplex.
The DVB standard adds complementary tables (DVB-SI) to allow the
user to navigate the available programs and services by means of an
electronic program guide (EPG). DVB-SI has four basic tables and
three optional tables to serve this purpose. The decoder must
perform the following main steps in order to find a program or a
service in an MPEG-2 transport multiplex.
- As soon as the new channel is acquired (synchronized), the
decoder must filter the PID 0 packets to acquire the PAT sections
and construct the PAT to provide the available choice (services
currently available on the air) to the user
- Once the user choice is made, the decoder must filter the PID
corresponding to the PMT of this program and construct the PMT from
the relevant sections. If there is more than one audio or video
stream, the user should be able to make another choice.
- The decoder must filter the PID corresponding to this
choice.
The audio/video decoding can now start. The part of this process
that is visible to users is the interactive presentation of the EPG
associated with the network, which can be built by means of the PSI
and DVB-SI tables in order to allow them to easily navigate the
available programs and services. Similar tables, Program and System
Information Protocol (PSIP) tables, are also available in the ATSC
system.
Conditional Access in DTV
DTV services will either be pay-per-view or at least include
some elements that are not freely available to the public. DVB
defined a standard for a "Common Interface for Conditional Access
and other Digital Video Broadcasting Decoder Applications" to
enable an Integrated Receiver Decoder (IRD) to de-scramble programs
broadcast in parallel, using different conditional access (CA)
systems. By way of inserting a PCMCIA module into the common
interface, you can sequentially address different CA systems by
that IRD. MultiCrypt describes the simultaneous operation of
several CA systems. The MultiCrypt approach has the additional
advantage that it does not require agreements between networks, but
it is more expensive to implement. Other applications, such as
Ethernet connection or electronic commerce, may also utilize the
DVB-CI connector.
SimulCrypt is another way of providing the viewer with access to
programs. In this case, commercial negotiations between different
service providers have led to a contract that enables the viewer to
use the one specific CA system built into the IRD to watch all the
programs, irrespective of the fact that these programs were
scrambled under the control of different CA systems. At the moment,
DVB supports both MultiCrypt and SimulCrypt, while ATSC only
supports the later.
Forward Error Correction
The transmission channels used for digital television
broadcasting are, unfortunately, rather error-prone due to a lot of
disturbances (such as noise, interference, and echoes). However, a
digital TV signal, after almost all its redundancy is removed,
requires a very low bit error-rate (BER) for good performance. A
BER of the order of 10-10 corresponds to an average
interval of some 30 minutes between errors. Therefore it is
necessary to take preventive measures before modulation in order to
allow detection and, as far as possible, correction in the receiver
of most errors introduced by the physical transmission channel.
These measures are called, collectively, forward error correction
(FEC). FEC requires that redundant data is added to the original
data prior to transmission, allowing the receiver to use these
redundant data to detect and recover the lost data caused by the
channel disturbance.
Figure 8: Forward error correction coding
process
Figure 8 illustrates the successive steps of the forward
error correction encoding process used in digital television
broadcasting. Strictly speaking, energy dispersal is not part of
the error correction process. The main purpose of this step is to
avoid long strings of 0s or 1s in the transport stream, in order to
ensure the dispersal of energy in the channel. Broadcasting
standards often use the terms inner coding and outer coding. Inner
coding operates just before the transmitter modulates the signal
and just after the receiver demodulates the signal. Outer coding
applies to the extreme input and output ends of the transmission
chain. Inner coding is usually convolutional in nature, with
optimal performance under conditions of steady noise interference.
Outer coding is a Read-Solomon code that is usually more effective
for correcting burst errors.
Read-Solomon Coding
Outer coding is a Reed-Solomon code that is a subset of BCH cyclic
block codes. As its name implies, in block coding, a block of bits
is processed as a whole to generate the new coded block. It does
not have system memory, such that coding of a data word does not
depend on what happens before or after that data occurs.
Reed-Solomon code, in combination with the Forney convolutional
interleaving that follows it, allows the correction of burst errors
introduced by the transmission channel. It is applied individually
to all the transport packets in Figure 7a, excluding the
synchronization bytes. R-S codes have been recently proved to
operate at the theoretical limit of correcting efficiencyno
more efficient code can be found. This is why it has been chosen
for all DTV standards as outer coding. An R-S code is characterized
by three parameters (n, k, t) where n
is the size of the block after coding, k is the size
of the block before coding and t is the number of
correctable symbols. Whether the received codeword is error-free
could be checked through a division circuit corresponding to the
generate polynomial g(x). For a proper codeword, the
remainder is zero. In the event that the remainder is non-zero, a
Euclidean algorithm is used to decide the two values needed for
error correction: the location of the error and the nature of the
error. However if the size of the error exceeds half the amount of
redundancy added, the error cannot be corrected.
In the ATSC standard, we find the R-S(207,187,10) code. It adds
20 parity bytes and can correct up to 10 erroneous bytes per
packet. In the DVB and ISDB standards, we find the R-S(204,188,8)
code. It adds 16 parity bytes and can correct up to 8 erroneous
bytes per packet.
Interleaving
The purpose of data interleaving is to increase the efficiency of
the Reed-Solomon coding by spreading over a longer time the burst
errors introduced by the transmission channel, which could
otherwise exceed the correction capacity of the Reed-Solomon
coding. Interleaving is normally implemented by using a
two-dimensional array buffer, such that the data enters the buffer
in rows and then read out in columns. The result of the
interleaving process is that a burst of errors in the channel after
deinterleaving becomes a few scarcely spaced single-symbol errors,
which are more easily correctable.
The interleaver employed in the ATSC standard is a
52-data-segment (intersegment) convolutional byte interleaver.
Interleaving is provided to a depth of about 1/6 of a data field (4
ms deep). Only data bytes are interleaved. The interleaver is also
synchronized to the first data byte of the data field. Intrasegment
interleaving is also performed for the benefit of the trellis
coding process. DVB and ISDB use convolutional interleaving, and
the interleaving depth is 12.
Inner Code
The inner coding is a 2/3 trellis coding for ATSC, and
convolutional coding for DVB and ISDB. Inner coding is an efficient
complement to the Reed-Solomon coding and Forney interleaving as it
is designed to correct random errors.
ATSC Trellis Coding
The 8-VSB transmission system employs a 2/3 rate (R=2/3) trellis
code, with one unencoded bit that is precoded. In creating serial
bits from parallel bytes, the MSB is sent out first: (7, 6, 5, 4,
3, 2, 1, 0). The MSB is precoded (7, 5, 3, 1) and the LSB is
feedback convolutional encoded (6, 4, 2, 0). Standard four-state
optimal Ungerboeck codes are used for the encoding (Figure
9); also shown are the precoder and the symbol mapper.
Figure 9: 2/3 trellis coding and precoder
You can use trellis coding with multi-level signaling, in other
words, several multi-level symbols are associated into a group. The
waveform that results from a particular group of symbols is called
a trellis. If each symbol can have eight levels, then in three
symbols there can be 512 possible trellises. In trellis coding, the
data are coded such that only certain trellis waveforms represent
valid data. If only 64 of the trellises represent error-free data,
then two data bits per symbol can be sent instead of three. The
remaining bit is a form of redundancy because trellises other than
the correct 64 are due to errors. If a trellis is received in which
the level of one of the symbols is ambiguous due to noise, the
ambiguity can be resolved because the correct level is the one that
gives a valid trellis. This technique is known as
maximum-likelihood decoding. The 64 valid trellises should be made
as different as possible to make the system continue to work with a
poorer signal to noise ratio. If the trellis coder makes an error,
the outer code will correct it.
DVB Convolutional Coding and Puncturing
In DVB, convolutional coding is used, followed by code puncturing.
Typically, a 1/2 convolutional consists of two FIR filters. These
two FIR filters convolve with the input bit stream, which produces
two outputs that represent different parity checks on the input
data so that bit errors can be corrected. Clearly, there will be
two output bits for every input bit; therefore the code rate is
1/2. Any rate between 1/1 and 1/2 would still allow the
transmission of original data, but the amount of redundancy would
vary. Failing to transmit the entire 1/2 output is called
puncturing and it obtains any required balance between bit rate and
error correcting capability. In DVB systems, as well as in ISDB
systems, 1/2, 2/3, 3/4, 4/5, 5/6, 7/8 are all possible code
rates.
Digital Modulations in DTV
Until now we do not see much difference among the three DTV
systems. Differentiation occurs due to the different modulation
schemes of the systems. This section briefly describes principles
behind those modultion schemes.
ATSC 8-VSB System
The ATSC 8-VSB system was developed by the Advanced Television
Systems Committee in the U.S. The framing structure of the
transmitted signal is an important aspect of the ATSC standard. It
accommodates the transport stream requirements, as well as
mitigates channel inter-propagation effects such as multipath and
impulse noise.
The transport packet for ATSC consists of 188 bytes, including a
sync byte. At the transmitter, this is altered in two ways. First
the sync byte is stripped off, leaving 187 bytes to be transmitted.
Then 20 bytes are added to this for the the Reed-Solomon error
correction, giving 207 bytes transmitted in each packet, which
amounts to 1656 bits. The trellis coding at rate 2/3 increases this
to 2484 bits, or 828 symbols, since eight-level coding gives three
bits per symbol. A special waveform, known as the data segment
sync, is added to the head of this packet and occupies four normal
symbol periods. The total modified transmission stream packet now
occupies 832 symbol periods, or a total time of 77.3 µs at the
symbol rate of 10.76 megasymbols per second. This resulting new
data packet is now called a data segment.
Figure 10: VSB data segments and framing
structure
Periodically, at intervals of 313 packets or 24.2ms, a special
data segment known as a field sync is inserted. The field sync
carries training data used by the adaptive equalizer in the
receiver to estimate what echoes may be present due to multipath
interference. The form of the data segment and overall framing
structure is shown in Figure 10.
Figure 11: Nominal VSB channel occupancy
The eight-level symbols combined with the binary data segment
sync and data field sync signals are used to generate a
suppressed-carrier-modulate carrier. Before transmission, however,
most of the lower sideband is removed. The resulting spectrum is
flat, except for the band edges where a nominal square-root
raised-cosine response results in 620 kHz transit bands. The
nominal VSB transmission spectrum is shown in Figure 11. The
spectrum includes a small pilot signal at the suppressed carrier
frequency, 310 kHz from the lower band edge.
DVB-T OFDM System
A European consortium of public and private sector
organizationsthe Digital Video Broadcasting
Projectdeveloped the DVB-T OFDM system. The system uses a
larger number of carriers-per-channel modulated in parallel via an
FFT process, a technique referred to as orthogonal frequency
division multiplex (OFDM). In case of multipath interference,
echoes could cause severe interference to the main signal.
Therefore, long symbol duration is necessary to suppress the echo
interference. OFDM can achieve long symbol duration within the same
bandwidth using parallel modulation. In OFDM, symbols are
demultiplexed to modulate many different carriers (a few thousand),
each of which occupies a much narrower bandwidth. Hence, the symbol
duration could be increased, though the total bandwidth remains the
same. These carriers are chosen to be orthogonal to each other so
that they are separable in the decoder. The modulated symbols are
frequency multiplexed to form the OFDM baseband signal, which is
then up-converted to RF signal for transmission.
The OFDM transmission system allows the selection of different
levels of QAM modulation. Moreover, a guard interval with
selectable width (1/4, 1/8, or 1/16 of the symbol duration)
separates the transmitting symbols, which gives the system an
excellent capability for coping with multipath distortion. OFDM
modulation also supports a single frequency network, such that in
the single coverage area, multiple transmitters are used to
transmit the same data using the same frequency at the same time.
The DVB-T system can operate in either a 2k mode or 8k mode. The 2k
mode uses a maximum of 1705 carriers, while in 8k mode the carrier
number is 6817. The 2k mode system has short symbol duration, so it
is suitable for a small single-frequency network (SFN) network with
limited distance between transmitters. The 8k mode is used in a
large SFN network where the transmitters could be up to 90 km
apart.
ISDB-T BST-OFDM System
The Association of Radio Industries and Businesses (ARIB) in Japan
developed the ISDB-T system. It uses a modulation method referred
to as Band Segmented Transmission (BST) OFDM, which consists of a
set of common basic frequency blocks called BST-Segments. Each
segment has a bandwidth corresponding to 1/14th of the channel
bandwidth. BST-OFDM provides hierarchical transmission capabilities
by using different punctured coding rates, modulation schemes, and
guard intervals on different BST-segments. Thus different segments
can meet different service requirements. By transmitting OFDM
segment groups with different transmission parameters, you get
hierarchical transmission.
Comparisons and Conclusions
Generally speaking, each system has its own unique advantages
and disadvantages. Table 1 summarizes the main
characteristics of the three DTV systems.
| |
ATSC 8-VSB
|
DVB-T OFDM
|
ISDB-T
BST-OFDM
|
| Source Coding |
| Video |
Main profile syntax of ISO/IEC 13818-2 (MPEG-2
video)
|
| Audio |
Dolby AC-3
|
MPEG-2 Audio
or Dolby AC-3
|
MPEG-2 Audio
or AAC Audio
|
| Transport Stream |
MPEG-2 Transport Stream
|
| Channel Coding Coding |
| Outer Coding |
R-S (207, 187, t=10)
|
R-S (204, 188, t=8)
|
| Outer Interleaver |
52 R-S block interleaver
|
12 R-S block interleaver
|
| Inner Coding |
2/3 trellis code
|
Punctured convolutional code: 1/2, 2/3, 3/4, 5/6,
7/8, constraint length=7, polynomials 171, 133
|
| Inner Interleaver |
12-to-1 trellis
code interleaver
|
Bit-wised interleaving
& frequency interleaving
|
Bit-wised interleaving &
time and frequency interleaving
|
| Data Randomization |
16-bit PRBS
|
| Modulation |
| Symbol Mapping |
|
QPSK, xQAM
|
DQPSK/QPSK, xQAM
|
| Guard Interval |
|
1/32, 1/16, 1/8, 1/4
|
| Hierarchical |
No
|
Yes
|
| No. of Carriers |
1
|
2k and 8k FFT
|
2k, 4k, and 8k FFT
|
| Bit Rates |
19.3 Mbps
|
3.7-31.7 Mbps
|
4.06-21.47 Mbps
|
| HDTV Capability |
Yes
|
Increased convolutional coding rate, say, 3/4 is
required. This needs additional 1.5dB of power.
|
Table 1: Main characteristics of DTV systems
The ATSC system is more robust in an added white Gaussian noise
(AWGN) channel, has higher spectrum efficiency, lower
peak-to-average power ratio, and is more robust to impulse noise.
It also has comparable performance to DVB and ISDB systems at
low-level multipath distortion and against analog TV interference.
Therefore the ATSC 8-VSB system could be more advantageous for a
single transmitter system and for providing HDTV service within a 6
MHz channel to fixed receivers.
The DVB-T system has performance advantages with respect to
high-level (up to 0 dB), long-delay multipath distortion. DVB-T
could be advantageous for services requiring large-scale,
single-frequency networks and for mobile reception. Hierarchical
channel coding and modulation, which uses multi-resolution
constellation on OFDM carriers, is also available to provide
two-tier services within one DTV channel.
The ISDB-T system, which uses the same modulation and
channel-coding scheme as the DVB-T system, has similar performance
advantages to the DVB-T system. It was designed to operate under
large-scale SFN and, particularly, in a mobile reception
environment. The depth of the time interleaver can be selected to
improve the quality of the mobile reception and immunity against
impulse noise. The band-segmented transmission allows the use of up
to three different modulation schemes and coding rates on different
segments to meet various service requirements and interference
conditions.
In conclusion, the period of research and development in digital
television broadcasting is largely over, and actual digital
television services are now offered in many countries. DTV brings
about new services and applications, such as home shopping and home
banking, which will bring convenience to users. Interactive TV can
allow users to call up program-related information on demand, thus
enhancing viewing pleasure. This spells a technical progression
similar to, but much profound than, the transition from
black-and-white to color television.
References