There is one major over-riding concern in digital video:
COMPRESSION
COMPRESSION
COMPRESSION
COMPRESSION
COMPRESSION
COMPRESSION
COMPRESSION
Too Much Data
To play one second of uncompressed 8-bit color, 640 X 480 resolution,
digital video requires approximately 9 MB of storage. One minute
would require about 0.5 GB. A CD-ROM can only hold about 600MB and
a single-speed player can only transfer 150KB per second. Data storage and
transfer problems increase proportionally with 16-bit and 24-bit color
playback. Without compression digital video would not be possible with
current storage technology.
Not Enough Storage
The storage required for a video clip can be expressed in a simple
relation:
Video Source Data Reduction ==>> Video Compression ==>> Video Storage
The amount of required storage is determined by how much and what type of
video data is in the uncompressed signal and how much the data can be
compressed. In other words, the orginal video source and the desired
playback parameters dramatically affect the final storage needs.
GIGO (Garbage In Garbage Out)
Video or motion video arrives originally through some type of camera,
which records what it sees as a sequence of images (measured in frames per
second [fps]).
Frame rate (for capture and playback) is
24 fps for movies
30 fps for TV
TV generally uses interleaving, so each frame is actually two
(fields), each with half of the lines. Computers could play this
back at 60 fps where both fields are combined, for flicker-free rendering
(non-interlaced).
The quality of the source data depends on the camera's optics and
resolution, which is determined by the number of CCD (charge-coupled
device), elements, usually 250- 400K for consumer products.
If the camera is connected to a VTR, then the quality of the
videotape recording and playback process limits the quality
the capturing system can achieve. Consumer grade recorders
used should at least be SVHS, or Hi-8, to give adequate
quality of the computer representation. Best would be digital
tape, such as D2, or high quality analog tape, like
Betacam SP. Alternatively, a laserdisc or broadcast source can be used,
but attention should be given to ensure the highest quality.
Broadcast TV, NTSC, generally has about 15 bits/pixel
of color depth, and 525 lines of resolution with 4:3 aspect ratio.
Scanning practices leave a smaller safe region.
Manual Compression
A person recording video for digitization can drastically affect the later
compression steps. Video in which backgrounds are stable (or change
slowly), for a period of time will yield a high compression rate. Scenes
in which only a person's face from the shoulders upward is captured
against a solid background will result in excellent compression. This type
of video is often referred to as a 'talking head'.
Filtering
Filtering itself does not achieve any compression but it is a necessary
step due to the artifacts of compression. Filtering is a preprocessing
step performed upon video frame images before compression. Essentially it
smoothes the sharp edges in an image where a sudden shift in color or
luminance has occurred. The smoothing is performed by averaging adjacent
groups of pixel values. Without the filtering preprocess step decompressed
video exhibits aliasing (jagged edges), and moiré patterns as in
the example pattern at the right.
What You Can't See Won't Hurt You
The human vision system is more sensitive to changes in luminance than in
color. The second preprocessing step is a conversion from the RGB color
scheme of a computer to the YUV color scheme that television uses.
The Y value is termed the luminance and the U & V values are the
chrominance. (In S-video two cables are used one for Y and one for UV. In
component video each value is carried across a separate cable.) This
color-space conversion allows for separate luminance and
chrominance sampling in the digitization step.
Everythings Going Digital
The color-space conversion allows for different digitization
sampling rates for luminance and chrominance. Typically studio quality
video (MPEG-2 scheme, other formats sample at different
rates), requires that the chrominance U & V values only be sampled
twice for every 4 times the Y luminance value is sampled. This discards
half of the less important color information. Two popular digital video
formats,
the MPEG and the DVI take only one UV sample for every 2 X 2
square of
luminance Y pixels. This achieves a 2 to 1 compression.
Good Things Come in Small Packages
The easiest way to save memory is to store less. The preprocessing
technique of scaling adheres to this maxium. Original digital video
standards only stored a video window of 160 X 120 pixels. A reduction of
1/16th the size of a 640 X 480 window. With faster processors a 320 X 240
video window size is quickly becoming the standard, yielding a 4 to 1
compression. A further scaling application involves time instead of space.
In this temporal scaling the number of frames per second (fps), is reduced
from 30 to 24. If the fps is reduced below 24 the reduction becomes very
noticable in the form of jerky movement.
There is Only One Constant in the World, Change
The first actual compression step is transformation. Codecs
(COmpressionDECompression algorithms), transform the two-dimensional
spatial representation of an image into another dimension space
(frequency). Since most natural images are composed of low frequency
information, the high frequency components can be discarded. This results
in a softer picture in terms of contrast. The frequency information is
represented as 64 coefficients due to the underlying DCT (Discrete
Cosine Transform), algorithm which operates upon 8 X 8 pixel grids. Low
frequency terms occur in one corner of the grid, with high frequency terms
occurring in the opposite corner of the grid.
If You Don't Need It, Lose It
The lossy quantization step of digital video uses fewer bits to
represent larger quantities. The 64 frequency coefficients of the DCT
transformation are treated as real numbers. These are quantified into 16
different levels. The high frequency components (sparse in real-world
images), are represented with only 0, 1 or 2 bits. The zero mapped
frequencies drop out and are lost.
Compaction Encoding
The last step in compressing individual frames (intraframe
compression) is a sequence of three standard text file compression schemes. Run-length encoding
(RLE), Huffman coding, and arithmetic coding. RLE replaces sequences of
identical values with the number of times the value occurs followed by the
value (e.g., 11111000011111100000 ==>>
51406150). Huffman coding replaces the most
frequently occurring values|strings with the smallest codes. Arithmetic
coding similar to Huffman coding, codes the commonly occuring
values|strings using fractional bit codes.
Automatic Talking-Head Compression
At 30 fps very little changes from one frame to the next. Interframe
compression takes advantage of this fact to achieve dramatic
compression. Instead of storing complete information about each frame only
the difference information between frames is stored. MPEG stores three
types of frames. The first type I-frame, stores all of the
interframe compression information using no frame differencing. The second
type P-frame is a predicted frame two or four frames in the future.
This is compared with the corresponding actual future frame and the
differences are stored (error signal). The third type B-frames, are
bidirectional interpolative predicted frames that fill in the jumped
frames.

When storing differences MPEG actually compares a block of pixels
(macroblock) and if a difference is found it searches for the block in
nearby regions. This can be used to alleviate slight camera movement to
stablize an image. It is also used to efficiently represent motion by
storing the movement information (motion vector), for the block.
Video Capture
There are two methods for capturing video. The first method compresses the
video in real-time as it is received. It requires a fast processor capable
of compressing frames at the same rate as the video source. The second
method breaks up the capture and compression phases into separate steps.
It first captures the raw video data to the hard drive which is compressed
later when time allows. This method requires a fast large hard drive to
store the huge raw video data files. In either process if the hardware
cannot keep pace with the video source frames are dropped (skipped), which
causes signal degradation.
Video Compression Algorithms
Several standard video compression algorithms
(codecs),
are being widely used on various platforms. The algorithms fall into one
of
two classifications: symmetric codec or asymmetric codec.
The symmetric codecs require inverse
operations to decompress the format. Asymmetric codecs use different
compression|decompression methods. More processing time is spent in
compressing
to achieve low storage to allow for shorter decompression time.
Produced by the Apple corporation for both Macintosh and Windows it
is actually a system for playing digital video. It allows the use of
different
codecs in addition to Apple's own proprietary format.
Video For Windows
Microsoft launched Video For Windows as an alternative to Quicktime for
Windows.
Like Quicktime it is an open-architecture system, allowing the use of
different codecs.
A codec based upon the Joint Photographics Expert Group (JPEG) graphic
format,
MJPEG performs only intraframe compression. Due to the lack of interframe
compression it requires special video hardware to deliver acceptable speed
and
quality.
A platform independent symmetric codec proposed by the Motion Picture
Experts Group.
Two standards currently exist. MPEG-1 was designed for CD-ROM single-speed
(150 KB/sec), playback in a 320 X 240 window at 30 fps. MPEG-2 was
designed for studio-quality video in a 704 X 480 window at fps.
For more details, look at the
tutorial by Dane Dwyer and Michael Swafford, or the discussion at
Terran Interactive.
Produced by SuperMac Technology, cinepak is an asymetric codec designed
for 24-bit
video in a 320 X 240 window for single-speed CD-ROM drives. Compression
typically
takes 300 times longer than decompression. You might want to see a
comparison of
Cinepak
versus Indeo.
Intel produced the Indeo asymmetric codec for real time compression on
special
Intel hardware. Playback can take place on a Intel 486 processor without
any
hardware assistance. It produces high-quality talking-head video but is
less
efficient than Cinepak at motion sequences.
DVI
The Digital Video Interactive standard has been in existence longer than
most other
formats. It requires off-line supercomputer processing power for the
compression.
Intel provides compression service centers for this purpose.
Further Exploration 
Terran Interactive has a summary of the various video codecs.
They also have web-capable
comparisons of the various codecs.
Also, take a look at their informative tips for
Making
Movies.
Note: Portions of this document are ©copyright by Dr. Edward A. Fox. All rights reserved.