Digital Video

There is one major over-riding concern in digital video:
COMPRESSION

COMPRESSION

COMPRESSION

COMPRESSION

COMPRESSION

COMPRESSION
COMPRESSION

Too Much Data

To play one second of uncompressed 8-bit color, 640 X 480 resolution, digital video requires approximately 9 MB of storage. One minute would require about 0.5 GB. A CD-ROM can only hold about 600MB and a single-speed player can only transfer 150KB per second. Data storage and transfer problems increase proportionally with 16-bit and 24-bit color playback. Without compression digital video would not be possible with current storage technology.

Not Enough Storage

The storage required for a video clip can be expressed in a simple relation:

Video Source Data Reduction ==>> Video Compression ==>> Video Storage

The amount of required storage is determined by how much and what type of video data is in the uncompressed signal and how much the data can be compressed. In other words, the orginal video source and the desired playback parameters dramatically affect the final storage needs.

GIGO (Garbage In Garbage Out)

Video or motion video arrives originally through some type of camera, which records what it sees as a sequence of images (measured in frames per second [fps]).

Frame rate (for capture and playback) is

           24 fps for movies
           30 fps for TV
TV generally uses interleaving, so each frame is actually two (fields), each with half of the lines. Computers could play this back at 60 fps where both fields are combined, for flicker-free rendering (non-interlaced).

The quality of the source data depends on the camera's optics and resolution, which is determined by the number of CCD (charge-coupled device), elements, usually 250- 400K for consumer products.

If the camera is connected to a VTR, then the quality of the videotape recording and playback process limits the quality the capturing system can achieve. Consumer grade recorders used should at least be SVHS, or Hi-8, to give adequate quality of the computer representation. Best would be digital tape, such as D2, or high quality analog tape, like Betacam SP. Alternatively, a laserdisc or broadcast source can be used, but attention should be given to ensure the highest quality.

Broadcast TV, NTSC, generally has about 15 bits/pixel of color depth, and 525 lines of resolution with 4:3 aspect ratio. Scanning practices leave a smaller safe region.

Manual Compression

A person recording video for digitization can drastically affect the later compression steps. Video in which backgrounds are stable (or change slowly), for a period of time will yield a high compression rate. Scenes in which only a person's face from the shoulders upward is captured against a solid background will result in excellent compression. This type of video is often referred to as a 'talking head'.

Filtering

Filtering itself does not achieve any compression but it is a necessary step due to the artifacts of compression. Filtering is a preprocessing step performed upon video frame images before compression. Essentially it smoothes the sharp edges in an image where a sudden shift in color or luminance has occurred. The smoothing is performed by averaging adjacent groups of pixel values. Without the filtering preprocess step decompressed video exhibits aliasing (jagged edges), and moiré patterns as in the example pattern at the right.

What You Can't See Won't Hurt You

The human vision system is more sensitive to changes in luminance than in color. The second preprocessing step is a conversion from the RGB color scheme of a computer to the YUV color scheme that television uses. The Y value is termed the luminance and the U & V values are the chrominance. (In S-video two cables are used one for Y and one for UV. In component video each value is carried across a separate cable.) This color-space conversion allows for separate luminance and chrominance sampling in the digitization step.

Everythings Going Digital

The color-space conversion allows for different digitization sampling rates for luminance and chrominance. Typically studio quality video (MPEG-2 scheme, other formats sample at different rates), requires that the chrominance U & V values only be sampled twice for every 4 times the Y luminance value is sampled. This discards half of the less important color information. Two popular digital video formats, the MPEG and the DVI take only one UV sample for every 2 X 2 square of luminance Y pixels. This achieves a 2 to 1 compression.

Good Things Come in Small Packages

The easiest way to save memory is to store less. The preprocessing technique of scaling adheres to this maxium. Original digital video standards only stored a video window of 160 X 120 pixels. A reduction of 1/16th the size of a 640 X 480 window. With faster processors a 320 X 240 video window size is quickly becoming the standard, yielding a 4 to 1 compression. A further scaling application involves time instead of space. In this temporal scaling the number of frames per second (fps), is reduced from 30 to 24. If the fps is reduced below 24 the reduction becomes very noticable in the form of jerky movement.

There is Only One Constant in the World, Change

The first actual compression step is transformation. Codecs (COmpressionDECompression algorithms), transform the two-dimensional spatial representation of an image into another dimension space (frequency). Since most natural images are composed of low frequency information, the high frequency components can be discarded. This results in a softer picture in terms of contrast. The frequency information is represented as 64 coefficients due to the underlying DCT (Discrete Cosine Transform), algorithm which operates upon 8 X 8 pixel grids. Low frequency terms occur in one corner of the grid, with high frequency terms occurring in the opposite corner of the grid.

If You Don't Need It, Lose It

The lossy quantization step of digital video uses fewer bits to represent larger quantities. The 64 frequency coefficients of the DCT transformation are treated as real numbers. These are quantified into 16 different levels. The high frequency components (sparse in real-world images), are represented with only 0, 1 or 2 bits. The zero mapped frequencies drop out and are lost.

Compaction Encoding

The last step in compressing individual frames (intraframe compression) is a sequence of three standard text file compression schemes. Run-length encoding (RLE), Huffman coding, and arithmetic coding. RLE replaces sequences of identical values with the number of times the value occurs followed by the value (e.g., 11111000011111100000 ==>> 51406150). Huffman coding replaces the most frequently occurring values|strings with the smallest codes. Arithmetic coding similar to Huffman coding, codes the commonly occuring values|strings using fractional bit codes.

Automatic Talking-Head Compression

At 30 fps very little changes from one frame to the next. Interframe compression takes advantage of this fact to achieve dramatic compression. Instead of storing complete information about each frame only the difference information between frames is stored. MPEG stores three types of frames. The first type I-frame, stores all of the interframe compression information using no frame differencing. The second type P-frame is a predicted frame two or four frames in the future. This is compared with the corresponding actual future frame and the differences are stored (error signal). The third type B-frames, are bidirectional interpolative predicted frames that fill in the jumped frames.

When storing differences MPEG actually compares a block of pixels (macroblock) and if a difference is found it searches for the block in nearby regions. This can be used to alleviate slight camera movement to stablize an image. It is also used to efficiently represent motion by storing the movement information (motion vector), for the block.

Video Capture

There are two methods for capturing video. The first method compresses the video in real-time as it is received. It requires a fast processor capable of compressing frames at the same rate as the video source. The second method breaks up the capture and compression phases into separate steps. It first captures the raw video data to the hard drive which is compressed later when time allows. This method requires a fast large hard drive to store the huge raw video data files. In either process if the hardware cannot keep pace with the video source frames are dropped (skipped), which causes signal degradation.

Video Compression Algorithms

Several standard video compression algorithms (codecs), are being widely used on various platforms. The algorithms fall into one of two classifications: symmetric codec or asymmetric codec. The symmetric codecs require inverse operations to decompress the format. Asymmetric codecs use different compression|decompression methods. More processing time is spent in compressing to achieve low storage to allow for shorter decompression time.

QuickTime

Produced by the Apple corporation for both Macintosh and Windows it is actually a system for playing digital video. It allows the use of different codecs in addition to Apple's own proprietary format.

Video For Windows

Microsoft launched Video For Windows as an alternative to Quicktime for Windows. Like Quicktime it is an open-architecture system, allowing the use of different codecs.

Motion JPEG

A codec based upon the Joint Photographics Expert Group (JPEG) graphic format, MJPEG performs only intraframe compression. Due to the lack of interframe compression it requires special video hardware to deliver acceptable speed and quality.

MPEG

A platform independent symmetric codec proposed by the Motion Picture Experts Group. Two standards currently exist. MPEG-1 was designed for CD-ROM single-speed (150 KB/sec), playback in a 320 X 240 window at 30 fps. MPEG-2 was designed for studio-quality video in a 704 X 480 window at fps. For more details, look at the tutorial by Dane Dwyer and Michael Swafford, or the discussion at Terran Interactive.

Cinepak

Produced by SuperMac Technology, cinepak is an asymetric codec designed for 24-bit video in a 320 X 240 window for single-speed CD-ROM drives. Compression typically takes 300 times longer than decompression. You might want to see a comparison of Cinepak versus Indeo.

Indeo

Intel produced the Indeo asymmetric codec for real time compression on special Intel hardware. Playback can take place on a Intel 486 processor without any hardware assistance. It produces high-quality talking-head video but is less efficient than Cinepak at motion sequences.

DVI

The Digital Video Interactive standard has been in existence longer than most other formats. It requires off-line supercomputer processing power for the compression. Intel provides compression service centers for this purpose.

Further Exploration

Terran Interactive has a summary of the various video codecs. They also have web-capable comparisons of the various codecs. Also, take a look at their informative tips for Making Movies.

Note: Portions of this document are ©copyright by Dr. Edward A. Fox. All rights reserved.

Author: N. Dwight Barnette
Curator: Computer Science Dept : VA TECH © Copyright 1994.
Last Updated: 5/25/96