Digital Audio

Sound in the natural world is analog. Sound stored on a computer is digital. Natural sound is the result of a stream of continuous changes of vibrations in the atmosphere. The process of converting natural analog sound into discrete digital sound is digitization. Digitization of analog sounds is composed of two phases: 1. sampling; 2. quantization.

Sampling

Sampling involves the rate at which the converted sound is captured. There exists a direct relationship between the sampling rate, sound quality (fidelity), and storage space. The higher the sampling rate the higher the fidelity and the higher the storage requirements. Digital sound sampled at high fidelity rates requires massive storage. For example, the CD audio format can hold 600 million characters of text, but only 74 minutes of uncompressed music. This requires a tradeoff to be made between the quality of the sound and the storage requirements. Knowing the type of audio to be sampled and the intended purpose of the audio allows for a reasonable choice of sampling rate. In deciding upon a sampling rate one must be aware of the difference between playback rate and capturing (sampling), rate. These two rates are not the same. In fact the sampling rate must be two times the playback rate. The reason for this discrepancy is due to the Nyquist Effect (or Nyquist Theorem). In a worst-case scenario with only one sample per period, instead of two samples as the above graph depicts, the reproduced sound might be played back as a continuous tone. While in practical applications this maximum loss of fidelity would not likely occur, unacceptable errors due to the Nyquist effect would be exhibited.

Although many sampling rates exist, only the most popular and most common will be discussed. Human speech can be effectively reproduced at a rate of 5.5 kHz (kilo-Hertz). This requires a sampling rate of only 11kHz. Most natural world sounds and medium fidelity music can be reproduced at 11kHz with acceptable losses of fidelity (approx. FM radio frequencies). To reproduce high fidelity music at CD audio quality the sampling rate must be 44.1 kHz, giving a playback rate of 22 kHz, which is just above the limit of human hearing.

Quantization

The process of converting a sampled sound into a digital value is termed quantization. The number of distinct sound levels that can be represented is determined by the number of bytes used to store the quantization value. CD audio, the most common quantization strategy uses 2 bytes (16 bits); capable of representing 65,536 discrete levels. In simple terms, quantization can be viewed as converting real (continuous sound), values into integer (discrete sound) values. This process involves dealing with the error between the sampled discrete values and the actual continuous sound, termed quantization error. In audio theory this is referred to as the signal-to-noise ratio (S/N). The S/N ratio is a ratio between the difference of the highest and lowest frequencies to the average superimposed noise (white noise or static). The higher the S/N ratio the better the sound. CD Audio has a theoretical S/N ratio of about 96 decibels (dB), with actual systems achieving S/N ratios in the low 90 dB. S/N ratios must be greater than 70 dB to prevent backgound noise from becoming audible. Decreasing the quantization to 8-bits, to save 50% of the required storage, decreases the S/N ratio to about 50 dB, approximately the same quality as AM radio.

This two step process just described for sampling and quantizing sound digitially is termed Pulse Code Modulation (PCM). PCM is the standard method employed in the CD Audio format.

Compression

For various reasons audio data does not easily yield high compression rates when standard textual compression methods are employed. The CD Audio standard (PCM) is commonly termed linear PCM since it performs no compression, storing each value as a separate 16-bit value. Because sampled sound will differ little from one fraction of a second sample to the next the DPCM (Differential PCM) technique is widely used. DPCM only stores a measurement of the difference between the last sample and the next. Since this difference tends to be small, less bits are required to store the difference values. The Compact Disc-Interactive format uses ADPCM (Adaptive DPCM), to achieve better compression. ADPCM increases and decreases the magnitude that the difference values represent as the sound signal amplitude changes.

Audio File Formats

As with graphic/images files, different systems use different file formats for audio. Many of these have become de facto standards through their multi-platform use. The format can usually be identified through their file name extension. Some of the more popular extensions are listed in the following table:

Common Digital Audio File Formats
Extension Name System
.au or .snd audio/sound Sun, Mac, NeXT, UNIX
.aiff AIFF Apple, SGI
.voc vocal Soundblaster
.wav WAVE Microsoft

Further Exploration

Many CD audio file formats exist and information about them can be found on the net.

For those students who are interested analog audio information can also be located on the net.

A good introduction to computer sound from Allison Zhang at the School of Library and Information Studies, Dalhousie University, Halifax, N.S., Canada

Note: Much of the information for this document was taken from the digital audio references in the course readings.


Author: N. Dwight Barnette
Curator: Computer Science Dept : VA TECH © Copyright 1994.
Last Updated: 8/20/97