CS4624 Text - Ch 4
Digital Audio Representation and Processing
This chapter has a great deal of important material. Some will be considered again during the unit on compression. Some, such as about speech, is largely beyond the scope of this course. But the rest should be read several times, since audio plays a key role in multimedia systems.
4.1 Use of Audio in Computer Applications
- Sonification is like visualization, mapping data parameters to sound characteritics
- Audio is an important channel for communication, supported by an important human sense which can perceive subtle differences in numerous aspects.
4.2 Psychoacoustics- There is no need to understand audio aspects that humans cannot perceive.
- Digital audio saves space by ignoring what can't be perceived, or is not perceived much.
4.2.1 Frequency Range of Human Hearing- 20 - 20 kHz when young
- Frequency is a physical measure while pitch is a percept
4.2.2 Dynamic Range of Human Hearing- Threshold of auditibility is 2.83E-4 dyne/cm^2 (SPL or sound pressure level) for a 1 kHz sine wave.
- Perceived loudness vs. freq.: highest at very low freq, decreasing till 5 kHz, increasing above.
- decibel = db = 20 log10(A/B)
- Threshold of pain reached at 100-120dB; 120 dB is 1E6 more pressure than lowest audible.
4.2.3 Spectral Characteristics in Human Hearing- Ignore most of this.
- People perceive differences in shape of frequency spectrum as timbre, which distinguishes the sound of different sources: piercing, boomy, attention-grabbing --- but we don't understand how to predict the effects from a spectrum.
4.2.4 Time-Varying Aspects of Natural Sound- Musical tones have attack (most important for identifying sound), steady-state, decay regions.
- Spectral components of tones vary in time slightly relative to each other, in frequency and amplitude.
4.2.5 Masking- One sound may partially or totally block perception of another (usually at a higher freq.) that follows it within milliseconds, even up to 20 dB above the threshold of audibility.
- Within a critical band around a sound, other sounds are masked; that band is around 100 Hz at low freq, 200 at 2 kHz, 500 at 4 kHz, 1000 at 5 kHz, 2000 at 10 kHz
- This is used in MPEG compression.
4.2.6 Phase- Waves 180 degrees out of phase cancel, so studios must be careful about phase shifts when mixing.
4.2.7 Binaural Hearing and Localization- Sound location and motion relates to differences in sound perceived between the 2 ears, and is mentally coordinated with visual perception of the source.
- Reverb relates to how we perceive sound in different types of rooms.
4.3 Digital Representations of Sound
4.3.1 Time-Domain Sampled Representations- sample frequencies: 11.025, 22.05, 32, 44.1, 48 kHz
- Nyquist frequency
- quantization noise
- PCM, 6 dB/pit, 16 bits for 96 dB
- ADC and anti-alias filter to remove freq. above Nyquist freq., DAC with reconstruction filter and possible oversampling
4.3.1a Other Methods of Encoding the Analog Signal- adaptive, differences: ADPCM (CD-I)
- logarithmic quantization step: A-law and mu-law (Ulaw) - get 12 bit accuracy for 8 bits, CCITT Rec. G.711
4.3.1b CD-Audio, CD-ROM, CD-I- CD-I ADPCM, levels A, B,
- Compare bandwidth, bits, hours fitting on a CD.
4.3.2 Transform Representations
4.3.2a Fourier Methods- PCM-time-domain signal to freq. domain Fourier coefficients
- discrete-short-time Fourier transform = phase vocoder = additive synthesis
4.3.2b Subband Coding and MPEG Audio- Sony mini disc, ATRAC scheme or Philips Digital Compact Cassette (DCC)
- MPEG - for compression unit
4.3.3 Subtractive-Based Representations
4.3.4 Parametric Representations- FM = frequency modulation; Yamaha sound synthesis
- waveshaping and other techniques
4.4 Transmission of Digital Sound- AES/EBU format: 16-20 audio bits, 2 channels, clock and status bits --- up to 56 audio channels with 24 bits/sample
- DCC uses 34, 44.1, 48 kHz sampling
4.5 Digital Audio Signal Processing (DSP)- filters, equalizer banks, gain control (compressor), echo (reverb) --- desktop audio
- Motorola DSP 56001, TI TMS 320, Sonic Solutions NoNoise sysem for Mac
4.5.1 Stereophonic and Quadrophonic Signal Processing Techniques- pan for left-right motion; reverb as moves away and softens, Doppler shift for fast motion
- Perceived location is important for pilots, to coordinate conferencing so audio is from lips
4.5.2 Architecture of an Audio Signal Processing Library- IEEE FORTRAN library
- NeXT for Motorola DSP 56001
- AT&T VCOS MML (Multimedia Module Library)
4.5.3 Editing Sampled Sound- Waveform for each channel: cut, copy, paste, label, synchronize with video (and SMPTE time code)
- Transform, effects, storing in sound formats
4.6 Digital Music-Making
4.6.1 Musical Instrument Synthesizers- synthesizers have some control or I/O device
- sampler uses stored sounds
- synthesis technique: additive synthesis, FM, waveshapping
4.6.2 MIDI Protocol- Music Instrument Digital Interface of MMA, the MIDI Manufacturer's Assoc.
- 8 data bits, start and stop bits --- 1ll 10 bits transmitted at 31.25 kbaud
- General MIDI spec.: 128-voice Instrument Pitch Map
- note on, off, pitch change, key number, velocity (attack), pressure (amplitude)
- real-time music performance with absolute time
- Standard MIDI File Format - delta time
4.7 Brief Survey of Speech Recognition and Generation- telephone speech is 20-3400 Hz
- 8 kHz sampling
4.7.1 Speech Production (IGNORE SECTION)
4.7.2 Encoding and Transmitting Speech- Ignore about modeling human speech production
- CCITT G.211 for 8 bit PCM, 64 kb/sec
- CCITT G.721, ADPCM, 32 kb/s
- CCITT G.722, subband coding, 50-7000 Hz, 64 kb/s
4.7.3 Speech Synthesis (IGNORE SECTION)
4.7.4 Speech Recognition (IGNORE SECTION)
4.8 Digital Audio and the Computer- CD-DA needs 176 kB/s; 48 kHz leads to 200 kB/s
- Hardware includes ADC, DAC, chips/cards for DSP, FM synthesis, etc. - e.g., SoundBlaster
4.9 Closing Remarks- to be used, developments must sound well, not just follow a theory
- may free us from typing, hand-eye-based communication
[Home |
Readings
]
Copyright 1996 Edward A. Fox