Digital Audio, Audio for Video, and Digital Video
Presented by:           Bartek Plichta , Michigan State University  
Project / Software Title:       Digital Audio, Audio for Video, and Digital Video  
Project / Software URL:  
Access / Availability:        

Digital Audio Basics
There many digital audio format types currently used. A digital audio file is a collection of encoded data that represent an analog acoustic signal. An analog signal is continuous, while its digital representation consists of discreet data (sample values) that are encoded by means of a encoding scheme, such as PCM (Pulse Code Modulation). PCM is the most widespread audio encoding scheme, and it is the basis of popular multimedia file formats, such as Microsoft Wav, Apple aiff, and many others.

There are many ways in which audio files can be categorized. One of such categories is the distinction between lossess and lossy encoding. Lossless files are encoded in an uncompressed manner, whereby all sample values are preserved. Such files are a function of two basic parameters - sample rate and bit-depth. Sample rate is the rate at which the analog waveform is captured (digitized) expressed in Hz. For example, the sample rate of 10,000 Hz means that the Analog-to-Digital converter recorded amplitude values of the waveform 10,000 times per second. The bit-depth is the resolution of the digital audio file, or the level of detail at which the sample data were recorded. For example, the bit-depth of 16 means that the waveform amplitudes (voltages) were recorded with 16-bit digital words, which gives over 65,000 possible values to be recorded. The amplitudes of the analog waveform that are not digitized are output as noise. The higher the bit-depth the higher the dynamic range - the difference between very soft and very loud amplitudes. A 16-bit audio file has a theoretical dynamic range of 96 dB. 16 bit, PCM audio at the sample rate of 44.1 kHz is the current audio CD standard. The new DVD audio standard is 96,000 Hz/24-bit.

Lossy compression, on the other hand, does not contain all of the digitized sample values. Current lossy compression schemes are based on psychoacoustic properties of human perception of sound. Human sound perception is limited by the frequency and time domains. We can only hear certain frequencies (20-20,000 Hz), and we can only hear limited frequency changes over time. For instance, if ask a good drummer to play a drum roll starting slowly, and speeding up over time, at some point we will lose the ability to hear individual strokes, and we will hear the drum roll as a continuous percussive sound. Psychoacoustic compression takes advantage of the limitations of human auditory perception and removes the sample values that would be hard or impossible for humans to hear. Some of the most common lossy compression standards include MP3, Windows Media, and AAC.

Preservation, Analysis, and Delivery
We use different audio file parameters for different purposes. Digital audio files for the purposes of long-term preservation should be sampled at a high sample rate (min. 96 kHz) and the 24-bit resolution. Files to be used for acoustic (or discourse) analysis should be resampled to 10,000 or 16,000 Hz (depending on talker F0) and the 16-bit resolution. Files to be used for delivery can be processed (e.g., normalized, equalized, compressed, etc.) and converted to a streaming audio format such as Windows Media, Real Audio, MP3, or others.

Doing Audio for Video
When recording video, many field workers forget about audio. While the moving images are extremely powerful, the accompanying audio track should not be neglected. Video encoding formats contain synchronized (using time code, such as SMTPE) audio and video tracks. Current digital video software (or older analog hardware) can easily separate those tracks. The audio track is, most typically, captured with the camcorder's built-in microphone. This results in a very noisy recording with very soft voice amplitudes. One should always use an off-camera microphone (or several microphones) and capture the audio signal meticulously. There are a variety of microphones, fieled mixers, and pre-amplifiers that field researchers can use. Boom microphones have a highly directional pick-up pattern and can acquire a crisp and loud speech signal from a few feet away. A great alternative to shotgun microphones are lavaliere microphones that are pinned to the speaker's lapel, collar, or tie. Many of them are omnidirectional, but some manufacturers, such as Audio-technica, make excellent directional lavaliere microphones. A good field mixer/pre-amplifier is a must, as are monitoring headphones.

Digital Video for Archival Purposes
Similar to the audio world, there are many competing, yet less stable, digital video formats on the market today. Avid Technologies is one of the biggest and most innovative companies specializing in digital video. Avid uses the Open Media Framework (OMF) technology to capture and store metadata related to digital video files. Avid systems can digitize video and store in an uncompressed Standard Definition (SD), High Definition (HD) video, as well as 16mm and 35mm film by means of telecine technology. Avid codecs (compressor/decomressor) are proprietary, yet they are probably the de-facto industry standard in contemporary digital video. Video files and film should be digitized by means of high-end hardware/software solution, such as Avid Media Composer system. Digital-born material should be acquired with an HD digital camera. Consumer level formats, such as DV or DV-CAM should be avoided. Having said that, a very good audio signal can be acquired even with a DV camera, as it uses the PCM, 16-bit, 48,000 Hz digital audio format.

Digital Video Storage
Due to its sheer size and competing video codecs, digital video storage for preservation purposes poses a serious challenge. Most movie studios use fiber channel network and SCSI spinning storage for post production purposes, while magnetic tape is still the most widely spread video preservation format. Optical media, such as DVD are simply not large enough to be used as uncompressed video storage media.

Still, many off-the-shelf, proprietary video storage are on the market, but they should be considered with caution due to their proprietary nature. For example, and AVID SCSI storage solution may not work with a non-Avid platform.

Digital Video Delivery
Digital video can be compressed in ways similar to digital audio. Some of the most common delivery formats include Windows Media (currently the best size/quality ratio), Real Video, QuickTime, and MPEG. The DVD standard (MPEG 2) is currently the most common consumer video format.

For more information, please go to Should you need specific information and advice, please send me an email from the email form on that web site.

Bartek Plichta
Michigan State Univeristy

Program Papers & Handouts Readings
Instructions for Participants
Local Arrangements
Emeld 2001 Emeld 2002 Emeld Homepage