If you want to be able to understand the basic terms and learn the concepts you need to properly use audio in ANY situation and not just according to a sheet of standard instructions, this introduction to the principles of digital audio is for you. Words in capitals are key terms you should understand after reading this text, or after doing research elsewhere.


Sound is simply fluctuations in air pressure, waves of air that make a given point of air alternate between slightly higher air pressure than normal and slightly lower, many times per second. A MICrophone is a device that converts those sound waves into waves of electricity, an ANALOG signal of pulsing electrons. For ideal sound quality, a PREAMP of some sort (often part of a MIXING BOARD, but sometimes a stand-alone device) strengthens the signal before it goes into your computer's sound card's LINE-IN to become a DIGITAL SIGNAL, or some mixing boards and even MICs OUTPUT a digital signal directly, connecting to a computer using USB or FireWire interfaces.

Essentially, analog equipment stores information as very subtle changes of voltage / magnetism stored for example on the magnetic ribbon of an audio tape. Digital storage keeps every piece of information about the audio as either a one or a zero, so as long as the disc is still readable (burned CD's have been known to oxidize and fail after 5-10 years) every copy and copy of a copy you make of a CD will sound just as good as the original. In contrast, copying copies of audio casettes quickly loses sound quality.

All sound electronics still have some analog parts, even a CD burner. Any piece of wire, particularly if it's uninsulated, can act as a radio antenna. That means that if there is enough radio or magnetic energy near a wire, this could interfere with the signal. The advantage of digital transfer of data, either through a wire or radio waves, is the CRC (Cyclic Redundancy Check). Every PACKET of data is cross-checked redundantly, and any packets that don't add up as anticipated are sent again. So once again, if digital data is readable at all, it comes through perfectly.

MICs all vary is sensitivity, range, and directionality. The three main types are CARDOID (which picks up sound from all directions), DIRECTIONAL (which primarily detects sounds in a cone above the mic), and SHOTGUN (which blocks out almost everything except just what the mic is pointed towards). ANALOG MICs connect to other hardware to share their MONO signal using 1/4 inch, 1/8 inch, or XLR plugs. XLR interfaces produce the best results, since they have an extra wire that provides electrical power to the MIC, determining if it is an ACTIVE (powered) versus PASSIVE microphone.

However the audio signal enters your computer, certain HARDWARE DRIVERS are involved, which are software indexed in the DEVICE MANAGER that interfaces between your Windows operating system and your sound hardware. Occasionally updates are released by the manufacturers of sound hardware, which may correct design flaws or add features. Once your operating system knows how to interface with your speakers and sound INPUTs, Audacity should be able to also if the appropriate input is selected in its pull-down menu, and the appropriate hardware is selected (EDIT, PREFERENCES, AUDIO I/O) for playback and recording.


Sound is simply fluctuations in air pressure. You can prove this in Audacity by recording your voice and zooming in until you see the waves. The FREQUENCY, or number of times per second ("Hertz", hz) that there is a peak in a standard sine wave (Generate tone, 300-440) determines its PITCH. In speech, both pitch and volume vary rapidly and uniquely, making everyone's voice distinct.

Zoom in even farther until you see the dots making up each wave. These are SAMPLES of the air pressure at a precise instant, at a given point (where the microphone was). This information is stored DIGITALLY or "IN BINARY," meaning as ones and zeros.

Base-2 -> Base-10
0 . . . . . . . 0
1 . . . . . . . 1
10 . . . . . . 2
11 . . . . . . 3
100 . . . . . 4
101 . . . . . 5
110 . . . . . 6
111 . . . . . 7

And so on. If you continue like this, you'll find that the largest number of standard numbers (base 10) 8 digits of BINARY (base-2) can store is 256, and the largest number 16 digits can store is 65,536... 32 BITS (places of ones and zeros) can store nearly 4.3 million as its largest number. Any DATA can be interpreted as text, an image, video, audio, or anything else if you know what STANDARD FORMAT to expect, but the CONTENT will only make sense and be usable by Audacity if the program knows how to interpret and write to audio files.

For now we're considering UNCOMPRESSED AUDIO, the primary formats for Audacity being WAV (the Microsoft-defined standard) and AU files. To avoid confusion, these standards are kept track of with unique FILE EXTENSIONs, the part of the FILE NAME stored after the period. Windows is often configured to hide these extensions to avoid confusing people, but they can be made visible (in Windows 2000 and other versions) in Explorer by selecting TOOLS, FOLDER OPTIONS, VIEW, and UN-checking "Hide file extensions for known file types."

The SAMPLE FORMAT is the number of ones and zeros used to plot each SAMPLE at a certain degree of precision. Standard COMPACT DISCs use 16-bit samples at 44,100 Hz, using no COMPRESSION. This is a good standard that is usually more than enough precision for voice and quite adequate for music, but Audacity is often set (EDIT, PREFERENCES, QUALITY, DEFAULT SAMPLE FORMAT) to record and internally manipulate audio in 24 or 32-bit SAMPLE FORMATs to avoid degradation if many EFFECTS are applied to it. But the more ones and zeros you use to represent each moment of audio, the more space the project files will take up on your computer. 16 bits is generally plenty, and was until recently the standard BIT DEPTH most SOUND CARDS use for recording and playback. While the 4.3 million distinct degrees of air pressure a 32-bit sample can record 44,100 times a second is a lot more precise than the 65,000 possibilities a 16 bit sample can convey, only the most expert ear will ever be able to notice a difference.

FREQUENCY just means the number of times per second that _something_ happens. In terms of audio, this either refers to the PITCH and WAVELENGTH, which are directly related, or the number of times per second that a SAMPLE is taken. Either is measured in Hz. Note that while 32 bit samples taken at 22,050 Hz takes up the same amount of hard drive space as 16-bit samples taken at 44,100 Hz, the 16-bit audio will sound much better than the lower-frequency version. This is simply a trait of the way people hear and interpret sounds. Using 32-bit samples at 48 khz is probably overkill in all but the most extreme of situations. That's using a lot of ones and zeros for every second of audio, which will slow down all your editing and use of effects.

Scientists in Europe at the Fraunhofer Society and elsewhere developed a new kind of AUDIO COMPRESSION that became a standard in 1991 ( see wikipedia: MP3). They studied the way people hear, how the brain interprets it, and which details are noticed and which are ignored. Using these scientific principles, they developed the MP3 LOSSY format, which "LOSES" details of audio that are unlikely to be missed and can reproduce audio that is very difficult to distinguish from an UNCOMPRESSED version of the audio file, but takes up a tenth of the space on a hard drive. This also makes audio much easier to transmit over the INTERNET, regardless of the BANDWIDTH CAPACITY you have (measured in Mb / sec, or millions of bits per second). Since the internet is a limited and shared resource that costs money, it makes sense to use it wisely and efficiently.

There are many different compression "standards," seemingly almost as many as there are companies designing audio electronics. Being the author of a standard that becomes dominant involves significant royalties. The standards use a variety of strategies for representing audio well using few bits, and each have their specialties. Some are even LOSSLESS, meaning that the audio file still exactly represents the precision of the original uncompressed version. But mostly these various formats are just a source of confusion and software incompatibility. File-format CONVERSION software is useful to get past incompatibilities, but each time you convert audio between LOSSY compression formats, more DISTORTION results. This means that the resulting audio may start to have noticeable ARTIFACTS of the compression process that can distract a listener from your content. There are many variables effecting how many times audio can be converted between lossy compression formats, but the higher bitrate used in each file, the fewer conversions, and the fewer different types of compression standards used, the better the outcome. Note that an MP3 file converted back to an uncompressed WAV file will sound worse than the original WAV file that was used to create the MP3. Unless you still have the original WAV, that degree of fidelity is lost forever.

There are two OPEN SOURCE, totally royalty-free audio formats that are quite efficient, but may require you to install CODEC software (a program telling your computer how to ENCODE [create] and DECODE [open] content) to use: OGG (Lossy) and FLAC (Free LOSSLESS Audio Compression). These should become the standards of the future. But for now, the old MP3 format remains dominant. Stereo MP3 files that use 128 kbps (KILO [thousand] BITS [ones and zeros] per second) are considered approximately to be "CD Quality." OGG files using 100 kbps sound about as good. In MONO, 40 kbps MP3's sound acceptable when only recording voice.

Some audio formats have a variety of optional settings which can impact the quality of the resulting audio. Some of these options can increase data efficiency and sound quality, such as using VBR (Variable Bit Rate) for MP3's. You can find these options by looking at the settings (F4) in AUDIO CONVERSION SOFTWARE like CDEX. Note that the use of any options that differ from the most common standard (For MP3: Very-high "quality," CBR [Constant Bit Rate], 16-bit, 44.1 khz, stereo) risks exposing an incompatibility with the listener's playing software or portable player hardware. Newer standard CODECs like OGG optimize most of these options automatically to prevent problems and confusion, but are less flexible. But since audio should be for the masses and not just technicians, simplicity is ideal. Generally, listeners need LOSSY compressed audio, and technicians need UNCOMPRESSED audio throughout production and LOSSLESS compression for critical archives.


Once your content is encoded in a compressed audio file, there are several ways to deliver it to the listener. The simplest way is for the listener to go to your website, download the files, and play them on their computer. They can also transfer the files to a portable player. PODCASTING is the use of software such as iTunes that automates the downloading of desired content and its transfer to a portable player as soon as new subscribed content becomes available.

As long as you remain connected to the internet while listening, the use of STREAMS can be the simplest and most efficient way to receive content. In this case, the audio file isn't downloaded as fast as your internet connection can do it, but only as fast as is needed for real-time playback. Sometimes BUFFERING is included, which is a delay in playback to give the download a "head start," in case your connection is interrupted or clogged momentarily. Most lossy audio formats can be opened as streams instead of downloading it before opening the file.

Often, FILE EXTENSIONS are used to define how which audio files should be opened. M3U files are simply standard TEXT files that describe the order and location of audio files to open as a STREAM. Special software may be needed to ENCODE content as a stream or to make the computer it's on a STREAMING SERVER, distributing the stream to CLIENTS. WMA (Windows media audio) and RA "Real Audio" streams use bits more efficiently than MP3, but are far less compatible with various computers and player software.


Understanding these concepts, you should be able to determine the optimal settings and formats for you to use when working with audio that strikes the best compromise considering your HARDWARE limitations, the CONTENT involved, all the MEDIA the content will be played through, and the needs and skill level of your LISTENERS.

BACK to the producers help page.

Written Nov 2007