LETS TALK ABOUT AUDIO
Speed of Sound: 340m/s at 24°C At the forefront: forefront: Bell Bell Telephone Telephone Laboratories Laboratories
Aud Au d i o T h e o r y B a s i c s Page 1 of 10
UNDERSTANDING AUDIO
WHAT IS AUDIO Simply put a number or range of frequencies detectable by human ear. The exact range varies from source to source but everybody seem to agree that 20Hz to 20,000Hz is the universal rough approximation of human hearing range. The true or exact range varies form person to person, primarily depending on age, lifestyle and genetics. Those of us exposed to loud environments are a subject to loosing the ability to hear or detect frequencies within the widely accepted range much faster than the rest of us. City life, traffic jams, air conditioning noise, building exhaust fans, fluorescent lights, clubs, bars etc all have a tremendous impact on the overall health of our ears.
WHAT IS SOUND When any object vibrates, air molecules surrounding it begin to vibrate sympathetically in all directions creating a series of sound waves. These sound waves then create vibrations in the ear drum that the brain perceives as sound.
WAVEFORM EXPLAINED A waveform is a graphical representation of sound pressure level or voltage level as it moves through a medium over time. The fundamental characteristics of a waveform can be broken down into the following: AMPLITUDE Directly proportional to air pressure amplitude defines the actual volume or loudness of a waveform
Page 2 of 10
SOUND PROPAGATION & ACOUSTICS
WAVELENGTH Defines the distance required to complete one waveform cycle. FREQUENCY The numbers of rarefactions and compressions, or ‘cycles’, that are completed every second is referred to as the operating frequency and is measured in Hertz (Hz). Any vibrating object that completes, say, 300 cycles/s has a frequency of 300 Hz while an object that completes 3000 cycles/s has a frequency of 3 kHz. VELOCITY Is speed of sound which is about 340 m/s, provided the temperature outside is at 24C. PHASE When we deal with multiple frequencies, there
can
constructive
be and
d e s t r u c t i v e interference or a combination of both.
HARMONIC CONTENT Each sound has a main frequency which is called the fundamental frequency. In addition, sounds have a number of harmonic frequencies which are multiples of fundamental frequency (more on this later).
Page 3 of 10
SOUND PROPAGATION & ACOUSTICS
QUALITY OF SOUND SAMPLE RATE With any digital recording system, sound has to be converted from an analogue signal (i.e. the sound you hear) to a digital format that the device can work with. Any digital recording device accomplishes this by measuring the waveform of the incoming signal at specific intervals and converting these measurements into a series of numbers based on the amplitude of the waveform. Each of these numbers is known as an individual sample and the total number of samples that are taken every second is called the sample rate. On this basis, the more the samples taken every second, the better the overall quality of the recording. For instance, if a waveform is sampled 2000 times/s, it will produce more accurate results than if it were sampled 1000 times/s. While this may seem basic in principle, it becomes more complex when you consider that the sampling rate must be higher than the frequency of the wave- form being recorded in order to produce accurate results. If it isn’t, then the analogue-to-digital converter (ADC) could miss anything from half to entire cycles of the waveform, resulting in a side effect known as ‘aliasing’. This is the result of a real-world audio signal not being ‘measured’ in the correct places. For instance, a high-frequency waveform could confuse the ADC into believing it’s
Page 4 of 10
SOUND PROPAGATION & ACOUSTICS
actually of a lower frequency, which would effectively introduce a series of spurious lowfrequency spikes into the audio file. To avoid this problem, you must make sure the sampling rate is greater than twice the frequency of the waveform: a principle called the Nyquist–Shannon theorem. This states that to recreate any waveform accurately in digital form, at least two different points of a waveform’s cycle must be sampled. Consequently, as the highest range of the human ear is a frequency of approximately 20kHz, the sample rate should be just over double this range. This is the principle from which CD quality is derived. That is, the sample rate of a typical domestic audio CD is 44.1kHz which is derived from the calculation: Human hearing limit is 20 000 Hz 20000Hz x 2 = 40000Hz + 4100Hz (to make the rate more than twice the optimum frequency and compensate for the anti-alias filter slope). Although this frequency response has become the de facto standard used for all domestic audio CDs, there are higher sampling frequencies available consisting of 48 000, 88 200, 96 000 and 192 000 Hz.
Though these sampling rates are far beyond the frequency response of CD, it is quite usual to work at these higher rates while processing and editing. Although there has been no solid evidence to support the theory, it is believed that higher sample rates provide better spatial signal resolution and reduce phase problems at the highfrequency end of the spectrum as the anti-alias filters can be made much more transparent due to gentle cut-off slopes. Thus, if the ADC supports higher rates, theoretically, there is no harm in working at the highest available rate. Notably, many engineers argue that because the signal must be reduced to 44.1 kHz when the mix is put onto CD, the downward con- version may introduce errors, so working at higher sampling rates is pointless. If you do decide to work at a higher sample rate, it may be worthwhile using a rate of 88 200 kHz, as this simplifies the down-sampling process.
Page 5 of 10
SOUND PROPAGATION & ACOUSTICS
BIT DEPTH In addition to the sample rate, the bit rate also determines the overall quality of the results of recording audio into a digital device. Within digital audio recording system, the number of bits determines the number of analogue voltages that are used to measure the volume of a waveform, in effect increasing the overall dynamic range. In technical applications the dynamic range is the ratio between the residual noise (known as the noise floor) created by all audio equipment and the maximum allowable volume before a specific amount of distortion is introduced. According to the IEC 268 standard, the dynamic range of any professional audio equipment is measured by the difference between the total noise floor and the equivalent sound pressure level where a certain amount of total harmonic distortion (measured in THD) appears. In relation to music, the dynamic range is the difference between the quietest and loudest parts. For instance, classical music utilizes a huge dynamic range by suddenly moving from very low to very high volume, the cannons in the 1812 Overture being a prime example. Most dance and pop music, on the other hand, has a deliberately limited dynamic range so that it remains permanently up front and ‘in your face’. Essentially, this means that if a low bit rate is used for a recording, only a small dynamic range will be achieved. The inevitable result is that the ratio between the noise floor and the loudest part of the audio will be small, so background noise will be more evident.
Page 6 of 10
SOUND PROPAGATION & ACOUSTICS
When the bit rate is increased, each additional bit introduces another analogue voltage, which adds another 6 dB to the dynamic range.
For example, if only one bit is used to record a sound, the recorder will produce a square wave at the same frequency as the original signal and at fixed amplitude. This is because only one voltage is used to measure the volume throughout the sampling frequency. However, if an 8-bit system is used, the signal’s dynamic range will be represented by 256 different analogue voltages and a more accurate representation of the waveform will result. It’s clear, then, that a 24-bit audio signal will have a higher dynamic range than a 16-bit signal (the bit rate used by CDs)
Although 24-bit is the widely accepted standard resolution available to samplers and soundcards ADCs, a proportionate amount of audio-capable software utilizes internal 32- or 64-bit processing. The reasoning behind using bit rates this high is that whenever any form of processing is applied to a digitized audio signal, quantization noise is introduced. This is a result of the hardware rounding up to the nearest measurement level. The less rounding up the hardware has to do (i.e. the more bits used), the less noise is introduced. This is a cumulative effect, meaning that as more digital processing is applied, more quantization noise is introduced, and quantization noise needs to be kept to a minimum. In more practical terms, this means that while a CD may only accept a 16-bit recording, if a 24-bit process is used throughout digital mixing, editing and processing, when the final sound is dropped to 16-bit resolution for burning to CD the quantization noise will be less apparent.
Page 7 of 10
SOUND PROPAGATION & ACOUSTICS
DITHERING The process of ‘dropping out’ bits from a recording to reduce the bit rate is known as ‘dithering’. Understanding how this actually works is not vital; what is important is that the best available dithering algorithms are used. Poor quality algorithms will have a detrimental effect on the music as a whole, resulting in clicks, hiss or noise. As a reference, Apogee is well known and respected for producing excellent dithering algorithms. It isn’t always necessary to work at such a high bit rate; some genres of electronic music benefit from using a much lower rate. For instance, 12-bit samples are often used in hip-hop to obtain the typical hip-hop sound. This is because the original artists used old samplers that could only sample at 12-bit resolution; thus, to write music in this genre it’s quite usual to limit the maximum bit rate in order to reproduce these timbres. Similarly, with trip-hop and lo-fi, the sample rate is often lowered to 22 or 11kHz, as this reproduces the gritty timbres that are characteristic of the genre.
HEALTHY WAVEFORM (RECORDING PROCESS) Ultimately, no matter what sample or bit rate is used, it’s important that the converters on the soundcard or digital recorder are of a good standard and that the amplitude of the signal for recording is as loud as possible (but without clipping the recorder). Although all digital editors allow the gain of a recorded signal to be increased after recording, the signal should ideally be recorded as loud as possible so as to avoid having to artificially increase a signal’s gain using software algorithms. This is because all electrical circuitry, no matter how good or expensive, will have some residual noise associated with it due to the random movement of electrons. Of course, the better the overall design the less random movement there will be, but there will always be some Page 8 of 10
SOUND PROPAGATION & ACOUSTICS
noise present so the ratio between the signal and this noise should be as high as possible. If not, when you artificially increase the gain after it has been recorded, it will increase any residual noise by the same amount. For instance, suppose a melodic riff is sampled from a record at a relatively low level, then the gain is artificially increased to make the sound audible; the noise floor will increase in direct proportion to the audio signal. This produces a loud signal with a loud background noise. If, however, the signal is recorded as loud as possible, the ratio between the noise floor and the signal is greatly increased. There is a fine line between capturing a recording at optimum gain and actually recording the signal too loud so that it clips the recorder’s inputs.
This isn’t necessarily a problem with analogue recorders as they don’t immediately distort when the input level gets a little too high – they have ‘headroom’ in case the odd transient hit pushes it too hard – but digital recorders will ‘clip’ immediately. Unlike analogue distortion, which can often be quite pleasant, digital distortion cuts off the top of the waveform, resulting in an ear-piercing clicking sound. This type of distortion obviously should be avoided, so you need to set a recording level that is not loud enough to cause distortion, yet not low enough to introduce too much noise. All recording software or hardware, including samplers, will display a level meter informing you how loud the input signal is, to help you determine the appropriate levels. Before beginning to record, it’s vital to set this up correctly. This is accomplished by adjusting the gain of the source so that the sig- nal overload peaks light on only the most powerful parts and then backing off slightly so that they are just below the clipping level. This approach is suitable only if there isn’t too much volume fluctuation throughout the entire recording, though. If a sample is taken from a record, CD or electronic
Page 9 of 10
SOUND PROPAGATION & ACOUSTICS
instrument there’s a good chance that the volume will be constant, but if a live source, such as vocals or bass guitars, are recorded there can be a huge difference in the dynamics. It’s doubtful that any vocalist, no matter how well trained, will perform at a constant level; in the music itself it’s quite common for the vocals to be softer in the verse sections than they are in the chorus sections. If the recording level is set so that the loudest sections don’t clip the recorder, any quieter section will have a smaller signal-to-noise ratio, while if the recording levels are set so that the quieter sections are captured at a high level, any louder parts will send the meters into the red. To prevent this, it’s necessary to use a device that will reduce the dynamic range of the performance by automatically reducing the gain on the loud parts, while leaving the quieter parts unaffected, thus allowing the recording levels to be increased on the quietest parts. This dynamic restriction is accomplished by strapping a compressor between the signal source and the input of the sampler or recording device. When the compressor is set to squash any signal from the source above a certain thresh- old, any peaks that could cause clipping in the recorder are reduced and the recording can be made at a substantially higher volume. (expanded on in the “compressor” chapters of this guide)
BITRATE (MP3) The amount of information stored/processed/ reproduction per second, (bits per second). • 96 kbit/s – FM quality • 128–160 kbit/s – Standard Bitrate; difference can sometimes be obvious (e.g. bass quality) • 192 kbit/s – DAB (Digital Audio Broadcast) quality • 224–320 kbit/s – Near CD quality
Page 10 of 10