Sound must be digitised for storage and processing by computers. This process involves converting continuous sound waves into discrete digital signals using sampling techniques.
What is digital sound?
Sound in the real world is an analogue signal. It varies smoothly and continuously over time, forming waveforms that represent vibrations in air or other media. These waves have attributes like frequency, amplitude, and phase, which collectively define the quality and characteristics of a sound.
Computers, however, do not understand analogue data. All information stored and processed by a computer must be in digital form, meaning it is broken down into binary data made up entirely of 0s and 1s.
This transformation from continuous analogue sound to discrete digital values is known as digitisation. Once digitised, sound can be stored in files, processed by software, transmitted over networks, and played back using digital hardware. This process forms the foundation of how digital audio systems—from mobile phones to music streaming services—work.
Sampling: capturing snapshots of sound
Sampling rate
Sampling is the process of taking measurements of an analogue signal at regular time intervals. Each measurement is called a sample, and the number of samples taken per second is the sampling rate.
Sampling rate is measured in Hertz (Hz).
Practice Questions
FAQ
The choice of 44,100 Hz as the standard sampling rate for CDs stems from both technical and practical reasons. According to the Nyquist Theorem, to accurately capture audio up to 20,000 Hz—the upper limit of human hearing—a sampling rate of at least 40,000 Hz is required. However, real-world systems must apply anti-aliasing filters to remove frequencies above the Nyquist limit before sampling, and these filters need a small transition band to operate effectively. Choosing a sampling rate slightly above the minimum allows the filters to gradually attenuate frequencies above 20,000 Hz. The value 44,100 Hz was also compatible with early digital video tape recorders, which stored audio samples using a technique that required a multiple of video frame rates (such as 29.97 fps). Thus, 44,100 Hz became a practical and technically sound choice that met Nyquist requirements, ensured compatibility with hardware, and became widely adopted for consumer digital audio formats.
Quantisation noise occurs when the continuous amplitude of an analogue signal is rounded to the nearest available value during digitisation. This rounding introduces a small error between the original analogue value and its digital representation. These errors appear as low-level background noise in the final digital recording, especially in quiet or subtle sections of audio. The effect becomes more pronounced when using a low bit depth, such as 8-bit, which provides fewer amplitude levels and increases the size of rounding errors. To minimise quantisation noise, higher sample resolutions such as 16-bit or 24-bit are used, allowing more precise amplitude representation and reducing error. Additionally, techniques like dithering can be applied. Dithering involves adding a small amount of noise to the signal before quantisation to randomise rounding errors, making the noise less perceptible and preventing distortion caused by correlated quantisation errors. These methods are especially important in professional audio and mastering environments.
Pulse-code modulation (PCM) and delta modulation are two methods of encoding analogue signals into digital form, but they work in different ways and serve different purposes. PCM is the most common and widely used method in digital audio. It involves sampling the amplitude of the analogue signal at regular intervals and encoding each sample as a binary number, with fixed bit depth. The bit depth determines the resolution or precision of each sample. PCM is used in formats like WAV and AIFF and provides high audio quality at the cost of larger file sizes.
Delta modulation, on the other hand, simplifies the encoding process by storing the difference between consecutive samples rather than their absolute values. It uses a single bit per sample to indicate whether the signal is increasing or decreasing. While this greatly reduces the required data rate and simplifies hardware, it can suffer from slope overload distortion and granular noise if the signal changes rapidly. Delta modulation is more suitable for low-bandwidth or embedded systems where simplicity and efficiency are prioritised over fidelity.
Stereo recording captures audio using two separate channels—left and right—to simulate directional sound perception, similar to how human ears hear in real life. In a digital sound file, this is achieved by storing two distinct streams of samples: one for the left speaker and one for the right. Each sample point in time includes two values—one per channel—representing the amplitude for that channel at that instant. These pairs of values are interleaved in the file, meaning the data alternates between left and right channel samples throughout the audio stream.
When calculating the file size of stereo recordings, the number of channels is multiplied by the sampling rate and bit depth, which doubles the size compared to mono recordings (assuming all other parameters are the same). Stereo provides a richer listening experience and is essential in music production and entertainment media. It allows producers to pan instruments and sounds across the left and right field, creating the perception of depth and spatial positioning.
Audio compression in digital sound refers to reducing the file size of audio data by removing redundancy or less noticeable information. It is fundamentally different from sampling rate and resolution, which determine how accurately the analogue signal is captured during digitisation. While sampling and bit depth directly affect audio quality and file size at the recording stage, compression is applied after digitisation to reduce the size for storage and transmission.
There are two main types of audio compression: lossless and lossy. Lossless compression (e.g. FLAC, ALAC) retains all original data and can perfectly reconstruct the original file, though with moderate file size reduction. Lossy compression (e.g. MP3, AAC) permanently removes parts of the audio that are considered less important to human hearing, such as certain frequencies masked by louder ones. This can significantly reduce file sizes but introduces irreversible quality loss. Compression is especially important for streaming platforms, mobile devices, and broadcasting where bandwidth and storage are limited. Unlike resolution and sampling, compression techniques do not affect how the audio is initially recorded but greatly influence how it is stored and transmitted.
