Digital sound is stored and processed in binary by sampling real-world audio at set intervals and representing each sample using a fixed number of bits.
What is digital sound?
Sound in the real world is an analogue signal, meaning it is a continuous wave that varies smoothly over time. However, computers cannot process analogue signals directly. In order to store, transmit, and manipulate sound on a digital system, the sound must be converted into a digital form. This process involves two main steps:
Sampling – measuring the amplitude (volume level) of the analogue signal at regular time intervals.
Quantisation – converting each measured amplitude into a binary value with a fixed number of bits.
The result is a sequence of binary numbers that approximate the original sound wave. The more frequently you sample and the more bits you use per sample, the closer the digital representation will be to the real sound.
Sampling rate
Understanding sampling
Sampling is the process of measuring the amplitude of a sound wave at evenly spaced points in time. Each individual measurement is called a sample.
Practice Questions
FAQ
The standard sampling rate for audio CDs is 44,100 Hz, and this value was chosen for very practical and technical reasons. The decision was based on the need to sample audio with enough precision to capture the full range of human hearing, which extends up to approximately 20,000 Hz. According to the Nyquist Theorem, the sampling rate must be at least twice the highest frequency, so a minimum of 40,000 Hz is required. However, to provide a margin of safety and account for imperfections in filtering and analogue-to-digital conversion, a slightly higher rate was selected. 44,100 Hz was also a result of compatibility with early digital video recording equipment, which stored audio using video tape technology that operated at 60 fields per second. Using this format, 735 samples per field yielded exactly 44,100 samples per second. This made it both a technically sound and cost-effective choice, and the industry adopted it as a standard.
Quantisation noise is a type of distortion that arises when an analogue sound signal is converted into digital form through the process of quantisation. When each amplitude sample is assigned a binary value, it is rounded to the nearest available level defined by the sample resolution. Since the real-world analogue signal may fall between two levels, the difference between the actual signal and the quantised binary value introduces a small error—this is the quantisation error. Quantisation noise is the audible consequence of these errors, and it becomes more noticeable when the bit depth (sample resolution) is low. For example, an 8-bit system has only 256 levels to represent amplitude, so the rounding differences are larger and the resulting distortion is more significant. In contrast, 16-bit or 24-bit systems have far more levels, greatly reducing quantisation noise. This type of noise typically sounds like a faint hiss and is most noticeable during quiet passages in audio recordings.
Bit depth, or sample resolution, determines the number of possible amplitude values a digital audio sample can represent. Dynamic range refers to the ratio between the loudest and quietest sounds that can be accurately recorded or reproduced. The higher the bit depth, the more amplitude levels are available, and therefore the greater the dynamic range. Each additional bit doubles the number of possible values and increases the dynamic range by roughly 6 decibels (dB). For example, a 16-bit audio system can represent 65,536 amplitude levels and has a dynamic range of about 96 dB, which is more than sufficient for consumer audio and CD-quality recordings. A 24-bit system has a dynamic range of approximately 144 dB, which exceeds human hearing and is used in professional studios to capture very quiet and very loud sounds with high accuracy. Lower bit depths, like 8-bit, offer only 48 dB of dynamic range, which is inadequate for high-quality music but may suffice for speech or simple audio effects.
Dithering is a technique used in digital audio processing to reduce the audibility of quantisation errors when decreasing the bit depth of a recording, such as when converting from 24-bit to 16-bit audio. When reducing bit depth, rounding each sample to a lower number of bits introduces quantisation noise. Dithering works by adding a very small amount of random noise to the audio signal before this rounding occurs. This noise helps to mask the distortion caused by quantisation by making it sound more like a uniform background hiss rather than distinct tonal artefacts. Although it may seem counterintuitive to add noise to improve quality, the result is that the final audio sounds more natural and less distorted, especially in quiet sections. Dithering is most commonly used in mastering and exporting final versions of digital audio to ensure smoother sound quality when bit depth is reduced for distribution formats like CDs or MP3s.
While 44,100 Hz is sufficient to accurately capture the range of human hearing (up to about 20,000 Hz), professional audio environments often use much higher sampling rates such as 96,000 Hz or 192,000 Hz. There are several reasons for this. First, higher sampling rates provide a greater margin of safety for avoiding aliasing, especially when applying digital effects or processing. Audio editing, mixing, and manipulation often involve changes in pitch, time-stretching, filtering, and other transformations that can introduce or amplify artefacts if the original sample rate is too low. Working at higher sample rates preserves more detail and results in cleaner outcomes after these processes. Second, higher sampling rates reduce the phase distortion introduced by anti-aliasing filters used during analogue-to-digital conversion. These filters can be more gentle and accurate when operating at higher frequencies. Lastly, some professionals believe that higher sampling rates yield a more "open" or "natural" sound, although this is often subjective and debated in the audio community.
