2.4.5 Sound Data Representation

Digital sound is created by sampling analog signals, converting them into binary, and storing them using properties like sample rate, duration, and bit depth.

What Is Sound and How Is It Digitized?

Sound is a continuous analog wave created by vibrations in the air. Computers, however, can only understand digital data—specifically binary values made up of 0s and 1s. To store and manipulate sound on a computer, the analog signal must be converted into a digital format through a process known as sampling.

Analog to Digital Conversion

The conversion process uses a device called an Analog-to-Digital Converter (ADC). This device samples the sound wave at regular intervals and converts each sample into a binary number. The resulting stream of binary numbers can then be stored, edited, and played back using digital systems.

Key Factors in Sound Sampling

Several important factors affect how a sound is digitally stored. These factors not only determine the quality of the audio but also influence the file size of the stored sound data.

Sample Rate

Sample rate is the number of samples taken per second and is measured in Hertz (Hz).

A higher sample rate means more samples are taken, resulting in higher sound quality.
A lower sample rate reduces file size but also decreases sound fidelity.
Common sample rates:
- 8,000 Hz: Telephone-quality audio
- 44,100 Hz: CD-quality audio
- 48,000 Hz: DVD audio or professional recordings

Each sample captures the amplitude of the sound wave at a specific moment. The more frequently you sample, the more accurate the digital representation of the sound.

Example: A sample rate of 44,100 Hz means the sound is sampled 44,100 times every second.

Bit Depth

Bit depth refers to the number of bits used to store each sample. It determines the range of values available to represent the sound’s amplitude.

Higher bit depth = more precise amplitude representation
Common bit depths:
- 8-bit: Basic or legacy systems (low quality)
- 16-bit: CD-quality audio
- 24-bit: Studio-quality recordings

How bit depth affects sound:

A higher bit depth allows for a larger range of amplitude values, reducing distortion and background noise.
A lower bit depth may introduce noise and limit dynamic range.

Key Point: Each extra bit doubles the number of amplitude levels. For example:

8-bit = 2⁸ = 256 levels
16-bit = 2¹⁶ = 65,536 levels

Duration

Duration is the length of time the sound file plays, measured in seconds. It directly affects the size of the file: the longer the sound, the more samples it contains.

Formula to calculate the number of samples:

mathematica
Number of samples = Sample rate × Duration

This number, when combined with the bit depth and number of channels, determines the overall file size.

File Size and Sound Quality

How to Calculate File Size

To calculate the size of a digital sound file, use the following formula:

mathematica
File size (in bits) = Sample rate × Bit depth × Number of channels × Duration

To convert bits to bytes, divide by 8
For kilobytes (KB), divide by 1,024
For megabytes (MB), divide again by 1,024

Example:
A 10-second stereo recording (2 channels), at 44,100 Hz, 16-bit depth:

arduino
File size = 44,100 × 16 × 2 × 10 = 14,112,000 bits
= 1,764,000 bytes ≈ 1.68 MB

Effect of Sample Rate, Bit Depth, and Duration

Each factor increases both quality and file size:

Higher sample rate = More samples per second = Better sound detail, larger file
Higher bit depth = More detailed sample values = Better fidelity, larger file
Longer duration = More total samples = Longer playback time, larger file

A balance must be found between quality and storage efficiency, especially in applications like streaming or mobile use.

Binary Representation of Sound

Each sample taken from the analog wave is converted into a binary number. This number represents the amplitude of the wave at a specific time.

Bit depth determines the maximum value the amplitude can be
The entire sound file is a sequence of binary numbers stored in memory or on disk

Example:
A 16-bit sample with an amplitude value of 32,700 will be stored in binary as:

0111111111111100

These binary values are interpreted by a Digital-to-Analog Converter (DAC) during playback, which reconstructs the original sound wave.

Importance of Binary in Sound Storage

Computers and digital devices require all data to be in binary. Storing sound digitally means each analog sample must be converted and stored as binary values.

This necessity arises because:

Binary is the fundamental language of computers
It allows consistent, efficient storage, processing, and transmission
It enables lossless digital reproduction of sound under ideal conditions

Without converting analog sound into binary:

Computers couldn’t process or understand the audio data
Sound couldn’t be easily stored, copied, edited, or transmitted

Mono vs Stereo

The number of channels in a sound file affects both file size and quality:

Mono (1 channel): Records a single audio stream
Stereo (2 channels): Records two separate streams, often for left and right speakers

File size is directly proportional to the number of channels:

arduino
Stereo files are about twice the size of mono files of the same quality and duration.

Compression (Lossless and Lossy)

Though not a focus of this subsubtopic, it's helpful to briefly understand how compression might affect sound files:

Lossless compression (e.g., FLAC): Reduces size without sacrificing quality
Lossy compression (e.g., MP3): Discards some audio data to reduce size, lowering quality

Note: Compression techniques are usually considered separately from the raw sampling process but are relevant in real-world applications.

Summary of Key Terms

Sample Rate: Number of times per second the sound is sampled (Hz)
Bit Depth: Number of bits per sample, determines volume precision
Duration: Length of audio, in seconds
Binary: The digital format in which all sound data is stored and processed
ADC: Converts analog signals to digital
DAC: Converts digital signals back to analog for playback

Real-World Applications

Understanding sound data representation is crucial in many areas:

Music production: Studio recordings use high sample rates and bit depths
Streaming services: Use compression to reduce bandwidth usage
Telecommunications: Balance between quality and efficiency

Game development and multimedia: Need efficient sound storage without sacrificing quality

FAQ

The 44,100 Hz sample rate is widely used for audio CDs because it effectively captures the full range of human hearing while maintaining compatibility with digital systems. Human hearing generally ranges from 20 Hz to 20,000 Hz, and according to the Nyquist Theorem, the sample rate must be at least twice the highest frequency to accurately reproduce a signal without aliasing. Therefore, 40,000 Hz would be the minimum required. Audio CDs use 44,100 Hz to add a buffer and ensure the highest fidelity, even at the upper range of human hearing. Additionally, 44,100 Hz was chosen based on compatibility with video equipment used during the development of the CD format. Specifically, it allowed easy synchronization with the standard video frame rates of 30 frames per second in NTSC and 25 in PAL. This rate became a practical and technical standard, balancing sound quality, storage efficiency, and technological limitations of the time.

Quantization is the process of rounding the continuous amplitude values of an analog signal to the nearest fixed level that can be represented using a given bit depth. Once a sample is taken during analog-to-digital conversion, its amplitude must be stored digitally, which requires selecting the nearest available binary value within the range defined by the bit depth. For example, with 8-bit audio, there are 256 possible amplitude levels. If the exact value of a sample falls between two levels, the closest one is selected. This rounding introduces a small error known as quantization error. The lower the bit depth, the fewer levels available, and the greater the potential quantization error. This can lead to distortion, especially in quiet sections of audio, known as quantization noise. Increasing the bit depth significantly reduces this error, improving sound fidelity. In professional audio, 24-bit or even 32-bit floating point formats are used to minimize the effects of quantization.

Sampling and synthesizing sound are two entirely different approaches to generating digital audio. Sampling involves capturing real-world analog audio—like voices or instruments—by measuring the amplitude of the sound wave at regular intervals (sample rate) and converting each measurement into a binary value. This process captures the unique, organic qualities of real-world sounds and is used in recordings, podcasts, or sound design where realism is essential. Synthesizing sound, on the other hand, involves generating sound using algorithms and mathematical models rather than recording actual sounds. Digital synthesizers use parameters such as waveform shape, frequency, and modulation to create sound entirely from scratch. This method is commonly used in electronic music, games, and software instruments. While sampling offers realism, synthesis provides flexibility and control. Synthesized sound can be precisely modified or automated, while sampled audio is limited to what was recorded unless edited with effects. Both methods may be used together in multimedia projects depending on the desired outcome.

near sampling involves taking sound samples at equally spaced intervals over time, which is the standard method used in most digital audio systems. Each sample is taken after a fixed amount of time—determined by the sample rate—providing a consistent, predictable representation of the analog waveform. This approach simplifies the process of converting, storing, and reconstructing audio and is highly compatible with digital systems such as CD players, DAWs, and streaming services. Non-linear sampling, by contrast, involves variable sampling intervals, where more samples might be taken during complex parts of a sound wave (such as a sharp attack or sudden volume change), and fewer during simpler, sustained tones. This method can be more efficient and may reduce file size without sacrificing quality, but it requires more complex processing for both encoding and playback. Non-linear sampling is less common in consumer formats and is typically found in specialized compression algorithms or advanced audio research settings.

Stereo audio is preferred over mono in music and video production because it creates a wider and more immersive sound experience. Stereo uses two separate channels—usually for the left and right ears or speakers—which allows sound to be positioned spatially across a horizontal soundstage. This means different instruments or voices can appear to come from different directions, mimicking how we naturally hear sound in the real world. For example, in a stereo mix, vocals might be centered, drums slightly to the left, and a guitar to the right, giving each element a distinct space in the mix. Mono, on the other hand, uses only one audio channel, so all sounds are blended together and played through both speakers or headphones identically. While mono ensures uniform sound in environments with only one speaker or where precise spatial positioning isn't needed (e.g., telephone calls, voice memos), stereo is essential for high-quality music, film, gaming, and virtual reality experiences.

Practice Questions

Explain how sample rate and bit depth affect both the quality and size of a digital audio file.

Sample rate determines how many times per second the sound is sampled. A higher sample rate captures more detail from the original analog sound, resulting in better audio quality. Bit depth refers to the number of bits used for each sample. A higher bit depth allows more precise representation of sound amplitude, improving dynamic range and reducing distortion. However, increasing sample rate and bit depth also increases the amount of data stored per second, resulting in a larger file size. Therefore, there is always a trade-off between sound quality and file size in digital audio.

Describe the process of converting analog sound into digital form and explain why this process is necessary.

Analog sound is continuous and must be converted into digital form so a computer can process and store it. This is done using an analog-to-digital converter (ADC), which samples the sound wave at regular intervals. Each sample is measured and assigned a binary value based on the amplitude, using the chosen bit depth. These binary values are then stored as a sequence in a digital file. This process is essential because computers can only work with binary data, and without converting analog sound, it would be impossible for computers to record, store, edit, or play back sound accurately.

Try All Topic Practice Questions

Written by:

Alfie

Profile

Cambridge University - BA Maths

A Cambridge alumnus, Alfie is a qualified teacher, and specialises creating educational materials for Computer Science for high school students.