Data compression reduces the size of digital files, making storage, transmission, and processing more efficient, especially for images, audio, and other media.
What is data compression?
Data compression is the process of encoding information in such a way that it takes up less space than the original format. It is a crucial part of computer systems, enabling more efficient use of storage and communication channels. In essence, compression removes redundancy and optimises how data is represented so it can be stored, transmitted, and processed more quickly and cost-effectively.
When a file is compressed, it becomes smaller in size, which is ideal for saving disk space and for speeding up file transfers, such as sending images via email or streaming music online. The original data can then be reconstructed from the compressed version, either exactly or approximately, depending on the type of compression used.
Compression is used in a wide variety of contexts, including:
Reducing the size of images and photos
Making audio and video files smaller for streaming
Shrinking documents and folders for archiving
Sending files over the internet quickly
Allowing more data to be stored in limited space
Importance of compression
Practice Questions
FAQ
Lossy compression achieves much smaller file sizes because it removes parts of the original data permanently, targeting elements that are less noticeable to human perception. In images, this includes fine colour variations or subtle textures; in audio, it removes frequencies outside the typical range of human hearing or less distinguishable background elements. Algorithms like JPEG for images and MP3 for audio use perceptual models that estimate what can be discarded with minimal impact on perceived quality. This selective removal of data allows for aggressive reduction in file size. In contrast, lossless compression only reduces redundancy without removing any information, limiting how much the file size can shrink. For example, a high-resolution photograph compressed using JPEG can be reduced to a tenth of its original size, while a lossless format like PNG might only reduce it by half. This makes lossy compression ideal for storage and streaming, where space and speed are priorities over perfect fidelity.
Compression plays a critical role in enhancing the performance of real-time systems like video streaming and online gaming by reducing data size and therefore lowering bandwidth requirements and latency. When media files are compressed, less data needs to be transmitted over the network, allowing for smoother playback and faster load times. For example, streaming platforms use codecs like H.264 or H.265, which apply advanced lossy compression to video content, allowing high-definition streams to be delivered over limited internet connections. In gaming, audio and texture assets are compressed to allow faster game loading and smoother performance during play. However, if compression is too aggressive, it can introduce noticeable artefacts like pixelation in video or audio distortions, which degrade user experience. Moreover, decompression also uses CPU or GPU resources, so the balance between compression efficiency and processing power is crucial. Effective compression techniques ensure minimal quality loss while maintaining real-time responsiveness and low-latency interaction.
Entropy in data compression refers to the measure of unpredictability or randomness in a data set. It originates from information theory and indicates how much information is contained in a message. In practical terms, data with high entropy has little repetition and is therefore more difficult to compress effectively because there is less redundancy to exploit. For example, an encrypted file or a file containing random binary values has high entropy. In contrast, data with low entropy—such as a document with repeated words or an image with large areas of the same colour—is highly compressible. Compression algorithms perform better when entropy is low, as they can substitute repeated or predictable patterns with shorter representations. Therefore, before applying compression, algorithms often analyse the entropy of the data. Understanding entropy helps in choosing the most appropriate compression technique: high-entropy data may benefit more from lossy methods, whereas low-entropy data can be efficiently compressed using lossless methods.
Compressing a file multiple times using the same compression method generally does not lead to further size reduction and may even increase the file size. Compression algorithms are designed to remove redundancy in data during the first pass, and once the file is compressed, it contains less predictable structure for the algorithm to exploit again. For example, if a text file is compressed into a ZIP archive and then that archive is compressed again, the second compression pass often adds metadata and structural overhead, which can make the file larger. In some cases, compressing already compressed data (especially if lossy) can degrade quality or produce inefficient results. Compression works best on original, uncompressed data. If additional size reduction is needed, a different algorithm or compression format might be more effective, such as switching from ZIP to 7z or using a lossy method if quality loss is acceptable. Multiple compressions of the same format are rarely useful.
Modern web technologies heavily rely on compression to optimise user experience by reducing loading times and conserving bandwidth. Web servers often use HTTP compression, like GZIP or Brotli, to compress HTML, CSS, and JavaScript files before they are sent to the user's browser. This allows pages to load faster, particularly on slower connections. Images are typically compressed using lossy formats like WebP or JPEG to minimise download size without significantly affecting visual quality. Video content is streamed using adaptive bitrate compression, which dynamically adjusts video quality based on the user's connection speed. Additionally, many content delivery networks (CDNs) automatically serve compressed versions of assets based on the user's device and browser. Compression also reduces server load and energy consumption, which is essential for large-scale web services. Combined with caching and minification, compression forms a core part of modern web performance strategies, enabling smoother, faster, and more responsive websites, even on low-bandwidth or mobile networks.
