TutorChase logo
Login
AQA A-Level Computer Science

2.1.4 File handling: text and binary files

File handling is a crucial part of computer science, enabling programs to persistently store, retrieve, and manipulate data in both human-readable and binary formats.

What are files?

A file is a structured collection of data stored on a non-volatile storage device such as a hard drive, SSD, or cloud-based storage. Files allow programs to save data permanently, meaning data can be accessed and reused across different sessions of the program. Unlike variables, which store data temporarily in RAM, files provide a persistent solution that is essential for real-world applications such as saving documents, storing user preferences, or maintaining databases.

Files can broadly be divided into text files and binary files, each with distinct features and use cases.

Text files

Characteristics of text files

Text files store information in a human-readable format using a standard character encoding scheme such as ASCII or UTF-8. Each character in a text file corresponds to one or more bytes depending on the encoding used. These files are commonly used for data that may need to be viewed or edited using a standard text editor (e.g. Notepad, VS Code, or nano).

Features:

  • Characters are stored as sequences of bytes.

Take your grades to the next level!

UPGRADING TO PREMIUM UNLOCKS
AI Tutor
AI-powered study assistant
instant feedback and guidance
Predicted Papers
Examiner-style predicted papers
based on recent exam trends
Practice Questions
All exam practice questions
by topic for each subject
Study Notes
All detailed revision notes
written by expert teachers
Cheat Sheets
Quick revision summaries
perfect for last-minute review
Past Papers
Complete collection
of practice and past exam papers
Email
Password
Confirm Password
Already have an account?

Practice Questions

FAQ

Buffering is used in file handling to optimise the interaction between a program and the underlying file system. When a file is accessed, reading or writing each byte individually can be extremely inefficient due to the overhead of system calls for every byte. Instead, buffering allows larger blocks of data to be temporarily stored in memory before being written to or read from the disk. For example, when writing, data is first accumulated in a buffer and only flushed to the disk when the buffer is full or the file is closed. This significantly reduces the number of disk accesses, improving performance. Buffered file access is especially useful for large files or repetitive operations like line-by-line writing. It also helps minimise wear on storage devices. Most programming languages, including Python and Java, handle buffering automatically, but they often provide ways to manually flush or disable buffers when needed, especially for real-time applications or debugging.

File content refers to the actual data stored within the file, such as the characters in a text document or the bytes in an image. File metadata, on the other hand, consists of information about the file itself, including its name, size, creation and modification dates, permissions, and file type. Metadata is typically managed by the operating system and is essential for file management tasks. In file handling, metadata affects whether a file can be accessed or modified. For example, if a file has read-only permissions, any attempt to write to it will trigger a PermissionError. Metadata also determines how files are opened and interpreted. A file’s extension (e.g. .txt or .jpg) helps the operating system and applications decide which program to use to open it. File handling functions may also rely on metadata when checking for file existence, modification times for backup scripts, or when sorting files by size or date.

Encoding in text files maps characters to sequences of bytes so they can be stored and interpreted correctly. Common encodings include ASCII, which handles basic English characters, and UTF-8, which supports a wide range of international characters and symbols. When writing to a file, the program must use an encoding to convert characters into bytes. When reading, the same encoding must be used to decode the bytes back into characters. If there is a mismatch between the encoding used for writing and the one used for reading, it can lead to decoding errors, corrupted text, or unreadable symbols. For instance, reading a UTF-8 file as ASCII may result in UnicodeDecodeError if the file contains non-ASCII characters. To avoid such issues, always specify the encoding explicitly when opening a file for reading or writing, especially when dealing with multilingual content or data exchanged between different systems. In Python, this can be done with open("file.txt", "r", encoding="utf-8").

File locking is a mechanism that prevents conflicts when multiple programs or processes attempt to access the same file at the same time. Without locking, if two programs try to write to the same file simultaneously, it can result in race conditions, data corruption, or partial writes. File locks ensure that only one process can modify a file at any given time or that reads and writes do not interfere with each other. There are two main types of locks: shared locks, which allow multiple readers but no writers, and exclusive locks, which allow only one writer and no other access. File locking can be implemented at the application level using libraries or by the operating system through file descriptors. In high-concurrency environments like servers or databases, locking is critical to maintain data consistency and integrity. However, improper handling of locks can cause deadlocks or performance bottlenecks if processes wait indefinitely for access.

Handling large files efficiently requires streaming or incremental reading, where data is processed in smaller, manageable chunks instead of reading the entire file into memory. This is crucial when working with files that are several gigabytes in size or more, as loading such files fully can lead to memory exhaustion or system crashes. In most programming languages, this is achieved using buffered or iterative reading methods. For example, in Python, using a for loop to read a file line by line ensures that only one line is held in memory at a time. For binary files, reading with fixed-size buffers (e.g. file.read(1024)) allows the file to be processed piece by piece. This approach is commonly used in applications such as log parsing, video streaming, and data migration. Efficient file handling also involves promptly releasing resources, avoiding unnecessary copying of data, and using generators or iterators where appropriate to reduce memory overhead.

Hire a tutor

Please fill out the form and we'll find a tutor for you.

1/2
Your details
Alternatively contact us via
WhatsApp, Phone Call, or Email