TutorChase logo
Login
AQA A-Level Computer Science

11.6.1 Concept of a hash table

Hash tables are fast, flexible data structures that store key-value pairs using a hash function, allowing rapid searching, insertion, and deletion in many computing applications.

What is a hash table?

A hash table is a data structure that stores data in the form of key-value pairs. Instead of storing items in a sequence like an array or a list, a hash table uses a hash function to compute an index, which determines where a value is stored. The key idea is that you can use the key to directly access the location where its value is stored.

For instance, if you want to store a student's information under a unique ID such as "stu457", the hash function would convert this ID into a numerical index, like 13. The value, which might include their name, grades, or attendance record, would then be stored at position 13 in an array or similar structure. The next time you want to retrieve the student’s data, you hash "stu457" again and instantly know where to find it.

Key features

  • A key uniquely identifies a value.

  • A hash function transforms the key into a numerical index.

  • The value is stored at the index returned by the hash function.

Take your grades to the next level!

UPGRADING TO PREMIUM UNLOCKS
AI Tutor
AI-powered study assistant
instant feedback and guidance
Predicted Papers
Examiner-style predicted papers
based on recent exam trends
Practice Questions
All exam practice questions
by topic for each subject
Study Notes
All detailed revision notes
written by expert teachers
Cheat Sheets
Quick revision summaries
perfect for last-minute review
Past Papers
Complete collection
of practice and past exam papers
Email
Password
Confirm Password
Already have an account?

Practice Questions

FAQ

Yes, hash tables can store multiple data types as values, such as integers, strings, lists, or even custom objects, depending on the programming language being used. This flexibility makes hash tables highly adaptable to a wide range of applications. The main requirement is that the key must be hashable, but the value can be any data type, even a complex structure like a nested dictionary or object. In terms of implementation, this means the underlying data structure must be able to accommodate heterogeneous data types, which is commonly supported in high-level languages like Python or JavaScript. However, in strongly typed languages like Java or C++, type declarations must match the expected value type, so generic or templated data structures are used. Although storing complex data types increases memory usage slightly, it does not affect the average-case time complexity for lookup or insertion. The hash function only needs to operate on the key, so the nature of the value does not impact indexing performance.

This situation is known as a collision, which occurs when two different keys generate the same hash value, resulting in the same index. Since a hash table uses a single array or list to store data, only one value can be stored at a particular index unless a collision resolution strategy is employed. While the collision resolution mechanisms themselves are covered in another section of the syllabus, it’s important to understand that collisions are unavoidable due to the finite size of the table and the potentially infinite number of possible keys. When a collision occurs, the hash table must decide whether to probe for the next available index (open addressing) or store multiple entries at the same index using a linked list or similar structure (chaining). Without proper collision handling, data would be overwritten, leading to loss of information and unpredictable behaviour. A well-designed hash function aims to minimise collisions, but a robust resolution method is essential to ensure data integrity.

The load factor of a hash table is the ratio of the number of elements stored to the total number of available slots (or buckets). It is calculated using the formula:
Load factor = number of elements / table size.
This metric directly influences the likelihood of collisions—higher load factors lead to more collisions, which in turn increase the time required to resolve them, degrading performance. A low load factor means the table has plenty of empty space, reducing collisions but increasing memory usage. In practice, most implementations aim for a load factor between 0.5 and 0.75, striking a balance between memory efficiency and access speed. When the load factor exceeds a threshold (commonly 0.75), the hash table is typically resized—usually by doubling its capacity—and all entries are rehashed. This process, while temporarily expensive, maintains long-term efficiency. Monitoring and controlling the load factor is critical to maintaining the average-case constant time performance of hash tables.

A good hash function should distribute keys evenly and uniformly across the entire hash table to prevent clustering, which occurs when many keys are hashed to the same or nearby indices. Uniform distribution ensures that each index has an approximately equal chance of being used, which reduces the likelihood of collisions and preserves the efficiency of constant-time access. If the hash function creates poor distribution—for example, if it always hashes certain keys to the same few indices—then those indices become overcrowded. This leads to frequent collisions and longer search times during lookup or insertion. In extreme cases, the hash table can behave similarly to a linked list, with performance degrading to O(n) in the worst case. Additionally, poor distribution increases the need for more frequent collision resolution, wasting CPU cycles and increasing memory overhead. Therefore, choosing or designing a high-quality hash function with strong uniformity properties is critical to achieving the expected performance of a hash table.

Traditional hash tables do not maintain any order of key-value pairs. The location of each entry is determined entirely by the hash function, which scatters keys across the table based on their hash values. This lack of order means that when you iterate over the contents of a hash table, the sequence may appear random or differ between executions. In many applications, such as dictionaries or caches, this doesn’t matter because the focus is on fast access, not order. However, in applications where data order is significant—such as maintaining the order of operations or insertion order for user interfaces—this behaviour is a limitation. To address this, some programming languages offer ordered hash table variants. For example, Python 3.7+ dictionaries preserve insertion order, and Java offers LinkedHashMap, which maintains insertion order while still using hashing for fast access. If order matters in your application, it’s important to choose a data structure that explicitly supports this behaviour rather than relying on standard hash tables.

Hire a tutor

Please fill out the form and we'll find a tutor for you.

1/2
Your details
Alternatively contact us via
WhatsApp, Phone Call, or Email