Dictionaries are powerful data structures used to store data as key-value pairs, offering efficient data access based on unique keys rather than positional indexing.
What is a dictionary?
A dictionary is a data structure that stores a collection of key-value pairs, allowing for efficient retrieval of values when given their corresponding keys. Unlike other data structures such as arrays or lists which are accessed using a numerical index, dictionaries use unique keys that can be of various data types. Each key maps to a specific value, making dictionaries an example of an associative data structure.
Dictionaries are commonly used in real-world programming to model objects and store related pieces of data under identifiable names. This makes them more intuitive and useful when handling structured or labelled data.
A dictionary is often thought of as being similar to a real-life dictionary or glossary. Just as you look up a definition based on a word (key), in a programming dictionary you look up a value based on a unique key.
Key and value
Key: An identifier used to locate the value in the dictionary. Keys must always be unique within the same dictionary and must be of an immutable data type such as a string, number, or tuple.
Practice Questions
FAQ
Dictionary keys must be immutable because dictionaries rely on hash functions to map keys to indices in a hash table. Hash functions require that the key's value remain constant so that it always produces the same hash value. If a key could be changed (as with a mutable object like a list), the location of the key in the hash table would become unreliable, making it impossible to retrieve the associated value accurately. Attempting to use a mutable object such as a list or a dictionary as a key results in a TypeError because these objects cannot be hashed. For example, my_dict = {[1, 2]: "value"} will raise an error because the list [1, 2] is mutable and therefore unhashable. To safely use composite data as a key, it must be immutable, such as a tuple containing only immutable elements. This ensures the integrity and consistency of key-value mapping within the dictionary.
When a dictionary is said to have average-case O(1) time complexity, it means that dictionary operations such as lookup, insertion, update, and deletion typically take constant time—i.e. the time taken does not increase with the number of elements. This efficiency arises because a dictionary uses a hash table, allowing it to directly compute the index of a key via a hash function. However, this performance can degrade to O(n) in the worst case, particularly when there are many collisions—situations where multiple keys hash to the same index. Collisions require additional steps such as traversing a linked list or probing for the next available slot, which increases time complexity. Poorly designed hash functions, excessive key clustering, or inserting too many items without resizing the hash table can all lead to performance degradation. Efficient hash functions and dynamic resizing help minimise these issues and maintain average-case O(1) efficiency in practice.
No, the preservation of dictionary insertion order depends on the programming language and version being used. In Python, starting from version 3.7, dictionaries maintain the order in which key-value pairs are inserted. This means that iterating over a dictionary will return keys in the same order they were added, which is now considered part of the official language specification. In contrast, earlier versions of Python (prior to 3.6) do not guarantee insertion order. Other programming languages handle this differently. For example, Java’s HashMap does not preserve insertion order, while LinkedHashMap does. Similarly, in JavaScript, objects generally preserve insertion order for string keys, but the behaviour for numerical keys may differ. It is therefore important not to rely on insertion order unless you are using a language and data structure that explicitly supports it. If maintaining order is critical, consider using an ordered dictionary or equivalent structure that guarantees this behaviour.
Dictionary resizing is the process of increasing the capacity of the underlying hash table when the dictionary becomes too full. Most implementations trigger resizing when a certain load factor is exceeded. The load factor is the ratio of the number of stored key-value pairs to the total number of buckets in the hash table. A high load factor increases the likelihood of collisions, which can degrade performance. When resizing occurs, a new, larger array is created (typically double the size), and all existing key-value pairs are rehashed and inserted into the new array. This process is known as rehashing. Although resizing is a costly operation in terms of time and memory, it happens infrequently and ensures that dictionary operations remain efficient overall. Without resizing, the number of collisions would increase rapidly as more items are added, resulting in slower performance. Resizing is therefore essential to sustaining the average-case O(1) time complexity.
The hash function plays a central role in dictionary implementation by converting keys into numeric hash codes, which are then used to compute the index at which the key-value pair is stored in the hash table. The quality of the hash function directly affects the efficiency of dictionary operations. A good hash function must be deterministic, meaning the same input always yields the same output. It should also distribute hash values uniformly across the table to minimise collisions. Furthermore, it must be fast to compute, even for complex keys, and should handle a wide variety of inputs gracefully. It must also avoid producing the same hash code for different keys (minimising collisions) and ensure that keys which are slightly different produce very different hash values (high dispersion). In cryptographic or security-sensitive contexts, hash functions must also be resistant to collision attacks, although this is less critical for general-purpose dictionaries.
