Dictionaries are versatile data structures used to store and retrieve data efficiently. This page explores practical ways dictionaries are used in real-world computing tasks.
Information retrieval
One of the most common and powerful uses of dictionaries is in the field of information retrieval. This refers to the process of extracting relevant information from a large dataset, often involving text documents. Dictionaries are particularly well-suited for this task because they allow for rapid key-based access to data, making them ideal for storing word frequencies, indexes, and search results.
Word frequency counting
A classic example of information retrieval is counting how many times each word appears in a document. This is known as word frequency analysis, and it plays a key role in various areas of computer science, including natural language processing (NLP), machine learning, and search engine technology.
Dictionaries allow us to use each word as a key, and the number of times it appears as the corresponding value. This means we can quickly determine whether a word has already been seen, update its count if it has, or add it if it hasn’t.
Example document
Let’s consider the document:
“The green, green grass grows”
Practice Questions
FAQ
Yes, dictionary keys can be complex data types, but with a strict limitation: they must be immutable. In Python, for example, valid dictionary keys include integers, strings, and tuples—provided the tuple only contains other immutable objects. This means you could use a tuple like (x, y) as a key to represent a coordinate in a grid or a pair of related values. However, keys like lists or dictionaries themselves are not allowed, as these are mutable and could be changed after being inserted, breaking the consistency of the key lookup. The immutability requirement ensures that the dictionary’s internal hash mechanism functions correctly and consistently. This allows for flexible and structured keying in applications like mapping coordinates to values in a matrix, or storing compound identifiers like ("John", "Doe"). Attempting to use mutable objects as keys will result in a runtime error, so careful key design is essential in such use cases.
Dictionaries and JSON objects appear similar in structure—both store data as key-value pairs—but they are not identical. A dictionary is a data type used within programming languages, such as Python, whereas JSON (JavaScript Object Notation) is a text-based data format designed for data exchange between systems. Dictionaries can contain a wide range of Python-specific data types, including functions, classes, and tuples, while JSON is limited to more universal types like strings, numbers, booleans, null (Python’s None), arrays (Python lists), and objects (Python dicts). Also, in JSON, keys must always be strings, whereas Python dictionaries allow keys of various types as long as they are immutable. However, Python provides built-in methods like json.dumps() and json.loads() to convert dictionaries to JSON strings and vice versa, allowing them to be used for API communication or configuration files. Despite their similarities, developers must be aware of format-specific rules when switching between the two.
Traditionally, dictionaries in many programming languages, such as earlier versions of Python (prior to version 3.7), did not maintain insertion order. This is because dictionaries are optimised for fast lookups and updates, and ordering was considered unnecessary overhead. Internally, they use structures like hash tables, which focus on key-based access speed rather than sequence. However, in practical applications, the order of items may become important—for example, in configuration files where settings need to be applied in a specific order, or in data presentation where consistent ordering improves readability. Modern Python (from version 3.7 onwards) does preserve insertion order as an implementation detail, which became a language guarantee in Python 3.8. Other languages may behave differently depending on their implementation. In general, if ordering is important, programmers should use ordered alternatives like OrderedDict (Python) or explicitly sort items before processing. Always check the language documentation to understand dictionary behaviour regarding ordering.
When a duplicate key is added to a dictionary, the new value overwrites the existing one. This is by design—dictionary keys must be unique, so assigning a new value to an existing key simply updates the stored value rather than adding a second entry. For instance, if you have a dictionary {'a': 1} and then run dict['a'] = 5, the dictionary becomes {'a': 5}. This behaviour is useful when updating settings, recalculating values, or correcting data, but it can also lead to accidental overwrites if not handled carefully. To prevent this, you can check if a key exists using if key in dict: before assigning a value, or use .setdefault() to only assign a value if the key does not exist. Alternatively, logging or raising a warning when a key already exists can help track unexpected updates. In situations where multiple values per key are needed, use a dictionary of lists or a defaultdict(list).
Dictionaries and databases both store and retrieve data using key-like structures, but they serve very different purposes. Dictionaries are held in memory, used for quick, temporary access to relatively small amounts of structured data within a single application. They are ideal for managing user sessions, configuration settings, or local records. Databases, on the other hand, are persistent, designed to store large datasets across sessions and support multi-user access, querying, indexing, and data integrity. While dictionaries are extremely fast (constant time complexity on average for lookups), they lack features such as data types, relational integrity, and transaction management found in databases like SQL or NoSQL systems. Moreover, dictionaries are not suitable for concurrent access without extra handling. However, dictionaries are often used as intermediary structures to fetch data from or send data to databases, making them an important tool in application development. They provide a lightweight and flexible model for in-memory data manipulation.
