Graphs are mathematical structures that model relationships between entities, making them invaluable for representing networks, connections, and dependencies in computing and real-life scenarios.
What is a graph?
A graph is a data structure used to represent a set of objects (called nodes or vertices) and the connections between them (called edges or arcs). Graphs are highly effective at modelling systems where pairs of entities interact or are linked in some way.
Graphs are widely applicable across fields like computer science, mathematics, biology, and transport. They allow complex relationships to be represented visually and computationally.
A graph G is formally defined as:
G = (V, E)
Where:
V is the set of vertices (nodes)
E is the set of edges (connections between nodes)
Each edge in a graph is either an unordered pair (if the graph is undirected) or an ordered pair (if the graph is directed) of vertices.
Graphs can vary in complexity:
They can be simple with a small number of vertices and edges, or large-scale with millions of connections.
They can be sparse (few edges compared to vertices) or dense (many edges).
Understanding the basic structure of graphs lays the foundation for working with many powerful algorithms and data representations in computing.
Key components of a graph
Although more terminology is covered in a later section, here are some fundamental terms to get familiar with:
Practice Questions
FAQ
Graphs and trees are both non-linear data structures, but they differ in structure, rules, and use cases. A tree is a special type of graph that is acyclic and has a hierarchical structure with one root node and only one path between any two nodes. Trees cannot contain cycles and each child node has exactly one parent. In contrast, graphs are more general — they can be cyclic or acyclic, directed or undirected, and may allow multiple connections between nodes. Graphs are suitable when relationships between data are complex and cannot be represented in a strict parent-child hierarchy. For instance, use a tree when building a file system or binary search tree where hierarchy is key. Use a graph when modelling systems like road networks, social media connections, or dependencies where entities may relate in multiple directions or have no central root. Graphs offer greater flexibility for interconnected structures.
Yes, graphs can represent asymmetric relationships using directed graphs, also known as digraphs. In a directed graph, each edge has a direction, meaning the connection goes from one vertex to another but not necessarily the other way around. This is structurally achieved by representing edges as ordered pairs (A, B), indicating a one-way connection from vertex A to vertex B. Such representations are essential when modelling relationships where direction matters — for example, in web page links (where page A links to page B, but not vice versa), or in Twitter follows (where user A follows user B, but B may not follow A). The direction of the edge ensures the asymmetry is preserved. Additionally, directed graphs may be used to model workflows or processes, where each task must follow a specific order and cannot loop back arbitrarily. In adjacency matrices or lists, direction is encoded by marking only the corresponding source-to-destination relationship.
In graph structures, a self-loop is an edge that connects a vertex to itself, while multiple edges (also called parallel edges) are two or more edges that connect the same pair of vertices. Whether these are allowed depends on the type of graph being used. A simple graph does not allow self-loops or multiple edges; each edge connects two different vertices with only one connection allowed between any pair. In contrast, a multigraph permits multiple edges between the same pair of vertices, and may allow self-loops. These features are important in modelling certain real-world systems. For example, in a transport network, multiple routes (edges) might connect the same two cities (vertices), or a station may loop back on itself. Self-loops can represent tasks that depend on themselves or recurring states in state machines. When designing algorithms or data structures for such graphs, special care must be taken to handle these cases correctly.
Working with large-scale graphs introduces several significant challenges. Firstly, memory usage becomes critical. Representing millions of vertices and billions of edges requires efficient data structures, especially if the graph is sparse. Using an adjacency matrix for a sparse graph is impractical due to wasted space. Secondly, processing time for algorithms like shortest path or traversal increases significantly, demanding optimised or parallel approaches. Scalability is another concern — algorithms must be able to handle dynamic changes, like adding or removing nodes, without rebuilding the entire structure. Additionally, data consistency and redundancy become issues when graphs are distributed across multiple systems or servers, as seen in web or social media graphs. Visualising such large graphs is also difficult; it’s hard to interpret meaningful patterns without sophisticated graph analytics or layout tools. Lastly, privacy and security issues can arise, especially in user-based graphs, where connections may reveal sensitive or private data when analysed improperly.
Real-world graphs that change over time are referred to as dynamic graphs. Examples include social networks, where users join, leave, or form new connections; road networks, where traffic conditions or road closures affect connectivity; and financial transaction networks, where new transactions constantly update the structure. Programmatically, handling dynamic graphs requires data structures that support efficient insertions, deletions, and updates. Adjacency lists are commonly used for this purpose due to their flexibility. Additionally, incremental algorithms are often implemented — these can update results like shortest paths or centrality scores without recomputing from scratch. Real-time systems might use event-driven processing, where changes in the graph trigger updates to dependent components. Versioned graph databases or temporal graphs can be used to record how the graph evolves over time. This is especially important for audit trails or historical analysis. Dynamic graph libraries and frameworks such as NetworkX, GraphX, or Neo4j often provide tools for managing such changes efficiently.
