TutorChase logo
Login
AQA A-Level Computer Science

13.3.5 Expressiveness of BNF vs regular expressions

Backus-Naur Form (BNF) is more powerful than regular expressions because it can represent nested and recursive patterns that regular expressions cannot.

The limits of regular expressions

What are regular expressions?

Regular expressions (often abbreviated as regex) are symbolic notations used to describe patterns within strings. They are common in programming and are used for searching, matching, and validating text data. Regular expressions describe regular languages, which are the simplest types of formal languages in computational theory. These languages can be recognised and processed by finite state machines (FSMs).

Regular expressions are excellent at identifying sequences and repeated elements in a linear format. They rely entirely on pattern matching without memory, which makes them efficient for simple syntax rules but inherently limited when it comes to more complex, structured text.

Examples of typical patterns regular expressions can match:

  • A sequence of one or more digits: [0-9]+

  • A pattern for a basic email address: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+

  • Words starting with capital letters: [A-Z][a-z]*

These examples show that regular expressions are good at matching flat, linear structures. However, when the structure requires nesting, recursion, or a memory of previous symbols, regular expressions fail.

How finite state machines relate to regex

Take your grades to the next level!

UPGRADING TO PREMIUM UNLOCKS
AI Tutor
AI-powered study assistant
instant feedback and guidance
Predicted Papers
Examiner-style predicted papers
based on recent exam trends
Practice Questions
All exam practice questions
by topic for each subject
Study Notes
All detailed revision notes
written by expert teachers
Cheat Sheets
Quick revision summaries
perfect for last-minute review
Past Papers
Complete collection
of practice and past exam papers
Email
Password
Confirm Password
Already have an account?

Practice Questions

FAQ

No, even with advanced features or tricks, regular expressions cannot match the full range of patterns that BNF can. The limitations of regular expressions are based on theoretical constraints, not practical implementation details. Regular expressions define regular languages, which are recognised by finite state machines (FSMs) that lack memory or stack structures. This makes it fundamentally impossible for them to handle recursive or nested constructs, regardless of how complex the expression is. Some programming languages, such as Perl or Python, implement enhanced regex engines that can simulate limited recursive behaviour using features like lookaheads or backreferences. However, these are not part of the formal definition of regular expressions in automata theory. They can be useful for specific practical tasks, but they still do not provide the expressive power of a context-free grammar. In contrast, BNF describes languages that require a stack (pushdown automata), and recursion is intrinsic to its structure, making it suitable for defining programming language syntax and nested constructs.

Finite state machines cannot parse programming languages because they cannot handle constructs that require tracking depth or relationships between different parts of the input. Programming languages often include nested elements, such as braces {}, brackets [], parentheses (), and recursive structures like nested functions or expressions. FSMs do not have memory or a stack, which means they cannot count or remember opening symbols to ensure that they are correctly matched with closing symbols. As a result, they fail to validate structures like “for every opening brace, there must be a closing brace in the correct order.” FSMs transition between a finite number of states and make decisions based only on the current input and current state. They cannot look ahead or keep track of previous inputs. This makes them unsuitable for parsing source code, which requires context-awareness and hierarchical understanding. For this reason, compilers use pushdown automata and context-free grammars, such as those written in BNF, which support recursion and stack-based parsing.

Recursion is essential in complex grammars because it allows rules to refer back to themselves, enabling the definition of structures of arbitrary depth and self-similar patterns. In BNF, recursion is implemented by allowing a non-terminal symbol to appear in its own definition. For example, the rule <expr> ::= <expr> + <term> | <term> allows the expression rule to be expanded into itself, enabling a chain of additions such as a + b + c + d. This approach is powerful because it allows the language to scale naturally to any depth or length without explicitly writing out every possible combination. It mirrors the way many programming constructs are built, such as nested statements, blocks, or repeated operations. Without recursion, a grammar would have to list every possible case separately, which is impractical or impossible for infinite or large languages. BNF uses recursion not only for arithmetic expressions but also for syntactic constructs like function calls, conditional nesting, and balanced delimiters, all of which are essential in real programming languages.

Yes, many real-world programming languages and tools rely on BNF or closely related notations (such as EBNF or ABNF) to formally define their syntax. For example, the syntax of Pascal, C, Java, and even scripting languages like Python has been described using variations of BNF. These formal grammar definitions are essential for creating compilers, interpreters, and documentation. Additionally, markup languages like XML and HTML are often described using BNF-style grammars. Tools like ANTLR (Another Tool for Language Recognition) use grammars written in BNF-like formats to automatically generate parsers for programming languages. Moreover, BNF is commonly found in language specifications published by standards organisations such as ISO or W3C. These formal specifications ensure that multiple implementations of the same language are consistent and that developers can rely on a standardised syntax. BNF also forms the foundation of syntax analysis tools in integrated development environments (IDEs), enabling features like code highlighting, syntax checking, and autocomplete.

Syntax errors are identified during compilation by matching the source code against the language’s context-free grammar, which is often defined using BNF. The compiler's parser attempts to construct a derivation tree (or syntax tree) from the source code by applying the BNF rules. Each line of code must correspond to a sequence of valid productions in the grammar. If the parser encounters a sequence of tokens that do not fit any rule — for instance, a missing parenthesis, an unexpected symbol, or an improperly formed expression — it raises a syntax error. Because BNF rules are hierarchical and recursive, they allow the parser to expect specific structures at specific points. If a rule expects a number and finds a keyword instead, or if an opening brace is not followed by a corresponding closing brace, the grammar fails to match and the compiler reports the location and type of syntax error. This early detection of errors ensures that invalid code does not proceed to later stages like semantic analysis or code generation.

Hire a tutor

Please fill out the form and we'll find a tutor for you.

1/2
Your details
Alternatively contact us via
WhatsApp, Phone Call, or Email