Data types and data structures form the backbone of programming. They define how data is stored, manipulated, and interpreted by the computer. A strong understanding of these concepts is crucial for writing effective, efficient, and error-free code.
What is a data type?
A data type is a classification that determines what kind of data a variable can hold, how much memory should be allocated for it, and what operations are permitted on it. Every value in a program—whether it's a number, a letter, or a block of text—is associated with a specific data type.
Data types ensure that the computer correctly interprets the nature of the data and avoids performing invalid operations. For instance, attempting to add a string to an integer would typically result in an error unless the data types are converted or handled correctly.
Purpose of data types
Memory efficiency: Data types allow the program to allocate just enough memory for each piece of data. For example, storing an integer in a 4-byte memory space rather than allocating unnecessarily large storage.
Type safety: By specifying types, the language can catch errors where operations are not compatible with the data. This helps avoid logical and runtime errors, such as trying to divide text.
Code reliability: Well-defined data types make programs more predictable and less prone to bugs. Knowing that a variable will only ever be a Boolean or integer helps the programmer reason about its behaviour.
Validation: Input from users can be checked against expected types to prevent crashes or misinterpretations.
Typing systems
Statically typed languages (e.g. C++, Java): Variable types must be declared and checked at compile time.
Dynamically typed languages (e.g. Python, JavaScript): The type is inferred at runtime, allowing more flexibility but potentially more runtime errors.
Built-in data types
Most programming languages come with a standard set of primitive data types that provide the basic building blocks for data storage and manipulation.
Integer
An integer is a whole number, either positive, negative, or zero, with no fractional or decimal part.
Examples: -15, 0, 42, 999
Use cases:
Loop counters: for i = 0 to 10
Counting objects or people
Array indexing: array[2]
Representing discrete values like age or scores
Many languages specify a limit on integer size. For example, a 32-bit signed integer ranges from -2147483648 to 2147483647.
Real / Float
A real or float data type represents numbers with a fractional part. These are written using a decimal point.
Examples: 3.14, -0.001, 100.0
Use cases:
Financial transactions: £19.99
Scientific values: gravitational constant, measurements
Representing temperatures, weights, or distances
Floats are often implemented using the IEEE 754 standard, which can lead to rounding errors when comparing very small or precise numbers. For example, comparing 0.1 + 0.2 with 0.3 might not yield true due to binary approximation errors.
Boolean
A Boolean value represents a logical state—either true or false.
Use cases:
Controlling program flow: if isPassed == true
Loop conditions: while isRunning
Flags to indicate status: isAuthenticated, isGameOver
Boolean logic is fundamental to programming and underpins decision-making, branching, and condition checking.
Character
A character stores a single symbol such as a letter, digit, or punctuation.
Examples: 'A', '5', '?', '#'
Characters are stored using encoding schemes like ASCII (7-bit codes for common characters) or Unicode (which includes characters from most written languages).
Use cases:
Detecting keystrokes
Processing strings one character at a time
Creating formatted outputs (e.g. newline \n, tab \t)
Each character is internally stored as a numeric code. For instance, 'A' is represented as 65 in ASCII.
String
A string is a sequence of characters treated as a single piece of text.
Examples: "hello", "123 Main Street", "true"
Use cases:
Storing names, addresses, passwords
Displaying messages
Reading files and input
String manipulation includes:
Concatenation: "Hello, " + "World!"
Slicing: text[0:5]
Searching: indexOf("word")
Replacing: replace("old", "new")
In many languages, strings can be either mutable (changeable) or immutable (cannot be altered once created).
Date/Time
The date/time data type handles calendar and clock data.
Examples: "21/06/2025", "14:30", "2025-06-21T14:55:00"
Use cases:
Scheduling events
Logging timestamps
Measuring durations
Most languages use built-in libraries (e.g. datetime in Python) to represent and manipulate dates and times. These support:
Parsing strings to dates
Formatting for output
Calculating differences (e.g. days between two dates)
Composite data structures
Composite data structures group multiple values together, allowing more complex data to be represented.
Arrays
An array is an ordered list of elements, all of the same data type. The elements are stored in contiguous memory locations and accessed by indices.
One-dimensional arrays
A 1D array stores a linear list.
Example:
makefile
grades = [55, 65, 75, 85, 95]
grades[0] = 55
grades[4] = 95
Use cases:
Lists of scores
Storing fixed-size datasets
Loop processing: for i = 0 to length-1
Multidimensional arrays
These arrays contain multiple indices.
Example:
lua
grid = [[1, 2],
[3, 4]]
grid[1][0] = 3
Use cases:
Representing spreadsheets
Game boards (e.g. chess)
Matrices in mathematical applications
Advantages:
Fast access using indices
Easy to process with loops
Memory layout is efficient
Limitations:
Fixed size at declaration
Stores only one data type
Can become complex with many dimensions
Records
A record is a structured collection of fields, possibly of different data types, grouped under a single name.
Example:
pgsql
record Student
string name
integer age
float grade
Usage:
ini
Student1.name = "Liam"
Student1.age = 17
Student1.grade = 92.5
Use cases:
Representing real-world objects (students, books)
Structuring data for databases
File-based structured data (e.g. CSV rows)
Advantages:
Groups related values logically
Can store mixed data types
Simplifies management of complex entities
User-defined types
User-defined types allow programmers to create their own data types by combining or naming built-in types.
Enumeration (enum)
An enumeration is a user-defined type consisting of a set of named constants.
Example:
cpp
enum Direction {North, South, East, West}
Usage:
java
Direction current = North
Use cases:
Representing finite options: days of the week, states in a program
Improving readability and safety (avoiding strings like "left" or integers like 2)
Advantages:
Code becomes clearer and more maintainable
Prevents invalid values outside the defined set
Reduces runtime bugs due to typos or invalid states
Type aliases
A type alias gives a new, meaningful name to an existing type.
Example:
bash
type StudentID = integer
This helps give context to variables and distinguish between values that otherwise share the same base type.
Use cases:
Labelling different uses of integers or strings
Clarifying intent in function signatures
Making types more domain-specific
Composite user-defined types
User-defined types often combine records, arrays, and other types.
Example:
php
record Book
string title
string author
integer pages
array[string] keywords
This structure can store rich information about a book, including a list of relevant keywords.
Practical application and selection of data types
Choosing the correct data type is critical for program performance, readability, and correctness.
Based on nature of data
Whole numbers → integer
Decimal numbers → float
Logical flags → Boolean
Text data → string
Single characters → character
Dates and times → date/time
Based on storage efficiency
Prefer smallest suitable types to conserve memory
Avoid using float when integer will do
Use arrays instead of multiple standalone variables
Based on functionality
Use arrays for lists of similar items
Use records to group diverse fields
Use enums for predefined options (e.g. modes, statuses)
Based on clarity and maintainability
Use descriptive type names (StudentID, Distance)
Group related fields using record
Model complex data using structured, user-defined types
Code examples (pseudocode)
// Built-in types
java
integer score = 100
float average = 85.75
boolean isPassed = true
character grade = 'A'
string studentName = "Chloe"
date examDate = "2025-06-21"
// 1D array
php
array[3] of integer ages = [16, 17, 18]
// Record definition and usage
pgsql
record Student
string name
integer age
float score
Student s1
s1.name = "Noah"
s1.age = 18
s1.score = 91.4
// Enum
java
enum Mood {Happy, Sad, Angry, Neutral}
Mood todayMood = Happy
// Type alias
java
type Distance = float
Distance d1 = 12.5
FAQ
Floating-point numbers are represented in binary using the IEEE 754 standard, which often cannot exactly represent decimal values due to the limitations of binary fractions. This leads to rounding errors and precision issues, especially with very small or repeating decimal values. For example, in many programming languages, 0.1 + 0.2 does not exactly equal 0.3, but rather results in a value like 0.30000000000000004. These small errors can accumulate in calculations and affect results when using equality checks. As a result, direct comparisons using == are unreliable. Instead, programmers should compare floats using a tolerance level or epsilon value, checking if the difference between numbers is smaller than a defined threshold. This approach accounts for small imprecisions. These inaccuracies are inherent in how computers handle real numbers in binary and are not errors in the code itself but a consequence of digital representation limits.
Mutable strings can be changed after they are created, while immutable strings cannot. In languages like Python and Java, strings are immutable, meaning any operation that modifies a string (e.g. concatenation, replacement) creates a new string object rather than changing the original one. This immutability enhances security and reliability, as the string's content cannot be altered unexpectedly once assigned. However, it can also lead to inefficiencies, especially in loops where repeated modifications create many intermediate objects. Mutable strings, supported in some languages (e.g. using StringBuilder in Java or StringBuffer), allow direct in-place modifications without creating new objects, which is more efficient for heavy string manipulation. Understanding mutability is important for managing memory usage, performance optimisation, and avoiding bugs when passing strings between functions or modules. Immutable strings also allow safe sharing across threads in concurrent applications, as there's no risk of one thread altering the string unexpectedly.
Arrays are stored in contiguous memory locations, meaning each element is placed directly after the previous one in memory. This structure allows constant-time access (O(1)) to elements using their index, as the position of any item can be calculated directly using the formula: starting memory address + (index × size of element). This makes arrays highly efficient for accessing and iterating through data. However, because their size is fixed at declaration, resizing an array requires allocating new memory and copying existing elements, which is computationally expensive. Additionally, inserting or deleting elements in the middle of an array can be slow, as it requires shifting other elements to maintain the order. The predictable memory layout also allows arrays to benefit from CPU caching, which improves access speed. However, the requirement for contiguous memory blocks can sometimes lead to memory fragmentation issues, especially in low-level or memory-constrained systems.
Character encoding determines how individual characters are stored as binary data in memory. The most common encoding schemes include ASCII and Unicode. ASCII uses 7 or 8 bits to represent 128 or 256 characters, sufficient for basic English text. Unicode, which includes encodings like UTF-8 and UTF-16, supports thousands of characters from various writing systems, including emojis, accented letters, and non-Latin scripts. In UTF-8, characters may use 1 to 4 bytes, which allows backward compatibility with ASCII while supporting a vast range of symbols. The choice of encoding affects the memory required to store strings and how text is processed. For example, calculating string length or slicing might yield different results depending on encoding, as one visible character could consist of multiple bytes. Misinterpreting the encoding can lead to garbled text or errors, particularly when reading from or writing to files or handling user input across different languages. Encoding is crucial for internationalisation and software compatibility.
While type aliases certainly improve code readability by providing meaningful names to otherwise generic types, they also enhance maintainability, abstraction, and type safety. By creating a type alias such as type Distance = float, any future change to how distances are represented (e.g. switching to a custom unit or structure) only needs to be made in one place. This abstraction allows the implementation to evolve without affecting the rest of the codebase. Type aliases also serve to clarify programmer intent, reducing misuse of variables—Distance and Speed might both be floats, but using separate aliases helps distinguish their roles in calculations or function parameters. In strongly typed languages, some systems can also enforce stronger checks between aliases, preventing one from being used where another is expected. This reduces bugs and clarifies data flow throughout the program. Overall, type aliases contribute to clearer, safer, and more adaptable code design in complex systems.
Practice Questions
Explain the difference between a record and an array, and give one suitable use case for each.
A record is a composite data structure that groups together fields of different data types under a single name, while an array stores multiple values of the same data type in indexed order. Records are useful when modelling real-world objects with various attributes, such as a Student with name, age, and grade. Arrays are better suited to storing ordered lists of uniform data, such as exam scores or temperatures over time. Records allow clearer organisation of mixed data, whereas arrays allow fast, indexed access and iteration over large datasets of a single type.
Describe what an enumeration (enum) is and explain one benefit of using it in a program.
An enumeration (enum) is a user-defined data type that consists of a fixed set of named values, which are treated as constants. It is used to represent a finite list of options, such as days of the week or system states like START, PAUSE, and END. A major benefit of using enums is improved code readability and safety. Since enums restrict possible values to a defined list, they prevent invalid inputs and reduce logical errors. Using enums makes the programmer's intent clearer and ensures that only valid states can be assigned, enhancing code clarity and maintainability.