1.1.5 String-handling operations

String-handling operations are essential in programming for manipulating textual data, solving real-world problems like input validation, data parsing, and information formatting effectively.

What is a string?

A string is a data type used to represent a sequence of characters. These characters can include letters, digits, spaces, punctuation marks, and special symbols. Strings are enclosed in quotation marks. Depending on the programming language, you may use single quotes ('Hello') or double quotes ("Hello"). Strings are used frequently in applications where text needs to be displayed, stored, or processed, such as user inputs, file content, messages, and system logs.

String operations

High-level programming languages like Python, Java, and C# offer built-in support for various string-handling operations. These operations help perform specific tasks on strings such as measuring their length, accessing specific characters, extracting substrings, and combining multiple strings.

Length

The length of a string is the total number of characters it contains, including spaces and punctuation. This is useful when validating input, such as passwords or form fields.

In Python, the built-in len() function is used:
len("Computer Science") returns 17
In Java, use the .length() method:
"Computer Science".length() returns 17
In C#, use .Length property:
"Computer Science".Length returns 17

Use cases:

Checking that user inputs meet minimum or maximum length requirements
Looping through all characters in a string using a for-loop
Validating that a string is not empty before processing

Position (Indexing)

Indexing allows access to individual characters within a string by their position. Most languages use zero-based indexing, meaning the first character is at position 0.

Example in Python:

s = "Computer"
print(s[0])  # Output: 'C'
print(s[7])  # Output: 'r'

In Java:

String s = "Computer";
System.out.println(s.charAt(0));  // Output: 'C'

Negative indexing is supported in Python (not in Java or C#), which allows access from the end of the string:

print(s[-1])  # Output: 'r'

Use cases:

Accessing initials in a name (e.g., first letter of each word)
Getting the file extension from a filename (e.g., .txt from "file.txt")
Verifying that a character at a specific position matches a required value

Substring and slicing

A substring is a portion of a string. Slicing refers to extracting this portion using start and end indices.

In Python, slicing is done using the format string[start:end], where start is inclusive and end is exclusive:

text = "ComputerScience"
print(text[0:8])   # Output: 'Computer'
print(text[8:])    # Output: 'Science'

In Java, use substring():

String s = "ComputerScience";
System.out.println(s.substring(0, 8));  // Output: 'Computer'

In C#:

string s = "ComputerScience";
string sub = s.Substring(0, 8);  // Output: 'Computer'

Use cases:

Extracting first name and surname from a full name
Breaking up a date in YYYY-MM-DD format
Parsing values from structured inputs such as postcodes or product codes

Concatenation

Concatenation is the process of joining two or more strings together to form a single string. This is done using the + operator or specific string methods depending on the language.

Python:

python

first = "Hello"
second = "World"
message = first + " " + second  # Output: 'Hello World'

Java:

String message = "Hello" + " " + "World";  // Output: 'Hello World'

C#:

string message = "Hello" + " " + "World";  // Output: 'Hello World'

Use cases:

Generating user-friendly messages by joining variables and strings
Dynamically creating file names or paths
Formatting data for output to the screen or logs

Character and character code conversions

Sometimes you need to convert a character to its numeric representation or vice versa. These numeric values are typically based on the ASCII or Unicode character encoding systems.

In Python:

ord('A')    # Output: 65
chr(65)     # Output: 'A'

In Java:

int code = (int) 'A';  // 65
char ch = (char) 65;   // 'A'

In C#:

int code = Convert.ToInt32('A');   // 65
char ch = Convert.ToChar(65);      // 'A'

Use cases:

Encrypting and decrypting text using algorithms like Caesar cipher
Sorting strings based on ASCII values
Validating whether a character is a digit or a letter by checking its code

Type conversion operations

String to integer/float

Strings that contain numeric values can be converted into integers or floats for arithmetic operations.

Python:

python

int("42")       # Output: 42
float("3.14")   # Output: 3.14

Java:

java

Integer.parseInt("42");      // 42
Double.parseDouble("3.14");  // 3.14

C#:

csharp

int.Parse("42");      // 42
float.Parse("3.14");  // 3.14

If the string is not a valid number (e.g., "abc"), an exception is raised. Always validate input before conversion.

Integer/float to string

You can convert numbers to strings using built-in functions or methods.

Python:

str(123)      # Output: '123'

Java:

Integer.toString(123);  // '123'

C#:

123.ToString();  // '123'

Use cases:

Displaying calculated values in a user interface
Saving numeric data to text files
Joining numbers with other strings in logs or messages

Date/time to string and vice versa

Converting date/time to string

Python:

from datetime import datetime
now = datetime.now()
now.strftime("%Y-%m-%d")  # Output: '2025-06-21'

Java:

LocalDate date = LocalDate.now();
String formatted = date.format(DateTimeFormatter.ofPattern("yyyy-MM-dd"));

C#:

DateTime now = DateTime.Now;
string formatted = now.ToString("yyyy-MM-dd");

Converting string to date/time

Python:

datetime.strptime("2025-06-21", "%Y-%m-%d")

Java:

LocalDate.parse("2025-06-21", DateTimeFormatter.ofPattern("yyyy-MM-dd"));

C#:

DateTime.ParseExact("2025-06-21", "yyyy-MM-dd", CultureInfo.InvariantCulture);

Use cases:

Validating date input in forms
Storing and retrieving timestamps from databases
Calculating intervals such as time until deadline

Solving common problems

Input sanitisation

This is the process of cleaning user input to prevent unexpected behaviour or security vulnerabilities. It includes:

Trimming whitespace:

name = "  Alice  ".strip()  # Output: 'Alice'

Standardising case:

email = "John.DOE@example.com".lower()  # 'john.doe@example.com'

Removing unwanted characters:

''.join(c for c in text if c.isalnum())  # Keeps only letters and numbers

Importance:

Prevents injection attacks
Reduces inconsistencies in data storage
Improves user experience by avoiding rejections on valid inputs

Parsing strings

Parsing involves breaking a string down into parts or converting it into a different data structure.

Examples:

Splitting a string:

"John Smith".split()  # ['John', 'Smith']

Parsing CSV values:

"23,45,67".split(",")  # ['23', '45', '67']

Checking prefixes/suffixes:

filename.endswith(".txt")  # Returns True for text files

Use cases:

Extracting values from form fields
Processing lines in configuration files
Analysing structured logs

Best practices

Use meaningful variable names

Descriptive variable names like email_address or first_name improve code clarity and make programs easier to understand and maintain.

Consider case sensitivity

String comparisons are case-sensitive in most languages. To compare accurately, convert both strings to a common case:

python

if a.lower() == b.lower():
    ...

Use built-in methods

Languages provide efficient and safe built-in string methods. Avoid reinventing the wheel.

Examples:

.replace()
.find()
.split()
.join()
.strip()

Handle large string operations efficiently

In loops, repeated use of the + operator to concatenate strings can lead to performance issues. In loops, repeated use of the + operator to concatenate strings can lead to performance issues. Instead, use lists and join them at the end:

python

lines = []
for i in range(10):
    lines.append("Line " + str(i))
result = "\n".join(lines)

This is faster and uses less memory than building the string directly with +.

Common mistakes

Index errors

Accessing beyond the end of a string will cause an error. Always check that the index is within bounds.

python

if i < len(s):
    char = s[i]

Invalid conversions

Converting a non-numeric string to an integer will raise a runtime error. Always validate before converting:

python

if s.isdigit():
    val = int(s)

Modifying immutable strings

Strings are immutable in languages like Python and Java, so changes create a new string:

python

s = "hello"
s = "H" + s[1:]  # Correct way to change first letter

FAQ

Both split() and partition() are used to divide strings, but they behave differently and are suited to different purposes. The split() method divides a string into a list of substrings based on a specified delimiter (e.g., a space or comma). It can split the string into multiple parts wherever the delimiter appears. For example, "a,b,c".split(",") returns ['a', 'b', 'c']. If no delimiter is specified, it defaults to splitting on whitespace. It’s ideal for separating words or data entries.

On the other hand, partition() splits the string into exactly three parts: the portion before the first occurrence of the separator, the separator itself, and the part after. For example, "a,b,c".partition(",") returns ('a', ',', 'b,c'). This method is useful when you only want to isolate the first part of a string and retain the separator for context. Unlike split(), it doesn't create multiple substrings and is more predictable for fixed-format parsing.

String slicing can reverse a string by using the syntax string[::-1], which means start from the end of the string and move backwards one character at a time. This technique leverages the full power of slicing where the third parameter indicates the step size. A negative step of -1 reads the string from right to left. For example, "hello"[::-1] returns "olleh". This is efficient and concise, especially in Python, where strings are immutable and such operations generate new strings in memory.

Reversing strings is useful in various programming scenarios. It can be used in algorithmic challenges, such as checking whether a string is a palindrome (i.e., it reads the same backwards as forwards), formatting outputs in reverse order, or creating cipher transformations in security applications. Using slicing for this purpose is not only elegant but avoids the need for manual loops, making the code shorter, clearer, and less error-prone.

The join() method is used to combine a list of strings into a single string, with a specified separator between each element. For example, ", ".join(['apple', 'banana', 'cherry']) produces 'apple, banana, cherry'. This is more efficient than using the + operator in a loop because it avoids the creation of multiple intermediate string objects, which can degrade performance, especially when dealing with large datasets or in situations that require repeated operations.

While + can concatenate two or a few strings easily, using it repeatedly in a loop, such as building a string from a list of items, is inefficient in many languages due to string immutability. Each operation creates a new string in memory, leading to higher memory usage and slower execution. join() is specifically optimised for joining many strings and is more readable for such tasks. It clearly separates the logic of joining with a delimiter from the content being combined, making code easier to understand and maintain.

String identity refers to whether two variables point to the exact same memory location, while string equality checks whether the content of the strings is the same. In Python, for example, the is operator checks identity, whereas checks equality. Two separate string objects can have identical contents but reside at different memory locations, meaning a b could be true, but a is b would be false.

This distinction is crucial in scenarios involving caching, memory optimisation, or object tracking. Mistaking identity for equality can lead to logic errors, such as prematurely assuming two values refer to the same underlying data when they don't. For instance, if two inputs are being compared with is, the comparison may fail even though they are textually the same. This becomes especially problematic with user input, dynamically created strings, or when data is loaded from files or databases. Understanding this nuance ensures comparisons are performed correctly and prevents subtle bugs.

Regular expressions (regex) are powerful tools for pattern matching and complex text manipulation within strings. They allow programmers to define patterns for search, match, or substitution using a concise syntax. For example, the pattern r"\d{4}-\d{2}-\d{2}" matches dates in the format YYYY-MM-DD. Functions like re.search(), re.match(), and re.sub() in Python can scan, verify, or modify strings using regex rules.

However, regular expressions can be difficult to read and maintain, especially for beginners or in team projects. For simple tasks like splitting strings, trimming whitespace, or checking prefixes, built-in string methods are easier to understand and debug. Regex also introduces a performance overhead and may require more testing to ensure patterns behave as intended. Therefore, regex is most useful when dealing with highly variable or unstructured data, while standard string operations are usually preferred for well-defined, straightforward manipulations. Balancing simplicity with power is key in deciding which approach to use.

Practice Questions

A program receives a user’s full name as input in the format "First Last". Describe how string-handling operations can be used to extract and output the user's initials in uppercase. Include examples of functions that could be used.

To extract and display a user’s initials in uppercase, the program first uses a string splitting operation such as .split() to separate the full name into two parts: first and last name. It then accesses the first character of each part using indexing, e.g., name[0], and converts both to uppercase using .upper(). These two characters are then concatenated using + to form the initials. For example, "Alice Smith" becomes "A" and "S", resulting in "AS". This use of string operations ensures reliable extraction and formatting of initials.

Explain how and why type conversion is used when validating a user’s numeric input taken as a string. Include possible issues that can arise and how they are handled.

When numeric input is taken as a string, it must be converted using functions like int() or float() to perform arithmetic or validation. This conversion ensures the program can compare and calculate values correctly. However, if the user enters non-numeric data (e.g., "abc"), a runtime error occurs. To handle this, the program can check with methods like .isdigit() or use exception handling (e.g., try-except) to manage invalid input. Type conversion is essential to ensure data integrity, user input reliability, and program stability when processing user-entered numbers.

Try All Topic Practice Questions

Written by:

Alfie

Profile

Cambridge University - BA Maths

A Cambridge alumnus, Alfie is a qualified teacher, and specialises creating educational materials for Computer Science for high school students.

Cambridge University - BA Maths

A Cambridge alumnus, Alfie is a qualified teacher, and specialises creating educational materials for Computer Science for high school students.

AQA A-Level Computer Science

What is a string?

String operations

Length

Position (Indexing)

Substring and slicing

Concatenation

Character and character code conversions

Type conversion operations

String to integer/float

Integer/float to string

Date/time to string and vice versa

Converting date/time to string

Converting string to date/time

Solving common problems

Input sanitisation

Parsing strings

Best practices

Use meaningful variable names

Consider case sensitivity

Use built-in methods

Handle large string operations efficiently

Common mistakes

Index errors

Invalid conversions

Modifying immutable strings

FAQ

Practice Questions

Hire a tutor