Structured Query Language (SQL) is used to interact with databases, and DQL (Data Query Language) helps us retrieve data using SELECT statements.
Basic column retrieval
SELECT column_name FROM table_name;
The most fundamental operation in SQL DQL is retrieving data from a table using the SELECT statement. This command specifies exactly which columns of data you want to extract from a table. Every table in a database is made up of rows (records) and columns (fields). The SELECT statement focuses on pulling out specific columns from all the rows.
Example:
sq
SELECT first_name, last_name FROM students;
This query returns the first_name and last_name columns for every record in the students table.
To retrieve all columns from a table, an asterisk * can be used:
sql
SELECT * FROM students;
This returns every column in the students table for all rows.
It is generally better practice to avoid using * in large applications, as it can lead to unnecessary data being retrieved, reducing performance.
Filtering data with WHERE
SELECT ... FROM ... WHERE condition;
The WHERE clause is used to extract only those records that satisfy a specified condition. It allows precise control over what data is returned.
Common comparison operators include:
= (equal to)
!= or <> (not equal to)
Practice Questions
FAQ
The key difference between WHERE and HAVING lies in when they are applied in the execution of a query. WHERE is used to filter individual rows before any grouping or aggregation takes place. It deals with raw data coming directly from the table. On the other hand, HAVING is used to filter groups of rows that have already been aggregated with a GROUP BY clause. It is evaluated after the grouping and aggregation have occurred. This distinction is important because you cannot use aggregate functions like AVG() or COUNT() in a WHERE clause, but you can use them in a HAVING clause. For example, WHERE score > 60 filters each row with a score greater than 60, but HAVING AVG(score) > 60 filters entire groups where the average score is greater than 60. Using both properly ensures accurate data selection at both the row and group level.
Using GROUP BY without an aggregation function can still produce valid results, but it has a very specific behaviour and is generally discouraged unless fully understood. When you include a GROUP BY clause in a SELECT query, any column in the SELECT list that is not part of an aggregation function must appear in the GROUP BY clause. If this rule is not followed, most SQL engines will return an error, or in some lenient engines like MySQL (in certain configurations), it will return unpredictable or incorrect results. The purpose of GROUP BY is to collapse rows with common values into single rows, allowing for aggregate calculations like SUM, AVG, or COUNT. If no aggregation is used, then there's no meaningful operation happening on the grouped data. This can confuse the intent of the query and lead to logical errors. Always use aggregation when grouping, unless you're deliberately collapsing duplicates.
Yes, ORDER BY can absolutely be used with aggregate functions or grouped data, and it's a powerful way to organise your results. When you perform a GROUP BY, you often want to see the results sorted by an aggregated value like COUNT(*) or AVG(score). This allows for clear interpretation of which group has the highest or lowest value. For instance, after grouping students by grade and counting how many are in each grade, you could use ORDER BY COUNT(*) DESC to display the grades starting from the one with the most students. You can also order by a column not included in the grouping but computed as part of the aggregate. SQL engines will evaluate the aggregation and grouping first, and then the ordering is applied to that final grouped result. Ordering by these aggregates helps in making meaningful reports or dashboards where trends need to be easily seen.
COUNT(*) and COUNT(column_name) both return the number of rows, but there is an important distinction. COUNT(*) counts all rows, regardless of whether they contain NULL values or not. It essentially counts every row that exists in the result set. On the other hand, COUNT(column_name) only counts rows where the specified column is not NULL. This means if you have a table where a particular column has some missing values (NULL), then COUNT(column_name) will return a smaller number than COUNT(*). For example, if a table has 100 rows but the email column is NULL in 10 of them, then COUNT(*) will return 100, but COUNT(email) will return 90. This difference becomes particularly important when dealing with incomplete data or when you're trying to measure how many entries have valid data in a specific field. Always choose the version that best fits your analytical needs.
Table aliases are temporary names given to tables in a query, typically using the AS keyword, although it can often be omitted. They are particularly useful in JOIN queries where multiple tables may have columns with the same name, such as id, name, or date. Aliases make your query shorter, more readable, and help avoid ambiguity. For instance, in a query joining students and classes, using aliases like s for students and c for classes allows you to refer to s.first_name and c.class_name instead of writing the full table names repeatedly. This is especially helpful in large queries with multiple joins, where writing out full table names each time would clutter the query. Aliases also prevent confusion when the same column name exists in more than one table and makes it easier to read and debug complex SQL queries, especially when dealing with multiple relationships.
