Table of Contents
- Module Introduction
- Basics of Filtering with SQL
- Advanced Filtering:
IN
,OR
, andNOT
- Using Wildcards in SQL
- Sorting with
ORDER BY
- Math operations
- Aggregate Functions
- Grouping Data with SQL
- Putting it all together
This module introduces a number of operators and clauses to query data:
WHERE
BETWEEN
IN
OR
NOT
LIKE
ORDER BY
GROUP BY
It also introduces wildcards, and math operators which can be used to aggregate data:
AVERAGE
COUNT
MAX
MIN
- allows one to be specific about what they want
- reduces the number of records you receive
- increases query performance
- reduces strain on the client application
The WHERE
clause filters queries by criteria you define.
SELECT [fields]
FROM [table]
WHERE [criteria]
These criteria are determined through using a number of operators:
=
- equal<>
- not equal>
- greater than<
- less than>=
- greater than or equal<=
- less than or equalBETWEEN
- between an inclusive rangeIS NULL
- is a null value
To use BETWEEN
we need to use it in conjunction with AND
:
SELECT *
FROM my_table
WHERE size BETWEEN 5 AND 10;
The IN
operator specifies a range of conditions, delimited by commas, and
within parentheses.
-- select only suppliers with the provided ids
SELECT *
FROM suppliers
WHERE supplierId IN (9, 10, 15);
To assert against string values, surround the strings in single quotes.
As with the disjunction operator in many programming languages, SQL will ignore
the second condition in an OR
statement if the first condition is true.
SELECT *
FROM pets
WHERE name = 'sammy' OR 'hammy';
IN
works similarly to OR
, but there are benefits to using IN
:
- one can provide a long list of options
IN
executes faster thanOR
- the condition used in
IN
are not subject to a specific order IN
clauses can container anotherSELECT
statement to make subqueries
NOT
is used to exclude rows based on conditions:
SELECT *
FROM employees
WHERE NOT city = 'London' AND NOT city = 'Seattle';
A wildcard is a special character used to match parts of a value, such as when you know only a portion of a value.
Search patterns are made from literal text, a wildcard character, or a combination of both.
Wildcard queries use the LIKE
operator. LIKE
is actually a predicate, but is
often referred to as an operator.
Wildcards can only be used for string data.
%text
- matches anything ending withtext
text%
- matches anything beginning withtext
%text%
- matches anything containingtext
a%b
- matches anything that is preceded bya
and ends withb
Wildcards will not match NULL values.
SELECT DISTINCT(area)
FROM cities
WHERE name LIKE '%ville';
Many RDMSs support _
as a wildcard. It functions in the same way that %
has
been outlined here.
- wildcard queries take longer to run
- better to use other operators if possible
- statements with wildcards at the end of patterns take longer to run than patterns with wildcards at the beginning
- placement of wildcads is imporant
Using SELECT
will return data in the same order that it was captured in the
database.
Furthermore, the order of data can change when it is updated or deleted.
Without specifying the order of data, its order cannot be assumed.
ORDER BY
:
- takes the name of one or more columns
- columns are delimited by commas
- data can be sorted by columns that are not retrieved in the
SELECT
statement - must always be the last statement in a
SELECT
statement
SELECT *
FROM my_table
WHERE [criteria]
ORDER BY col1, col2;
One can specify the direction to sort data:
DESC
- descendingASC
- ascending
The direction keywords apply only to the column name they precede.
...
ORDER BY DESC name;
Though not readable, and prone to error, one can sort by column number:
...
ORDER BY 2, 3;
Columns numbers are 1-indexed.
Available operators:
+
-
*
/
SELECT
id
,unit_price
,unit_sold
,unit_price * units_sold AS total_sales
FROM my_table;
SELECT
id
,unit_price
,units_sold
,unit_weight
,(unit_price * units_sold)/unit_weight AS weight_ratio
FROM my_table;
Aggregate functios allow one to summarise data.
The aggregate functions available are:
AVG
COUNT
MIN
MAX
SUM
DISTINCT
may be used with some of the aggregate functions to include only
values that are distinct.
Aggregate functions are useful for:
- summarising data
- finding the highest and lowest values
- finding the total number of rows
- finding averages
Aggregate functions are syntactic sugar for the existing math operators.
Rows containing NULL values are ignore by the AVG
function.
SELECT
AVG(price) AS avg_price
FROM products;
COUNT
can be used to count rows, including those containing NULL values, or
individual columns, excluding those containing NULL values.
# count all rows, including those with NULL values
SELECT COUNT(*) AS customers
FROM customers;
# count customers that have emails
SELECT COUNT(email) AS customers_with_email
FROM customers;
NULL values are ignored by both MAX
and MIN
.
SELECT MAX(price) AS max_price
FROM products;
SELECT MIN(size) AS smallest_size
FROM shoes;
# get a range of shoe sizes
SELECT
MAX(size) AS shoe_size_upper
, MIN(size) as show_size_lower
FROM shoes;
# simple query
SELECT SUM(price) as total_price
FROM products;
# less simple query
SELECT SUM(price * units_sold) total_sales
FROM products;
If DISTINCT
is not defined, ALL
is assumed.
DISTINCT
can't be used in conjunction with COUNT(*)
.
DISTINCt
doesn't make sense to use with MIN
or MAX
since they only return
a single value.
SELECT COUNT(DISTINCT price) AS distinct_prices
FROM products;
Grouping data allows one to summarise subsets of data.
Data is grouped using two statements:
GROUP BY
HAVING
Using these statements we can aggregate data by a particular value, e.g. grouping shoes by shoe size.
When simply aggregating total number of customers in the aggregating module, we didn't need to do anything specific to get a result.
If, however, another column is specified in the SELECT
statement, we'd need to
indicate how we'd like the number of customers to be counted. This is where
GROUP BY
is useful:
# this will error - we need to specify _how_ to count customer ids when queried
with region
SELECT
region
,COUNT(customer_id) AS total_customers
FROM customers;
# so we indicate that we want to count customers grouped by region
SELECT
region
,COUNT(customer_id) AS total_customers
FROM customers
GROUP BY region;
- can contain multiple columns, delimited by commas
- every column in the
SELECT
statement must be present in aGROUP BY
clause, with the exception of the aggregated values - NULLs can be grouped together if your
GROUP BY
column contains NULLs
WHERE
filters on rows, not groupsWHERE
then needs to be defined beforeGROUP BY
- rows eliminated by a
WHERE
clause will not be included in a group HAVING
is the equivalent toWHERE
, but for groupsHAVING
must come afterORDER BY
# count the number of orders, by customer, where there are 2 or more orders for
# each customer
SELECT
customer_id
.COUNT(*) AS total_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 2;
As with other queries, GROUP BY
will not sort results. It's good practice to
use ORDER BY
with GROUP BY
to organise the results of queries.
SELECT
supplier_id
,COUNT(*) AS num_products
FROM products
WHERE price >= 4
GROUP BY supplier_id
HAVING COUNT(*) >= 2
ORDER BY name ASC;
[video](Putting it All Together)
Filtering is useful because:
- you're narrowing down your results
- it increases query and application performance
- makes understanding data easier by:
- finding specific values
- finding a range of values
- finding blank values
Order of key SQL clauses:
# always required
SELECT [columns]
# required if selecting table from a table
FROM [table]
WHERE [criteria]
# required if calculating aggregates by a group
GROUP BY [columns]
HAVING [criteria]
ORDER BY [columns [ASC|DESC]];