Window Functions in SQL

Joshua SmartJoshuaSmart

DURATION

7min

Tags

SQL

Database

COMPETITIVE PROGRAMMING AT TOPCODER

Window functions in SQL are an easy approach to carry out complex calculations in SQL across a specified group of rows. They can help users to easily analyze or query aggregated and unaggregated data side by side without the GROUP BY clause. They were first introduced in 2003.

Let’s look at some examples. In a company, a data analyst may be asked to find how much revenue a particular product has made the company. Let’s say this product has a product ID of 710. Using window functions, this is what the query and its result would look like.

1
2
3
4
5
6
7
8
9
10
SELECT
[ProductID],

[LineTotal] AS 'Revenue',

  SUM([LineTotal]) OVER() AS 'Revenue Total'

FROM[Sales].[SalesOrderDetail]

WHERE[ProductID] = 710

In the images above, instead of using the GROUP BY clause, using the WHERE clause to specify the ProductID we want, we have the revenue of ProductID 710 logged at each instance the product incurred revenue. The last two columns in the second image are a classic example of aggregated and unaggregated data grouped together WITHOUT the GROUP BY clause. If you sum the values in all the rows (44) of the revenue column, they add up to 513.

The breakdown of the syntax for the window function on the third line of the query is as follows:

SUM: this is the same aggregate function used in SQL to ADD integers together. This serves the function of adding up the LineTotal, which is a synonym for revenue.

OVER(): the OVER clause is what is used to initiate a window function. It enables the user to divide the data into “windows”, or sections. The OVER clause, in conjunction with the SUM clause, performs an action across all the rows that have a ProductID of 710 and adds up all the revenue for this particular ProductID. Because it is a function that acts on a specified set of rows (the rows specified in this query have been done so using the WHERE clause to specify [ProductID] = 710), all rows that fall under the column Revenue Total will have the result of that window function (513) repeated in every row of that column. That is why the number 513 is repeated again and again within the column Revenue Total. Also, there is only one window because nothing has been specified within the parenthesis of the OVER clause.

Within the OVER clause, there are two provisions or options to specify further what you want to see in your query. They are the PARTITION BY option and the ORDER BY option. Let’s look at another example to understand them further.

In the next example, we are looking at the standard cost of a product identified by its product ID.

1
2
3
4
5
6
7
SELECT TOP 15[ProductID],

  SUM([StandardCost]) OVER(PARTITION BY[ProductID]) AS 'Sum of Standard Cost'

FROM[Production].[Product]
ORDER BY 'Sum of Standard Cost'
DESC

In the query above, the standard cost, which is the cost it takes to make the product, is first summed up according to its respective product ID. The PARTITION BY clause is used to group or organize the resulting query by the product ID. The PARTITION BY clause also helps to create the “windows” of data.

In the image below there are fifteen rows of data in the results of the query but rows one to five, six to nine, and ten to fifteen all have unique results within their ranges under the column “Sum of Standard Cost”. These are the windows of data that have been created AFTER the total sum of standard cost was divided or “partitioned” by the product ID.

The ORDER BY clause within the window function looks like this:

1
2
3
4
5
6
7
8
SELECT TOP 15[ProductID],

  SUM([StandardCost]) OVER(PARTITION BY[ProductID] ORDER BY[ProductID]) AS 'Sum of Standard Cost'

FROM[Production].[Product]

ORDER BY 'Sum of Standard Cost'
DESC

We can have the ORDER BY clause after the SELECT statement because in logical query processing, the step-by-step approach used in querying, the SELECT statement comes first, then the ORDER BY clause comes after. Hence, the ORDER BY clause within the window function. This ORDER BY clause is not influenced by the ORDER BY clause at the end of our query.

An example of using the MAX function along with window functions would be one where a manager wants a data analyst to query the database and find the largest order quantity per sales order ID and show that side-by-side with the largest order quantities for each sales order ID.

1
2
3
4
5
6
7
8
SELECT DISTINCT[SalesOrderID],

  MAX([OrderQty]) OVER() AS Maximum_Order_Quantity,

  MAX([OrderQty]) OVER(PARTITION BY[SalesOrderID]) Order_Quantity_Per_Sales_OrderID

FROM[Sales].[SalesOrderDetail]
ORDER BY Order_Quantity_Per_Sales_OrderID DESC

From the image above, the DISTINCT function is used to ensure only one record per sales order ID is returned by the query. This is to avoid sales order ID duplicates. The second line of code uses the MAX function to return the largest order quantity on record while using the OVER() clause to activate the window function on that line of code. This is basically using the window function to return the largest order quantity. Because window functions act on all rows of a column or record specified, the largest value will be returned through all rows of that column. The third line of code uses the MAX function to get the largest value but uses the PARTITION BY clause within the window function to divide or arrange our result by the largest order quantity value for each sales order ID. The ORDER BY clause orders the results in this column in descending order so the largest order quantity values appear first. The image below is what the result of that query would look like. For the MIN function, just switch out the MAX function for MIN function.

For the AVG function, we will use a similar example. Instead of looking for the largest order quantity, imagine the requirement is to get the overall average revenue along with the average revenue of each sales order ID. That query would look like this:

1
2
3
4
5
6
7
8
SELECT DISTINCT[SalesOrderID],

  ROUND(AVG([LineTotal]) OVER(), 0) as Overall_Avg_Revenue,

  ROUND(AVG([LineTotal]) OVER(PARTITION BY[SalesOrderID]), 0) as Avg_Revenue_Per_Sales_OrderID

FROM[Sales].[SalesOrderDetail]
ORDER BY Avg_Revenue_Per_Sales_OrderID DESC

The ROUND() function is used in this query to round up the values in the Overall_Avg_Revenue column and the Avg_Revenue_Per_Sales_OrderID column. The AVG function is used with the OVER() clause to get the overall average revenue while in the third column the average revenue per each sales order ID is derived. The result of the query above, ordered in descending order, would look like this:

For the final example of window functions, we will use the COUNT function. Imagine a scenario where the line manager wants to know how many employees are at a company and how many employees there are per job description or job role?

The SQL query for that would be this:

1
2
3
4
5
6
7
8
9
SELECT DISTINCT[JobTitle],

  COUNT([NationalIDNumber]) OVER() AS No_Of_Employees,

  COUNT([NationalIDNumber]) OVER(PARTITION BY[JobTitle]) AS Emloyees_Per_Job_Title

FROM[HumanResources].[Employee]

ORDER BY Emloyees_Per_Job_Title DESC

The overall count of employees is taken using their unique national ID numbers as identifiers. The total count of that is partitioned by the job titles at the organization and the query is ordered in descending order by the number of employees per job description so the job descriptions with the highest number of employees come first.

In summary, window functions can make querying in SQL easier, especially when one needs to present aggregated and non-aggregated data together. These are just a few examples of how to use window functions with other aggregate functions.

REFERENCES

Chat on Discord

September 13, 2021

Window Functions in SQL

DURATION

categories

Tags

share

COMPETITIVE PROGRAMMING AT TOPCODER

REFERENCES

Recommended for you

A Crash Course in SQL

A Guide to Building a Career in Data Science

Order By vs Group By with Microsoft SQL Server