a.
SQL Window Functions:
Welcome to Part 5 of ‘SQL for data analytics’ series, Here we are going to understand window function in SQL. We're thrilled to have you here, joining us on this exciting journey of exploration, discovery and knowledge sharing.
A
window function, also known as an analytic function, is a type of function in
SQL that operates on a set of rows called a "window" or
"frame." Unlike traditional aggregate functions, which operate on the
entire result set, window functions perform calculations on a specified subset
of rows within the result set.
There are several types of window functions, Let’s crack them one by one. To understand all operations in this blog we are going to use table mentioned below which has Venue wise data.
CREATE TABLE STATS (
Match_No INT PRIMARY KEY,
Opposition VARCHAR(50) Not null,
RunsScored INT DEFAULT 0,
BallFaced INT,
StrikeRate FLOAT,
MatchDate DATE,
Venue VARCHAR(100))
INSERT INTO STATS (Match_No,Opposition,RunsScored,BallFaced,StrikeRate,MatchDate,Venue)
VALUES
(1,'South Africa',18,34,52.94 ,'2019-06-05','Southampton'),
(2,'Australia' ,82,77,106.49,'2019-06-09','The Oval'),
(3,'Pakistan' ,77,65,118.46,'2019-06-16','Manchester'),
(4,'Afghanistan' ,67,63,106.34,'2019-06-22','Southampton'),
(5,'West Indies' ,72,82,87.80 ,'2019-06-27','Manchester'),
(6,'England' ,66,76,86.84 ,'2019-06-30','Birmingham'),
(7,'Bangladesh' ,26,27,96.29 ,'2019-07-02','Birmingham'),
(8,'Sri Lanka' ,34,41,82.93 ,'2019-07-06','Leeds'),
(9,'New Zealand'
,01,06,16.16 ,'2019-07-09','Manchester')
Types of Window functions in SQL:
1. Ranking Functions:
ROW_NUMBER():
Assigns a unique sequential number to each row within a partition.
Let’s say we want to assign date wise incremental number to each row in table we can use below query:
SELECT Opposition,RunsScored,MatchDate,Venue,
ROW_NUMBER() OVER (ORDER BY MatchDate) AS Sequence
FROM #STATS;
O/P:
Here in window function we can add partition clause while assigning numbering to specified table. This is optional part but it is very much useful while handling real life scenarios.
Consider below query where we have added partition clause which leads to recreate sequence on basis of partition column.
SELECT Opposition,RunsScored,MatchDate,Venue,
ROW_NUMBER() OVER (PARTITION BY Venue ORDER BY MatchDate) AS Sequence
FROM STATS
ORDER BY Venue;
Let's break down the different parts of the query:
ROW_NUMBER(): This is the name of the function.
OVER(): This clause defines the window or frame over which the function operates.
PARTITION BY: This optional clause partitions the result set into groups based on one or more columns. The ROW_NUMBER() function will reset to 1 for each distinct combination of partitioning columns.
ORDER BY: This clause specifies the order in which the rows within each partition are assigned the row numbers. We can specify the sort order which by default is ascending.
RANK():
Assigns a unique rank to each distinct value in the partition, with gaps in case of ties.
Let’s try to implement rank() function on our table. Syntax is similar to row_number() function.
SELECT Opposition,RunsScored,MatchDate,Venue,
RANK() OVER (ORDER BY Venue DESC) AS Rank
FROM STATS
ORDER BY Rank;
O/P:
Here you can notice rank() function has skipped the subsequent rank after having same column values.
DENSE_RANK():
Assigns a unique rank to each distinct value in the partition, without gaps in case of ties.
Let’s try to implement dense_rank() function on our table.
SELECT Opposition,RunsScored,MatchDate,Venue,
DENSE_RANK() OVER (ORDER BY Venue) AS DRank
FROM STATS
ORDER BY DRank
O/P:
Here you can notice Dense_rank() function has assigned the subsequent rank after having same column values. That’s the only difference between Rank and Dense_rank.
NTILE():
Divides the partition into specified equal-sized buckets and assigns a bucket number to each row.
Let’s try NTILE() function on our table, We will try to create 2 virtual groups on basis of RunsScored.
SELECT Opposition,RunsScored,MatchDate,Venue,
NTILE (3) OVER (ORDER BY Venue) as Bucket_Number
FROM STATS;
O/P:
Here we have created 3 virtual groups on basis of Venue in ascending order using NTILE() function.
Function NTILE() accepts one numeric argument.
If number of rows aren’t divisible by the argument passed, the NTILE() function returns the groups of two sizes with a difference by one.
2. Analytical Functions:
LAG():
This window function Retrieves the value from a previous row within the window.
Let’s see how it pans out with our table.
SELECT Opposition,MatchDate,
LAG(Opposition) OVER(ORDER BY MatchDate) AS PrevoiusOpposition
FROM STATS;
O/P:
We have used ‘Opposition’ in LAG() function In above result with Matchdate as our order by clause, as a result we can clearly see we are getting previous ‘Opposition’ value in newly formed column.
LEAD():
Retrieves the value from a subsequent row within the window.
This function returns exactly opposite result compared to LAG() function.
Let’s try previous query this time with lead() function.
SELECT Opposition,MatchDate,
LEAD(Opposition) OVER(ORDER BY MatchDate) AS PrevoiusOpposition
FROM STATS;
O/P:
FIRST_VALUE():
The FIRST_VALUE() function returns the value of the specified expression from the first row of the window frame within a partition. It allows you to retrieve the first value in a sorted window partition.
Let’s try out this function on our table.
SELECT Opposition,Venue,
FIRST_VALUE(MatchDate) OVER (ORDER BY MatchDate) AS FirstMatchDate,
FIRST_VALUE(RunsScored) OVER (PARTITION BY Venue ORDER BY RunsScored) AS
VenueWiseLowestRuns
O/P:
LAST_VALUE():
Retrieve last row value within a window.
Works exactly opposite to first_value() function.
SELECT Opposition,Venue,
LAST_VALUE(MatchDate) OVER (ORDER BY MatchDate) AS LastMatchDate,
LAST_VALUE(RunsScored) OVER (PARTITION BY Venue ORDER BY RunsScored ) AS
VenueWiseHighestRuns FROM STATS ORDER BY Venue;
O/P:
3. Aggregate Functions inside window function:
Window functions can be used in combination with aggregate functions to perform more complex analytical tasks in SQL. By using both together, you can calculate aggregate values over specific partitions and define custom window frames for advanced data analysis. Some common aggregate functions used with window functions include 'SUM', 'AVG', 'MIN', 'MAX', and 'COUNT'.
Let's take a look at some examples of using aggregate functions with window functions:
Calculating Running Sum:
To calculate the running sum of runs scored using a window function, you can use the `SUM()` aggregate function in combination with the `OVER` clause. The `OVER` clause allows you to define a specific window frame for the `SUM()` function, which, in turn, computes the running sum.
Here's the SQL query to get the running sum of runs scored
using a window function:
SELECT Opposition,RunsScored,MatchDate,Venue,
SUM(RunsScored) OVER (ORDER BY MatchDate) AS RunningSum
FROM STATS
Order by RunningSum;
O/P:
In this query, we use the 'SUM()' function as a window function with the 'OVER' clause. We order the window frame by 'MatchDate' to ensure the running sum is calculated in chronological order. The result will show the running sum of runs scored for each row in the original table.
By adding Partition by clause to this query changes the context of the result.
SELECT Opposition,RunsScored,MatchDate,Venue,
SUM(RunsScored) OVER (PARTITION BY Venue) AS TotalRunsPerVenue
FROM STATS
ORDER BY Venue;
O/P:
Here against each venue we get total runs.
Now let’s try both partition and order by clause together.
SELECT Opposition,RunsScored,MatchDate,Venue,
SUM(RunsScored) OVER (PARTITION BY Venue ORDER BY MatchDate) AS RunningSum
FROM #STATS
ORDER BY Venue,MatchDate;
O/P:
Now we have running sum within a partition.
That was fun. You can try similar thing to get running average or running count etc.
Hoping you have enjoyed this blog. Please let us know your feedback or queries in comment section.
Happy learning.
0 Comments