Introduction to Aggregate Functions
Consider Below table named ‘STATS’ as reference for all examples in this blog.
Here's table creation script.
CREATE TABLE STATS (
Match_No INT PRIMARY KEY,
Opposition VARCHAR(50) Not null,
RunsScored INT DEFAULT 0,
BallFaced INT,
StrikeRate FLOAT,
MatchDate DATE,
Venue VARCHAR(100))
INSERT INTO STATS (Match_No,Opposition,RunsScored,BallFaced,StrikeRate,MatchDate,Venue)
VALUES
(1,'South Africa',18,34,52.94 ,'2019-06-05','Southampton'),
(2,'Australia' ,82,77,106.49,'2019-06-09','The Oval'),
(3,'Pakistan' ,77,65,118.46,'2019-06-16','Manchester'),
(4,'Afghanistan' ,67,63,106.34,'2019-06-22','Southampton'),
(5,'West Indies' ,72,82,87.80 ,'2019-06-27','Manchester'),
(6,'England' ,66,76,86.84 ,'2019-06-30','Birmingham'),
(7,'Bangladesh' ,26,27,96.29 ,'2019-07-02','Birmingham'),
(8,'Sri Lanka' ,34,41,82.93 ,'2019-07-06','Leeds'),
(9,'New Zealand'
,01,06,16.16 ,'2019-07-09','Manchester')
Lets explore all aggregate functions one by one,
1. COUNT in SQL
The COUNT function in SQL Server is a used for counting the number of rows or non-null values in a column or a result set. Here's a comprehensive guide on the usage and application of the COUNT function in SQL :
Consider a case where we want to find number of matches present in our table 'STATS'.
SELECT COUNT(Match_No) AS Match_Count FROM STATS;
O/P:
Above query will return count of Non null 'Match_No' column from 'STATS' table
Replace ` Match_No ` with the name of the column you want to count, and ` STATS ` with the name of the table you're querying.
Counting All Rows:
To count all rows in a table, you can use the COUNT function without specifying a column name:
SELECT COUNT(*) AS Row_count FROM STATS;
O/P:
This will return the total number of rows including null values in the specified table.
- Counting Non-Null Values:
By default, the COUNT function includes NULL values in the count. However, you can exclude NULL values by applying the function to a specific column:
SELECT COUNT(*) AS Not_Null_RowCount FROM STATS WHERE Venue IS NOT NULL;
O/P:
This query will count only the non-null values in the specified column.
Alias in SQL:
You can assign a custom name to the result of the COUNT function using the AS keyword:
SELECT COUNT(Match_No) AS MatchesPlayed FROM STATS;
The result will be displayed under the aliased name (`MatchesPlayed ` in this example).
- Distinct Count in SQL:
If you want to count the number of distinct not null values in a column, you can use the DISTINCT keyword inside the COUNT function.
Consider you want to find number of unique venues present in table, you can use below query.
SELECT COUNT(DISTINCT Venue) AS DistinctCount FROM STATS;
O/P:
As our table has 5 unique values in 'Venue' column, SQL has returned '5' in result.
- Filtering with WHERE Clause:
You can further refine your count by adding a WHERE clause to specify conditions for counting specific rows:
In case you want to find how namy matches were played at Manchester, you can use below query
SELECT COUNT(1) AS VenueCount FROM #STATS WHERE Venue='Manchester';
O/P:
Replace where clause part with the desired filtering condition based on your data requirements.
2. SUM in SQL
The SUM function in SQL Server is used to calculate the sum of numeric values in a specified column or expression. It is particularly useful for performing mathematical calculations and aggregating numeric data. Here's a guide on how to use the SUM function in SQL Server:
Lets find total runs scored by using below query
SUM(RunsScored) AS TotalRuns FROM STATS;
O/P:
Replace 'RunsScored' with the name of the column containing the values you want to sum, and 'STATS' with the name of the table you're querying.
- Filtering with WHERE Clause:
You can refine your sum calculation by adding a WHERE clause to specify conditions for summing specific rows:
Lets filter our selection. Here below query finds total runs scored on a single ground ‘Manchester’.
SELECT SUM(RunsScored) AS TotalRuns FROM STATS WHERE Venue='Manchester';
O/P:
Replace where clause with the desired filtering condition based on your data requirements.
- Summing Computed Values:
The SUM function can also be used to sum the results of arithmetic calculations or expressions. For example:
SELECT SUM(RunsScored + BallFaced) AS DerivedColumn FROM STATS;
O/P:
This query calculates the sum of the values obtained by adding ` RunsScored ` and ` BallFaced ` for each row.
3. AVG in SQL:
The AVG function in SQL Server is used to calculate the average (mean) value of a column or expression. It is particularly useful for obtaining the average of numeric data. Here's a guide on how to use the AVG function in SQL Server:
SELECT AVG(RunsScored) AS AverageScore FROM STATS;
O/P:
Above query returns average runs scored. Replace 'RunsScored' with the name of the column containing the values you want to average, and `S ` with the name of the table you're querying.
SELECT AVG(RunsScored) AS Average FROM STATS;
The result will be displayed under the aliased name ('Average' in this example).
- Filtering with WHERE Clause:
You can refine your average calculation by adding a WHERE clause to specify conditions for averaging specific rows:
SELECT AVG(RunsScored) Average FROM #STATS WHERE Venue='Manchester';
O/P:
Above query returns
average runs scored by VK on Manchester ground. Replace where
clause with the desired filtering condition based on your data requirements.
4. MAX in SQL:
The MAX function in SQL Server is used to retrieve the maximum value from a column or expression. It is commonly used to find the highest value within a dataset. Here's a guide on how to use the MAX function in SQL Server:
SELECT MAX(RunsScored) MaxRuns FROM #STATS;
O/P:
Above query returns the highest score from table STATS.
Replace 'RunsScored' with the name of the column from which you want to find the maximum value, and 'STATS' with the name of the table you're querying.
5. MIN in SQL:
The MIN function in SQL Server is used to retrieve the minimum value from a column or expression. It is commonly used to find the lowest value within a dataset. Here's a guide on how to use the MIN function in SQL Server:
SELECT MIN(RunsScored) AS MinimumScore FROM #STATS;
O/P:
Above query returns the lowest score from table 'STATS'.
Replace 'RunsScored' with the name of the column from which you want to find the minimum value, and 'VK_Stats' with the name of the table you're querying.
GROUP BY Clause in SQL:
The GROUP BY clause in SQL Server is used to group rows based on one or more columns and perform aggregate functions on each group. It allows you to summarize data and perform calculations on subsets of your dataset. Here's a guide on how to use the GROUP BY clause in SQL Server:
SELECT Venue,SUM(RunsScored) As Runs
FROM STATS
GROUP BY Venue;
O/P:
Above query returns Venue wise summation of RunsScored. You can replace sum() function by any other aggregate function as per requirement.
Let’s add a little complexity here.
Now we will modify above query in order to give Venue wise
sum of RunsScored but I want to consider only those records where RunsScored is
greater than 50.
SELECT Venue, SUM(RunsScored) AS Runs
FROM STATS
Where RunsScored>50
GROUP BY Venue;
O/P:
HAVING Clause:
The HAVING clause allows you to filter the grouped data based on conditions involving aggregate functions or other grouped columns. It is useful when you want to apply conditions to groups of rows rather than individual rows.
Lets jump on previous example of group by clause.
SELECT Venue, SUM(RunsScored) as Runs
FROM STATS
GROUP BY Venue;
Now we want to restrict output on basis of condition aggregate function. i.e. SUM(RunsScored) in this case. For that we will use having clause.
SELECT Venue, SUM(RunsScored) AS Runs
FROM STATS
GROUP BY Venue
HAVING SUM(RunsScored)=150;
O/P:
Difference between Where and Having Clause:
Both the WHERE and HAVING clauses allow the use of logical operators to combine multiple conditions. However, the WHERE clause is optional and can be used without the GROUP BY clause, while the HAVING clause is mandatory and requires the GROUP BY clause.
Comparison Factor |
WHERE Clause |
HAVING Clause |
Usage |
Filters individual
rows before grouping and aggregation. |
Filters groups after
grouping and aggregation. |
Position |
Appears before the
GROUP BY clause. |
Appears after the
GROUP BY clause. |
Applied On |
Individual rows. |
Grouped data. |
Column Usage |
Can be used with any
column in the table, including non-aggregated columns. |
Typically used with
aggregated columns or aggregate functions. |
Aggregation |
Cannot use aggregate
functions directly in the WHERE clause. |
Can use aggregate
functions directly in the HAVING clause. |
Filtering |
Applies conditions
to individual rows. |
Applies conditions
to groups of rows. |
Use with GROUP BY |
Optional. Can be
used without the GROUP BY clause. |
Mandatory. Requires
the GROUP BY clause. |
Examples |
WHERE price > 100 |
HAVING SUM(quantity)
> 100 |
Examples and Scenarios
We can actually use all aggregate functions in a single query.
Let’s have a look
SELECT
COUNT(*) AS MatchesPlayed,
SUM(RunsScored ) AS Total_Runs,
AVG(BallFaced) AS Average_BallsFaced,
MIN(RunsScored) AS Lowest_Score,
MAX(RunsScored) AS Highest_Score,
COUNT(DISTINCT Venue) AS Venue_Count
FROM STATS;
O/P:
Conclusion:
Throughout our exploration, we have familiarized ourselves with the core aggregate functions available in SQL Server, such as COUNT, SUM, AVG, MIN, and MAX. These functions serve as indispensable tools for counting records, summing values, calculating averages, and identifying minimum and maximum values in your data.
Moreover, we have discovered the power of grouping and aggregating data using the GROUP BY clause. This feature allows you to analyse your data at various levels, enabling you to derive meaningful insights by grouping your data based on different criteria. By applying aggregate functions within the GROUP BY context, you can derive personalized metrics and summaries that align with your specific analysis requirements.
Happy exploring and aggregating!
0 Comments