Introduction to Aggregate Functions

Aggregate Functions in SQL


    Hello and welcome to part 4 of 'SQL for data analytics' series. Here we will learn one of the important topics of SQL called 'Aggregate Functions'. Aggregate functions are essential tools that enable the calculation of valuable summaries from database tables. By performing calculations on groups of rows, aggregate functions provide a single result, making them perfect for data summarization.

Consider Below table named ‘STATS’ as reference for all examples in this blog.


Here's table creation script.

CREATE TABLE STATS    (  

    Match_No        INT                 PRIMARY KEY,

    Opposition      VARCHAR(50)         Not null,

    RunsScored      INT                 DEFAULT 0,

    BallFaced       INT,

    StrikeRate      FLOAT,

    MatchDate       DATE,

    Venue           VARCHAR(100))

 

INSERT INTO STATS (Match_No,Opposition,RunsScored,BallFaced,StrikeRate,MatchDate,Venue)

    VALUES

    (1,'South Africa',18,34,52.94 ,'2019-06-05','Southampton'),

    (2,'Australia'   ,82,77,106.49,'2019-06-09','The Oval'),

    (3,'Pakistan'    ,77,65,118.46,'2019-06-16','Manchester'),

    (4,'Afghanistan' ,67,63,106.34,'2019-06-22','Southampton'),

    (5,'West Indies' ,72,82,87.80 ,'2019-06-27','Manchester'),

    (6,'England'     ,66,76,86.84 ,'2019-06-30','Birmingham'),

    (7,'Bangladesh'  ,26,27,96.29 ,'2019-07-02','Birmingham'),

    (8,'Sri Lanka'   ,34,41,82.93 ,'2019-07-06','Leeds'),

    (9,'New Zealand' ,01,06,16.16 ,'2019-07-09','Manchester')


 Lets explore all aggregate functions one by one,

1. COUNT in SQL

   The COUNT function in SQL Server is a used for counting the number of rows or non-null values in a column or a result set. Here's a comprehensive guide on the usage and application of the COUNT function in SQL : 

Consider a case where we want to find number of matches present in our table 'STATS'.

        SELECT COUNT(Match_No) AS Match_Count FROM STATS;

O/P: 

 
      

Above query will return count of Non null 'Match_No' column from 'STATS' table  

Replace ` Match_No ` with the name of the column you want to count, and ` STATS ` with the name of the table you're querying.


  •  Counting All Rows: 

To count all rows in a table, you can use the COUNT function without specifying a column name:

    SELECT COUNT(*) AS Row_count FROM STATS;

O/P:

     

This will return the total number of rows including null values in the specified table. 

  • Counting Non-Null Values: 

By default, the COUNT function includes NULL values in the count. However, you can exclude NULL values by applying the function to a specific column:

    SELECT COUNT(*) AS Not_Null_RowCount FROM STATS WHERE Venue IS NOT NULL;

O/P:        


This query will count only the non-null values in the specified column.

  •  Alias in SQL: 

You can assign a custom name to the result of the COUNT function using the AS keyword:

        SELECT COUNT(Match_No) AS MatchesPlayed FROM STATS;

The result will be displayed under the aliased name (`MatchesPlayed ` in this example).

 O/P;

  • Distinct Count in SQL:

If you want to count the number of distinct not null values in a column, you can use the DISTINCT keyword inside the COUNT function.

Consider you want to find number of unique venues present in table, you can use below query.

     SELECT COUNT(DISTINCT Venue) AS DistinctCount FROM STATS;

O/P:  

  

As our table has 5 unique values in 'Venue' column, SQL has returned '5' in result.

 

  • Filtering with WHERE Clause:

You can further refine your count by adding a WHERE clause to specify conditions for counting specific rows:

In case you want to find how namy matches were played at Manchester, you can use below query

    SELECT COUNT(1) AS VenueCount FROM #STATS WHERE Venue='Manchester';

O/P:   

   

Replace where clause part with the desired filtering condition based on your data requirements. 


 2. SUM in SQL

   The SUM function in SQL Server is used to calculate the sum of numeric values in a specified column or expression. It is particularly useful for performing mathematical calculations and aggregating numeric data. Here's a guide on how to use the SUM function in SQL Server:

Lets find total runs scored  by using below query

    SUM(RunsScored) AS TotalRuns FROM STATS;

O/P:   

   

Replace 'RunsScored' with the name of the column containing the values you want to sum, and                'STATS' with the name of the table you're querying.

 

  •  Filtering with WHERE Clause: 

You can refine your sum calculation by adding a WHERE clause to specify conditions for summing specific rows:

Lets filter our selection. Here below query finds total runs scored on a single ground ‘Manchester’.

    SELECT SUM(RunsScored) AS TotalRuns FROM STATS WHERE Venue='Manchester';

O/P:    

      

Replace where clause with the desired filtering condition based on your data requirements.

 

  • Summing Computed Values: 

The SUM function can also be used to sum the results of arithmetic calculations or expressions. For example:  

    SELECT SUM(RunsScored + BallFaced) AS DerivedColumn FROM STATS;

O/P: 

    

This query calculates the sum of the values obtained by adding ` RunsScored ` and ` BallFaced ` for each row.

 

3. AVG in SQL:

   The AVG function in SQL Server is used to calculate the average (mean) value of a column or expression. It is particularly useful for obtaining the average of numeric data. Here's a guide on how to use the AVG function in SQL Server:

     SELECT AVG(RunsScored) AS AverageScore FROM STATS;

   O/P: 

Above query returns average runs scored.  Replace 'RunsScored' with the name of the column containing the values you want to average, and `S ` with the name of the table you're querying.

Aliasing the Result: You can assign a custom name to the result of the AVG function using the AS keyword:  

    SELECT AVG(RunsScored) AS Average FROM STATS;

The result will be displayed under the aliased name ('Average' in this example). 


  • Filtering with WHERE Clause: 

You can refine your average calculation by adding a WHERE clause to specify conditions for averaging specific rows:

    SELECT AVG(RunsScored) Average FROM #STATS WHERE Venue='Manchester';

O/P:

 

Above query returns average runs scored by VK on Manchester ground. Replace where clause with the desired filtering condition based on your data requirements.  

4. MAX in SQL:

   The MAX function in SQL Server is used to retrieve the maximum value from a column or expression. It is commonly used to find the highest value within a dataset. Here's a guide on how to use the MAX function in SQL Server:                           

    SELECT MAX(RunsScored) MaxRuns FROM #STATS;

O/P:  

Above query returns the highest score  from table STATS.

 Replace 'RunsScored' with the name of the column from which you want to find the maximum value, and 'STATS' with the name of the table you're querying. 

 

5. MIN in SQL:

   The MIN function in SQL Server is used to retrieve the minimum value from a column or expression. It is commonly used to find the lowest value within a dataset. Here's a guide on how to use the MIN function in SQL Server:                           

   SELECT MIN(RunsScored) AS MinimumScore FROM #STATS;

O/P:  

Above query returns the lowest score  from table 'STATS'.

Replace 'RunsScored' with the name of the column from which you want to find the minimum value, and 'VK_Stats' with the name of the table you're querying.

 

GROUP BY Clause in SQL:

   The GROUP BY clause in SQL Server is used to group rows based on one or more columns and perform aggregate functions on each group. It allows you to summarize data and perform calculations on subsets of your dataset. Here's a guide on how to use the GROUP BY clause in SQL Server:      

SELECT Venue,SUM(RunsScored) As Runs

FROM STATS

GROUP BY Venue;

O/P:


Above query returns Venue wise summation of RunsScored. You can replace sum() function by any other aggregate function as per requirement.

Let’s add a little complexity here.

Now we will modify above query in order to give Venue wise sum of RunsScored but I want to consider only those records where RunsScored is greater than 50.  

       SELECT Venue, SUM(RunsScored) AS Runs

     FROM STATS

     Where RunsScored>50

     GROUP BY Venue;    

O/P:


We can use rest of the aggregate functions like avg(),min(),max(),count() instead of sum().


HAVING Clause:

The HAVING clause allows you to filter the grouped data based on conditions involving aggregate functions or other grouped columns. It is useful when you want to apply conditions to groups of rows rather than individual rows.

Lets jump on previous example of group by clause.

    SELECT Venue, SUM(RunsScored) as Runs

    FROM STATS

    GROUP BY Venue;

Now we want to restrict output on basis of condition aggregate function. i.e. SUM(RunsScored) in this case. For that we will use having clause.

     SELECT Venue, SUM(RunsScored) AS Runs

     FROM STATS

     GROUP BY Venue

     HAVING SUM(RunsScored)=150;

O/P:

 

Difference between Where and Having Clause:

     The WHERE clause filters individual rows before any grouping or aggregation is performed. It operates on individual rows and can be used with any column in the table. In contrast, the HAVING clause filters groups after grouping and aggregation have occurred. It is used with the GROUP BY clause and can include conditions involving aggregated columns or aggregate functions.

Both the WHERE and HAVING clauses allow the use of logical operators to combine multiple conditions. However, the WHERE clause is optional and can be used without the GROUP BY clause, while the HAVING clause is mandatory and requires the GROUP BY clause.

Comparison Factor

WHERE Clause

HAVING Clause

Usage

Filters individual rows before grouping and aggregation.

Filters groups after grouping and aggregation.

Position

Appears before the GROUP BY clause.

Appears after the GROUP BY clause.

Applied On

Individual rows.

Grouped data.

Column Usage

Can be used with any column in the table, including non-aggregated columns.

Typically used with aggregated columns or aggregate functions.

Aggregation

Cannot use aggregate functions directly in the WHERE clause.

Can use aggregate functions directly in the HAVING clause.

Filtering

Applies conditions to individual rows.

Applies conditions to groups of rows.

Use with GROUP BY

Optional. Can be used without the GROUP BY clause.

Mandatory. Requires the GROUP BY clause.

Examples

WHERE price > 100

HAVING SUM(quantity) > 100

 

Examples and Scenarios

We can actually use all aggregate functions in a single query.

Let’s have a look

 SELECT

        COUNT(*) AS MatchesPlayed,

        SUM(RunsScored ) AS Total_Runs,

        AVG(BallFaced) AS Average_BallsFaced,

        MIN(RunsScored) AS Lowest_Score,

        MAX(RunsScored) AS Highest_Score,

        COUNT(DISTINCT Venue) AS Venue_Count

        FROM STATS;        

O/P:

 

Conclusion:

Throughout our exploration, we have familiarized ourselves with the core aggregate functions available in SQL Server, such as COUNT, SUM, AVG, MIN, and MAX. These functions serve as indispensable tools for counting records, summing values, calculating averages, and identifying minimum and maximum values in your data.

Moreover, we have discovered the power of grouping and aggregating data using the GROUP BY clause. This feature allows you to analyse your data at various levels, enabling you to derive meaningful insights by grouping your data based on different criteria. By applying aggregate functions within the GROUP BY context, you can derive personalized metrics and summaries that align with your specific analysis requirements.

Happy exploring and aggregating!