OVER Clause

Parameters

ParameterDetails
PARTITION BYThe field(s) that follows PARTITION BY is the one that the 'grouping' will be based on

Remarks

The OVER clause determines a windows or a subset of row within a query result set. A window function can be applied to set and compute a value for each row in the set. The OVER clause can be used with:

  • Ranking functions
  • Aggregate functions

so someone can compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

In a very abstract way we can say that OVER behaves like GROUP BY. However OVER is applied per field / column and not to the query as whole as GROUP BY does.

Note#1: In SQL Server 2008 (R2) ORDER BY Clause cannot be used with aggregate window functions (link).

Cumulative Sum

Using the Item Sales Table, we will try to find out how the sales of our items are increasing through dates. To do so we will calculate the Cumulative Sum of total sales per Item order by the sale date.

SELECT item_id, sale_Date 
       SUM(quantity * price) OVER(PARTITION BY item_id ORDER BY sale_Date ROWS BETWEEN UNBOUNDED PRECEDING) AS SalesTotal
  FROM SalesTable

Dividing Data into equally-partitioned buckets using NTILE

Let's say that you have exam scores for several exams and you want to divide them into quartiles per exam.

-- Setup data:
declare @values table(Id int identity(1,1) primary key, [Value] float, ExamId int)
insert into @values ([Value], ExamId) values
(65, 1), (40, 1), (99, 1), (100, 1), (90, 1), -- Exam 1 Scores
(91, 2), (88, 2), (83, 2), (91, 2), (78, 2), (67, 2), (77, 2) -- Exam 2 Scores

-- Separate into four buckets per exam:
select ExamId, 
       ntile(4) over (partition by ExamId order by [Value] desc) as Quartile, 
       Value, Id 
from @values 
order by ExamId, Quartile

Our exam data divided into quartiles per exam

ntile works great when you really need a set number of buckets and each filled to approximately the same level. Notice that it would be trivial to separate these scores into percentiles by simply using ntile(100).

Using Aggregation functions with OVER

Using the Cars Table, we will calculate the total, max, min and average amount of money each costumer spent and haw many times (COUNT) she brought a car for repairing.

Id CustomerId MechanicId Model Status Total Cost

SELECT CustomerId,  
       SUM(TotalCost) OVER(PARTITION BY CustomerId) AS Total,
       AVG(TotalCost) OVER(PARTITION BY CustomerId) AS Avg,
       COUNT(TotalCost) OVER(PARTITION BY CustomerId) AS Count,
       MIN(TotalCost) OVER(PARTITION BY CustomerId) AS Min,
       MAX(TotalCost) OVER(PARTITION BY CustomerId) AS Max
  FROM CarsTable
 WHERE Status = 'READY'

Beware that using OVER in this fashion will not aggregate the rows returned. The above query will return the following:

CustomerIdTotalAvgCountMinMax
14302152200230
14302152200230

The duplicated row(s) may not be that useful for reporting purposes.

If you wish to simply aggregate data, you will be better off using the GROUP BY clause along with the appropriate aggregate functions Eg:

SELECT CustomerId,  
       SUM(TotalCost) AS Total,
       AVG(TotalCost) AS Avg,
       COUNT(TotalCost) AS Count,
       MIN(TotalCost) AS Min,
       MAX(TotalCost)  AS Max
  FROM CarsTable
 WHERE Status = 'READY'
GROUP BY CustomerId

Using Aggregation funtions to find the most recent records

Using the Library Database, we try to find the last book added to the database for each author. For this simple example we assume an always incrementing Id for each record added.

SELECT MostRecentBook.Name, MostRecentBook.Title
FROM ( SELECT Authors.Name,
              Books.Title,
              RANK() OVER (PARTITION BY Authors.Id ORDER BY Books.Id DESC) AS NewestRank
       FROM Authors
       JOIN Books ON Books.AuthorId = Authors.Id
     ) MostRecentBook
WHERE MostRecentBook.NewestRank = 1

Instead of RANK, two other functions can be used to order. In the previous example the result will be the same, but they give different results when the ordering gives multiple rows for each rank.

  • RANK(): duplicates get the same rank, the next rank takes the number of duplicates in the previous rank into account
  • DENSE_RANK(): duplicates get the same rank, the next rank is always one higher than the previous
  • ROW_NUMBER(): will give each row a unique 'rank', 'ranking' the duplicates randomly

For example, if the table had a non-unique column CreationDate and the ordering was done based on that, the following query:

SELECT Authors.Name,
       Books.Title,
       Books.CreationDate,
       RANK() OVER (PARTITION BY Authors.Id ORDER BY Books.CreationDate DESC) AS RANK,
       DENSE_RANK() OVER (PARTITION BY Authors.Id ORDER BY Books.CreationDate DESC) AS DENSE_RANK,
       ROW_NUMBER() OVER (PARTITION BY Authors.Id ORDER BY Books.CreationDate DESC) AS ROW_NUMBER,
FROM Authors
JOIN Books ON Books.AuthorId = Authors.Id

Could result in:

AuthorTitleCreationDateRANKDENSE_RANKROW_NUMBER
Author 1Book 122/07/2016111
Author 1Book 222/07/2016112
Author 1Book 321/07/2016323
Author 1Book 421/07/2016324
Author 1Book 521/07/2016325
Author 1Book 604/07/2016636
Author 2Book 704/07/2016111


2016-04-05
2017-01-07
Microsoft SQL Server Pedia
Icon