MS SQL 2012 Window functions introduction

Among lot of new features introduced in SQL Server 2012 also a new Windowing functions were introduced. The new functionality allow us to use the ORDER BY clause in the OVER clause with aggregate functions and also new ROWS and RANGE clauses were introduced to limit rows. The ORDER BY allow us define the order of rows processing and the ROWS/RANGE clauses put limits on the rows being processed in partition. All the details related to the OVER clause you can find on MSDN: OVER Clause (Transact-SQL).

ROWS/RANGE clause

The ROWS clause limits the rows in a parittion by specifying a fixed number of rows preceding or folowing the current rows. The rows preceeding and following are determined by the order specified in the ORDER BY clause.

The limit can be specified by serveral methods:

  • <unsigned integer> PRECEDING -fixed number of preceding rows
  • CURRENT ROW – representing current row being processed
  • UNBOUNDED PRECEDING – all previous records
  • <unsigned integer> FOLLOWING – fixed number of following rows
  • UNBOUNDED FOLLOWING – all rows following current row

So we can specify the limits like

ROWS BETWEEN 3 PRECEEDING AND 1 FOLLOWING
ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW
ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING

RANGE BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
RENGE CURRENT ROW

The RANGE clause can be only used with the UNBOUNDED limit and CURRENT ROW. The difference between ROWS and RANGE clause is, that ROWS works with physical rows and RANGE works with range of rows based on the current row value in the terms of ORDER BY clause. This means that for ROWS clause the CURRENT ROW represents the only current row being processed. For RANGE the CURRENT ROW represents all the rows with the same value in the fields specified in the ORDER BY clause within current partition as the current row being processed. So if we use RANGE and multiple rows have the same rank in the terms of order within the partition, then all those rows will represent current row.

When there is no ROWS/RANGE clause specified after the ORDER BY clause, then the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is used by SQL Server.

Samples how to use the window functions

Let’s take a look on a few samples, how we can use the window functions and what results they will provide.

Test data preparation

To be able to test the new functionality

--======================
-- Create test database
--======================
CREATE DATABASE WindowFunctionsTest
GO
USE WindowFunctionsTest
GO 

--Create Testing Tables
CREATE TABLE [dbo].[Accounts](
	[TransactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
	[TransactionDate] [datetime] NULL,
	[Balance] [float] NULL
)
GO
CREATE TABLE [dbo].[MultiAccounts](
	[TransactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
	[AccountID] [int] NOT NULL,
	[TransactionDate] [datetime] NULL,
	[Balance] [float] NULL
)
GO
--Fill test tables with data
INSERT INTO [dbo].[Accounts](
	[TransactionDate],
	[Balance]
)
SELECT '2000-1-1', 100 UNION ALL
SELECT '2000-1-1', -50 UNION ALL
SELECT '2000-1-2', 200 UNION ALL
SELECT '2000-1-3', 500 UNION ALL
SELECT '2000-1-4', -200 UNION ALL
SELECT '2000-1-5', 1000 UNION ALL
SELECT '2000-1-5', -300 UNION ALL
SELECT '2000-1-6', -300 UNION ALL
SELECT '2000-1-7', -200 UNION ALL
SELECT '2000-1-8', 2000 UNION ALL
SELECT '2000-1-9', 100 UNION ALL
SELECT '2000-1-10', -50 UNION ALL
SELECT '2000-1-10', 500 UNION ALL
SELECT '2000-1-11', 200 UNION ALL
SELECT '2000-1-12', 200 UNION ALL
SELECT '2000-1-13', 1000 UNION ALL
SELECT '2000-1-14', 1000 UNION ALL
SELECT '2000-1-15', -500 UNION ALL
SELECT '2000-1-15', -300 UNION ALL
SELECT '2000-1-16', 1000 UNION ALL
SELECT '2000-1-17', 1000 UNION ALL
SELECT '2000-1-18', -800 UNION ALL
SELECT '2000-1-19', 2000 UNION ALL
SELECT '2000-1-20', -1000
GO

INSERT [dbo].[MultiAccounts] (
	[AccountID],
	[TransactionDate],
	[Balance]
)
SELECT 1, '2000-1-1', 100 UNION ALL
SELECT 1, '2000-1-1', -50 UNION ALL
SELECT 1, '2000-1-2', 200 UNION ALL
SELECT 1, '2000-1-3', 500 UNION ALL
SELECT 1, '2000-1-4', -200 UNION ALL
SELECT 1, '2000-1-5', 1000 UNION ALL
SELECT 1, '2000-1-5', -300 UNION ALL
SELECT 1, '2000-1-6', -300 UNION ALL
SELECT 1, '2000-1-7', -200 UNION ALL
SELECT 2, '2000-1-1', 2000 UNION ALL
SELECT 2, '2000-1-2', 100 UNION ALL
SELECT 2, '2000-1-3', -50 UNION ALL
SELECT 2, '2000-1-4', 500 UNION ALL
SELECT 2, '2000-1-5', 200 UNION ALL
SELECT 2, '2000-1-6', 200 UNION ALL
SELECT 2, '2000-1-7', 1000 UNION ALL
SELECT 2, '2000-1-7', 1000 UNION ALL
SELECT 3, '2000-1-1', 800 UNION ALL
SELECT 3, '2000-1-2', -300 UNION ALL
SELECT 3, '2000-1-3', 1000 UNION ALL
SELECT 3, '2000-1-4', 1000 UNION ALL
SELECT 3, '2000-1-5', -800 UNION ALL
SELECT 3, '2000-1-6', 2000 UNION ALL
SELECT 3, '2000-1-7', -1000
GO

Window functions samples

If we try any of below queries they will provide the same results

--Using the ROWS clause
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--The same as abowe the ROWS UNBOUNDED PRECEDING will be completed by SQL Server to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
--If we specify only the left boundary, SQL Servers automaticaly fills the right BOUNDARY
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS UNBOUNDED PRECEDING ) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--Using the RANGE Clause
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--The same as above as RANGE UNBOUNDED PRECEDING AND CURRENT ROW will be complete by SQL Server as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
--If we specify only the left boundary, SQL Servers automaticaly fills the right BOUNDARY
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID RANGE UNBOUNDED PRECEDING ) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--NO ROWS/RANGE Clause (SQL Server will use the DEFAULT RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO

Results are below and we can see, a correct cumulative balance is calculated.

TransactionID TransactionDate         Balance                CummulativeBalance
------------- ----------------------- ---------------------- ----------------------
1             2000-01-01 00:00:00.000 100                    100
2             2000-01-01 00:00:00.000 -50                    50
3             2000-01-02 00:00:00.000 200                    250
4             2000-01-03 00:00:00.000 500                    750
5             2000-01-04 00:00:00.000 -200                   550
6             2000-01-05 00:00:00.000 1000                   1550
.             .                       .                      .
.             .                       .                      .
.             .                       .                      .
20            2000-01-16 00:00:00.000 1000                   5900
21            2000-01-17 00:00:00.000 1000                   6900
22            2000-01-18 00:00:00.000 -800                   6100
23            2000-01-19 00:00:00.000 2000                   8100
24            2000-01-20 00:00:00.000 -1000                  7100

ROWS clause with not unique order

SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate
GO

Results will be the same as in previous example. They are the same because there is no parallelism and we have CLUSTERED INDEX on the TransactionID (otherwise the final order could be different because the order of rows with the same TransactionDate is not guaranteed here.

TransactionID TransactionDate         Balance                CummulativeBalance
------------- ----------------------- ---------------------- ----------------------
1             2000-01-01 00:00:00.000 100                    100
2             2000-01-01 00:00:00.000 -50                    50
3             2000-01-02 00:00:00.000 200                    250
4             2000-01-03 00:00:00.000 500                    750
5             2000-01-04 00:00:00.000 -200                   550
6             2000-01-05 00:00:00.000 1000                   1550
7             2000-01-05 00:00:00.000 -300                   1250
.             .                       .                      .
.             .                       .                      .
.             .                       .                      .
20            2000-01-16 00:00:00.000 1000                   5900
21            2000-01-17 00:00:00.000 1000                   6900
22            2000-01-18 00:00:00.000 -800                   6100
23            2000-01-19 00:00:00.000 2000                   8100
24            2000-01-20 00:00:00.000 -1000                  7100

RANGE Clause with not unique order

SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO

Here we see, that the results are quite different. The final sum is the same, but the intermediate are not.

TransactionID TransactionDate         Balance                CummulativeBalance
------------- ----------------------- ---------------------- ----------------------
1             2000-01-01 00:00:00.000 100                    50
2             2000-01-01 00:00:00.000 -50                    50
3             2000-01-02 00:00:00.000 200                    250
4             2000-01-03 00:00:00.000 500                    750
5             2000-01-04 00:00:00.000 -200                   550
6             2000-01-05 00:00:00.000 1000                   1250
7             2000-01-05 00:00:00.000 -300                   1250
8             2000-01-06 00:00:00.000 -300                   950
.             .                       .                      .
.             .                       .                      .
.             .                       .                      .
17            2000-01-14 00:00:00.000 1000                   5700
18            2000-01-15 00:00:00.000 -500                   4900
19            2000-01-15 00:00:00.000 -300                   4900
20            2000-01-16 00:00:00.000 1000                   5900
21            2000-01-17 00:00:00.000 1000                   6900
22            2000-01-18 00:00:00.000 -800                   6100
23            2000-01-19 00:00:00.000 2000                   8100
24            2000-01-20 00:00:00.000 -1000                  7100

Here we can see, that the RANGE works as described above. All rows with the same value in the ORDER BY clause are considered as current row. Therefore for the dates ‘2000/01/01’ ,  ‘2000/01/05’ and ‘2000/01/15’ the values for each date are the same.

Working with FOLLOWING Rows

All the examples above worked with current row and all previous rows. Except this we can even work with rows following current row in particular order.

Here are a few other examples incorporating also FOLLOWING rows.

--Sum of current row and all following rows
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID

--SUM of 1 preceding, current and one following row
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID

--SUM of all rows in each row
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER () AS FinalBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID

Example with Partitioning results

In previous examples we have worked with single partition. The OVER clause also allows partitioning the results. So let see some a few examples with partitioning.

--Sum of current row and all following rows partitioned, by AccountID
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate, TransactionID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS CummulativeBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate, TransactionID

--SUM of 1 preceding, current and one following row, partitioned by AccountID
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate, TransactionID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS CummulativeBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate, TransactionID

--SUM of all rows in each row
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID]) AS FinalBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate, TransactionID

--SUM of all preceeding and current row, partitioned by AccountID order is based only on TransactionDate - using RANGE
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate

When to use ROWS and when RANGE

Now we can ask, when we should use the ROWS clause and when RANGE clause to limit the rows. The answer comes fro the definition how the ROWS and RANGE clauses works. As described, ROWS works with each unique rows and RANGE handles multiple rows with the same order position as current row.

So in case the combination of fields specified in the ORDER BY clause does not uniquely specify the order of rows (as in case of examples above when only TransactionDate was used), then you should use RANGE, as the processing order of rows with the same order position is not guaranteed. In case the rows are uniquely identified, then ROWS should be used as there are no rows with the same order in the partition.

Conclusion

The new windowing functions brings new possibilities in writing T-SQL queries can simplify a lot of tasks which were problematic to write without these constructs. It allow us to bypass the recursive CTE, other solutions for calculation of running totals or averages without knocking down the server and also allow us to bypass quirky updates, CLR solutions which have some pitfalls when are used.

In my next post I will take a closer look on the Running Totals problem when using this new windowing functionality. Also I will take a closer look on the query plans produced by those constructs and give some advices for using them.

Advertisement

Calculating running total for last X rows – UPDATED

This is another blog post about running totals and again I will use CLR and demonstrate the power of CLR in such situations. Again this post is inspired by an article on SQL Server Central – Calculate the Running Total for the last five Transactions by Divya Agrawal.

This is a kind of running totals when we want to sum only a specific number of last rows.

The CLR function will be nothing than a slightly modification of the running totals function from my previous blog post: SQL Sever and fastest running totals using CLR.

I’ve updated this blog post based on the update of mentioned previous post, after communication with Paul White and added the security check into the CLR function to detect processing of rows out of expected order. Please read the previous post for more details.

Test data preparation

I will use sample data from the article, so it is possible to compare the solutions and we will be able to see the difference.

CREATE TABLE Accounts
(
    ID int IDENTITY(1,1),
    TransactionDate datetime,
    Balance float
)
GO

INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/1/2000',100)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/2/2000',101)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/3/2000',102)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/4/2000',103)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/5/2000',104)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/6/2000',105)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/7/2000',106)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/8/2000',107)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/9/2000',108)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/10/2000',109)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/11/2000',200)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/12/2000',201)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/13/2000',202)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/14/2000',203)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/15/2000',204)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/16/2000',205)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/17/2000',206)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/18/2000',207)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/19/2000',208)
INSERT INTO Accounts(TransactionDate,Balance) VALUES ('1/20/2000',209)
GO

CLR Solution

As I mentioned above the CLR solution is only a modification of function from previous blog post. We only add a queue of last calculated running totals.

Calculation of Running Total in Excel

The image above displays situation when we want to make running total for last 3 rows. Our solution will calculate the running total continuously as it is done in column B, but when the requested count of 3 rows is met, it will subtract the running total 3 rows back as can be seen on the image in column B. For this we need in our function a queue of the last X running total values (In this example 3). This class also implements the security check introduced in my previously updated article related to Running Totals. The security check consist in providing the correct row numbers in the sequence in which the records should be processed and in case the records are processed out of expected order, an exception is fired.

using System;
using Microsoft.SqlServer.Server;
using System.Runtime.Remoting.Messaging;
using System.Data.SqlTypes;
using System.Collections.Generic;

/// <summary>
/// Class contains CLR scalar functions for calculation of running totals
/// </summary>
public class RunningTotalsQueue
{
    /// <summary>
    /// Storage Structure for holding actual Total and row number for security check.
    /// </summary>
    /// <typeparam name="T">Totals Data Type</typeparam>
    private class RtStorage<T> where T : struct
    {
        public T Total;
        public int RowNo;
        public Queue<T> Queue;
        public RtStorage(int queueLength)
        {
            Queue = new Queue<T>(queueLength);
        }
    }

    /// <summary>
    /// Calculates a running totals on Int (Int32) data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="queueLength">Length of the queue. 0 for classical running totals</param>
    /// <param name="rowNo">Specifies expected rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <param name="nullForLessRows">Specifies whether return NULL if less values than queue are summed</param>
    /// <returns>SqlInt64 representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlInt64 RunningTotalBigIntQueue(SqlInt64 val, SqlByte id, int queueLength, int rowNo, SqlInt64 nullValue, bool nullForLessRows)
    {
        string dataName = string.Format("MultiSqlRtQueue_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        RtStorage<SqlInt64> storage;
        if (lastSum != null)
            storage = (RtStorage<SqlInt64>)lastSum;
        else
            CallContext.SetData(dataName, storage = new RtStorage<SqlInt64>(queueLength));

        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        var currentTotal = storage.Total;
        if (queueLength > 0 && storage.Queue.Count == queueLength)
        {
            var lastQueue = storage.Queue.Dequeue();
            currentTotal -= lastQueue.IsNull ? 0 : lastQueue;
        }
        else if (storage.Queue.Count < queueLength && nullForLessRows)
            currentTotal = SqlInt64.Null;

        if (queueLength > 0)
            storage.Queue.Enqueue(storage.Total);

        return currentTotal;
    }

}

On first call the function allocates a queue of requested queue size inside a private storage class and stores it using the CallContext for use by future calls. The function also allows returning NULL when the count of rows processed is lower than the number of rows requested to be calculated in the running total and also allows calculation of classical running totals (not queued) in case the queue length is equal to 0.

CREATE FUNCTION [dbo].[fn_RunningTotalBigIntQueue](
	@val [bigint],                  --value to be added to running total (a fiedl in the query)
	@id [tinyint],                  --id of the running total within a single query
	@queueLength [int],             --lenght of the queue
	@rowNo [int],                   --RowNumber of processed records. This is compared to espected rowNo and in case out of synchronization an exceiption is fired
	@nullValue [bigint] = NULL,     --representation of the NULL value when adding to running totals
	@nullForLesRows [bit] = 0       --specifies whether return NULL if less records than queueLenght were processed
)
RETURNS [bigint]
WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [SqlClrTotals].[RunningTotalsQueue].[RunningTotalBigIntQueue]
GO

Tests

Once we have compiled the sources, created assembly and function in DB, we can make a tests. First we will test the T-SQL solution from the article mentioned at the beginning of this blog post and then the presented CLR solution.

T-SQL solution

SET STATISTICS IO ON
GO
SELECT
    Acc.ID,CONVERT(varchar(10),TransactionDate,101) AS TransactionDate ,
    Balance,
    isnull(RunningTotal,'') AS RunningTotal
FROM Accounts Acc
LEFT OUTER JOIN (SELECT ID,sum(Balance) AS RunningTotal
        FROM (SELECT A.ID AS ID,B.ID AS BID, B.Balance
            FROM Accounts A
                cross JOIN Accounts B
            WHERE B.ID BETWEEN A.ID-4
            AND A.ID AND A.ID>4
            ) T
        GROUP BY ID ) Bal
    ON Acc.ID=Bal.ID
GO
SET STATISTICS IO OFF
GO

Results:

ID          TransactionDate Balance                RunningTotal
----------- --------------- ---------------------- ----------------------
1           01/01/2000      100                    0
2           01/02/2000      101                    0
3           01/03/2000      102                    0
4           01/04/2000      103                    0
5           01/05/2000      104                    510
6           01/06/2000      105                    515
7           01/07/2000      106                    520
8           01/08/2000      107                    525
9           01/09/2000      108                    530
10          01/10/2000      109                    535
11          01/11/2000      200                    630
12          01/12/2000      201                    725
13          01/13/2000      202                    820
14          01/14/2000      203                    915
15          01/15/2000      204                    1010
16          01/16/2000      205                    1015
17          01/17/2000      206                    1020
18          01/18/2000      207                    1025
19          01/19/2000      208                    1030
20          01/20/2000      209                    1035 

(20 row(s) affected) 

Table 'Accounts'. Scan count 37, logical reads 37, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

We cab see, that for only a few records and the T-SQL solution produced 37 logical reads and the query plan is quite complex. And there is a lot of table scans.

If we add a clustered primary key on the ID

ALTER TABLE dbo.Accounts ADD CONSTRAINT PK_Accounts PRIMARY KEY CLUSTERED(ID)

Then the plan will be better as we will replace the three table scans in the query plan by one clustered index scan and two clustered index seeks. But even now the plan is quite complex.

CLR Solution test

SET STATISTICS IO ON
GO
SELECT
    Acc.ID
    ,CONVERT(varchar(10),TransactionDate,101) AS TransactionDate
    ,Balance
    ,ISNULL(dbo.fn_RunningTotalBigIntQueue(Balance, 0, 5, ROW_NUMBER() OVER(ORDER BY ID), NULL, 1), 0) AS RunningTotal
 FROM Accounts Acc
 ORDER BY ID
 OPTION(MAXDOP 1)
GO
SET STATISTICS IO OFF
GO

Here we have results:

ID          TransactionDate Balance                RunningTotal
----------- --------------- ---------------------- ----------------------
1           01/01/2000      100                    0
2           01/02/2000      101                    0
3           01/03/2000      102                    0
4           01/04/2000      103                    0
5           01/05/2000      104                    510
6           01/06/2000      105                    515
7           01/07/2000      106                    520
8           01/08/2000      107                    525
9           01/09/2000      108                    530
10          01/10/2000      109                    535
11          01/11/2000      200                    630
12          01/12/2000      201                    725
13          01/13/2000      202                    820
14          01/14/2000      203                    915
15          01/15/2000      204                    1010
16          01/16/2000      205                    1015
17          01/17/2000      206                    1020
18          01/18/2000      207                    1025
19          01/19/2000      208                    1030
20          01/20/2000      209                    1035 

(20 row(s) affected) 

Table 'Accounts'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Here we can see, that the CLR solution produced exactly the same output as the T-SQL solution. What more, we have only 1 logical read as the table is small and we have also received a very simple execution plan with single Table Scan and Sort. The segment and Sequence Project(Compute Scalar) operators are the ROW_NUMBER() function for security check.  The compute Scalar is our function computing the running totals.

In case of the Clustered Primary Key the plan will contain one Clustered Index scan without sort as the Clustered Key is already in our expected order.

Conclusion

As we can see from the simple test here, the CLR solution is unbeatable for such purposes. Even if we use other T-SQL solution like “quirky updates”, even then the CLR will have less reads and should be quicker.

When increasing the number of records and increasing the last X number, then with higher number count, the T-SQL solution will kill even high end servers. On the other side you will see nearly no difference in speed when using the CLR solution. As it only needs to allocate one queue of last X elements. Even if we want to count last 1000 000 rows, then it only needs to allocate such queue and in case of bigint it will 8 MB + some overhead of the queue.

Of course the CLR solution has the issues with processing rows in right order, but for that purposes there is the security check implemented so in case rows are processed in other than expected order an exception is thrown. But in most cases the scenario is very simple and you need to calculate the total from a simple table and in such cases this solution work without bigger problems. In case of complex queries with JOINS or other windowing functions, records can be processed into a temp table and then the running totals calculated on the temporary table.

SQL Server and fastest running totals using CLR – Updated

Calculating running totals is a common task when reporting warehouse or financial balance. The theory behind running totals is quite easy as it is simply sum of values on all previous rows.

RowID ProductID Date Operation Quantity Total Quantity
1 1 2010/09/01 Buy 1000 1000
2 1 2010/09/02 Buy 500 1500
3 1 2010/09/03 Sell 200 1300
4 1 2010/09/04 Buy 500 1800
5 1 2010/09/05 Sell 1000 800

Some reporting tools provide native support for running total, but some tools not. The real problems comes when we need to calculate running totals inside query. The main problems are with performance and sometimes complexity of such solution.

Jeff Moden has a great article about calculating running totals using T-SQL on SQL Server Central: Solving the “Running Total” & “Ordinal Rank” Problems.

For relatively small record sets the T-SQL solution is quite usable, but for larger record sets the real performance problems comes (a lot of reads and processing).

Although there are many articles related to running totals, nearly non of them describes the fastest possible method for calculating running totals – CLR. On the other side the CLR solution brings some issues which we have to take in mind.

I have updated the article after some comments and further email communication with Paul White MVP (blog). Paul has mentioned issues with rows ordering for calculation of the running totals so this updated version reflect this and also I’ve added a security check to the functions so those functions throws exceptions when and issue with row ordering is detected.

CLR solution for running totals

As I mentioned above, the CLR solution is the simplest and quickest solution for the running totals. Before we start with the CLR solution we will prepare a testing database and test data.

--======================
-- Create test database
--======================
CREATE DATABASE RunningTotalsTest
GO
USE RunningTotalsTest
GO 

--===========================
-- Create small test table
--===========================
CREATE TABLE SmallTable (
    RowID int NOT NULL IDENTITY(1,1),
    Value int, RandomValue int
)
GO 

--Add Clustered primary key to the table
ALTER TABLE SmallTable
    ADD CONSTRAINT PK_SmallTable
    PRIMARY KEY CLUSTERED (RowID)
    WITH FILLFACTOR = 100
GO 

--===========================
-- Create big test table
--===========================

--The big table will be heap, becase we will want to test paralel query plans on it
CREATE TABLE BigTable (
    RowID int NOT NULL IDENTITY(1,1),
    Value int,
    RandomValue int,
    BigData binary(3000)
)
GO 

--==============================
-- Fill tables with test data
--============================== 

--fill small table with data 1000 records
WITH TallyBase AS (
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N
),
Tally AS (
    SELECT
        ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) As Value
    FROM TallyBase t1, TallyBase t2, TallyBase t3
)
INSERT INTO SmallTable (
    Value,
    RandomValue
)
SELECT
    Value,
    ABS(CHECKSUM(NEWID())) % 100000
FROM Tally
GO 

--fill BigTable with 100000 records
WITH TallyBase AS (
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N UNION ALL
    SELECT 1 AS N
),
Tally AS (
    SELECT
        ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) As Value
    FROM TallyBase t1, TallyBase t2, TallyBase t3, TallyBase t4, TallyBase t5
)
INSERT INTO BigTable (
    Value,
    RandomValue
)
SELECT
    Value,
    ABS(CHECKSUM(NEWID())) % 100000
FROM Tally
GO

Simple scalar CLR running totals function

In our implementation we will use a simple scalar CLR function. This function will store current value for next row and in that row it will retrieve the value from previous row and calculate total. For this purpose we will use CallContext class (See MSDN for details).

Here is testing class and function for BigInt data type.

public class RunningTotals
{
    /// <summary>
    /// Storage Structure for holding actual Total and row number for security check.
    /// </summary>
    /// <typeparam name="T">Totals Data Type</typeparam>
    private struct RtStorage<T> where T : struct
    {
        public T Total;
        public int RowNo;
    }

    /// <summary>
    /// Calculates a running totals on BigInt (Int64) data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <returns>SqlInt64 representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlInt64 RunningTotalBigInt(SqlInt64 val, SqlByte id, int rowNo, SqlInt64 nullValue)
    {
        string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        var storage = lastSum != null ? (RtStorage<SqlInt64>)lastSum : new RtStorage<SqlInt64>();
        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        CallContext.SetData(dataName, storage);

        return storage.Total;
    }
}

The function has several input parameters:

  • val parameter represents the value to be accumulated into the running total.
  • id parameter is identification of the function in a select statement if there are more functions used in single select (eg. accumulating costs and incomes separately) – it is necessary to have separate storage for each call of the function.
  • rowNo is parameter for the security check to ensure, that rows are accumulated in expected order and correct results are provided. In case an issue with accumulation order is detected, an exception is thrown so user immediately knows the calculation is wrong.  The security check is based on communication with Paul White (MVP) as mentioned above and the value for this parameter will be provided by ROW_NUMBER() windowing function with expected order.
  • nullValue allows specify value for null handling (eg. you may want to accumulate some particular value in case the val is null.

Once we compile the assembly with the class, we can register it in our test database. Because we use the CallContext class, the assembly will require PERMISSION_SET = UNSAFE and because of this we have to set also the database TRUSTWORTHY to ON;

--=================================================================
-- Set Testing database TRUSTWORTHY ON so we can use UNSAFE code
--=================================================================
ALTER DATABASE [RunningTotalsTest]
SET TRUSTWORTHY ON WITH ROLLBACK IMMEDIATE
GO 

--==========================================
-- Create assembly
--==========================================
CREATE ASSEMBLY [SqlClrTotals]
AUTHORIZATION [dbo]
FROM 'C:SQLCLRSqlClrTotals.dll'
WITH PERMISSION_SET = UNSAFE
GO 

--==========================================
-- Create testing function
--==========================================
CREATE FUNCTION [dbo].[fn_RunningTotalBigInt](
	@val [bigint],					--value to be added to running total (a fiedl in the query)
	@id [tinyint],					--id of the running total within a single query
	@rowNo [int],					--RowNumber of processed records. This is compared to espected rowNo and in case out of synchronization an exceiption is fired
	@nullValue [bigint] = NULL		--representation of the NULL value when adding to running totals
)
RETURNS [bigint]
WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [SqlClrTotals].[RunningTotals].[RunningTotalBigInt]
GO

Once the assembly and function is created, we can start testing the function on our test tables.

Testing running total function on queries with serial plan

Fist test will be done on simple queries which will have a simple serial plan.

Simple query with one running total

SET STATISTICS IO ON
SET STATISTICS TIME ON
GO 

SELECT
    RowID,
    Value,
    dbo.fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal
FROM dbo.SmallTable
ORDER BY RowID

SET STATISTICS TIME OFF
SET STATISTICS IO OFF
GO

Please note the ROW_NUMBER() OVER(ORDER BY RowID). This is the security check by which we provide RowNumber in the order of RowID. In case of some inconsistencies (rows arrive in different order as required. This can happen because the query engine doesn’t guarantee the rows order until final result set is received) an exception will be thrown.

As we can see, the syntax for this query is very simple and it also give as a very good and correct results:

RowID       Value       RunningTotal
----------- ----------- --------------------
1           1           1
2           2           3
3           3           6
4           4           10
5           5           15
6           6           21
.           .           .
.           .           .
.           .           .
998         998         498501
999         999         499500
1000        1000        500500 

(1000 row(s) affected) 

SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. 

(1000 row(s) affected)
Table 'SmallTable'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

(1 row(s) affected)
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 3 ms.

And also a very simple query plan:

Non of the T-SQL methods will produce such simple query plan. The Sequence Project is calculation of the ROW_NUMBER() for security check.

Multiple running totals in single query

Now let test the situation if we use the function two times in the same query.

SELECT
    RowID,
    Value,
    [dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal,
    RandomValue,
    [dbo].fn_RunningTotalBigInt(RandomValue, 1, ROW_NUMBER() OVER(ORDER BY RowID), null) as RandomRunningTotal
FROM dbo.SmallTable
ORDER BY RowID

Result:

RowID       Value       RunningTotal         RandomValue RandomRunningTotal
----------- ----------- -------------------- ----------- --------------------
1           1           1                    15137       15137
2           2           3                    97303       112440
3           3           6                    14688       127128
4           4           10                   64512       191640
5           5           15                   20463       212103
.           .           .                    .           .
.           .           .                    .           .
996         996         496506               64071       50040800
997         997         497503               58488       50099288
998         998         498501               65331       50164619
999         999         499500               97196       50261815
1000        1000        500500               65433       50327248

As we can see, we have again correct results. Note, that each call to the fn_RunningTotalInt has different ID.

In case we use same ID the execution will fail with detection of rows being processed in unexpected order.

SELECT
    RowID,
    Value,
    [dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal,
    RandomValue,
    [dbo].fn_RunningTotalBigInt(RandomValue, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RandomRunningTotal
FROM dbo.SmallTable
ORDER BY RowID
Msg 6522, Level 16, State 2, Line 1
A .NET Framework error occurred during execution of user-defined routine or aggregate "fn_RunningTotalInt":
System.InvalidOperationException: Rows were processed out of expected order. Expected RowNo: 2, received RowNo: 1
System.InvalidOperationException:
 in RunningTotals.RunningTotalInt(SqlInt32 val, SqlByte id, Int32 rowNo, SqlInt32 nullValue)

This is caused by the fact that in case of using identical id, the internal counters are used by both functions instead of single counter for each function call.

Detection of processing row order changes

As I  have mentioned at the beginning fo the article, this update was done upon Paul White comments, so now the function detect such issues if they occurs. Let’s simulate such issue.

SELECT
    RowID,
    Value,
    [ClrUnsafe].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal,
    ROW_NUMBER() OVER(ORDER BY RowID DESC) DescendingRowOrder
FROM dbo.SmallTable
ORDER BY RowID
Msg 6522, Level 16, State 2, Line 1
A .NET Framework error occurred during execution of user-defined routine or aggregate "fn_RunningTotalInt":
System.InvalidOperationException: Rows were processed out of expected order. Expected RowNo: 1, received RowNo: 1000
System.InvalidOperationException:
in RunningTotals.RunningTotalInt(SqlInt32 val, SqlByte id, Int32 rowNo, SqlInt32 nullValue)

This query produces below plan:

The issue is, that the the scalar operator is after a sort in descending order which is being done for the DescendingRowOrder and prior the final sort. However if such issue occurs, it is correctly detected and an exception raised. This issue can occur even the scalar operator was the last operator as the order of rows is not guaranteed by the query processor until final result is received, but in normal cases in single threaded processing it generally works and in case of some issues, we have the check implemented.

Null values handling

Now we will focus on the NULL values handling. Let’s run another simple query with some NULL values.

WITH Data (RowID, Value) AS (
    SELECT 1 AS RowId, NULL As Value
    UNION ALL SELECT 2 AS RowId, 1 As Value
    UNION ALL SELECT 3 AS RowId, NULL As Value
    UNION ALL SELECT 4 AS RowId, 1 As Value UNION ALL
    SELECT 5 AS RowId, 1 As Value
)
SELECT
    RowId
    ,Value
    ,[dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as TotalNull
    ,[dbo].fn_RunningTotalBigInt(Value, 1, ROW_NUMBER() OVER(ORDER BY RowID), 0) as TotalZero
    ,[dbo].fn_RunningTotalBigInt(Value, 2, ROW_NUMBER() OVER(ORDER BY RowID), 10) as TotalTen
FROM Data
ORDER BY RowID

And of course result of such solution:

RowId       Value       TotalNull            TotalZero            TotalTen
----------- ----------- -------------------- -------------------- --------------------
1           NULL        NULL                 0                    10
2           1           1                    1                    11
3           NULL        1                    1                    21
4           1           2                    2                    22
5           1           3                    3                    23

Here we can see, that if we pass NULL, then the NULL values are ignored and in the other two cases the NULL values are handled as 0 respectively 10.

Parallel plans and not ordered result set testing

Till now we had tested only simple queries, which produce serial plan. But what if we will use the function in much more complicated queries, or queries processing high amount of data and which produce parallel query plan on multi processor (core) systems.

Let’s simulate such scenario on our big table, which should produce a parallel plan on multi core system and default settings of SQL Server.

SELECT
    RowID
    ,Value
    ,[dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal
FROM dbo.BigTable
ORDER BY RowID

Here is query plan for above query:

And here results:

RowID       Value       RunningTotal
----------- ----------- --------------------
1           1           1
2           2           3
3           3           6
4           4           10
99994       99994       4999450015
99995       99995       4999550010
99996       99996       4999650006
99997       99997       4999750003
99998       99998       4999850001
99999       99999       4999950000
100000      100000      5000050000

(100000 row(s) affected)

In this case the results are correct as Compute Scalar was performed after the parallel streams were garthered.
In case the computation of scalar running total value occured prior the parallel streams are garthered, then again an issue that rows are not processed in corect order may occur.
To avoid this kind of problems we can force serial query plan by setting max degree of parallelism for the query using the OPTION(MAXDOP 1)

SELECT
    RowID
    ,Value
    ,[dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal
FROM dbo.BigTable
ORDER BY RowID
OPTION(MAXDOP 1)

This query will always produce a serial plan. In this case following:

Final form of Running Totals

From the above we can construct a final form of the Running Totals functions for each numeric data type. The final form can look like following class;

using System;
using Microsoft.SqlServer.Server;
using System.Runtime.Remoting.Messaging;
using System.Data.SqlTypes;
using System.Collections.Generic;

/// <summary>
/// Class contains CLR scalar functions for calculation of running totals
/// </summary>
public class RunningTotals
{
    /// <summary>
    /// Storage Structure for holding actual Total and row number for security check.
    /// </summary>
    /// <typeparam name="T">Totals Data Type</typeparam>
    private struct RtStorage<T> where T : struct
    {
        public T Total;
        public int RowNo;
    }

    /// <summary>
    /// Calculates a running totals on TinyInt (byte) data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <returns>SqlByte representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlByte RunningTotalTinyInt(SqlByte val, SqlByte id, int rowNo, SqlByte nullValue)
    {
        string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        var storage = lastSum != null ? (RtStorage<SqlByte>)lastSum : new RtStorage<SqlByte>();
        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        CallContext.SetData(dataName, storage);

        return storage.Total;
    }

    /// <summary>
    /// Calculates a running totals on SmallInt (Int) data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <returns>SqlInt16 representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlInt16 RunningTotalSmallInt(SqlInt16 val, SqlByte id, int rowNo, SqlInt16 nullValue)
    {
        string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        var storage = lastSum != null ? (RtStorage<SqlInt16>)lastSum : new RtStorage<SqlInt16>();
        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        CallContext.SetData(dataName, storage);

        return storage.Total;
    }

    /// <summary>
    /// Calculates a running totals on Int (Int32) data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <returns>SqlInt32 representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlInt32 RunningTotalInt(SqlInt32 val, SqlByte id, int rowNo, SqlInt32 nullValue)
    {
        string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        var storage = lastSum != null ? (RtStorage<SqlInt32>)lastSum : new RtStorage<SqlInt32>();
        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        CallContext.SetData(dataName, storage);

        return storage.Total;
    }

    /// <summary>
    /// Calculates a running totals on BigInt (Int64) data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <returns>SqlInt64 representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlInt64 RunningTotalBigInt(SqlInt64 val, SqlByte id, int rowNo, SqlInt64 nullValue)
    {
        string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        var storage = lastSum != null ? (RtStorage<SqlInt64>)lastSum : new RtStorage<SqlInt64>();
        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        CallContext.SetData(dataName, storage);

        return storage.Total;
    }

    /// <summary>
    /// Calculates a running totals on Float (Double) data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <returns>SqlDouble representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlDouble RunningTotalFloat(SqlDouble val, SqlByte id, int rowNo, SqlDouble nullValue)
    {
        string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        var storage = lastSum != null ? (RtStorage<SqlDouble>)lastSum : new RtStorage<SqlDouble>();
        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        CallContext.SetData(dataName, storage);

        return storage.Total;
    }

    /// <summary>
    /// Calculates a running totals on Real (Single) data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <returns>SqlSingle representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlSingle RunningTotalReal(SqlSingle val, SqlByte id, int rowNo, SqlSingle nullValue)
    {
        string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        var storage = lastSum != null ? (RtStorage<SqlSingle>)lastSum : new RtStorage<SqlSingle>();
        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        CallContext.SetData(dataName, storage);

        return storage.Total;
    }

    /// <summary>
    /// Calculates a running totals on Money data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <returns>SqlMoney representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlMoney RunningTotalMoney(SqlMoney val, SqlByte id, int rowNo, SqlMoney nullValue)
    {
        string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        var storage = lastSum != null ? (RtStorage<SqlMoney>)lastSum : new RtStorage<SqlMoney>();
        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        CallContext.SetData(dataName, storage);

        return storage.Total;
    }

    /// <summary>
    /// Calculates a running totals on Decimal data type
    /// </summary>
    /// <param name="val">Value of current row</param>
    /// <param name="id">ID of the function in single query</param>
    /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param>
    /// <param name="nullValue">Value to be used for NULL values</param>
    /// <returns>SqlDecimal representing running total</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlDecimal RunningTotalDecimal(SqlDecimal val, SqlByte id, int rowNo, SqlDecimal nullValue)
    {
        string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);

        object lastSum = CallContext.GetData(dataName);

        var storage = lastSum != null ? (RtStorage<SqlDecimal>)lastSum : new RtStorage<SqlDecimal>();
        storage.RowNo++;

        if (storage.RowNo != rowNo)
            throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));

        if (!val.IsNull)
            storage.Total = storage.Total.IsNull ? val : storage.Total + val;
        else
            storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);

        CallContext.SetData(dataName, storage);

        return storage.Total;
    }
}

Conclusion

As we have tested and seen, the CLR version of calculation running totals produces very simple query plans and is one of the fastest methods to calculate running totals.

The CLR functions are really simple and fast.

On one side the CLR solution for running totals is one of the fastest, but on the other side we have to remember that this solution requires, that records are processed in order in which the running totals should be calculated. However the Query Optimizer and Query Processor doesn’t guarantee, that the records will be processed in particular order. It only guarantees, that the final output will be sorced according the ORDER BY clause.

To ensure, that those function to not produce incorrect results a security check in the form of rowNo generated by ROW_NUMBER() function needs to be part of the solution an in case the solution could produce incorrect results, it raises an exception.

So I would like to thank gain to Paul White MVP (blog) for his notes.

Remarks

You can download a complete Visual C# project + SQL Scripts (script for generating sample database, script for creating CLR objects + script with test queries) from my SkyDrive here: SqlClrTotals_Updated.zip