Calculating running totals is a common task when reporting warehouse or financial balance. The theory behind running totals is quite easy as it is simply sum of values on all previous rows.
RowID | ProductID | Date | Operation | Quantity | Total Quantity |
1 | 1 | 2010/09/01 | Buy | 1000 | 1000 |
2 | 1 | 2010/09/02 | Buy | 500 | 1500 |
3 | 1 | 2010/09/03 | Sell | 200 | 1300 |
4 | 1 | 2010/09/04 | Buy | 500 | 1800 |
5 | 1 | 2010/09/05 | Sell | 1000 | 800 |
Some reporting tools provide native support for running total, but some tools not. The real problems comes when we need to calculate running totals inside query. The main problems are with performance and sometimes complexity of such solution.
Jeff Moden has a great article about calculating running totals using T-SQL on SQL Server Central: Solving the “Running Total” & “Ordinal Rank” Problems.
For relatively small record sets the T-SQL solution is quite usable, but for larger record sets the real performance problems comes (a lot of reads and processing).
Although there are many articles related to running totals, nearly non of them describes the fastest possible method for calculating running totals – CLR. On the other side the CLR solution brings some issues which we have to take in mind.
I have updated the article after some comments and further email communication with Paul White MVP (blog). Paul has mentioned issues with rows ordering for calculation of the running totals so this updated version reflect this and also I’ve added a security check to the functions so those functions throws exceptions when and issue with row ordering is detected.
CLR solution for running totals
As I mentioned above, the CLR solution is the simplest and quickest solution for the running totals. Before we start with the CLR solution we will prepare a testing database and test data.
--====================== -- Create test database --====================== CREATE DATABASE RunningTotalsTest GO USE RunningTotalsTest GO --=========================== -- Create small test table --=========================== CREATE TABLE SmallTable ( RowID int NOT NULL IDENTITY(1,1), Value int, RandomValue int ) GO --Add Clustered primary key to the table ALTER TABLE SmallTable ADD CONSTRAINT PK_SmallTable PRIMARY KEY CLUSTERED (RowID) WITH FILLFACTOR = 100 GO --=========================== -- Create big test table --=========================== --The big table will be heap, becase we will want to test paralel query plans on it CREATE TABLE BigTable ( RowID int NOT NULL IDENTITY(1,1), Value int, RandomValue int, BigData binary(3000) ) GO --============================== -- Fill tables with test data --============================== --fill small table with data 1000 records WITH TallyBase AS ( SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N ), Tally AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) As Value FROM TallyBase t1, TallyBase t2, TallyBase t3 ) INSERT INTO SmallTable ( Value, RandomValue ) SELECT Value, ABS(CHECKSUM(NEWID())) % 100000 FROM Tally GO --fill BigTable with 100000 records WITH TallyBase AS ( SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 1 AS N ), Tally AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) As Value FROM TallyBase t1, TallyBase t2, TallyBase t3, TallyBase t4, TallyBase t5 ) INSERT INTO BigTable ( Value, RandomValue ) SELECT Value, ABS(CHECKSUM(NEWID())) % 100000 FROM Tally GO
Simple scalar CLR running totals function
In our implementation we will use a simple scalar CLR function. This function will store current value for next row and in that row it will retrieve the value from previous row and calculate total. For this purpose we will use CallContext
class (See MSDN for details).
Here is testing class and function for BigInt data type.
public class RunningTotals { /// <summary> /// Storage Structure for holding actual Total and row number for security check. /// </summary> /// <typeparam name="T">Totals Data Type</typeparam> private struct RtStorage<T> where T : struct { public T Total; public int RowNo; } /// <summary> /// Calculates a running totals on BigInt (Int64) data type /// </summary> /// <param name="val">Value of current row</param> /// <param name="id">ID of the function in single query</param> /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param> /// <param name="nullValue">Value to be used for NULL values</param> /// <returns>SqlInt64 representing running total</returns> [SqlFunction(IsDeterministic = true)] public static SqlInt64 RunningTotalBigInt(SqlInt64 val, SqlByte id, int rowNo, SqlInt64 nullValue) { string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value); object lastSum = CallContext.GetData(dataName); var storage = lastSum != null ? (RtStorage<SqlInt64>)lastSum : new RtStorage<SqlInt64>(); storage.RowNo++; if (storage.RowNo != rowNo) throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo)); if (!val.IsNull) storage.Total = storage.Total.IsNull ? val : storage.Total + val; else storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue); CallContext.SetData(dataName, storage); return storage.Total; } }
The function has several input parameters:
val
parameter represents the value to be accumulated into the running total.id
parameter is identification of the function in a select statement if there are more functions used in single select (eg. accumulating costs and incomes separately) – it is necessary to have separate storage for each call of the function.rowNo
is parameter for the security check to ensure, that rows are accumulated in expected order and correct results are provided. In case an issue with accumulation order is detected, an exception is thrown so user immediately knows the calculation is wrong. The security check is based on communication with Paul White (MVP) as mentioned above and the value for this parameter will be provided byROW_NUMBER()
windowing function with expected order.nullValue
allows specify value for null handling (eg. you may want to accumulate some particular value in case the val is null.
Once we compile the assembly with the class, we can register it in our test database. Because we use the CallContext
class, the assembly will require PERMISSION_SET = UNSAFE
and because of this we have to set also the database TRUSTWORTHY
to ON
;
--================================================================= -- Set Testing database TRUSTWORTHY ON so we can use UNSAFE code --================================================================= ALTER DATABASE [RunningTotalsTest] SET TRUSTWORTHY ON WITH ROLLBACK IMMEDIATE GO --========================================== -- Create assembly --========================================== CREATE ASSEMBLY [SqlClrTotals] AUTHORIZATION [dbo] FROM 'C:SQLCLRSqlClrTotals.dll' WITH PERMISSION_SET = UNSAFE GO --========================================== -- Create testing function --========================================== CREATE FUNCTION [dbo].[fn_RunningTotalBigInt]( @val [bigint], --value to be added to running total (a fiedl in the query) @id [tinyint], --id of the running total within a single query @rowNo [int], --RowNumber of processed records. This is compared to espected rowNo and in case out of synchronization an exceiption is fired @nullValue [bigint] = NULL --representation of the NULL value when adding to running totals ) RETURNS [bigint] WITH EXECUTE AS CALLER AS EXTERNAL NAME [SqlClrTotals].[RunningTotals].[RunningTotalBigInt] GO
Once the assembly and function is created, we can start testing the function on our test tables.
Testing running total function on queries with serial plan
Fist test will be done on simple queries which will have a simple serial plan.
Simple query with one running total
SET STATISTICS IO ON SET STATISTICS TIME ON GO SELECT RowID, Value, dbo.fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal FROM dbo.SmallTable ORDER BY RowID SET STATISTICS TIME OFF SET STATISTICS IO OFF GO
Please note the ROW_NUMBER() OVER(ORDER BY RowID)
. This is the security check by which we provide RowNumber in the order of RowID
. In case of some inconsistencies (rows arrive in different order as required. This can happen because the query engine doesn’t guarantee the rows order until final result set is received) an exception will be thrown.
As we can see, the syntax for this query is very simple and it also give as a very good and correct results:
RowID Value RunningTotal ----------- ----------- -------------------- 1 1 1 2 2 3 3 3 6 4 4 10 5 5 15 6 6 21 . . . . . . . . . 998 998 498501 999 999 499500 1000 1000 500500 (1000 row(s) affected) SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. (1000 row(s) affected) Table 'SmallTable'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (1 row(s) affected) SQL Server Execution Times: CPU time = 0 ms, elapsed time = 3 ms.
And also a very simple query plan:
Non of the T-SQL methods will produce such simple query plan. The Sequence Project is calculation of the ROW_NUMBER()
for security check.
Multiple running totals in single query
Now let test the situation if we use the function two times in the same query.
SELECT RowID, Value, [dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal, RandomValue, [dbo].fn_RunningTotalBigInt(RandomValue, 1, ROW_NUMBER() OVER(ORDER BY RowID), null) as RandomRunningTotal FROM dbo.SmallTable ORDER BY RowID
Result:
RowID Value RunningTotal RandomValue RandomRunningTotal ----------- ----------- -------------------- ----------- -------------------- 1 1 1 15137 15137 2 2 3 97303 112440 3 3 6 14688 127128 4 4 10 64512 191640 5 5 15 20463 212103 . . . . . . . . . . 996 996 496506 64071 50040800 997 997 497503 58488 50099288 998 998 498501 65331 50164619 999 999 499500 97196 50261815 1000 1000 500500 65433 50327248
As we can see, we have again correct results. Note, that each call to the fn_RunningTotalInt
has different ID.
In case we use same ID the execution will fail with detection of rows being processed in unexpected order.
SELECT RowID, Value, [dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal, RandomValue, [dbo].fn_RunningTotalBigInt(RandomValue, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RandomRunningTotal FROM dbo.SmallTable ORDER BY RowID
Msg 6522, Level 16, State 2, Line 1 A .NET Framework error occurred during execution of user-defined routine or aggregate "fn_RunningTotalInt": System.InvalidOperationException: Rows were processed out of expected order. Expected RowNo: 2, received RowNo: 1 System.InvalidOperationException: in RunningTotals.RunningTotalInt(SqlInt32 val, SqlByte id, Int32 rowNo, SqlInt32 nullValue)
This is caused by the fact that in case of using identical id, the internal counters are used by both functions instead of single counter for each function call.
Detection of processing row order changes
As I have mentioned at the beginning fo the article, this update was done upon Paul White comments, so now the function detect such issues if they occurs. Let’s simulate such issue.
SELECT RowID, Value, [ClrUnsafe].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal, ROW_NUMBER() OVER(ORDER BY RowID DESC) DescendingRowOrder FROM dbo.SmallTable ORDER BY RowID
Msg 6522, Level 16, State 2, Line 1 A .NET Framework error occurred during execution of user-defined routine or aggregate "fn_RunningTotalInt": System.InvalidOperationException: Rows were processed out of expected order. Expected RowNo: 1, received RowNo: 1000 System.InvalidOperationException: in RunningTotals.RunningTotalInt(SqlInt32 val, SqlByte id, Int32 rowNo, SqlInt32 nullValue)
This query produces below plan:
DescendingRowOrder
and prior the final sort. However if such issue occurs, it is correctly detected and an exception raised. This issue can occur even the scalar operator was the last operator as the order of rows is not guaranteed by the query processor until final result is received, but in normal cases in single threaded processing it generally works and in case of some issues, we have the check implemented.Null values handling
Now we will focus on the NULL values handling. Let’s run another simple query with some NULL values.
WITH Data (RowID, Value) AS ( SELECT 1 AS RowId, NULL As Value UNION ALL SELECT 2 AS RowId, 1 As Value UNION ALL SELECT 3 AS RowId, NULL As Value UNION ALL SELECT 4 AS RowId, 1 As Value UNION ALL SELECT 5 AS RowId, 1 As Value ) SELECT RowId ,Value ,[dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as TotalNull ,[dbo].fn_RunningTotalBigInt(Value, 1, ROW_NUMBER() OVER(ORDER BY RowID), 0) as TotalZero ,[dbo].fn_RunningTotalBigInt(Value, 2, ROW_NUMBER() OVER(ORDER BY RowID), 10) as TotalTen FROM Data ORDER BY RowID
And of course result of such solution:
RowId Value TotalNull TotalZero TotalTen ----------- ----------- -------------------- -------------------- -------------------- 1 NULL NULL 0 10 2 1 1 1 11 3 NULL 1 1 21 4 1 2 2 22 5 1 3 3 23
Here we can see, that if we pass NULL, then the NULL values are ignored and in the other two cases the NULL values are handled as 0 respectively 10.
Parallel plans and not ordered result set testing
Till now we had tested only simple queries, which produce serial plan. But what if we will use the function in much more complicated queries, or queries processing high amount of data and which produce parallel query plan on multi processor (core) systems.
Let’s simulate such scenario on our big table, which should produce a parallel plan on multi core system and default settings of SQL Server.
SELECT RowID ,Value ,[dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal FROM dbo.BigTable ORDER BY RowID
Here is query plan for above query:
And here results:
RowID Value RunningTotal ----------- ----------- -------------------- 1 1 1 2 2 3 3 3 6 4 4 10 99994 99994 4999450015 99995 99995 4999550010 99996 99996 4999650006 99997 99997 4999750003 99998 99998 4999850001 99999 99999 4999950000 100000 100000 5000050000 (100000 row(s) affected)
In this case the results are correct as Compute Scalar was performed after the parallel streams were garthered.
In case the computation of scalar running total value occured prior the parallel streams are garthered, then again an issue that rows are not processed in corect order may occur.
To avoid this kind of problems we can force serial query plan by setting max degree of parallelism for the query using the OPTION(MAXDOP 1)
SELECT RowID ,Value ,[dbo].fn_RunningTotalBigInt(Value, 0, ROW_NUMBER() OVER(ORDER BY RowID), null) as RunningTotal FROM dbo.BigTable ORDER BY RowID OPTION(MAXDOP 1)
This query will always produce a serial plan. In this case following:
Final form of Running Totals
From the above we can construct a final form of the Running Totals functions for each numeric data type. The final form can look like following class;
using System; using Microsoft.SqlServer.Server; using System.Runtime.Remoting.Messaging; using System.Data.SqlTypes; using System.Collections.Generic; /// <summary> /// Class contains CLR scalar functions for calculation of running totals /// </summary> public class RunningTotals { /// <summary> /// Storage Structure for holding actual Total and row number for security check. /// </summary> /// <typeparam name="T">Totals Data Type</typeparam> private struct RtStorage<T> where T : struct { public T Total; public int RowNo; } /// <summary> /// Calculates a running totals on TinyInt (byte) data type /// </summary> /// <param name="val">Value of current row</param> /// <param name="id">ID of the function in single query</param> /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param> /// <param name="nullValue">Value to be used for NULL values</param> /// <returns>SqlByte representing running total</returns> [SqlFunction(IsDeterministic = true)] public static SqlByte RunningTotalTinyInt(SqlByte val, SqlByte id, int rowNo, SqlByte nullValue) { string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value); object lastSum = CallContext.GetData(dataName); var storage = lastSum != null ? (RtStorage<SqlByte>)lastSum : new RtStorage<SqlByte>(); storage.RowNo++; if (storage.RowNo != rowNo) throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo)); if (!val.IsNull) storage.Total = storage.Total.IsNull ? val : storage.Total + val; else storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue); CallContext.SetData(dataName, storage); return storage.Total; } /// <summary> /// Calculates a running totals on SmallInt (Int) data type /// </summary> /// <param name="val">Value of current row</param> /// <param name="id">ID of the function in single query</param> /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param> /// <param name="nullValue">Value to be used for NULL values</param> /// <returns>SqlInt16 representing running total</returns> [SqlFunction(IsDeterministic = true)] public static SqlInt16 RunningTotalSmallInt(SqlInt16 val, SqlByte id, int rowNo, SqlInt16 nullValue) { string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value); object lastSum = CallContext.GetData(dataName); var storage = lastSum != null ? (RtStorage<SqlInt16>)lastSum : new RtStorage<SqlInt16>(); storage.RowNo++; if (storage.RowNo != rowNo) throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo)); if (!val.IsNull) storage.Total = storage.Total.IsNull ? val : storage.Total + val; else storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue); CallContext.SetData(dataName, storage); return storage.Total; } /// <summary> /// Calculates a running totals on Int (Int32) data type /// </summary> /// <param name="val">Value of current row</param> /// <param name="id">ID of the function in single query</param> /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param> /// <param name="nullValue">Value to be used for NULL values</param> /// <returns>SqlInt32 representing running total</returns> [SqlFunction(IsDeterministic = true)] public static SqlInt32 RunningTotalInt(SqlInt32 val, SqlByte id, int rowNo, SqlInt32 nullValue) { string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value); object lastSum = CallContext.GetData(dataName); var storage = lastSum != null ? (RtStorage<SqlInt32>)lastSum : new RtStorage<SqlInt32>(); storage.RowNo++; if (storage.RowNo != rowNo) throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo)); if (!val.IsNull) storage.Total = storage.Total.IsNull ? val : storage.Total + val; else storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue); CallContext.SetData(dataName, storage); return storage.Total; } /// <summary> /// Calculates a running totals on BigInt (Int64) data type /// </summary> /// <param name="val">Value of current row</param> /// <param name="id">ID of the function in single query</param> /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param> /// <param name="nullValue">Value to be used for NULL values</param> /// <returns>SqlInt64 representing running total</returns> [SqlFunction(IsDeterministic = true)] public static SqlInt64 RunningTotalBigInt(SqlInt64 val, SqlByte id, int rowNo, SqlInt64 nullValue) { string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value); object lastSum = CallContext.GetData(dataName); var storage = lastSum != null ? (RtStorage<SqlInt64>)lastSum : new RtStorage<SqlInt64>(); storage.RowNo++; if (storage.RowNo != rowNo) throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo)); if (!val.IsNull) storage.Total = storage.Total.IsNull ? val : storage.Total + val; else storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue); CallContext.SetData(dataName, storage); return storage.Total; } /// <summary> /// Calculates a running totals on Float (Double) data type /// </summary> /// <param name="val">Value of current row</param> /// <param name="id">ID of the function in single query</param> /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param> /// <param name="nullValue">Value to be used for NULL values</param> /// <returns>SqlDouble representing running total</returns> [SqlFunction(IsDeterministic = true)] public static SqlDouble RunningTotalFloat(SqlDouble val, SqlByte id, int rowNo, SqlDouble nullValue) { string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value); object lastSum = CallContext.GetData(dataName); var storage = lastSum != null ? (RtStorage<SqlDouble>)lastSum : new RtStorage<SqlDouble>(); storage.RowNo++; if (storage.RowNo != rowNo) throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo)); if (!val.IsNull) storage.Total = storage.Total.IsNull ? val : storage.Total + val; else storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue); CallContext.SetData(dataName, storage); return storage.Total; } /// <summary> /// Calculates a running totals on Real (Single) data type /// </summary> /// <param name="val">Value of current row</param> /// <param name="id">ID of the function in single query</param> /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param> /// <param name="nullValue">Value to be used for NULL values</param> /// <returns>SqlSingle representing running total</returns> [SqlFunction(IsDeterministic = true)] public static SqlSingle RunningTotalReal(SqlSingle val, SqlByte id, int rowNo, SqlSingle nullValue) { string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value); object lastSum = CallContext.GetData(dataName); var storage = lastSum != null ? (RtStorage<SqlSingle>)lastSum : new RtStorage<SqlSingle>(); storage.RowNo++; if (storage.RowNo != rowNo) throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo)); if (!val.IsNull) storage.Total = storage.Total.IsNull ? val : storage.Total + val; else storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue); CallContext.SetData(dataName, storage); return storage.Total; } /// <summary> /// Calculates a running totals on Money data type /// </summary> /// <param name="val">Value of current row</param> /// <param name="id">ID of the function in single query</param> /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param> /// <param name="nullValue">Value to be used for NULL values</param> /// <returns>SqlMoney representing running total</returns> [SqlFunction(IsDeterministic = true)] public static SqlMoney RunningTotalMoney(SqlMoney val, SqlByte id, int rowNo, SqlMoney nullValue) { string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value); object lastSum = CallContext.GetData(dataName); var storage = lastSum != null ? (RtStorage<SqlMoney>)lastSum : new RtStorage<SqlMoney>(); storage.RowNo++; if (storage.RowNo != rowNo) throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo)); if (!val.IsNull) storage.Total = storage.Total.IsNull ? val : storage.Total + val; else storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue); CallContext.SetData(dataName, storage); return storage.Total; } /// <summary> /// Calculates a running totals on Decimal data type /// </summary> /// <param name="val">Value of current row</param> /// <param name="id">ID of the function in single query</param> /// <param name="rowNo">Specifies expecter rowNo. It is for security check to ensure correctness of running totals</param> /// <param name="nullValue">Value to be used for NULL values</param> /// <returns>SqlDecimal representing running total</returns> [SqlFunction(IsDeterministic = true)] public static SqlDecimal RunningTotalDecimal(SqlDecimal val, SqlByte id, int rowNo, SqlDecimal nullValue) { string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value); object lastSum = CallContext.GetData(dataName); var storage = lastSum != null ? (RtStorage<SqlDecimal>)lastSum : new RtStorage<SqlDecimal>(); storage.RowNo++; if (storage.RowNo != rowNo) throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo)); if (!val.IsNull) storage.Total = storage.Total.IsNull ? val : storage.Total + val; else storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue); CallContext.SetData(dataName, storage); return storage.Total; } }
Conclusion
As we have tested and seen, the CLR version of calculation running totals produces very simple query plans and is one of the fastest methods to calculate running totals.
The CLR functions are really simple and fast.
On one side the CLR solution for running totals is one of the fastest, but on the other side we have to remember that this solution requires, that records are processed in order in which the running totals should be calculated. However the Query Optimizer and Query Processor doesn’t guarantee, that the records will be processed in particular order. It only guarantees, that the final output will be sorced according the ORDER BY clause.
To ensure, that those function to not produce incorrect results a security check in the form of rowNo generated by ROW_NUMBER() function needs to be part of the solution an in case the solution could produce incorrect results, it raises an exception.
So I would like to thank gain to Paul White MVP (blog) for his notes.
Remarks
You can download a complete Visual C# project + SQL Scripts (script for generating sample database, script for creating CLR objects + script with test queries) from my SkyDrive here: SqlClrTotals_Updated.zip
Excellent article and code, thanks for the tips!
Hi there. A nice article, indeed. I have implemented your functions on my server. One question, though. The function that deals with decimals… it only works for decimal(18,0), that is, anything that has precision > 0 gets truncated. How do I allow for decimal(18,2)? I’d be more than happy if I could get an answer quickly… please?
Ok, I have found the solution, and that was not easy at all. To get decimal(18,2) as a parameter into the function you have to decorate it like this:
[Microsoft.SqlServer.Server.SqlFacet(Precision = 18, Scale = 2)]SqlDecimal val
and to be able to obtain this type from the function you have to decorate the function like this:
[SqlFunction(IsDeterministic = true, Name = “ufn_RunningTotalForDecimal”)]
[return: Microsoft.SqlServer.Server.SqlFacet(Precision = 18, Scale = 2)]
public static SqlDecimal RunningTotalsDecimal(…
It was not easy to find, phew… Hope it’ll help ppl.
Best,
Darek
Hi,
you do not need to define this in the CLR code. Since the parameters of the CLR function in the C# code is decimal, you can create several SQL Server CLR functions to point into a single CLR method in assembly.
So you have a single CLR method in assembly
and several SQL Server CLR functions pointing to it as any SQL decimal declaration is represented by a C# decimal and/or SqlDecimal data type
Thanks for not being unsatisfied/curious enough to investigate improving running totals calculations for all of us pre-2012 admins!!
For anyone interested, I was able to quickly modify your assembly/function id parameter in order to allow for a partitioned running totals calculation. For example, SqlInt32 ([int]) instead of SqlByte ([tinyint]) to accommodate a potentially large number of partitions. After that, just modify the call to the function in the query in a way such as this:
SELECT
RowID,
PartitionID,
Value,
dbo.fn_RunningTotalBigInt(
Value,
PartitionID,
ROW_NUMBER() OVER(PARTITION BY PartitionID ORDER BY RowID),
null) as RunningTotal
FROM dbo.SmallTable
ORDER BY PartitionID, RowID
This is amazing work Pavel, thank you for sharing it.
I have a specific use for these functions in an accounting application I am writing and I have been working with a much simpler version I believe you wrote some time ago. I will update my code with your new functions and will feel more secure now.
But I have a problem (maybe more than one). I am using Entity Framework and .NET 4.0 and once I registered the function with the data model, I was able to call it with simple ESQL SELECT statements and it worked just fine.
The main problem is that while I can for example get the running total for an entire bank account, if I try to only select a subset of the account like all records between a date range (ex BETWEEN ‘2/1/2011’ AND ‘3/1/2011’), obviously it will give me a subtotal only for that subset and not include the records prior to 2/1/2011 to give me the running total accurate for that window of time.
I tried things like putting your function inside of a CTE so that the CTE was essentially ‘SELECT Col1, col2, RunningTotal(Amount) FROM Mytable and then query the outer CTE with the date range, but the result was the same. It started the total from 2/1/2011 and accumulated from there. I have tried all sorts of different things and nothing I am doing is working. I do not have nearly your familiarity or skill with SQL Server as you do and most solutions I am trying are just groping from Google searches,
In essence, people will be auditing bank accounts with thousands of entries from the opening balance of the account to the present day. To find a problem entry, the system they have been using allowed them to enter a date range and it would return the results within that date range with the running total accurately represented. However, the previous developers of the existing system achieved this by querying the table into a temporary table and iterating over it with a cursor to create the running total. Now you understand why I am rewriting it: it is incredible slow and resource intensive because there are hundreds or perhaps thousands of accounts in the DB,
Any clue how I might make this work much faster and give the user the ‘slice’ of the account data with correct running balance total on those records?
The other problem is that, as I mentioned, I am using Entity Framework. So while I can make nice clean ESQL DML calls, I don’t know how I will execute the equivalent of OPTION(MAXDOP 1, FORCE ORDER) since I don’t think OPTION() is a valid ESQL function.
I am desperate to avoid cursors and temp tables and would appreciate any guidance.
Thanks,
Dan Martini
Hi, to achieve what you want, you can simply add the accumulated amount value from the period prior your BETWEEN, as in the sample here:
Related to the OPTION(MAXDOP 1, FORCE ORDER) and Entity Framework. I didn’t worked much with the EntityFramework, and don’t now whether it is possible to pass this option using the Entity Framework, but as you could see, this option are essential for correct work.</p
I can only suggest to encapsulate the T-SQL code in the a stored procedure with parameters for the interval, and call this stored proc. Then the OPTION will be inside the StoredProc and everything should execute correctly.
While I am thinking of it, and this is probably an ignorant question: Using your final version of the CLR code, I am assuming that this will work safely in an environment where many users (up to a few hundred concurrent) will be continuously hitting the bank account register, paging back and forth, performing a lot of data entry etc. I ask because I am not not completely comfortable with CLR SQL integration and don’t really know all that WITH PERMISSION_SET = UNSAFE implies.
Being a financial application, you can understand that accuracy is paramount and speed is a close second. Thanks again for your help and the amazing code. No other solution I have tried in the past 6+ months comes even close to the stunning performance of your code.
Hi,
related to the WITH PERMISSION_SET=UNSAFE you can reed on MSDN – CREATE ASSEMBLY. If the assembly is only your code and you know what is inside, you do not have to bother about the UNSAFE. The UNSAFE is necessary because of use of the CallContext class.
Related to many users usage, this depends on your severs, but the CallContext used to store intermediate values will not cause problems and each user request is processed in separate thread and the CallContext doesn’t interfere with other threads. You can read more again on MSDN – CallContext Class.
Hi Pavel,
Thank you for both answers and the guidance. I will try working with the code that way and see if I can achieve what I need, and your example looks very promising. I just need to set up a realistic size set of data and also compile and install this latest code of yours. I will also look further into EF material and see if I can find any way to call with OPTION(MAXDOP 1, FORCE ORDER), but it is good to have the stored procedure idea as a backup if I can’t do it in EF.
I don’t know how, but the following code worked even on a table with a half million rows when i tried it a couple of months ago as a test. It was a simple SELECT against a table with a UNIQUEIDENTIFIER key (no date filtering like my last example). It may have broken down somewhere along the way and I didn’t notice it, but it appeared correct.
In any case, I completely trust what you are saying that it requires OPTION(MAXDOP 1, FORCE ORDER) to be completely safe and I will be using the new assembly of functions you wrote and try to work through the problems. Thanks again.
I only suggest to use
SqlDecimal currentValue
instead ofdecimal currentValue
You can check the
currentValue.IsNull
as in my article above to process accordingly eventualNULL
values.The
OPTION(MAXDOP 1, FORCE ORDER)
is essential for correct working. As if the the query is processed by paralled plan, you will receive wrong results, as the query will be processed by parallel threads. Therefore the MAXDOP (maximum degree of parallelism) must be set to 1 to force serial plan. The same is for the FORCE ORDER to preserve the correct order in query plan. Anyway you should inspect your query plan, that the compute scalar for the running total occurs after the sort operation.Wow. Seriously, wow. I used your new code and the examples, particularly the filtered ones and while I have only run a couple of moderately complex queries against a large dataset, I have to say this is a beautiful thing. It works better than I could have hoped. I am very grateful Pavel.
I haven’t tried working with EF yet, and I don’t have high hopes for that, but that doesn’t bother me because I can get something working with either stored procedures or dynamic SQL and that is what counts.
Thank you very much.
Thanks very much for such a wonderful code.
I’ve been trying to implement it in my database but seem to be having a teething problem:
The code below:
gives me sql error message “CREATE FUNCTION failed because parameter counts do not match.”
I’m using SQL 2008. Can you please help me with what I am missing wrong?
Hi, if you use the final form of the CLR function, then i has 3 parameters.
1. Value to be added to the running total
2. ID of the function within a statement to allow multiple running totals in single select. If only one running total is calculated, you can put whatever value for that param
3. Value representing NULL. If NULL is passed, then the NULL values are ignored and counted as 0. Othervice if value is provided, then NULL values will be counted as the value provided
So you have modify the CREATE FUNCTION statement to be corect:
Thanks a million. It works like a miracle from heaven. I used it to create a view and it worked perfectly.
Hi, when using it in a view, remember on the MAXDOP and the ORDER of the operations in final query plan. As when you use the view subsequently in joins or something, the plan can change drastically.
Thanks. I am producing a statement of account for different accounts in a transaction table. When I use the statement to select transactions for a specific account, I get the desired results. I’m also able to calculate previous transactions and add it as brought forward, as you advised Dan Martini. But I need to print statements for multiple accounts at a time, using crystal report to group the report by account code.
When I execute the code below in a query, the runningTotal continues through out and does not restart when the account code changes:
Result is:
How can I achieve this? Is it also possible to add up previous balances to the running totals as the Account code change in the listing, as I’m able to do for a single account?
You can add a parameter AccountCode parameter to the CLR function for calculating running total.
Then using the
CallContext.SetData
andCallContext.GetData
you can store the Account Code value between calls to the Running Total function. Inside the function you will detect the change of the AccountCode and once the account code changes, you simply reset the running total to 0 prior adding the current amount value to the Running Total.But generally if you want the calculation of running total on report, most reporting systems support direct running totals calculation including restarting on beginning/end of groups
I’ve been trying to do something similar, but for some reason, my code doesn’t seem to store the field changes using the
CallContext
class. I know this post is old, but I can’t figure out what I’m doing wrong and I don’t have a functioning SQL Server debugger, nor do I have an effective way to debug this.My code is below, any insight into what I’m doing wrong is appreciated:
///
/// Calculates a running totals on Decimal data type based on transistion between a string and numeric field.
///
/// Value of current row
/// ID of the function in single query
/// Specifies expecter rowNo. It is for security check to ensure correctness of running totals
/// Value to be used for NULL values
/// String field
/// Numeric field
/// SqlDecimal representing running total
[SqlFunction(IsDeterministic = true)]
public static SqlDecimal RunningTotalDecimalResetStringNumeric(SqlDecimal val, SqlByte id, int rowNo, SqlDecimal nullValue, SqlString field1, SqlDecimal field2)
{
string dataName = string.Format("MultiSqlRt_{0}", id.IsNull ? 0 : id.Value);
string field1Name = string.Format("MultiSqlField1_{0}", id.IsNull ? 0 : id.Value);
string field2Name = string.Format("MultiSqlField2_{0}", id.IsNull ? 0 : id.Value);
object lastSum = CallContext.GetData(dataName);
object field1Value = CallContext.GetData(field1Name);
object field2Value = CallContext.GetData(field2Name);
var storage = lastSum != null ? (RtStorage)lastSum : new RtStorage();
storage.RowNo++;
var stringFieldStorage = field1Value != null ? (StringFieldStorage)field1Value : new StringFieldStorage();
var numericFieldStorage = field2Value != null ? (NumericFieldStorage)field2Value : new NumericFieldStorage();
if (storage.RowNo != rowNo)
throw new System.InvalidOperationException(string.Format("Rows were processed out of expected order. Expected RowNo: {0}, received RowNo: {1}", storage.RowNo, rowNo));
if (stringFieldStorage.stringField != field1 || (stringFieldStorage.stringField == field1 && numericFieldStorage.numericField != field2))
{
storage.Total = new SqlDecimal(0);
stringFieldStorage.stringField = field1;
numericFieldStorage.numericField = field2;
}
if (!val.IsNull)
storage.Total = storage.Total.IsNull ? val : storage.Total + val;
else
storage.Total = storage.Total.IsNull ? nullValue : (nullValue.IsNull ? storage.Total : storage.Total + nullValue);
CallContext.SetData(dataName, storage);
CallContext.SetData(field1Name, stringFieldStorage);
CallContext.SetData(field2Name, numericFieldStorage);
return storage.Total;
}
Hi don’t know what exactly is wrong with your code as I do not have a complete code available. By first look it should be working.
But if you simply take my solution which is working and add your own field and store the values in the existing
RTStorage
struct as additional string fields and then doing the comparison against those, it should work.For whatever reason, it’s not working. Here’s how I have the
RTStorage
struct.private struct RtStorage where T : struct
{
public T Total;
public int RowNo;
public SqlString strField;
public SqlDecimal decField;
}
The following changes were made to the
RunningTotalDecimal
function:1) Copied your
RunningTotalDecimal
function and added the following parametersSqlString field1, SqlDecimal field2
2) Added this code after the exception throwing code for out of order row processing and before adding
SqlDecimal val
to the totalif (storage.strField != field1 || (storage.strField == field1 && storage.decField != field2))
{
storage.Total = 0;
storage.strField = field1;
storage.decField = field2;
}
I then reinstalled the assembly and added the following SQL scalar function
CREATE FUNCTION [dbo].[fn_RunningTotalDecimal_15_2_ResetStringNumeric](
@val [decimal](15, 2),
@id [tinyint],
@rowNo [int],
@nullValue [decimal](15, 2) = NULL,
@field1 [nvarchar](15),
@field2 [numeric](18,0)
)
RETURNS [decimal](15, 2)
WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [SqlClrRunningTotals].[RDSAPI.SQLClrRunningTotals.RunningTotals].[RunningTotalDecimalResetStringNumeric]
GO
I then call it in my select statement like this
[dbo].fn_RunningTotalDecimal_15_2_ResetStringNumeric([USER_TB_TIME_TICKETS].[HRS_WORKED], 0, ROW_NUMBER() OVER(ORDER BY [USER_TB_TIME_TICKETS].[CUST_NO], [USER_TB_TIME_TICKETS].[JOB_NO], [USER_TB_TIME_TICKETS].[ACT_DAT], [USER_TB_TIME_TICKETS].[REF_NO]), 0, [USER_TB_TIME_TICKETS].[CUST_NO], [USER_TB_TIME_TICKETS].[JOB_NO])
Please let me know if there’s anything else I can give or if you can point me to a resource that can help me solve this. I’ve posted it on stackoverflow and spiceworks with no luck.
The running total works exactly like its described in your post, I just don’t know why the use of
CallContext
can keep track of the previous sum, but not a string or decimal value?I was able to get it to partition by an additional field of type SqlString. Were you able to get your’s working, Jacob?
Dear Pavel,
Thanks a lot for this indepth analysis. I needed a sequence number generator and you function inspired me to write this:
http://stackoverflow.com/questions/7094897/a-sql-server-function-to-generate-of-sequential-numbers/
Thank you again for saving me so much time
Regards,
Hi,
I was looking on your post on StackOverflow and I think, you can have a solution for your problem even without the CLR using the
ROW_NUMBER()
function.When using the ROW_NUMBER() function, you have to specify at least the ORDER BY in the OVER clause of that function. What causes, that the table is being sorted, which can be an expensive operation.
However there is an easy workaround by using a SELECT in the ORDER BY of the OVER clause. You cannot use expressions in the ORDER BY, however a SELECT statement with constant can be part of the ORDER BY
In this case no Sort operation will be performed on the table and if you are selecting records from the heap, then the records will come in the sequence as they were read from the table.
[…] an addendum, I did test the CLR function offered by Pavel Pawlowski, and the performance characteristics were nearly identical to the SQL Server 2012 solution using […]
This is almost perfect for pre-2012 servers. One huge problem with this approach is that filtering rows causes the rows out of order exception because SQL server simply doesn’t hand the filtered-out rows to the CLR function.
Is there any way to force SQL server to run the CLR function on all rows from a table before filtering so that the results are the same as sum() over (order by) on 2012?
This is the pitfall of this approach. It is not always possible to guarantee the operators order in the query plan.
A workaround would be to do all the necessary filtering and store the requested rows into a temp table. Than as final step calculate the running totals by doing the select against the temp table.
[…] I’m trying to create a SQL CLR function that performs a running total similar to what’s explained here: SQL Server and fastest running totals using CLR – Updated […]
So elegant.