Cleanup of Retention Window in Partioned Tables – sp_tblCleanupRetentionWindow

In my GitHub collection of SQL Scripts is now available a new stored procedure sp_tblCleanupRetentionWindow under the TablesManagement/Partitioning folder. Help for the function is available in the repository Wiki.

Scenario Overview

You have huge tables in the MS SQL Database and to be able to efectively manage the amount of data, you have partitioned the table.

As the amount of the data grows over the time, you start thinking about regular maintenance of such huge table and drop old not needed data.

You need to clean the oldest partitions and MERGE the partitions in the partitoin fuction to keep the number of partitions reasonable (especially if new partitions are created as new data are arriving).

To cleanup partitions prior SQL Server 2016 you have to perform partition switching to cleanup the partitions and then you could MERGE the partitions. On SQL Server 2016 and above the situation is simplified as you can use the TUNRCATE TABLE WITH PARTITIONS. But again to merge partition, you must put exact partition boundary values etc.

To automate regular maintenance you need to write sripts to handle all the needed tasks.

sp_tblCleanupRetentionWindow

As mentioned in the scenario above, the sp_tblCleanupRetentionWindow is the procedure which takes care about all the operations needed to cleanup the table prior specified retention window.

Parameters

ParameterNameData TypeDefaultDescription
@pfNamenvarchar(128)NULLName of the partition function. All associated partition schemes and tables will be cleaned according specified @retentionWindow
@retentionWindowsql_variantNULL
  • Specifies retention window
  • @retentionWindow must be of the dame exact data type as the partition fuction
  • All partitions prior the partition containging @retentionWindow value wil be cleared.If partitions contain data, partitions are first truncated.
  • All partitions prior partition containing the @retentionWindow will be merged into the first (leftmost) partition.
  • If @retentionWindow is part of the first (leftmost) partition, no cleanup happens.
@infoOnlybit1When 1 prints only information about affected tables and partitions and does not perform any cleanup. When 0 does the actual cleanup

How It Operates

Procedure takes @pfName parameter, which identifies partition function used by partitioned table(s). Then based on the @retentionWindow parameter it idetnfifies partition containing the @retentionWindow value.

The @retentionWindow parameter identifies the lowest value which must be kept in the table. All records prior that one can be cleared.

If the @retentionWindow value is part of the first (leftmost) partition, the procedure ends and does not perform any cleanup as there is nothing to cleanup. It cannot clean the partition containing the @retentionWindow value.

If the @retentionWindow is part of second and higher partitions, it scan for all Partitions Shemes using the @pfName partition function.

After identification of the partition schemes it identifies all the tables using the identified partitions shemes.

Once the tables are identified, it loops through all the tables and TRUNCATES all the partitions prior the partition containing the @retentionWindow value.

After the truncation the procedure ALTERS the parition function and merges all the partitions prior the partition containing the @retentionWindow value to the first (leftmost) partition.

It keeps always the first leftmost partition empty.

By the scrip the procedure is deployed into the [master] database and marked as system stored procedure. This allows simple call of the procedure from within all databases on the SQL Server instance and also thanks to that the procedure is running in the context of the database from within it is being called.

Sample

The sample below does cleanup fo all tables associated with the pf_partDate partition function. Cleanup and merge all partitions prior partition containign value of @retentionWindow = '2020-12-15'. pf_partDate is using the date data type.

If we take a closer look on the pf_partDate partition function by using the sp_HelpPartitionFunction we will find, it has multiple partitions and is used by multipel partition schemes and tables.

sp_HelpPartitionFunction 'pf_partDate', 1

Info Only

If we execute the procedure with @infoOnly=1 then only information about affected partitions and tables will be provided.

DECLARE @retentionWindow date = '2020-12-05'

EXEC sp_tblCleanupRetentionWindow
    @pfName             = 'pf_partDate'
    ,@retentionWindow   = @retentionWindow
    ,@infoOnly          = 1

Info Only Output

Below is sample output of the procedure which informs about actions which will be performed if the @infoOnly would be set to 0.

sp_tblCleanupRetentionWindow v0.10 (2021-05-07) (C) 2017-2021 Pavel Pawlowski
=============================================================================
Cleans retention window for all tables associated with partition function

Feedback mail to: pavel.pawlowski@hotmail.cz
Repository:       https://github.com/PavelPawlowski/SQL-Scripts
-----------------------------------------------------------------------------
Cleaning retention window for partition function [pf_partDate]
 
Partitions to Cleanup:
----------------------
[          ]    [x] <  [2020-11-01]
[2020-11-01] <= [x] <  [2020-12-01]
 
Affected Partition Schemes And Tables:
--------------------------------------
[ps_date1]
  - [dbo].[tblDate1]
  - [dbo].[tblDate11]
[ps_date2]
  - [dbo].[tblDate2]

From the output we can see, that there are two partitions which would be affected. First partition containing data prior 2020-10-01 and second partition containing data between the 2020-10-01 and 2020-11-01.

Also it informs, that there are two partitions schemes. Partition scheme [ps_date1] which is used by tables [dbo].[tblDate1] and [dbo].[tblDate11] and partition scheme [ps_date2] which is used by table [dbo].[tblDate2].

Cleanup

If we execute the procedure with @infoOnly=0 then information about affected partitions and tables will be provided as well as actual cleanup will be done.

DECLARE @retentionWindow date = '2020-12-05'

EXEC sp_tblCleanupRetentionWindow
    @pfName             = 'pf_partDate'
    ,@retentionWindow   = @retentionWindow
    ,@infoOnly          = 0

Cleanup output

sp_tblCleanupRetentionWindow v0.10 (2021-05-07) (C) 2017-2021 Pavel Pawlowski
=============================================================================
Cleans retention window for all tables associated with partition function

Feedback mail to: pavel.pawlowski@hotmail.cz
Repository:       https://github.com/PavelPawlowski/SQL-Scripts
-----------------------------------------------------------------------------
Cleaning retention window for partition function [pf_partDate]
 
Partitions to Cleanup:
----------------------
[          ]    [x] <  [2020-11-01]
[2020-11-01] <= [x] <  [2020-12-01]
 
Affected Partition Schemes And Tables:
--------------------------------------
[ps_date1]
  - [dbo].[tblDate1]
  - [dbo].[tblDate11]
[ps_date2]
  - [dbo].[tblDate2]
 
CLEANUP PROCESS
---------------------------------------------------------------
2021-05-18 23:54:11.4600180 +02:00 - Starting CLEANUP Process
2021-05-18 23:54:11.4610182 +02:00 - [dbo].[tblDate2] PARTITION 1 not empty. Starting TRUNCATE
2021-05-18 23:54:11.4620186 +02:00 - [dbo].[tblDate2] PARTITION 1 TRUNCATE completed
2021-05-18 23:54:11.4710215 +02:00 - [dbo].[tblDate11] PARTITION for range [2020-11-01] <= [x] <  [2020-12-01] not empty. Starting TRUNCATE
2021-05-18 23:54:11.4760227 +02:00 - [dbo].[tblDate11] PARTITION for range [2020-11-01] <= [x] <  [2020-12-01] TRUNCATE completed
2021-05-18 23:54:11.4770230 +02:00 - [pf_partDate] range [2020-11-01] <= [x] <  [2020-12-01] : Start MERGE into PARTITION 1
2021-05-18 23:54:11.4790237 +02:00 - [pf_partDate] range [2020-11-01] <= [x] <  [2020-12-01] : MERGE into PARTITION 1 completed. New range of PARTITION 1: [x] <  [2020-12-01]
2021-05-18 23:54:11.4790237 +02:00 - CLEANUP Process COMPLETED

Compared to the @infoOnly=1 output we see additional section “CLEANUP PROCESS’, which informs about the actual cleanup steps performed.

Procedure identified, that partition 1 of the [dbo].[tblDate2] was not empty and therefore performed TRUNCATE of the partition 1. Then it identified that the table [dbo].[tblDate11] has data in partition covering range between dates 2020-11-01 and 2020-12-01, therefore performed also truncate of that partitions. As there is no information about any truncate of the table [dbo].[tblDate1] it means the table had both affected partitions empty. The same is related to the other afected partitions of the [dbo].[tblDate2] and [dbo].[tblDate11].

After the partition cleanup it started the multi-step partition merge process. It merged each affected partition into the left most partition and informs about the new range of the leftmost partition 1.

If we again utilize the sp_HelpPartitionFunction we can see, that the partitions were properly cleared.

Azure support

The procedure supports also execution in Azure SQL Databases as well as Instances or Azure Synapse Analytics.

When deploying to Azure SQL Database or Synapse Analytics, comment-out the unsuported USE statement or ignore the eventual error as stated in the header of the procedure source file.

/* *****************************************************************************************
                                      AZURE SQL DB Notice

   Comment-out the unsupported USE [master] when running in Azure SQL DB/Synapse Analytics
   or ignore error caused by unsupported USE statement
******************************************************************************************** */

USE [master]
GO

SQL Server Version Limitation

Because the procedure internally is using TRUNCATE TABLE WITH PARTITION , procedure is limited only to the SQL Server 2016 and above or Azure SQL Database, Azure Managed Instance and Azure Synapse Analytics.

Summary

The sp_tblCleanupRetentionWindow heavily simplifies the management of huge partitioned table and moves the complete maintenance of retention window into a single call of a stored procedure. Hopefully you will find the procedure usefull.

Generating Partition Function – sp_tblCreatePartitionFunction available don GitHub

If you are working with large databases you have come to situation when you need to partition the large tables.

To be able to partition a table, you need to create a Partition Function. Creation of Partition Function is quite easy, however if you need to create a larger number of partitions at initial phase, it can be pretty annoying as you have to define all the ranges for each partition. Also it is easy to make a type in the list of boundary values and as a result create a partition function with wrong partition alignment.

To simplify that process I’m presenting here a stored procedure which takes care about that and generates the partition function automatically based on input parameters

sp_tblCreatePartitionFunction

As mentioned above, the sp_tblCreatePartitionFunction takes care about generation of the partition function based on input parameters.

You can find complete source code in my SQL-Scripts project on GitHub under the TablesManagement\Partitioning folder.

The stored procedure provides support for generation of ranges based on smallint, int, bigint, date, datetime and datetime2 data types.

Stored procedure generates a partition function with boundary values between @rangeStart and @rangeEnd parameters.

The stored procedure is marked as system stored procedure in the script so it means that it can operate in the context of the current user database.

Parameters

The input parameters define the ranges and the way how the partition function is generated

The available input parameters for the function are @pfName, @rangeStart, @rangeEnd, @boundaryType, @incrementValue, @incrementUnit, @useIntegerDates, @integerFormatType and @printScriptOnly

@pfName

Defines the partition function Name. the parameter data type is nvarchar(128) which corresponds to sysname.

@rangeStart

Defines the starting range for generation of the partition function. The parameter is sql_variant. You have to pass parameter of supported data type for the range. As mentioned above the supported data types are smallint, int, bigint, date, datetime and datetime2.

To simplify specification of the range, you can pass the parameter as string. If the value passed as string represents a DateTime it is automatically converted to datetime data type.

You can also pass a value with a data type specifier. The data type specifier must be first character of the string and the value must follow. The supported data type specifiers are 'D' - date, 'T' - datetime, 'B' - bigint, 'I' - int and 'S' - smallint.

Below are some examples of supported strings passed as @rangeStart :

Sample value Result
'2016-01-01' Converted to datetime 2016-01-01
20160101 Converted to datetime 2016-01-01
D2016-01-01 Converted to date 2016-01-01
T20160101 Converted to datetime 2016-01-01
B1010 Converted to bigint 100
I1010 Converted to int 100
S1010 Converted to smallint 100

The @rangeStart representing a date can be altered during the partition function generation based on the @incrementUnit parameter. The @rangeStart is shifted to the beginning of corresponding unit.. See the @incrementUnit parameter for details.

The @rangeStart is included in the generated ranges.

@rangeEnd

Represents then end of range which should be generated for the partition function. The supported data types are the same as for the @rangeStart. See the @rangeStart for details.

The @rangeEnd is inclusive in the range, but may not be included in the final generated partition function in dependency on the @incrementValue and @incrementUnit parameters. See those parameters for details.

@boundaryType

Specifies the boundary type of the boundary values for the partition function. It can be either LEFT or RIGHT. It defines whether the boundary value is included in the left or right partition respective to the boundary value. Default value is RIGHT.

@incrementValue

Defines increment of the boundary values. This means that the @rangeStart value is incremented by the @incrementValue until @rangeEnd is reached.

@incrementUnit

Specifies the unit of the increment. It is being used only for range values representing date. The allowed units are YEAR, MONTH, WEEK, ISO_WEEK, DAY.

In each iteration step the @rangeStart value is incremented by the @incrementValue number of @incrementUnit.

Also the @incrementUnit may have impact on the @rangeStart value. If the @rangeStart value does not point to the first day of respective unit, it is being automatically shifted to the beginning of corresponding unit based on the @rangeStart value.

Samples of @rangeStart shifting:

@rangeStart @incrementUnit @rangeStart shifted to
'2016-03-17' YEAR '2016-01-01'
'2016-03-17' MONTH '2016-03-01'
'2016-03-17' WEEK '2016-03-14'
'2016-01-01' ISO_WEEK '2015-12-28'

@useIntegerDates

Specifies whether boundary values representing dates in the partition function should be represented as int date types or corresponding date, datetime or datetime2 data type. In case of  @useIntegerDates = 0 the boundary value is always first day of corresponding @incrementUnit.

In case of @useIntegerDates = 1 the boundary value is expressed as integer number and the value representing the range depends on the @integerFormatType parameter. See the @integerFormatType for details

It is being used only for date ranges. Default value is 1.

@integerFormatType

Specifies integer format of date range value. When the partition function is being generated and date ranges are being used, the range value is internally always represented by a first day of corresponding rage. if the @useIntegerDate = 1 the final boundary value is generated based on the format type.

Supported format types are:

FormatType Boundary value formatting
1 yyyyMMdd For example 20160101
2 yyyyxx(x) where xxx(x) correspond to appropriate month, week or day within particular year. For example 2016053 represents a day 53 of year 2016. 201643 represents week or iso_week 43 in year 2016

Default value is 2.

@printScriptOnly

Specifies whether only the partition function CREATE script is being printed or the partition function is being automatically crated.

The default value is 1, this means that only script is printed.

Samples

Below are some sample usages of the stored procedure.

Sample 1

sp_tblCreatePartitionFunction
    @pfName = 'myPf'
   ,@rangeStart = 1
   ,@rangeEnd = 1001
   ,@boundaryType = 'RIGHT'
   ,@incrementvalue = 100
CREATE PARTITION FUNCTION [myPf](int) AS RANGE RIGHT FOR VALUES (
    1
   ,101
   ,201
   ,301
   ,401
   ,501
   ,601
   ,701
   ,801
   ,901
   ,1001
)

Sample 2

sp_tblCreatePartitionFunction
    @pfName = 'myPf'
   ,@rangeStart = '2016-01-01'
   ,@rangeEnd = '2020-12-31'
   ,@boundaryType = 'RIGHT'
   ,@incrementvalue = 1
   ,@incrementUnit = 'YEAR'
   ,@useIntegerDates = 1
   ,@integerFormatType = 2

CREATE PARTITION FUNCTION [myPf](int) AS RANGE RIGHT FOR VALUES (
    2016
   ,2017
   ,2018
   ,2019
   ,2020
)

Sample 3

sp_tblCreatePartitionFunction
    @pfName = 'myPf'
   ,@rangeStart = '2016-01-01'
   ,@rangeEnd = '2016-12-31'
   ,@boundaryType = 'LEFT'
   ,@incrementvalue = 1
   ,@incrementUnit = 'MONTH'
   ,@useIntegerDates = 1
   ,@integerFormatType = 2
CREATE PARTITION FUNCTION [myPf](int) AS RANGE LEFT FOR VALUES (
    20160101
   ,20160201
   ,20160301
   ,20160401
   ,20160501
   ,20160601
   ,20160701
   ,20160801
   ,20160901
   ,20161001
   ,20161101
   ,20161201
)

Sample 4

sp_tblCreatePartitionFunction
    @pfName = 'myPf'
   ,@rangeStart = '2016-01-01'
   ,@rangeEnd = '2016-12-31'
   ,@boundaryType = 'LEFT'
   ,@incrementvalue = 1
   ,@incrementUnit = 'MONTH'
   ,@useIntegerDates = 0
CREATE PARTITION FUNCTION [myPf](datetime) AS RANGE LEFT FOR VALUES (
    '20160101'
   ,'20160201'
   ,'20160301'
   ,'20160401'
   ,'20160501'
   ,'20160601'
   ,'20160701'
   ,'20160801'
   ,'20160901'
   ,'20161001'
   ,'20161101'
   ,'20161201'
)