From Unpartitioned Rowstore to Partitioned Columnstore

2024-01-07

I initially wanted to name this post "How To Partition A Heap", but that's not really what I was trying to do prior to finding the inspiration to write this. I had an existing SQL Server table that was not partitioned. It had a traditional rowstore clustered index (in the form of a primary key). I wanted to convert the table to a partitioned table with a clustered columnstore index. I ran into some obstacles and oddities along the way. Feel free to skip ahead to the summary. Otherwise, lets begin by taking a look at a partition function and partition scheme I created.

--Partition function
CREATE PARTITION FUNCTION PF_DateKey_Years(INT) 
AS RANGE RIGHT 
FOR VALUES (20100101, 20110101, 20120101, 20130101, 20140101, 20150101, 
	20160101, 20170101, 20180101, 20190101, 20200101, 20210101, 20220101, 
	20230101, 20240101, 20250101, 20260101, 20270101, 20280101, 20290101);

--Partition scheme
CREATE PARTITION SCHEME [PS_DateKey_Years] AS PARTITION [PF_DateKey_Years] 
ALL TO ([PRIMARY]);
GO

I'll use the table definition for table dbo.FactInternetSales from the sample database AdventureWorksDW. But I'll use a different database to avoid some extra steps that would otherwise be needed to deal with referential constraints.

USE CS
GO

DROP TABLE IF EXISTS dbo.FactInternetSales;
CREATE TABLE [dbo].[FactInternetSales](
	[ProductKey] [int] NOT NULL,
	[OrderDateKey] [int] NOT NULL,
	[DueDateKey] [int] NOT NULL,
	[ShipDateKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[SalesTerritoryKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [tinyint] NOT NULL,
	[RevisionNumber] [tinyint] NOT NULL,
	[OrderQuantity] [smallint] NOT NULL,
	[UnitPrice] [money] NOT NULL,
	[ExtendedAmount] [money] NOT NULL,
	[UnitPriceDiscountPct] [float] NOT NULL,
	[DiscountAmount] [float] NOT NULL,
	[ProductStandardCost] [money] NOT NULL,
	[TotalProductCost] [money] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[TaxAmt] [money] NOT NULL,
	[Freight] [money] NOT NULL,
	[CarrierTrackingNumber] [nvarchar](25) NULL,
	[CustomerPONumber] [nvarchar](25) NULL,
	[OrderDate] [datetime] NULL,
	[DueDate] [datetime] NULL,
	[ShipDate] [datetime] NULL,
 CONSTRAINT [PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED 
(
	[SalesOrderNumber] ASC,
	[SalesOrderLineNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

A table can be partitioned by creating a clustered index on a partition scheme. But this table already has a clustered index. A table can only have one clustered index. There's no way this should work, right? As expected, it does not. (Note that I'm using the ordered columnstore index feature that is new for SQL 2022.)

CREATE CLUSTERED COLUMNSTORE INDEX Idx_cci_FactInternetSales
ON dbo.FactInternetSales ORDER (OrderDateKey)
ON PS_DateKey_Years(OrderDateKey);

Msg 35372, Level 16, State 3, Line 68
You cannot create more than one clustered index on table 'dbo.FactInternetSales'. Consider creating a new clustered index using 'with (drop_existing = on)' option.

Dropping the primary key and recreating it as nonclustered should get things moving in the right direction. Here I'll create the new PK on the partition scheme so the index will be partition aligned.

ALTER TABLE dbo.FactInternetSales DROP CONSTRAINT PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber;

ALTER TABLE dbo.FactInternetSales
ADD CONSTRAINT PK_FactInternetSales PRIMARY KEY NONCLUSTERED (SalesOrderNumber, SalesOrderLineNumber, OrderDateKey)
ON PS_DateKey_Years(OrderDateKey);

If we run the following query, we see the table is now a heap as would be expected. But it is not partitioned (Partitions = 1), whereas the nonclustered primary key is partitioned (Partitions = 21). I suppose it would not be accurate to say the primary key index is partition-aligned. But it will be...eventually.

SELECT 
	sc.name AS [Schema], t.name AS TableName, 
	si.name AS IndexName, si.index_id, si.type_desc AS IndexType,
	psc.name AS PartitionScheme,
	COUNT(*) AS Partitions
FROM sys.indexes AS si
JOIN sys.tables AS t ON si.object_id=t.object_id
JOIN sys.schemas AS sc ON t.schema_id=sc.schema_id
LEFT JOIN sys.partition_schemes AS psc ON si.data_space_id=psc.data_space_id
LEFT JOIN sys.partitions AS p ON p.object_id = si.object_id AND p.index_id=si.index_id
WHERE t.name = 'FactInternetSales'
GROUP BY sc.name, t.name, si.name, si.index_id, si.type_desc, psc.name
ORDER BY si.index_id

A second attempt to create a clustered columnstore index on the partition scheme (which would partition the table) still fails, though.

CREATE CLUSTERED COLUMNSTORE INDEX Idx_cci_FactInternetSales
ON dbo.FactInternetSales ORDER (OrderDateKey)
ON PS_DateKey_Years(OrderDateKey);

Msg 35316, Level 16, State 1, Line 79
The statement failed because a columnstore index must be partition-aligned with the base table. Create the columnstore index using the same partition function and same (or equivalent) partition scheme as the base table. If the base table is not partitioned, create a nonpartitioned columnstore index.

Could I create the clustered columnstore index *not* on the partition scheme, and then afterwards rebuild it on the partition scheme to finally get the table partitioned? There was a significant amount of data--enough that the thought of this option sort of annoyed me. But it didn't matter anyway. This attempt to create a clustered columnstore index also failed.

CREATE CLUSTERED COLUMNSTORE INDEX Idx_cci_FactInternetSales
ON dbo.FactInternetSales ORDER (OrderDateKey)

Msg 35322, Level 16, State 1, Line 79
The statement failed because a nonclustered index and the underlying clustered columnstore index must be partition-aligned. Consider creating the nonclustered index using the same partition function and same (or equivalent) partition scheme as the clustered columnstore index.

At this point I was thinking to myself "How do you partition a heap? Is that even possible?" My understanding is that a partitioned table must have a clustered index. Feeling desperate, I resorted to using the SSMS GUI and having it script out the TSQL operations. The solution? Create a clustered rowstore index for the table on the partition scheme, immediately drop it, then create the clustered columnstore index on the partition scheme. This was not intuitive to me at all. It still isn't. Let's start with the clustered rowstore index. After creating it and dropping it, when we run the previous query showing meta data for the table/indexes/partitions, we see we have a "partitioned heap" with 21 partitions. I put that in double quotes because I don't know if that's a misnomer or not.

CREATE CLUSTERED INDEX x ON dbo.FactInternetSales (OrderDateKey)
ON PS_DateKey_Years(OrderDateKey);
DROP INDEX x ON dbo.FactInternetSales;

My confusion notwithstanding, the clustered columnstore index can now be created on a partition scheme. Our table/indexes/partitions query verifies this.

CREATE CLUSTERED COLUMNSTORE INDEX Idx_cci_FactInternetSales
ON dbo.FactInternetSales ORDER (OrderDateKey)
ON PS_DateKey_Years(OrderDateKey);

Commands completed successfully.

Completion time: 2024-01-06T21:51:50.3390424-05:00

Summary

These are the steps for converting an unpartitioned clustered (via primary key) rowstore table to a partitioned clustered columnstore table:

Drop the primary key constraint.
Recreate the primary key constraint as nonclustered on a partition scheme.
Create a clustered rowstore index on the table on a partition scheme.
Immediately drop the clustered rowstore index.
Create the clustered columnstore index on a partition scheme.

Summary

Comments