Saturday, January 12, 2008

SQL 2005 - Some quirky findings

This week I was working on a performance related issue for one of the clients. They had about 3 million records in their order table and about 7 million records in a child table. They are going to provide the tracking/ search options online so we have created a tool for them to do so. However, there is a complex security criteria as well and the search time was horrible!! Most of the searches took 100+ seconds meaning the site was useless. While trying lots of alternatives I found out couple of things:

New SQL 2005 ROWNUMBER() function isn't that useful. In most of the sites, we need to show the total record count and then return the results for the requested page. So our query will be something:

DECLARE @PageNum AS INT; 
DECLARE @PageSize AS INT;
SET @PageNum = 2;
SET @PageSize = 10;

SELECT COUNT(*) FROM [Order] WHERE [OrderDate] BETWEEN '1/1/2000' AND '1/1/2005'
WITH OrdersRN AS
(
SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, OrderID) AS RowNum
,OrderID
,OrderDate
,CustomerID
,EmployeeID
FROM dbo.Orders
WHERE [OrderDate] BETWEEN '1/1/2000' AND '1/1/2005'
)

SELECT *
FROM OrdersRN
WHERE RowNum BETWEEN (@PageNum - 1) * @PageSize + 1
AND @PageNum * @PageSize
ORDER BY OrderDate
,OrderID;


The key thing you'll notice above is - we are executing the WHERE clause twice. Once when trying to find out the total records and then the records for the page requested. If the queries go complex, this means, we are spending twice the time we should have! In the old SQL 2000 way, the way we used to write the same query as:



DECLARE @PageNum AS INT; 
DECLARE @PageSize AS INT;

DECLARE @RecordCount AS INT;
SET @PageNum = 2;
SET @PageSize = 10;

DECLARE @PagingTable TABLE ([Row#] int identity, [Id#] int)

INSERT INTO @PagingTable [Id#] (SELECT [OrderId] FROM [Order] WHERE [OrderDate] BETWEEN '1/1/2000' AND '1/1/2005' ORDER BY [Orderdate], [OrderId]

SET @RecordCount = @@ROWCOUNT

SELECT [Order].*

FROM [Order] INNER JOIN @PagingTable [PagingTable#] ON [Order].[OrderId] = [PagingTable#].[Id#]
WHERE [Row#] BETWEEN (@PageNum - 1) * @PageSize + 1
AND @PageNum * @PageSize
ORDER BY [Row#]


In this second query, if you notice, the query is run only once. This reduces the overall time it takes to return the results.



The second finding was even more intriguing. In my where clause I had something like:



WHERE [CustomerId] IN (SELECT [CustomerId] FROM [Customer] WHERE [CustomerName] LIKE 'A%'



. This was returning the results in about 100 seconds and was quite irritating. Then I tried a simple change. First I inserted the results in a temp table variable like:

DECLARE @Customers TABLE ([CustomerId] int)
INSERT INTO @Customers [CustomerId] SELECT [CustomerId] FROM [Customer] WHERE [CustomerName] LIKE 'A%'

And now my WHERE condition looks like

WHERE [CustomerId] IN (SELECT [CustomerId] FROM @Customers)

Now, the query returns the result in around ~250ms! Isn't it surprising. Anyway, this confirmed one thing - never get bogged down by what you already know! Every project, every situation is different and experimentation beyond the known always helps.

No comments: