Your SlideShare is downloading. ×
  • Like
Decipher_SQL_SERVER_2008.doc
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Decipher_SQL_SERVER_2008.doc

  • 716 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
716
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. SQL Server 2008 – All Blog Posts till 04/30/2008 Page 1 of 45
  • 2. © Copyright Decipher Information Systems, 2008. All rights reserved. Page 2 of 45
  • 3. http://decipherinfosys.wordpress.com Last revised: Jan. 2008 Page 3 of 45
  • 4. Table of Contents ..........................................................................................................................................................1 Table of Contents.............................................................................................................................4 Server Consolidation - things to look for in SQL Server 2008....................................................5 GROUPING SETS in SQL Server 2008.....................................................................................9 SQL Server 2008 - Table Valued Parameters............................................................................11 New Data types in SQL Server 2008 - I....................................................................................13 Row Value Constructor Support in SQL Server 2008...............................................................13 New Features in SQL Server 2008 - I........................................................................................14 MSFT learning portal for SQL Server 2008..............................................................................15 Intellisense in SQL Server 2008................................................................................................15 SQL Server 2008: INSERT over a DML statement..................................................................17 MERGE command in SQL Server 2008....................................................................................19 Change Data Capture Feature in SQL Server 2008...................................................................22 Data Collector in SQL Server 2008...........................................................................................27 Compound Assignment Operators in SQL Server 2008............................................................30 Back Up Compression Option in SQL Server 2008..................................................................32 FORCESEEK Table Hint in SQL Server 2008.........................................................................34 HierarchyID Data Type in SQL Server 2008............................................................................37 FILESTREAM Data in SQL Server 2008.................................................................................40 XML DML Enhancement in SQL Server 2008.........................................................................42 Lock Escalation Changes in SQL Server 2008..........................................................................43 SQL Server 2008 Deployment Guides from Dell......................................................................43 Filtered Indexes and Statistics in SQL Server 2008..................................................................43 Virtual Labs...............................................................................................................................44 Page 4 of 45
  • 5. Server Consolidation - things to look for in SQL Server 2008 We had talked about virtualization in the database world in a previous post. In SQL Server 2008, there are a lot more features available for doing server consolidation - here is a link to the MSFT whitepaper that breaks this down into the sections of flexibility, manageability and scalability & performance: http://download.microsoft.com/download/a/c/d/acd8e043-d69b-4f09- bc9e-4168b65aaa71/SQL2008SrvConsol.doc If you have downloaded the CTP 4, we would recommend that you get a head start with these features by playing with them in one of your VM environments. In case you have not upgraded to SQL Server 2005, chances are that you would be upgrading directly to SQL Server 2008 by the end of next year once the product is released and has been tested in real world implementations. GROUP BY and CUBE(), ROLLUP(), GROUPING() and GROUPING_ID() functions You must already be aware of the GROUP BY clause. It is used for grouping the rows by a selected set of attributes and is typically done to get some aggregated information for that group of columns. /***************************************************************************** Creation of a TEST table and population of some dummy data *****************************************************************************/ CREATE TABLE TEST_TBL (COL1 INT, COL2 VARCHAR(10), COL3 INT) GO INSERT INTO TEST_TBL VALUES (1, ‘A’, 10) INSERT INTO TEST_TBL VALUES (1, ‘A’, 20) INSERT INTO TEST_TBL VALUES (1, ‘A’, 30) INSERT INTO TEST_TBL VALUES (1, ‘B’, 90) INSERT INTO TEST_TBL VALUES (2, ‘A’, 30) INSERT INTO TEST_TBL VALUES (2, ‘A’, 100) INSERT INTO TEST_TBL VALUES (3, ‘C’, 110) INSERT INTO TEST_TBL VALUES (3, ‘C’, 120) GO /***************************************************************************** Before starting, let’s take a look at the data that exists in the table *****************************************************************************/ Page 5 of 45
  • 6. SELECT * FROM TEST_TBL GO COL1 COL2 COL3 ----------- ---------- ----------- 1 A 10 1 A 20 1 A 30 1 B 90 2 A 30 2 A 100 3 C 110 3 C 120 /***************************************************************************** Simple Example of a GROUP BY Clause. The grouping is being done over COL1 and COL2 and the aggregate functions are used to display the total, the average, the max and min values, etc. *****************************************************************************/ SELECT COL1, COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL, AVG(COL3) AVG_VAL, MAX(COL3) MAX_VAL, MIN(COL3) MIN_VAL FROM TEST_TBL GROUP BY COL1, COL2 GO COL1 COL2 CNT_RECORD TOTAL_VAL AVG_VAL MAX_VAL MIN_VAL ----------- ---------- ----------- ----------- ----------- ----------- ----------- 1 A 3 60 20 30 10 2 A 2 130 65 100 30 1 B 1 90 90 90 90 3 C 2 230 115 120 110 /***************************************************************************** Now, let us take the same SQL and use the RollUP() function in addition to the GROUP BY clause: *****************************************************************************/ SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL FROM TEST_TBL GROUP BY ROLLUP (COL1, COL2) GO Usage of the ROLLUP() function generates the GROUP BY aggregate rows PLUS super- aggregate (cumulative) rows and a final grand total row as well. If you see below, one row with a sub-total is generated for each unique combination of values of (COL1, COL2), and (COL1). As should be clear from the preceding statement, the order of the columns in the ROLLUP() function can change the output as well as the number of rows in the final result set. COL1 COL2 TOTAL_VAL ----------- ---------- ----------- Page 6 of 45
  • 7. 1 A 60 ==> Same as in the representation above 1 B 90 ==> Same as in the representation above 1 NULL 150 ==> Aggregation of the records from above 2 A 130 ==> Same as in the representation above 2 NULL 130 ==> Aggregation of the record from above 3 C 230 ==> Same as in the representation above 3 NULL 230 ==> Aggregation of the record from above NULL NULL 510 ==> The grand total The above SQL can also be written as: SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL FROM TEST_TBL GROUP BY COL1, COL2 WITH ROLLUP /***************************************************************************** Now, let us take the same SQL and use the CUBE() function in addition to the GROUP BY clause: *****************************************************************************/ SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL FROM TEST_TBL GROUP BY CUBE (COL1, COL2) GO A CUBE() as the name suggests generates data for the grouping of all permutations of expressions i.e. since we have 2 columns in our example that we are using the CUBE() function on, we have 2^2 which means 4 grouping sets: COL1, COL2 COL2, COL1 COL1 COL2 Thus, one row will be produced for each unique grouping set from above and in addition, a sub- total row is generated for each row and an aggregated grand total row is produced with NULL values in all the other columns. You can see the output below: COL1 COL2 TOTAL_VAL ----------- ---------- ----------- 1 A 60 2 A 130 NULL A 190 1 B 90 NULL B 90 3 C 230 NULL C 230 NULL NULL 510 1 NULL 150 2 NULL 130 3 NULL 230 Page 7 of 45
  • 8. And as is obvious from the output, the order of the columns has no bearing on the output in the case of the CUBE() function. The above SQL can also be written as: SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL FROM TEST_TBL GROUP BY COL1, COL2 WITH CUBE In order to distinguish the NULL values that are returned because of the usage of the ROLLUP() and/or CUBE() functions versus actual NULL values in the tables, one can make use of the GROUPING function. Example: SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL, GROUPING(COL2) AS GRP_VALUE FROM TEST_TBL GROUP BY ROLLUP (COL1, COL2) GO COL1 COL2 TOTAL_VAL GRP_VALUE ----------- ---------- ----------- --------- 1 A 60 0 1 B 90 0 1 NULL 150 1 2 A 130 0 2 NULL 130 1 3 C 230 0 3 NULL 230 1 NULL NULL 510 1 As seen from above, whereever the GRP_VALUE is marked as 1, those are the records that were generated because of the ROLLUP() function usage. Since the GROUPING() function takes in only one argument, we made usage of the outermost column in order to filter out all those records where the NULL value was being generated because of the usage of the function. In the case of the CUBE() function, you can use the GROUPING() function multiple times to filter out those records. Another function to be aware of is the GROUPING_ID() function. This function can be used to compute the level of grouping. So, if there are two columns like we have in our example in this post, the GROUPING_ID() will be computed as: Column(s) GROUPING_ID(COL1, COL2) = GROUPING(COL1) + GROUPING(COL2) GROUPING_ID() Output COL1 10 2 COL2 01 1 COL1, COL2 11 3 COL2, COL1 11 3 Now, let us see this with the actual SQL execution: Page 8 of 45
  • 9. SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL, GROUPING(COL1) AS C1, GROUPING(COL2) AS C2, GROUPING_ID(COL1, COL2) AS GRP_ID_VALUE FROM TEST_TBL GROUP BY ROLLUP (COL1, COL2) GO COL1 COL2 TOTAL_VAL C1 C2 GRP_ID_VALUE ----------- ---------- ----------- ---- ---- ------------ 1 A 60 0 0 0 1 B 90 0 0 0 1 NULL 150 0 1 1 2 A 130 0 0 0 2 NULL 130 0 1 1 3 C 230 0 0 0 3 NULL 230 0 1 1 NULL NULL 510 1 1 3 In the next post, we will talk about the new feature of GROUPING SETS that has been introduced in SQL Server 2008. Using the GROUPING SETS, one can aggregate only the specified groups instead of the full set of aggregations as generated by the CUBE() or ROLLUP() functions. GROUPING SETS in SQL Server 2008 Yesterday, we had covered the GROUP BY clause and the usage of the ROLLUP(), CUBE(), GROUPING() and GROUPING_ID() functions within the GROUP BY clause. In today’s post, we will look at one of the enhancements in this are in SQL Server 2008. The enhancement is called GROUPING SETS. This clause let’s you specify multiple grouping sets in the same query. You can achieve the same functionality in prior versions using multiple queries and using a UNION ALL clause to club the result sets together. With GROUPING SETS though, you can accomplish the same thing with much lesser code and the query will be more efficient as well. This can be very useful for reporting purposes when you want to visualize the data distribution based on different sets of groups. Considering the same example as we did in yesterday’s post (see yesterday’s post to get the DDLs for the example): If we want to see the effect on the totals for these groups: COL1, COL2 COL2 COL1 No Grouping then, we can re-write the SQL to be: SELECT COL1, COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL FROM TEST_TBL GROUP BY COL1, COL2 Page 9 of 45
  • 10. UNION ALL SELECT NULL AS COL1, COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL FROM TEST_TBL GROUP BY COL2 UNION ALL SELECT COL1, NULL AS COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL FROM TEST_TBL GROUP BY COL1 UNION ALL SELECT NULL AS COL1, NULL AS COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL FROM TEST_TBL GO This will yield: COL1 COL2 CNT_RECORD TOTAL_VAL ----------- ---------- ----------- ----------- 1 A 3 60 2 A 2 130 1 B 1 90 3 C 2 230 NULL A 5 190 NULL B 1 90 NULL C 2 230 1 NULL 4 150 2 NULL 2 130 3 NULL 2 230 NULL NULL 8 510 In SQL Server 2008, you can achieve this very easily by using GROUPING SETS functionality: SELECT COL1, COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL FROM TEST_TBL GROUP BY GROUPING SETS ( (COL1, COL2), (COL2), (COL1), () ) GO Page 10 of 45
  • 11. COL1 COL2 CNT_RECORD TOTAL_VAL ----------- ---------- ----------- ----------- 1 A 3 60 2 A 2 130 NULL A 5 190 1 B 1 90 NULL B 1 90 3 C 2 230 NULL C 2 230 NULL NULL 8 510 1 NULL 4 150 2 NULL 2 130 3 NULL 2 230 It is the same data just ordered differently because of the grouping. The different grouping sets are separated by a comma and the different elements of a grouping set are also separated by commas. In addition, one can have multiple GROUPING SETS in the same GROUP BY clause. SQL Server 2008 - Table Valued Parameters We have blogged on the upcoming features in SQL Server 2008 (Code Named: Katmai) in some of our previous posts. You can search for “SQL Server 2008″ on the blog site to see other posts. We will keep on posting new articles as and when we play with those new features (likewise for Oracle 11g as well). In this post, we will talk about another good T-SQL feature in SQL Server 2008 : The table valued parameters. In order to pass an array into a stored procedure (see previous post - here), one has to use XML in SQL Server 2005 and then de-construct it using XQUERY features or OPENXML. This, though functional is not as simple. SQL Server 2008 now introduces table valued parameters which can be passed into the stored procedures. One does not need to de-construct and parse through the XML anymore. One can just pass a table full of data i.e. an array into a stored procedure and the code can simply use it for their processing logic. Let’s see this feature with an example to make it a bit more clear: CREATE TYPE TestType AS TABLE ( EMP_NAME VARCHAR(30) NULL, HIRE_DATE DATETIME NULL, MGR_NO INT NULL Page 11 of 45
  • 12. ) GO And then, I can simply pass this into my procedure: create proc my_test_proc ( @tbl TESTTYPE READONLY ) as begin …. –your logic will go over here. –you can use the TVP (Table Valued Parameter) –as a regular table and do the READ-ONLY operations on it end go In order to use this stored procedure now, I can declare a variable of that type, populate it with the data and then pass it in. Example: All these types are visible under the sys.table_types system catalog. Please remember that this data set is READONLY and hence the readonly clause is mentioned in the procedure as well. In addition, these table variables cannot be used as OUTPUT parameters - they can only be used as input parameters. DECLARE @TestTbl AS TestType Now, I can insert the data into this table variable: insert into @TestTbl select emp_name, hire_date, mgr_no from emp_master where emp_id < 1000 And once I am done, I can then pass this into the stored procedure: exec my_test_proc @tbl = @TestTbl; When table variables are passed in as parameters, the table resides in tempdb and not in memory…this means that we can use larger result sets also with relative ease. We will benchmark this in our test environments and post the results on it in a later blog post. This type functionality has existed in Oracle for quite some time now so the Oracle professionals who work in SQL Server also will find this to be a welcome addition. Page 12 of 45
  • 13. New Data types in SQL Server 2008 - I SQL Server 2008 has quite a few new data-types. We will cover those in 2 parts as we start playing with those in our test environments. 1) DATE and TIME Data Types: Currently, SQL Server supports datetime and smalldatetime but there is no data-type that supports just the DATE part or just the TIME part and that is considered to be de-normalized by some data modelers since the same field is being used to represent both the date and the time piece. Not only that, in certain situations where one needs to record the different times related to a given date, Example: The trucks can take off from the warehouse at 8:00 A.M., 10:00 A.M., 12:00 P.M. and so on for a given day. One needed to store the date repeatedly for just capturing this time specific information. That is no longer the case with the introduction of these 2 new data types in SQL Server 2008. In prior versions, one then needed to apply the CONVERT() function in order to retrieve only what one needed. That won’t be necessary anymore. The TIME data-type supports upto 100 nano seconds. 2) Datetime2 data type: The name is funny. MSFT could have come up with a better name (this does remind me of the Oracle VARCHAR and VARCHAR2 data-types :-)). This data type extends the granularity of the time. As you might already know, the granularity of the time in the datetime data type is 1/300th of a second i.e. you can store only .000, .003 or .007 seconds. This new data type extends that to 100 nano seconds just like the TIME data type. The fractional part can be specified by declaring the variable or a column as datetime2(n) where n is the precision with valid values from 0 to 7. 3) In order to support the TIMEZONES in a better way, MSFT has introduced DATETIMEOFFSET(n) data type. This support is provided in Oracle using the TIMEZONE family of data-types. In SQL Server 2008, one can use it to provide better support for the applications that require timezone support, Example: A Warehouse in Eastern Timezone that has it’s data center in central timezone and also has people entering data sitting in the Pacific timezone and this data is timezone sensitive due to shipping date and times. Likewise for a voting system with cut off timelines for the voters etc.. There are more new data types: Better User Defined Types Support: HIERARCHYID (for better support for hierarchical data), spatial data types (Geometry, Geography) and support of TYPES (this feature has been available in Oracle for ages now and will help make array processing and nesting of procedures a lot easier)., FILESTREAM (for better binary data storage) and there have been changes to the NTEXT, TEXT and IMAGE data-types as well which we will cover in the next part. Row Value Constructor Support in SQL Server 2008 One of the new additions that is being made to SQL Server 2008 is the addition of the row value constructors. I have not tested it completely for all the scenarios but here are some examples: Page 13 of 45
  • 14. Insert into dbo.decipher_test (col1, col2) values (1, 10), (2, 20), (3, 30) Likewise, in an update statement: update decipher_test set (col1, col2) = (select col3, col4 from dbo.test where col3 = decipher_test.col1) where colx > 10 Similarly, in a select statement: select * from dbo.decipher_test where (col1, col2) in (select col3, col4 from dbo.test) Update (SET and where clause) and Select statement (where clause) support for row value constructors has existed in other RDBMS like Oracle. DB2 LUW, MySQL as well as PostgreSQL support the row value constructors for the insert statement as well. It is good to see that SQL Server is moving in that direction as well. This is in fact an ANSI SQL standard. There are other good T-SQL features also that are coming our way in the next release (like the much awaited MERGE command, better support for the analytic functions - better support for the OVER clause for ranking functions and aggregates etc.) that we will be covering in our future blog posts. New Features in SQL Server 2008 - I We have blogged before about the new upcoming release of SQL Server. You can search for “Katmai” or “SQL Server 2008″ on this blog for more information. There are a lot of new features that are being introduced in this release as well as improvements to the existing features like the SSIS pipeline improvements and SSIS persistent look-ups. In our blog posts, we will start covering some of those new features as we start playing with the CTP’s. One of the new features that is going to be introduced is the Resource Governor. The resource governor allows the DBA to define resource limits and priorities for different workloads which will help in enabling concurrent workloads to provide a consistent performance to the applications. This feature has been available in Oracle for quite some time now and has been available in IIS 6.0 as well. This is a very nice feature to have since you will now be able to allocate and limit the usage of SQL Server resources based on priorities and the workload needs of different applications. In applications that allow the adhoc query formation against the databases (decision support systems or adhoc end user reporting for example) will benefit from it and at the same time if the instance is being used for several applications, one can throttle the usage by creating policies for usage. You should be able to do this for an application or a user. More details from MSFT are still to come through but if this feature is going to be anything like the Oracle’s database resource manager, then one should be able to get these benefits (this is just a wish list): 1) Should be able to set policies based on different criteria like the machine from where the connections are coming from, the type of application, the user name, group of users etc. Page 14 of 45
  • 15. 2) After policy configuration, one should be able to guarantee specific applications and users a minimum amount of processing resources regardless of the load on the system and the number of users on the system. 3) One should be able to set up policies to be able to distribute available processing resources by allocating percentages of CPU time to different applications and/or different users. For example: In a data warehousing application, a higher percentage of the processing power can be given to the ROLAP (Real Time OLAP) application as compared to the batch jobs that happen only during the night. 4) One should be able to decrease the parallelism execution plan generation for any operation performed by an user or members of a group of users. 5) Dynamic management of changing the allocation of resources without requiring a shutdown or re-start. 6) Throttle and log the execution of certain operations that violate the policy settings. Proper error reporting and logging is necessary for troubleshooting the issues. 7) Should be able to allow for automatic switching of users from one group to a fallback group based on the defined criteria in the policies. Say, a higher priority is assigned to a particular application or user and the session executed by that application is executing for more than the specified threshold, that particular session should be allowed to automatically switch to another group of users with different resource requirements. This will ensure that they higher priority tasks even though they reached their threshold get to completion. 8 ) Allowing for creation of thresholds on connections per application. Any additional connections should be allowed to queue with a timeout limit that should be configurable. MSFT learning portal for SQL Server 2008 SQL Server 2008 is slated to be released early next year. If you search on SQL Server 2008 on our blog, you will get a good list of blog posts and we keep on adding new ones as we play around with those feature sets. The latest CTP for that version is CTP5. MSFT also has put together a learning portal for that release of SQL Server and you can access it here: http://www.microsoft.com/learning/sql/2008/default.mspx Intellisense in SQL Server 2008 With CTP5 release, comes intellisense in SSMS (SQL Server Management Studio). Here is an image depicting the basic functionality: Page 15 of 45
  • 16. It is a really neat feature and can also help in learning the new objects in the system. A colleague of mine though absolutely hates it and wanted to turn this off. So, he went to Tools/Options and under the Text Editor, he set those values to off. Well, when he came in the next day and re-started SSMS, there it was again. All his changes were reverted back. He then started toying with the registry entry and went directly to the registry setting that is toggled by those GUI options: HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL Server100ToolsShellLanguagesLanguage ServicesSQL and set the CodeSense field to 0. If this setting is 1, it turns on Intellisense by default. Page 16 of 45
  • 17. Again, it is a personal preference thing. I believe it is a great feature and will help everyone a great deal - I always loved this feature when doing development in Visual Studio or doing PL/SQL development in PL/SQL Developer. SQL Server 2008: INSERT over a DML statement In this post, we are going to cover another T-SQL enhancement in SQL Server 2008. We will be using the same set of tables that we did in one of our previous blog post in which we had talked about the MERGE command in SQL Server 2008. Using the same set of tables, here is how the data looks like: CREATE TABLE PRODUCT_CATALOG (PRODUCT_NAME NVARCHAR(100) NOT NULL, QTY INT NOT NULL CHECK (QTY > 0)); CREATE TABLE PRODUCT_SALE (PRODUCT_NAME NVARCHAR(100) NOT NULL, SALE_QTY INT NOT NULL); INSERT INTO PRODUCT_CATALOG (PRODUCT_NAME, QTY) VALUES (’TOY1′, 100); INSERT INTO PRODUCT_CATALOG (PRODUCT_NAME, QTY) VALUES (’TOY2′, 50); INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY1′, 100); INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY2′, 10); INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY3′, 500); GO If you read that post, you will see that we had used the MERGE command to do the DML (Delete, Insert, Update) logic on the PRODUCT_CATALOG table. This was the first example of the MERGE command that we had used in that post: MERGE PRODUCT_CATALOG PC /* target table */ USING PRODUCT_SALE PS /* source table */ ON PC.PRODUCT_NAME = PS.PRODUCT_NAME WHEN MATCHED AND (Qty - SALE_QTY = 0) THEN DELETE WHEN MATCHED THEN /* update stock if you still hold some stock*/ UPDATE SET Qty = Qty - SALE_QTY WHEN NOT MATCHED THEN /*insert a row if the stock is newly acquired*/ INSERT VALUES (PRODUCT_NAME, SALE_QTY) /* output details of INSERT/UPDATE/DELETE operations – made on the target table*/ OUTPUT $action, inserted.PRODUCT_NAME, inserted.QTY, deleted.PRODUCT_NAME, deleted.QTY; Page 17 of 45
  • 18. Now, suppose that you want the output of this to be put into a separate audit or processing table but only when the update is done or when a certain other condition is met. It is possible to be able to now to a select atop the MERGE command. We will first need to create a table for storing that data. CREATE TABLE TEST_INSERT_OVER_DML (ACTION_CODE NVARCHAR(50), I_PRODUCT_NAME NVARCHAR(100), I_QTY INT, D_PRODUCT_NAME NVARCHAR(100), D_QTY INT) GO /************************************************* And now the insert over DML statement **************************************************/ INSERT INTO TEST_INSERT_OVER_DML (ACTION_CODE, I_PRODUCT_NAME, I_QTY, D_PRODUCT_NAME, D_QTY) select Action_Code, Inserted_Product_Name, Inserted_QTY, Deleted_Product_Name, Deleted_Qty from ( MERGE PRODUCT_CATALOG PC /* target table */ USING PRODUCT_SALE PS /* source table */ ON PC.PRODUCT_NAME = PS.PRODUCT_NAME WHEN MATCHED AND (Qty - SALE_QTY = 0) THEN DELETE WHEN MATCHED THEN /* update QTY if you still hold some QTY for that PRODUCT*/ UPDATE SET Qty = Qty - SALE_QTY WHEN NOT MATCHED THEN /*insert a row if the PRODUCT is newly acquired*/ INSERT VALUES (PRODUCT_NAME, SALE_QTY) /* output details of INSERT/UPDATE/DELETE operations – made on the target table*/ OUTPUT $action, inserted.PRODUCT_NAME, inserted.QTY, deleted.PRODUCT_NAME, deleted.QTY ) Change (Action_Code, Inserted_Product_Name, Inserted_QTY, Deleted_Product_Name, Deleted_Qty); And after executing it, let’s take a look at the data in the PRODUCT_CATALOG and the TEST_INSERT_OVER_DML tables: /************************* PRODUCT_CATALOG **************************/ PRODUCT_NAME QTY ------------ ----------- TOY2 40 TOY3 500 Page 18 of 45
  • 19. /************************* TEST_INSERT_OVER_DML **************************/ ACTION_CODE I_PRODUCT_NAME I_QTY D_PRODUCT_NAME D_QTY ----------- -------------- ----------- --------------- ----------- INSERT TOY3 500 NULL NULL DELETE NULL NULL TOY1 100 UPDATE TOY2 40 TOY2 50 As you can see from above, the action history log has been captured and the DELETE, INSERT and UPDATE actions took place on the PRODUCT_CATALOG table since the product: TOY1 got deleted, the QTY for TOY2 got decremented by 10 and TOY3 record got added. This new feature will help a lot in data warehousing ETLM operations and also for preparing audit logs for DML operations though CDC (Change Data Capture) would be a better option for auditing. MERGE command in SQL Server 2008 MERGE command (also known as UPSERT in Oracle circles) is nothing new to Oracle. It has existed since quite some time. We have also blogged about it on our site - including the enhancements to that command that were made in Oracle 10g. It gets introduced in SQL Server 2008 as well which is a welcome addition for all the data-warehousing/data-mining community as well as those writing complicated feeds to their OLTP systems. This command is way more powerful than just thinking of it as an UPDATE/DELETE/INSERT combined into one single statement. Let’s set up two tables and then start going through the features of this command to see how one can utilize it. /***************************************************************************** We are setting up two tables - One for the Product Catalog and the other one for the sales of the products *****************************************************************************/ SET NOCOUNT ON GO CREATE TABLE PRODUCT_CATALOG (PRODUCT_NAME NVARCHAR(100) NOT NULL, QTY INT NOT NULL CHECK (QTY > 0)); CREATE TABLE PRODUCT_SALE (PRODUCT_NAME NVARCHAR(100) NOT NULL, SALE_QTY INT NOT NULL); INSERT INTO PRODUCT_CATALOG (PRODUCT_NAME, QTY) VALUES (’TOY1′, 100); INSERT INTO PRODUCT_CATALOG (PRODUCT_NAME, QTY) VALUES (’TOY2′, 50); INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY1′, 100); INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY2′, 10); INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY3′, Page 19 of 45
  • 20. 500); GO /***************************************************************************** Apply changes to the PRODUCT_CATALOG table based on daily sales that are tracked in the incoming Product_Sale table from different stores. Delete the record from the PRODUCT_CATALOG table if all the items for that Product_Name have been sold. Update and decrement the quantity if quantity after the sale is not 0, and Insert a new record if there was a new item introduced at one of the regional stores. *****************************************************************************/ MERGE PRODUCT_CATALOG PC /* target table */ USING PRODUCT_SALE PS /* source table*/ ON PC.PRODUCT_NAME = PS.PRODUCT_NAME WHEN MATCHED AND (Qty - SALE_QTY = 0) THEN DELETE WHEN MATCHED THEN /* update QTY if you still hold some QTY for that product*/ UPDATE SET Qty = Qty - SALE_QTY WHEN NOT MATCHED THEN /*insert a row if the PRODUCT is newly acquired*/ INSERT VALUES (PRODUCT_NAME, SALE_QTY) /* output details of INSERT/UPDATE/DELETE operations – made on the target table*/ OUTPUT $action, inserted.PRODUCT_NAME, inserted.QTY, deleted.PRODUCT_NAME, deleted.QTY; $action PRODUCT_NAME QTY PRODUCT_NAME QTY ---------- ------------- ------ ------------- ------ INSERT TOY3 500 NULL NULL DELETE NULL NULL TOY1 100 UPDATE TOY2 40 TOY2 50 SELECT * FROM PRODUCT_CATALOG; GO PRODUCT_NAME QTY ------------ ----------- TOY2 40 TOY3 500 This is a very simple example to illustrate the functionality of the MERGE command. The comments given above are self explanatory. Let’s move on to other advanced features of this wonderful enhancement. In the example above, you saw that we are joining between two tables. There is no reason why we cannot join the PRODUCT_CATALOG table with another sub- query: Page 20 of 45
  • 21. Suppose, we had another table called SALE_ORDER in which we kept track of the sales of all the products per store and the quantity that was sold. In that scenario, we will get the total Sale Quantity by summing it up over the PRODUCT_NAME for a given date. CREATE TABLE SALE_ORDER (STORE_NBR INT NOT NULL, PRODUCT_NAME NVARCHAR(100) NOT NULL, SALE_QTY INT NOT NULL, ORDER_DATE DATETIME NOT NULL DEFAULT GETDATE()); INSERT INTO SALE_ORDER (STORE_NBR, PRODUCT_NAME, SALE_QTY) VALUES (1, ‘TOY1′, 10); INSERT INTO SALE_ORDER (STORE_NBR, PRODUCT_NAME, SALE_QTY) VALUES (2, ‘TOY1′, 20); INSERT INTO SALE_ORDER (STORE_NBR, PRODUCT_NAME, SALE_QTY) VALUES (3, ‘TOY1′, 50); GO And let us drop and re-create the two tables from above. Please note that in the query below, we are MERGE PRODUCT_CATALOG PC USING (SELECT PS.PRODUCT_NAME, SUM(SO.SALE_QTY) AS SALE_QTY FROM PRODUCT_SALE PS INNER JOIN SALE_ORDER SO ON SO.PRODUCT_NAME = PS.PRODUCT_NAME WHERE SO.ORDER_DATE >= CONVERT(VARCHAR(10), GETDATE(), 101) AND SO.ORDER_DATE < CONVERT(VARCHAR(10), GETDATE() + 1, 101) GROUP BY PS.PRODUCT_NAME) AS IV (PRODUCT_NAME, SALE_QTY) ON PC.PRODUCT_NAME = IV.PRODUCT_NAME WHEN MATCHED AND (Qty - SALE_QTY = 0) THEN DELETE WHEN MATCHED THEN /* update QTY if you still hold some QTY for that product*/ UPDATE SET Qty = Qty - SALE_QTY WHEN NOT MATCHED THEN /*insert a row if the PRODUCT is newly acquired*/ INSERT VALUES (PRODUCT_NAME, SALE_QTY) /*output details of INSERT/UPDATE/DELETE operations – made on the target table*/ OUTPUT $action, inserted.PRODUCT_NAME, inserted.QTY, deleted.PRODUCT_NAME, deleted.QTY; $action PRODUCT_NAME QTY PRODUCT_NAME QTY ---------- ------------- ----------- ------------- ----- UPDATE TOY1 20 TOY1 100 SELECT * FROM PRODUCT_CATALOG; GO Page 21 of 45
  • 22. PRODUCT_NAME QTY —————————————————————————————————- ———– TOY1 20 TOY2 50 In the next post, we will go over another T-SQL enhancement in SQL Server 2008 - in that one, we will cover the enhancement to the INSERT statement - one can insert over a DML and one can also have minimally logged insert operations (same as direct path inserts in the case of Oracle). Change Data Capture Feature in SQL Server 2008 We have been blogging for quite some time now on the new feature sets of the upcoming release of SQL Server - SQL Server 2008. You can search for SQL Server 2008 on this blog and get to those posts. In this post, we wil cover another new feature called “Change Data Capture”. Using this feature, one can capture the DML activities (insert, update and delete) that take place against the tables. There are two steps that need to happen in order for this to take place: A login that has sysadmin rights first needs to enabled the database for change data capture and then a member of the db_owner database role needs to enable the tables for which the DML activities need to be captured. Let’s first enable the database and then a table for demonstrating this and then we will go over how the capture process works internally. On our instance, currently none of the databases is configured for CDC (Change Data Capture). You can see that from the IS_CDC_ENABLED column of SYS.DATABASES: USE MASTER GO SELECT IS_CDC_ENABLED, NAME FROM SYS.DATABASES GO IS_CDC_ENABLED NAME -------------- ------------------- 0 master 0 tempdb 0 model 0 msdb 0 MDW 0 ReportServer 0 ReportServerTempDB 0 AdventureWorks 0 DECIPHER_TEST Now, in order to enable it for the database DECIPHER_TEST, we will have to run the stored procedure: “sys.sp_cdc_enable_db_change_data_capture” in the context of DECIPHER_TEST database: Page 22 of 45
  • 23. USE DECIPHER_TEST GO EXEC sys.sp_cdc_enable_db_change_data_capture GO Now, if we execute the first query, we will see that DECIPHER_TEST has been enabled for CDC: IS_CDC_ENABLED NAME -------------- ------------------- 0 master 0 tempdb 0 model 0 msdb 0 MDW 0 ReportServer 0 ReportServerTempDB 0 AdventureWorks 1 DECIPHER_TEST So, what exactly did SQL Server do when we enabled the CDC option? It went ahead and created the cdc schema, the cdc user, the meta-data tables and other system objects that will be used for this process. You can look at the image below to see some of those objects that were created. Page 23 of 45
  • 24. Now that we have enabled the database, we will enable the table TEST_TBL in the dbo schema for the CDC process. In order to do so, a member of the db_owner fixed database role needs to create a capture instance for the TEST_TBL table. We need to first ensure that the SQL Server Agent is running and then use the “sys.sp_cdc_enable_table_change_data_capture” procedure: USE DECIPHER_TEST GO EXECUTE sys.sp_cdc_enable_table_change_data_capture @source_schema = N’dbo’ , @source_name = N’TEST_TBL’ , @role_name = N’cdc_Admin’ GO This execution will then create two jobs that you will see in the informational messages as part of this execution: Job ‘cdc.DECIPHER_TEST_capture’ started successfully. Job ‘cdc.DECIPHER_TEST_cleanup’ started successfully. Page 24 of 45
  • 25. In addition, if you want to track which tables have been marked for the CDC process, you can use this SQL: select is_tracked_by_cdc, name, type from sys.tables where type = ‘u’ go The ones marked with 1 are the ones that are tracked. is_tracked_by_cdc name type ----------------- ----------------- ---- 0 ddl_history U 0 lsn_time_mapping U 0 captured_columns U 0 index_columns U 0 dbo_TEST_TBL_CT U 1 TEST_TBL U 0 systranschemas U 0 change_tables U By creating a capture instance, a change table is also created by this process. There are a couple of options that can be utilized when specifying the capture process for a table: a) You can specify which columns you want to capture, b) You can specify an index to help uniquely identify the rows in the change table, c) Name of a fileroup in which that change table should be created, d) There is a concept of net change tracking with CDC. By default, only one UDF is created to access the data in the change table: cdc.fn_cdc_get_all_changes_<name of the capture instance>. In our example from above, the name of the UDF is cdc.fn_cdc_get_all_changes_dbo_TEST_TBL. You can also use the parameter @supports_net_changes and assign it a value of 1 to create another UDF which will return only one change for each row that was changed in a given interval. So, essentially when the DML operations are made against the source tables, these are then recorded into the transaction log of the database. The CDC process that we just set up then reads the log records and inserts modifications made to captured columns in the corresponding change table(s) in the CDC schema. Now, let us follow it up with an example of DML operations against the TEST_TBL table. If you recall from a previous posts in which we had covered some Page 25 of 45
  • 26. other T-SQL features of SQL 2008 (here and here), this table has three columns: COL1, COL2 and COL3 and this set of data: SELECT * FROM TEST_TBL GO COL1 COL2 COL3 ----------- ---------- ----------- 1 A 10 1 A 20 1 A 30 1 B 90 2 A 30 2 A 100 3 C 110 3 C 120 Now, we will execute these DML statements against this table: delete from TEST_TBL; insert into TEST_TBL values (1, ‘A’, 1000); Update dbo.TEST_TBL set COL2 = ‘X’ where col1 = 1; And this should have created the log enteries into the table in the CDC schema. Let’s query that using the UDF that was created for us when we had set up the CDC process: DECLARE @start_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10); SET @start_time = GETDATE()-1 SET @end_time = GETDATE(); SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn(’smallest greater than or equal’, @start_time); SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn(’largest less than or equal’, @end_time); /**************************************************************** Show the changes that happened during that query window *****************************************************************/ SELECT __$start_lsn, __$seqval, case __$operation when 1 then ‘Delete’ when 2 then ‘Insert’ when 3 then ‘Update (Before)’ when 4 then ‘Update (After)’ end as Operation, Col1, Col2, Col3 Page 26 of 45
  • 27. FROM cdc.fn_cdc_get_all_changes_dbo_test_tbl (@from_lsn, @to_lsn, ‘all update old’) order by 1, 2; GO __$start_lsn __$seqval Operation Col1 Col2 Col3 ---------------------- ---------------------- --------------- ----------- ---------- ----------- 0x00000019000001CF000B 0x00000019000001CF0002 Delete 1 A 10 0x00000019000001CF000B 0x00000019000001CF0003 Delete 1 A 20 0x00000019000001CF000B 0x00000019000001CF0004 Delete 1 A 30 0x00000019000001CF000B 0x00000019000001CF0005 Delete 1 B 90 0x00000019000001CF000B 0x00000019000001CF0006 Delete 2 A 30 0x00000019000001CF000B 0x00000019000001CF0007 Delete 2 A 100 0x00000019000001CF000B 0x00000019000001CF0008 Delete 3 C 110 0x00000019000001CF000B 0x00000019000001CF0009 Delete 3 C 120 0x00000019000001D20004 0x00000019000001D20002 Insert 1 A 1000 0x00000019000001D30006 0x00000019000001D30002 Update (Before) 1 A 1000 0x00000019000001D30006 0x00000019000001D30002 Update (After) 1 X 1000 You can imagine the benefits of the CDC feature - fine grained auditing, enhanced functionality for an ETL application that needs to incrementally populate a data mart. There are of course a lot of other things that need to be discussed pertaining CDC - what happens when the source table gets modified by a DDL command (change of data-type, addition/deletion of a column etc.), the overhead of having a CDC process - is it any different than using a DML trigger (the answer is Yes - we will cover it in a future blog post). Data Collector in SQL Server 2008 This is a wonderful new feature in SQL Server 2008. Data Collector in SQL Server 2008 is a component that is installed in a SQL instance and which can either be configured to run on your defined schedule or can run all the time and collect different sets of data pertaining to performance diagnostics, historical data for baseline comparisons, policy based data etc.. It can then be used to store this data into a relational database that is called the management data warehouse. So, how is it different than just using SQL Trace and perfmon? It actually provides a central point for data collection across servers and across applications and allows you to control what gets collected, allows you to report over it, allows you to create a data warehouse of this data for baseline and trend analysis, it allows you to extend it further using an API. Page 27 of 45
  • 28. Let’s first see where it appears in SSMS and then how do we configure it. After that, in subsequent posts, we will cover the different terminologies and different usage bits for this wonderful feature. In the image below, you can see that once you connect to the instance, you will see “Data Collection” icon under the Management tree: Since this is the first time we doing it, we will need to configure the management data warehouse. So, right click and select “Configure Management Data Warehouse”. It will bring up the wizard and on the second image below you will see that you will need to mention the instance, the new database (we are calling it CMDB) and then a cache directory location. Page 28 of 45
  • 29. The third image below shows the place where you will need to map an existing login (you can create a new one from this screen) to the roles within the database. You can see the descriptions of those roles in the image below. Page 29 of 45
  • 30. Once that is done, the wizard will do the configuration and also provide you with the output of the work done. In subsequent posts, we will start looking at the data collection sets, reporting and other features of Data Collector. Compound Assignment Operators in SQL Server 2008 Continuing in the series on the new features in SQL Server 2008, here is another small little tid- bit. SQL Server 2008 (tested in CTP5) supports the compound assignment operators. Here is a working example of those operators: declare @i int set @i = 100 /************************** Addition and assignment ***************************/ set @i += 1 select @i Page 30 of 45
  • 31. ———– 101 /************************** Subtraction and assignment ***************************/ set @i -= 1 select @i ———– 100 /************************** Multiplication and assignment ***************************/ set @i *= 2 select @i ———– 200 /************************** Division and assignment ***************************/ set @i /= 2 select @i ———– 100 /************************** Addition and assignment ***************************/ set @i %= 3 select @i ———– 1 /************************** xor operation and assignment ***************************/ set @i ^= 2 select @i Page 31 of 45
  • 32. ———– 3 /************************** Bitwise & operation and assignment ***************************/ set @i &= 2 select @i ———– 2 /************************** Bitwise | operation and assignment ***************************/ set @i |= 2 select @i ———– 2 Back Up Compression Option in SQL Server 2008 We have been covering the new features of the upcoming release of the next version of SQL Server. You can search on “SQL Server 2008″ on this blog and get to those posts. In this post, we will briefly cover the back up compression option that has been introduced in SQL Server 2008. All this code is valid as of CTP5. First, let’s check the size of the database first. /****************************** Get the size of the databases *******************************/ sp_helpdb go name —————— ————- AdventureWorks 174.63 MB DEC_TEST_FS 46.44 MB DECIPHER_TEST 678.31 MB We will pick up the DECIPHER_TEST database for this post. We will run the back-up command first without specifying any compression and then next with compression. Page 32 of 45
  • 33. /****************************** Do the back-up - Uncompressed *******************************/ BACKUP DATABASE DECIPHER_TEST TO DISK = ‘C:Back_UPDT.bak’ WITH INIT /****************************** Do the back-up - Compressed *******************************/ BACKUP DATABASE DECIPHER_TEST TO DISK = ‘C:Back_UPDT_COMP.bak’ WITH INIT, COMPRESSION For the un-compressed back-up, we got: BACKUP DATABASE successfully processed 65985 pages in 118.665 seconds (4.555 MB/sec). For the compressed back-up, we got: BACKUP DATABASE successfully processed 65985 pages in 129.773 seconds (4.165 MB/sec). The difference in size was: 528MB for the un-compressed vs 104MB for the compressed. All this was done on a very small VM but the difference in timings and the size of the back-ups should give you an idea on the difference between un-compressed and compressed back-ups. The compression ratio of course is dependent upon what kind of data do we have in the database. For example: The string data compresses better than other data. Now that we have backed up the database, how can we tell whether a particular back-up was done using compression option or not? For that, we can use the RESTORE HEADERONLY command and check the value under the COMPRESSED column. If that value is 1 then, it is a compressed back-up and if it is 0, then it is an un-compressed back-up. Example: restore headeronly from disk = ‘C:Back_UpDT_COMP.bak’ go /*Abbreviated: Showing only the column we are interested in*/ Compressed ———– 1 restore headeronly from disk = ‘C:Back_UpDT.bak’ go Page 33 of 45
  • 34. Compressed ———– 0 And the compressed and un-compressed backups cannot be mixed up on the same file. Example: Trying to do a compressed back-up on a file that already has non-compressed back-up will give us this error: Msg 3098, Level 16, State 2, Line 1 The backup cannot be performed because ‘COMPRESSION’ was requested after the media was formatted with an incompatible structure. To append to this media set, either omit ‘COMPRESSION’ or specify ‘NO_COMPRESSION’. Alternatively, you can create a new media set by using WITH FORMAT in your BACKUP statement. If you use WITH FORMAT on an existing media set, all its backup sets will be overwritten. Another thing to note is that you can also change the default behavior of the back-ups to be always compressed back-ups. You can do so by changing the instance level setting using the sp_configure command. Example: EXEC sp_configure ‘backup compression default’, 1 RECONFIGURE GO You can choose to do so or do it only at the individual command level. We will be doing some testing with the RTM version (whenever it is released) on some real world databases to see the CPU overhead and the back-up time and will post the results on the blog. While restoring these back-up sets, there is nothing that needs to be specified to indicate that we are restoring a compressed back-up set. Now, while doing a restore - we saw that the restore from the compressed back-up took lesser time as compared to the restore from the un-compressed back- up. One would have thought that it would be the reverse since some cost will be associated with the un-compressing. This is something that we will research and post in a future blog post. Looks like the smaller size of the back-up set in the case of a compressed back-up off-sets some of that time. FORCESEEK Table Hint in SQL Server 2008 In prior versions of SQL Server, there was a way to force the optimizer to consider an index in the execution plan - however, there was no way to always force it to do a seek operation. That has now been added in the next release (tested against CTP5). A point of caution here - don’t be trigger happy and start using the hints. In a majority of the scenarios, the optimizer makes the right decision for the execution plan - you just need to make sure that it has statistics available to form the histograms correctly for the data distribution and it will do it’s job. However, there are always corner cases and that is why SQL Server (as well as Oracle and DB2 LUW) have table level, Query level and Join related hints available. The location of the hints and their syntax and usage is of course different in the three RDBMS. We have discussed some of these hints before like the ORDERED hint in Oracle (same as OPTION(FORCE ORDER) in SQL Server). Page 34 of 45
  • 35. The FORCESEEK hint is a table level hint. It can also be applied to views and applies to both the clustered as well as the non-clustered indexes. It can also be applied in conjunction with the index hint i.e. you can force the optimizer to use a specific index as well as force it to use a seek operation on it. Let’s create an artificial situation here to demonstrate the usage. We will create a table with a million records in it which has a column that has only 3 distinct values. The data will be heavily skewed for this column. Yes - we are trying to re-create the bind variable peeking (Oracle) or parameter sniffing (SQL Server) scenario to help demonstrate the usage of this new hint. We had covered that in some of our posts before - you can read them here and here. Here goes: SET NOCOUNT ON GO CREATE TABLE BIG_TABLE ( COL1 INT IDENTITY NOT NULL, COL2 TINYINT NOT NULL ) GO DECLARE @I INT SET @I = 1 WHILE (@I <= 1000000) BEGIN IF (@I <=10) INSERT INTO BIG_TABLE (COL2) VALUES (0) IF (@I > 10 AND @I <= 100000) INSERT INTO BIG_TABLE (COL2) VALUES (10) IF (@I > 100000 AND @I <= 1000000) INSERT INTO BIG_TABLE (COL2) VALUES (90) SET @I = @I + 1 END GO Here is the data distribution that was created by this small script: SELECT COL2, COUNT(*) AS REC_CNT FROM BIG_TABLE GROUP BY COL2 ORDER BY COL2; Page 35 of 45
  • 36. COL2 REC_CNT ---- ----------- 0 10 10 99990 90 900000 Now, let us create a non-clustered index on it: CREATE NONCLUSTERED INDEX BIG_TABLE_IND_1 ON BIG_TABLE (COL2) /*FILEGROUP CLAUSE*/ GO And now, let’s fire off a parameterized query against this using the data value of 90 which we know from above qualifies for 900,000 records i.e. 90% of the records in the table. SET SHOWPLAN_TEXT ON GO DECLARE @I INT SET @I = 90 SELECT * FROM BIG_TABLE WHERE COL2 = @I GO Here is the execution plan for this statement: StmtText ————————————————————————————————————— |–Table Scan(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE]), WHERE:([DEC_TEST_FS]. [dbo].[BIG_TABLE].[COL2]=[@I])) And now, let us change the value that is being passed in to be 0. As we can see from the data distribution from above, the value of 0 qualifies for only 10 records in the table and that has a very good selectivity (0.001 %). The index should jump right at it and use it. DECLARE @I INT SET @I = 0 SELECT * FROM BIG_TABLE WHERE COL2 = @I GO StmtText ————————————————————————————————————— |–Table Scan(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE]), WHERE:([DEC_TEST_FS]. [dbo].[BIG_TABLE].[COL2]=[@I])) As we know from previous posts on the topic, since the execution plan is already in the cache (based on the prior execution), the same execution plan gets used again this time even though the parameter has very good selectivity. There are ways to circumvent this issue even in prior versions of SQL Server and we have discussed those on our blog site. Another way to solve this Page 36 of 45
  • 37. would be to always re-compile this query regardless of the variable or to have a constant in this case instead of a parameter. Usage of plan guides is another way of fixing this. Yet another way would be to use the FORCESEEK hint in places where we know the code will be getting fired only for the value of 0. Let’s take a look at the execution using this hint: DECLARE @I INT SET @I = 0 SELECT * FROM BIG_TABLE WITH (FORCESEEK) WHERE COL2 = @I GO StmtText —————————————————————————————————————— —————————————- |–Parallelism(Gather Streams) |–Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000], [Expr1005]) WITH UNORDERED PREFETCH) |–Index Seek(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE].[BIG_TABLE_IND_1]), SEEK:([DEC_TEST_FS].[dbo].[BIG_TABLE].[COL2]=[@I]) ORDERED FORWARD) |–RID Lookup(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE]), SEEK: ([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD) HierarchyID Data Type in SQL Server 2008 This is another blog entry in the series of SQL Server 2008 posts. In this post, we will cover the basics of one more data-type: hierarchyID. This data-type, as the name suggests, can be used to store hierarchical data structures like organizational trees, file system structures or any other type of hierarchical tree structures. In SQL Server 2008, you can now store that data and very easily query it as well. In this post, we will just look at a basic example of putting in the hierarchy data and querying it. There’s a whole lot more to it including updating and re-arranging the hierarchy, re-ordering of the data, creation of indexes to help optimize the searches on hierarchical data etc. Let’s look at this briefly using an example: Let us create an EMP_MASTER table which will house only the Employee data and then two table variables - one to act as the source of the data for the hiearchies and the second where we recursively populate the data in a hierarchical fashion. /******************************************************************* Creation of the Employee Master table ********************************************************************/ CREATE TABLE EMP_MASTER ( EMP_NBR INT NOT NULL PRIMARY KEY, EMP_NAME NVARCHAR(50), Page 37 of 45
  • 38. MGR_NBR INT NULL ) /******************************************************************* Populating the table with some sample data The MGR_NBR column reflects the immediate manager of the Employee ********************************************************************/ INSERT INTO EMP_MASTER VALUES (1, ‘DON’, 5); INSERT INTO EMP_MASTER VALUES (2, ‘HARI’, 5); INSERT INTO EMP_MASTER VALUES (3, ‘RAMESH’, 2); INSERT INTO EMP_MASTER VALUES (4, ‘JOE’, 5); INSERT INTO EMP_MASTER VALUES (5, ‘DENNIS’, NULL); INSERT INTO EMP_MASTER VALUES (6, ‘NIMISH’, 3); INSERT INTO EMP_MASTER VALUES (7, ‘JESSIE’, 3); INSERT INTO EMP_MASTER VALUES (8, ‘KEN’, 5); INSERT INTO EMP_MASTER VALUES (9, ‘AMBER’, 4); INSERT INTO EMP_MASTER VALUES (10, ‘JIM’, 6); /******************************************************************* Declaration of the two table variables @ORG_HIERARCHY will hold the actual hierarchical data after the recursion query populates it @HIERARCHY_TREE will hold the EMP_MASTER data with actual numbering of the data partitioned by their manager’s number ********************************************************************/ DECLARE @ORG_HIERARCHY TABLE (HIERARCHY_NODE HIERARCHYID, EMP_NBR INT, MGR_NBR INT, EMP_NAME NVARCHAR(50)); DECLARE @HIERARCHY_TREE TABLE (EMP_NBR INT, MGR_NBR INT, NBR_REC_IN_NODE INT) INSERT INTO @HIERARCHY_TREE SELECT EMP_NBR, MGR_NBR, ROW_NUMBER() OVER (PARTITION BY MGR_NBR ORDER BY MGR_NBR) FROM EMP_MASTER; /********************************************************************* Recursive Query to take the data and populate the @ORG_HIERARCHY table variable. This stores the Node and the path information which we will then query using Hiearchical query methods First we get the root data (using the GetRoot() method) and then we recurse through the tree and populate the hierarchy_Node column. Note that you would need to do a cast to a string and use the ToString() method to get the hierarchyid. **********************************************************************/ Page 38 of 45
  • 39. WITH ORG_PATH (path, EMP_NBR) AS ( SELECT hierarchyid::GetRoot() AS Node, EMP_NBR FROM @HIERARCHY_TREE AS A WHERE MGR_NBR IS NULL UNION ALL SELECT CAST(p.path.ToString() + CAST(NBR_REC_IN_NODE AS nvarchar(50)) + ‘/’ AS hierarchyid), A.EMP_NBR FROM @HIERARCHY_TREE AS A JOIN ORG_PATH AS P ON A.MGR_NBR = P.EMP_NBR ) insert into @ORG_HIERARCHY select PATH, A.EMP_NBR, A.MGR_NBR, A.EMP_NAME from EMP_MASTER AS A INNER JOIN ORG_PATH AS B ON A.EMP_NBR = B.EMP_NBR; /********************************************************************* Now, let’s query the data using the ToString() and GetLevel() methods and then order by the Level field. Output is shown below **********************************************************************/ SELECT Hierarchy_Node.ToString() AS Node, Hierarchy_Node.GetLevel() as Level, * FROM @ORG_HIERARCHY ORDER BY Level; Node Level HIERARCHY_NODE EMP_NBR MGR_NBR EMP_NAME ------------ ------ --------------- ----------- ----------- -------- / 0 0x 5 NULL DENNIS /1/ 1 0x58 8 5 KEN /2/ 1 0x68 4 5 JOE /3/ 1 0x78 1 5 DON /4/ 1 0x84 2 5 HARI /4/1/ 2 0x8560 3 2 RAMESH /2/1/ 2 0x6AC0 9 4 AMBER /4/1/1/ 3 0x856B 6 3 NIMISH /4/1/2/ 3 0x856D 7 3 JESSIE /4/1/1/1/ 4 0x856B58 10 6 JIM As you can see from above, the different levels are defined and the Nodes are defined as well. So, what exactly does this replace from the previous system? i.e. what are the alternate ways of doing this hierarchical structure? One can use the normalized structures of a parent/child relationship, one can even make use of the XML data-type to store un-structured data. We will do some benchmarks and will post the performance results on our blog for the different methods. In our future blog posts, we will cover the indexing schemes, the updates and re-arranging of the data in the hierarchyID data-type columns. Page 39 of 45
  • 40. FILESTREAM Data in SQL Server 2008 We have been covering the new features in SQL Server 2008 in some of our blog posts. In some of those, we had also discussed the new data-types in that release. You can search for SQL Server 2008 on this blog site to get to all those posts. In today’s blog post, we will discuss what can be done for designing and implementing FILESTREAM storage in SQL Server 2008. Prior to SQL Server 2008, storage of items like documents, images was typically done by storing a link in the table column and storing the documents and images in a particular mapped out location on the SAN or other storage device. For images, some people also used to store it in the imsage data-type but that was not always the most performant solution. In SQL Server 2008, Filestream enables the application to store such un-structered data on the file system. So, the application can now take advantage of the streaming APIs and performance of the file-system. In addition, one can now co-relate the un-structured and structured data easily. The Win32 filesystem interfaces provide the streaming access to the data and a good thing about filestream is that it does not use the SQL Server buffer cache - it uses the Windows system cache for the caching of the file data. In order to make use of FILESTREAM, one needs to enable filestream for an instance of SQL Server and then one needs to create a database that supports filestream. Once that is done, we can use the FILESTREAM clause next to the VARBINARY(MAX) data-type to take advantage of this new feature. Let’s follow this up with an example: 1) Let us first enable the instance for filestream: USE MASTER GO EXEC sp_filestream_configure @enable_level = 3 This can also be changed from the GUI from the Advanced level properties for the instance. Below is the jpg image for it. You can see from the drop-down the different levels that are available for FILESTREAM. We have chosen the highest level - Transact SQL and filesystem (local and remote client access). Page 40 of 45
  • 41. 2) Now that we have enabled the filestream option at the instance level, let us go ahead and create a new database. We would need to create a filegroup that has the CONTAINS FILESTREAM as the clause so that we can designate that filegroup (and it’s files) to contain the filestream data. USE MASTER GO CREATE DATABASE DEC_TEST_FS ON PRIMARY ( NAME = DEC_TEST_FS_data, FILENAME = N’C:DEC_TEST_FS_data.mdf’), FILEGROUP DEC_TEST_FS_FG_1 ( NAME = DEC_TEST_FS_REGULAR, FILENAME = N’C:DEC_TEST_FS_data_1.ndf’), FILEGROUP DEC_TEST_FS_FG_2 CONTAINS FILESTREAM ( NAME = DEC_TEST_FS_FILESTREAM, FILENAME = N’C:DEC_TEST_FS’) LOG ON ( NAME = DEC_TEST_FS_LOG, FILENAME = N’C:DEC_TEST_FS_log.ldf’); GO Please note that if you try to create this database by specifying a path for the filestream files that is not on NTFS, you will get the error message: “The path specified by ‘d:DEC_TEST_FS’ cannot be used for FILESTREAM files because it is not on NTFS.” For the Filestream filegroup, the FILENAME refers to the path and not to the actual file name. It creates that particular folder - from the example above, it created the C:DEC_TEST_FS folder on the filesystem. And that folder now contains a filestream.hdr file and also a folder $FSLOG folder. If you already have a database, you can add a FILESTREAM filegroup to it using ALTER DATABASE command. 3) Now, that we have the instance and the database taken care of, let us create a table to take advantage of this new feature: USE DEC_TEST_FS GO CREATE TABLE DEC_FS_TEST ( COL1 INT IDENTITY, COL2 UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE, Page 41 of 45
  • 42. COL3 VARBINARY(MAX) FILESTREAM ); The table definition needs a ROWGUIDCOL column - this is required by FILESTREAM. The actual data is stored in the column COL3. Now, let’s insert data into this table: INSERT INTO DEC_FS_TEST (COL2, COL3) VALUES (NEWID(), NULL); INSERT INTO DEC_FS_TEST (COL2, COL3) VALUES (NEWID(), CAST(’MY DUMMY TEST’ AS VARBINARY(MAX))); And now we can select out of this table and see how this data is represented: COL1 COL2 COL3 ———– ———————————— ———————————————————– 1 78909DBF-7B26-4CA9-A840-4D45930F7523 NULL 2 0B0F5833-1997-4C9C-A9A7-F2536D68CFED 0×4D592044554D4D592054455354 If you see on the filesystem, you will see additional folders have been created under DEC_TEST_FS folder. That is shown in the jpg image below. One can also use T-SQL to update the filestream data if one needs to. The deletes will also work - the delete also marks the underlying file system files for deletion. We will be doing some benchmarks on this new feature and will post our findings on the blog. XML DML Enhancement in SQL Server 2008 This is another post to mention one more enhancement made in the next version of SQL Server. In SQL Server 2005 itself, the XML data type supports several methods such as value, exist, modify, query and nodes. These methods allow querying as well as modifying an XML instance. The modify method in SQL Server 2008 now also allows DML on the XML instances. Here is an example: /******************************************************************** Do the declarations and the assignments Page 42 of 45
  • 43. *********************************************************************/ declare @a xml = ‘<root><sku ID=”10″/><sku ID=”20″/></root>’; declare @b xml = ‘<sku ID=”1″/>’; /******************************************************************** Do the set and the modify *********************************************************************/ SET @a.modify (’insert sql:variable(”@b”) as first into (/root)[1]‘); select @a; <root> <sku ID=”1″ /> <sku ID=”10″ /> <sku ID=”20″ /> </root> Lock Escalation Changes in SQL Server 2008 We have talked about lock escalation in some of our previous posts - you can read more on it here and here. In SQL Server 2008, there is an enhancement to the ALTER TABLE statement that allows a new LOCK_ESCALATION option that controls the escalation options on a table. One can thus disable table level lock escalations. If you have implemented partitioning in your system, then this option can be applied at the partition level as well. This option has three possible values: TABLE, DISABLE and AUTO. TABLE is the default value and the behavior remains the same as SQL Server 2005 i.e. the lock escalation (when happens) is done at the table level regardless of whether the table is partitioned or not. DISABLE option prevents the table level lock escalation and the AUTO option lets SQL Server engine decide the lock escalation granularity - if the table is partitioned, then the escalation is allowed at the partition level and if it is not partitioned, then the escalation is done at the table level. SQL Server 2008 Deployment Guides from Dell I was having a discussion with one of our client’s IT directors and he pointed out some deployment guides from Dell for SQL Server 2008 deployments. This client is one of the early adopters of SQL Server 2008 and is evaluating going live on it by the end of the year. These guides have a wealth of information in them and I would encourage everyone to read through them: SQL Server 2008 CTP5, SQL Server 2008 on Windows 2003 and SQL Server 2008 on Windows 2008. Filtered Indexes and Statistics in SQL Server 2008 Page 43 of 45
  • 44. Filtered indexes have been introduced in SQL Server 2008 as a means of optimizing the non- clustered indexes. Suppose that you have a column which has a range of status values ranging from 0 to 99. However, the application only queries the data that lies between 20 and 30. A filtered index will be suited for this scenario since it will have the data in a well defined subset. So, what would be the benefit of having this index on a sub-set of the data? Storage definitely is the first thing that comes to mind i.e. the storage needs for the index will be reduced which goes hand in hand with lower maintenance overhead since the rebuild or update of the stats will take lesser time in this case. In addition to these, it is good for performance as well since it is smaller in size as compared to the regular non-clustered index. And there are other creative ways of using it as well. If you have read one of our previous blog post on ways to allow a unique index in SQL Server and DB2 LUW to allow more than one unique value, what we had provided was a work-around. One can make use of filtered indexes to create such a constraint. Let’s use the same example that we had used in that post: CREATE TABLE TEST_UQ (COL1 INT IDENTITY(1,1) PRIMARY KEY, COL2 NVARCHAR(10) NULL) GO insert into test_uq (col2) values (’abc’); insert into test_uq (col2) values (’xyz’); insert into test_uq (col2) values (Null); insert into test_uq (col2) values (Null); Now, let’s create the Unique index as a filtered index: create unique index Decipher.TEST_UQ_IND_1 on Decipher.test_uq(col2) where Decipher.test_uq(col2) is not null; By making use of the filtered index, we have removed the NULL values while enforcing the constraint. Another thing to keep in mind is that filtered statistics have also been introduced in SQL Server 2008 (to see the difference between indexes and statistics, see our previous blog post). Filtered Statistics are essentially created on a sub-set of the data to help improve the performance of queries that select from a defined sub-set (say only specific status codes, only open orders, only specific billing codes etc.). You can read up more in the online BOL version. Virtual Labs MSFT virtual labs is a good way to try out new technology through a guided hands on labs in their virtual online environment. They have virtual labs available for Business Intelligence, SQL Server (including SQL Server 2008), Windows (including the latest 2008 OS),Exchange and many others. This is a very good way to get your hands dirty and gain some knowledge without investing into the software, hardware or training. Check out this link on TechNet for the virtual labs: Page 44 of 45
  • 45. http://technet.microsoft.com/en-us/bb467605.aspx?wt.svl=leftnavvirtuallabs Also, here is the link for Windows Server 2008 virtual labs: http://www.microsoft.com/events/series/windowsserver2008.aspx?tab=virtuallabs Page 45 of 45