Sql Server 2008 New Programmability Features - Presentation Transcript
New Programmability Features
Valinor SQL Server Professional Services Database Projects end-to-end High Availability & Disaster Recovery Upgrades Performance Analysis & Tuning Security Training contact@valinor.co.il
Valinor SQL Server Complementary Tools
SQL diagnostic manager
SQLcompliance manager
SQLsafe backup
SQLchange manager
SQLdefrag manager
SQL comparison toolset
SQLadmin toolset
Valinor http://www.sqlserver.co.il
Agenda T-SQL enhancements MERGE Statement Table Valued Parameters Grouping Sets Compound Assignments Table value constructors New data types Filestream HierarchyID Temporal data types Spatial data types SSMS Enhancements Intellisense Multi-instances query Registered Servers properties SSRS Enhancements More new features Data compression Resource Governor
MERGE Statement What is it? MERGE allows you to compare two tables and apply changes to one of them according to matching and non-matching rows Allows multiple set operations in a single SQL statement Operations can be INSERT, UPDATE, DELETE ANSI SQL 2006 compliant - with extensions
What can I do with it? ETL processes “Update if exists, insert otherwise” stored procedures Data comparison, verification and modification MERGE Statement
MERGE Syntax MERGE [INTO] target_table AS target USING source_table AS source (table, view, derived table) ON <merge_search_conditions> WHEN MATCHED [AND <other predicates>] UPDATE SET target.col2 = source.col2 (or delete) WHEN NOT MATCHED [BY TARGET] [AND <other predicates>] INSERT [(col_list)] VALUES (col_list) WHEN NOT MATCHED BY SOURCE [AND <other predicates>] DELETE (or update) OUTPUT $action, inserted.col, deleted.col, source.col; Get used to semicolons!
MERGE Statement $action function in OUTPUT clause Multiple WHEN clauses possible For MATCHED and NOT MATCHED BY SOURCE Only one WHEN clause for NOT MATCHED Rows affected includes total rows affected by all clauses
MERGE Performance MERGE statement is transactional No explicit transaction required One pass through tables At most a full outer join Matching rows (inner join) = when matched Left-outer join rows = when not matched Right-outer join rows = when not matched by source When optimizing, optimize joins Index columns used in ON clause (if possible, unique index on source table)
MERGE and Determinism UPDATE using a JOIN is non-deterministic If more than one row in source matches ON clause, either/any row can be used for the UPDATE. MERGE is deterministic If more than one row in source matches ON clause, an exception is thrown.
MERGE & Triggers When there are triggers on the target table Trigger is raised per DML (insert/update/delete) “Inserted” and “deleted” tables are populated In much the same way, MERGE is treated by replication as a series of insert, update and delete commands.
Where is MERGE useful? Replace UPDATE … JOIN and DELETE … JOIN statements ETL Processes Set comparison Update-if-exists, insert-otherwise procedures IF EXISTS (SELECT … FROM tbl) UPDATE tbl … ELSE INSERT …
MERGE Statement
MERGE usage
Table-Valued Parameters Before SQL Server 2008 In order to pass a set of data to SQL Server: Comma-delimited strings XML BULK INSERT Why would I want to do that? Less database round-trip “Array” variable Example: Pass an order + its line details Table-valued parameters solve this problem
Table Types SQL Server has table variables DECLARE @t TABLE (id int); SQL Server 2008 adds strongly typed table variables CREATE TYPE mytab AS TABLE (id int); DECLARE @t mytab; Parameters must use strongly typed table variables
Table Variables are Input Only Declare and initialize TABLE variable DECLARE @t mytab; INSERT @t VALUES (1), (2), (3); EXEC myproc @t; Parameter must be declared READONLY CREATE PROCEDURE usetable ( @t mytabREADONLY ...) AS INSERT INTO lineitems SELECT * FROM @t; UPDATE @t SET... -- no!
TVP Implementation and Performance Table Variables materialized in TEMPDB Faster than parameter arrays, BCP APIs still fastest Duration (ms) Number of rows passed
Table-Valued Parameters
Table-Valued Parameters end-to-end
Grouping Sets GROUP BY Used to group rows by values in specified columns Aggregate data sum(), count(*), avg(), min(), max()…) Grouping Sets Why limit the query to a single grouping?
Grouping Sets GROUP BY Extensions GROUPING SETS Specifies multiple groupings of data in one query CUBE * All permutations in column set ROLLUP* Sub totals, super aggregates and grand total. * don’t confuse with old nonstandard WITH CUBE and WITH ROLLUP options
Grouping_ID() New function that computes the level of grouping Takes a set of columns as parameters and returns the grouping level Helps identify the grouping set that each result row belongs to. GROUPING(col) function returns 1 bit: 1 if the result is grouped by the column, 0 otherwise.
Grouping Sets
How to use Grouping Sets
Variable Initialization & Compound Assignment T-SQL# Small programming enhancements targeting more convenient and efficient development Variable Initialization DECLARE @i AS INT = 0 , @d AS DATETIME = CURRENT_TIMESTAMP , @j AS INT = (SELECT COUNT(*) FROM sysobjects); select @i,@d, @j;
Table Value Constructors Use the VALUES clause to construct a set of rows to insert multiple rows or as a derived tables INSERT INTO dbo.Customers(custid, companyname, phone, address) VALUES (1, 'cust 1', '(111) 111-1111', 'address 1'), (2, 'cust 2', '(222) 222-2222', 'address 2'), (3, 'cust 3', '(333) 333-3333', 'address 3'); SELECT * FROM ( VALUES (CAST('20090730' AS DATE),'TishaB''Av') ,(CAST('20090918' AS DATE),'Erev Rosh HaShana') ,(CAST('20090919' AS DATE),'Rosh HaShana') ) AS holidays (HolidayDate,description)
FILESTREAM Storage To Blob Or Not To Blob? Cons LOBS take memory buffers Updating LOBS causes fragmentation Poor streaming capabalities Pros Transactional consistency Point-in-time backup & restore Single storage and query vehicle ?
Dedicated BLOB Store Store BLOBs in Database Use File Servers Application Application Application BLOBs BLOBs BLOBs DB DB DB
Streaming Performance
Integrated management
Data-level consistency
Enterprise-scales only
Scalability & Expandability
Advantages
Complex application development & deployment
Separate data management
Integration with structured data
Poor data streaming support
File size limitations
Affects performance of structured data querying
Complex application development & deployment
Separate data management
Enterprise-scales only
Challenges
Windows File Servers
NetAppNetFiler
EMC Centera
Fujitsu Nearline
SQL Server VARBINARY(MAX)
Example Blob Storage Options
FILESTREAM combines the best of 2 worlds
Integrates DB engine with NTFS
Stores BLOB data as files
FILESTREAM Storage Application BLOBs FEATURES:
Uses NT cache for caching file data.
SQL buffer pool is not used and is available to query processing
Windows file system interface provides streaming access to data
Compressed volumes are supported
File access is part of DB transaction.
DB
It’s not only about storing but also about working with BLOBS:
Image analysis
Voice interpretation
Mixing satellite feeds & Spatial Data type for weather reports
and more…
FILESTREAM Storage
FILESTREAM Programming Dual Programming Model TSQL (Same as SQL BLOB) Win32 Streaming File IO APIs Begin a SQL Server Transaction Obtain a symbolic PATH NAME & TRANSACTION CONTEXT Open a handle using sqlncli10.dll - OpenSqlFilestream Use Handle Within System.IO Classes Commit Transaction
// 1. Start up a database transaction – SqlTransactiontxn = cxn.BeginTransaction(); // 2. Insert a row to create a handle for streaming. newSqlCommand("INSERT <Table> VALUES ( @mediaId, @fileName, @contentType);",cxn, txn); // 3. Get a filestreamPathName & transaction context. newSqlCommand("SELECT PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT() FROM <Table>", cxn, txn); // 4. Get a Win32 file handle using SQL Native Client call. SafeFileHandle handle = SqlNativeClient.OpenSqlFilestream(...); // 5. Open up a new stream to write the file to the blob. FileStreamdestBlob= newFileStream(handle, FileAccess.Write); // 6. Loop through source file and write to FileStream handle while ((bytesRead = sourceFile.Read(buffer, 0, buffer.Length)) > 0) {destBlob.Write(buffer, 0, bytesRead);} // 7. Commit transaction, cleanup connection. – txn.Commit(); FILESTREAM Programming
FILESTREAM Implementation Server & Instance level Enable filestream (in setup or configuration manager) Make sure port 445 (SMB) is open if remote access is used Exec sp_configure'filestream_access_level', [0/1/2] At Database Level Create a filestreamfilegroup & map to directory At Table Level Define VARBINARY(MAX) FILESTREAM column(s) Must have UNIQUEIDENTIFIER column (and file extension column if FTS is used)
FILESTREAM Implementation Integrated security ACLs (NT permissions) granted only to SQL Server service account. Permissions to access files implied by granting read/write permissions on FILESTREAM column in SQL Server. Naturally, only Windows authentication is supported. Integrated management BACKUP and RESTORE (for both database and log) also backup and restore FILESTREAM data. Note that in the FULL RECOVERY MODEL, “deleted” files are not deleted until log is backed up.
FILESTREAM Limitations Not supported Remote FILESTREAM storage Database snapshot and Mirroring Supported: Replication (with SQL Server 2008 subscribers) Log shipping (with SQL Server 2008 secondaries) SQL Server Express Edition (4gb size limit does not apply to FILESTREAM data container) Features not integrated SQL Encryption Table Value Parameters
Available to CLR clients as the Sqlhierarchyid data type
Each node holds its parent’s name/ID Pros:
Understandable
Managable
2005 – CTEs can beused for recursive queries
Cons:
Queries are more complex
Bad performance with large trees.
2005 Alternatives - Adjacency Model
2005 Alternatives – Path Enumeration Each node holds a path to the root, as a string concatenation Pros:
Logical representation
Easy tree traversal
Cons:
Difficult to maintain
Searches are done with string functions (LIKE, string split)
"Left“ and "Right“ columns represent edges Pros:
Easy to query with >, <, BETWEEN
Easy to index
Cons:
Difficult to maintain
(try to add another underboss…) 2005 Alternatives – Nested Sets
HierarchyID New data type in SQL Server 2008 Uses path enumeration, but in a much more efficient binary representation. Exposes methods to query the tree and set relations between nodes. Remember that it’s just data It’s up to the application to assign correct hierarchyID values to the nodes to represent the relations (using HierarchyID methods) It’s up to the developer/DBA to create a unique constraint on the hierarchyID column
HierarchyID Nodes are ordered Root: / First child: /1/ Second Child: /2/ First grandchild: /1/1/ Second: /1/2/ If you want to insert another grandchild between them - /1/1.1/
HierarchyID Methods hierarchyid::GetRoot() – get root of hierarchy tree Hierarchyid::Parse() – same as cast(@str as hierarchyid) node.ToString() - Logical string representation of node’s hID parent.GetDescendant(c1,c2)- returns a child node of parent between c1 and c2 node.GetAncestor(n) - hierarchyid of the nth ancestor of node node.IsDescendantOf(pID) - true if node is a descendant of pID node.GetLevel() – integer representing depth of node node.GetReparentedValue(oldRoot,newRoot) – get path to newRoot for node who is descendant of oldRoot (use to move subt-trees within tree)
Depth-first Index Breadth-first Index Order by level, then path Useful for querying immediate children Ordered by path Useful for querying sub-trees e.g. all files in a subfolder HierarchyID Indexes
Insert Root Insert 1st Subordinate Insert rest of tree Query Hierarchical Data Demo Structure
HierarchyID
HierarchyID Basics
Trees and Hierarchies
Temporal Data Types Prior to SQL Server 2008 DATETIME Range: 01-01-1753 to 31-12-9999 Accuracy: Rounded increments of 3.333ms Storage: 8 bytes SMALLDATETIME Range: 01-01-1900 to 06-06-2079 Accuracy: 1 minute Storage: 4 bytes
Date/Time Types SQL Server 2008 Save date and time in separate columns. Query all logins that occurred 18:00-7:00 during weekdays Higher precision Scientific data Timezone offset Global applications
DATE and TIME DATE Data Type - Date Only 01-01-0001 to 31-12-9999 Gregorian Calendar Takes only 3 bytes TIME Data Type - Time Only Variable Precision - 0 to 7 decimal places for seconds Up to 100 nanoseconds Takes 3-5 bytes, depending on precision
DATETIME2 and DATETIMEOFFSET DATETIME2 Data Type 01-01-0001 to 31-12-9999 Gregorian Calendar Variable Precision - to 100 nanoseconds Takes 6-8 bytes (same or less than DATETIME!) DATETIMEOFFSET 01-01-0001 to 31-12-9999 Gregorian Calendar Variable Precision - to 100 nanoseconds Time Zone Offset (From UTCTime) Preserved No Daylight Saving Time Support Takes 8-10 bytes
Date/Time Types Compatibility New Data Types Use Same T-SQL Functions DATENAME (datepart, date) DATEPART (datepart,date) DATEDIFF (datepart, startdate, enddate) DATEADD (datepart, number, date) Datepart can also be microsecond, nanosecond, TZoffset MONTH DAY YEAR CONVERT
Date Time Library Extensions Current date/time in higher precision SYSDATETIME SYSUTCDATETIME SYSDATETIMEOFFSET Original date/time uses GETDATE, GETUTCDATE, CURRENT_TIMESTAMP ISDATE(datetime/smalldatetime) Special functions for DATETIMEOFFSET SWITCHOFFSET(datetimeoffset, timezone) TODATETIMEOFFSET(datetime, timezone)
Temporal Data Types
New date and time features
Spatial Data Type Spatial data provides answers to location-based queries Which roads intersect the Microsoft campus? Does my land claim overlap yours? List all of the Italian restaurants within 5 kilometers Spatial data is part of almost every database If your database includes an address
Spatial Data Type Represent geometric data: Point Lines Polygons Multi-point/line/polygon
Spatial Data Type SQL Server supports two spatial data types GEOMETRY - flat earth model GEOGRAPHY - round earth model Both types support all of the instanciable OGC types InstanceOf method can distinguish between them
Spatial Data Type - Input Spatial data is stored in a proprietary binary format Instance of the type can be NULL Can be input as Well Known binary - ST[Type]FromWKB Well Known text - ST[Type]FromText Geography Markup Language (GML) - GeomFromGml Can also use SQLCLR functions Parse Point - extension function Input from SQLCLR Type - SqlGeometry, SqlGeography
Spatial Data Type - Output Spatial Data Can Be Output As Well Known binary - STAsBinary Well Known text - STAsText GML - AsGml Text with Z and M values - AsTextZM SQLCLR standard method ToString - returns Well Known text As SQLCLR object - SqlGeometry, SqlGeography Other useful formats are GeoRSS, KML Not Directly Supported
Useful Spatial Methods & Properties
Spatial Data Types
New spatial features
Management Studio Enhancements Intellisense Finally built in! Only available when querying 2008 instances Word Completion Quick Info Syntax errors Doesn’t work in SQLCMD mode! Can be turned off Code Collapse&Expand
Management Studio Enhancements Multi Instance Queries Quickly run SQL statements on multiple instances Downside if you wish to insert the dataset to a single table, you better find some other tool. Useful for ad-hoc queries, not more than that… Custom status bar color (per server) Quick trace New Activity Monitor
Reporting Services 2008 Enhancements
Architecture
Reporting engine no longer requires IIS Better memory management
Reacts better to memory pressure
Better Processing
On-demand processing redesigned for scalability
Page to page response time is constant
Processing ImprovementsBenefits
Tablix Control
What is it?
Best of both table data-region and matrix data-region Allows fixed and dynamic columns and rows Enables Arbitrary nesting on each axis Enables multiple parallel row/column members at each level Introduces optional omission of row/column headers
Parallel Dynamic Groups 2005 2008
Mixed Dynamic & Static columns 2005 2008
Enriched Visualizations – Dundas Charts
Enriched Visualizations – Dundas Gauges
Export to word
Report Builder 2
Reporting Services 2008 Usability features
Why?
Cost of storage rises with database size
High-end disks are expensive
Multiple copies required – test, HA, backups
Cost of managing storage rises with database size
Time taken for backups and maintenance operations (IO bound)
Time taken to restore backups in a disaster
Migration from other platforms (Oracle or DB2) that support compression is not possible
Compressing data leads to better memory utilization
SQL Server needs data compression!!
(But only in Enterprise edition…) Data Compression
Data Compression Before SQL Server 2008
(n)varchar, varbinary – no trailing spaces or zeroes saved (unlike char, binary)
SQL Server 2005 SP2 introduces vardecimal
Same solution – variable length column, empty bytes are not stored
But on a smaller scale:
(n)char/binary’s maximum capacity is 8000 bytes.
Decimal’s maximum capacity is 17 bytes (percision 29-38)
(but if you have millions of decimal cells in a fact table, you can benefit from compression)
Data Compression New in SQL Server 2008
Row Compression
Compresses fixed-length data types by turning them into variable-length (a step up from vardecimal).
Row meta data (row header, null bitmap) is also saved in a new variable-length format.
Page Compression
Row compression
Prefix compression
Dictionary compression
Row Compression Row compression CREATE TABLE Countries (ID INT, Name CHAR(50))
Row Compression Row compression CREATE TABLE Countries (ID INT, Name CHAR(50)) 4 bytes 50 bytes (4b + 50b) * 3 rows = 162 bytes (not including row overhead)
Row Compression Row compression CREATE TABLE Countries (ID INT, Name CHAR(50)) 50 bytes 4 bytes Row Compression 1 byte 7 bytes 1 byte 11 bytes 1 byte 6 bytes
Row Compression Row compression CREATE TABLE Countries (ID INT, Name CHAR(50)) 1 byte 7 bytes 1 byte 11 bytes 1 byte 6 bytes 162 bytes reduced to: 1+7+1+11+1+6 = 27 bytes (not including row overhead and 4 bits per column for offsets)
The whole page is scanned looking for common values, which are stored in the CI area on the page.
The in-row values are replaced with pointers to the CI area.
Dictionary compression
Estimating space savings EXEC sp_estimate_data_compression_savings
Returns current size and estimated compressed size for a table, an index or a partition of them, according to selected compression type (row or page).
It creates a sampled subset of the data in tempdb and compresses it using the requested compression mechanism to get the estimate.
Enabling and Disabling Data Compression
Data compression can be set on heaps, clustered indexes, non-clustered indexes and their partitions (including indexes on views).
Setting data compression on a table (CREATE/ALTER TABLE ) only affects the heap or the clustered index.
Data compression has to be set for each non-clustered index individually.
Enabling and Disabling Data Compression
Syntax:
CREATE/ALTER TABLE … REBUILD WITH ([PARTITION = ALL/partition_number,] DATA_COMPRESSION = NONE/ROW/PAGE) CREATE/ALTER INDEX … REBUILD WITH (DATA_COMPRESSION = NONE/ROW/PAGE [ON PARTITIONS (partition_number/range])
When is data compressed? With ROW compression, compression occurs row by row, whenever a row is inserted or updated. With PAGE compression, it gets complicated. In heaps, PAGE compression only occurs in the following ways: On table REBUILD When data is inserted with BULK INSERT When data is inserted with INSERT INTO … WITH(TABLOCK)
When is data compressed? In indexes, pages are only ROW compressed until they fill up. When the next insert occurs, it triggers PAGE compression. If there’s space left for the new row after PAGE compression, the row is compressed and inserted into the newly compressed page. If there’s no space, the page is not compressed and the row will be inserted on a new page
Data Compression: Limitations Data compression is Enterprise Edition only Data compression cannot be used in conjunction with sparse columns Data compression doesn’t increase the capacity per row. You can’t insert more data per row with data compression. This ensures that disabling compression will always succeed
Data Compression: Limitations Non-leaf level pages in indexes are only compressed using ROW compression Heap pages are not PAGE compressed during regular DML LOB values out-of-row are not compressed Data exported using BCP is always uncompressed. Data imported using BCP will be compressed causing increased CPU usage.
Estimate Space Savings
Compress Index and heap
Compare Performance
Query DMVs
Data Compression
Data Compression Performance Generalization No noticeable impact on INDEX SEEKs Noticeable impact on large data modification operations. Lower IO, higher CPU on index/table scans. Duration depends on overall system configuration. Data compression should be tested thoroughly before setting up in production
So when should I use it? Read-intensive databases with low CPU usage Systems with small Buffer cache (relative to data size) Databases queried for few rows at a time (i.e. index seeks). There’s always a trade-off. Remember the equation: Data compression = Less IO = Smaller memory footprint But also Data compression = Higher CPU usage
Data Compression - Numbers
Typical results: your mileage may vary
Business cost reduction: SQL Server 2005 Resource management Backup OLTP Activity Admin Tasks Executive Reports Ad-hoc Reports Workloads SQL Server
Single resource pool
No workload differentiation
Memory, CPU, Threads, … Resources
Executive Reports Backup OLTP Activity Admin Tasks Ad-hoc Reports High OLTP Workload Admin Workload Report Workload Min Memory 10% Max Memory 20% Max CPU 20% Max CPU 90% Application Pool Admin Pool Business cost reduction: SQL Server 2008 Resource management SQL Server
Helps differentiate workloads (e.g. by application name/user name)
What else is new? SQL Server Change Tracking Synchronized Programming Model SQL Server Conflict Detection FILESTREAM data type Integrated Full Text Search Sparse Columns Large User Defined Types Date/Time Data Type SPATIAL data type Virtual Earth Integration Partitioned Table Parallelism Query Optimizations Persistent Lookups Change Data Capture Policy Based Management Backup Compression MERGE SQL Statement Data Profiling Star Join Optimization Enterprise Reporting Engine Internet Report Deployment Block Computations Scale out Analysis BI Platform Management Export to Word and Excel Author reports in Word and Excel Report Builder Enhancements TABLIX Rich Formatted Data Personalized Perspectives Filtered Indexes Filtered Statistics … and many more Transparent Data Encryption External Key Management Data Auditing Pluggable CPU Transparent Failover for Database Mirroring Declarative Management Framework Server Group Management Streamlined Installation Enterprise System Management Performance Data Collection System Analysis Data Compression Query Optimization Modes Resource Governor Entity Data Model LINQ Visual Entity Designer Entity Aware Adapters
Thank you! Any questions?
References SQL Server 2008 Developer Training Kit Introduction to New T-SQL Programmability Features in SQL Server 2008 / Itzik Ben Gan Using The Resource Governor / Aaron Bertrand Reporting Services in SQL Server 2008 / Ann Weber Data Compression: Strategy, Capacity Planning and Best Practices / Sanjay Mishra
0 comments
Post a comment