This document discusses new capabilities in SQL Server 2012 for managing both structured and unstructured data. It notes challenges with building applications that use different data formats. SQL Server 2012 aims to reduce costs and simplify development by providing common application models, constructs and services for all types of data. It allows for storage and querying of various data formats natively and consistently. The document outlines new programmability options and rich services for search, spatial data, XML and more. It also shows how SQL Server 2012 provides efficient storage, high throughput access and integrated administration for all data.
3. Building and Maintaining Applications with
relational and non-relational data is hard
Pain Complex integration
Duplicated functionality
Points Compensation for unavailable services
Reduce the cost of managing all data
Simplify the development of applications
Goals over all data
Provide management and programming
services for all data
4. Tables, XML, Spatial, Documents, Digital Media, Scientific
Records, Factoids…
Data formats and content natively understood for rich application and
user experience
Consistent Application Model and Data Constructs to ease application
development, migration and long-term retention
Provide rich services, e.g.,
5. Programmability
T-SQL
Query
Structured
Data
B-trees
Manageability
Availability
Files
6. Programmability
T-SQL
Query Search
Structured Unstructured
Data Data
B-trees
Manageability
Availability
Files
7. Programmability
Spatial, XML,
T-SQL/Data Types HierarchyID
Win 32
Query and XQuery
Search
Type Operations Spatial ops
Semi-
Structured Unstructured
structured
Data Data
Data/XML
XML, FTS, Spatial
B-trees Indices
Filestrea
Manageability m
Availability
Files
8. Rich Data Programmability
Programming
Spatial, XML,
Capabilities T-SQL/Data Types HierarchyID
Win 32
Rich Query and Query and Type
Search
Search Services Operations
XQuery
over all Data Spatial ops Semantic
Platform
Efficient Storage Structured Semi-structured Unstructured
for BR Data Data Data/XML Data
XML, FTS, Spatial
B-trees Indices
Filestream
Manageability& Availability
Files
9.
10.
11. Transactional Access Streaming Win32 Access
Streaming Win32 Access??
Database Applications Windows Apps SQL Apps
Blobs SMB Share FileStream
Files/Folders API
Rich Services
Fulltext Search Database
Solutions
Scale-up
Semantic Similarity Disk Disk Disk
FileTable
1 2 3
FileStreams
Search
Multiple Containers
Integrated Administration?
Integrated Administration Remote BLOB Storage
Customer Application
SQL RBS API
D
D Centera SQL
B FileStre Azure lib lib FILESTREAM lib
B FileStreams
Integrated Azure Centera SQL DB
Backup/Replication/AlwaysOn
22. Queries over 350M documents database and random DMLs running in background.
Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
23. 2005/8 vs 2012
2005/8
2012
Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer
playback benchmark
27. C
B D A B A B
D A
Primary Filter Secondary Filter
E (Index lookup) (Original predicate)
In general, split predicates in two
Primary filter finds all candidates, possibly
with false positives (but never false negatives)
Secondary filter removes false positives
The index provides our primary filter
Original predicate is our secondary filter
Some tweaks to this scheme
Sometimes possible to skip secondary filter
32. Optimal value (theoretical) is
somewhere between two extremes
Default values: Time needed to
512 - Geometry AUTO grid process false positives
768 - Geography AUTO grid
1024 - SELECT * FROM table t WITH
MANUAL grids (SPATIAL_WINDOW_MAX_CELLS=256)
WHERE t.geom.STIntersects(@window)=1;
33.
34.
35. CREATE SPATIAL INDEX idxGeog
ON table(geography column)
USING GEOGRAPHY_GRID
WITH (
DATA_COMPRESSION = page | row
);
On the basis of internal tests, with compression
- 40%-50% smaller
- 20% faster -15% slower queries
- Per partition compression setting is not supported.
36.
37. Give me the closest 5 Italian restaurants
SQL Server 2008/2008 R2: table scan
SQL Server 2012: uses spatial index
SELECT TOP(5) *
FROM Restaurants r
WHERE r.type = ‘Italian’
AND r.pos.STDistance(@me) IS NOT NULL
ORDER BY r.pos.STDistance(@me)
38.
39. Find the closest 50 business points to a specific location (out of 22 million in total)
Let’s take a look at a BR application. What services does it provide. What about having these services supported in the database instead of each application building their own?
Examples: Manage an application that manages images in the file system and additional information in the databaseBuilding a spatial database application before SQL Server 2008Example services: Backup/restore, search over relational and non-relational data
Pure relational database system.
SQL Server 7.0: Added FT Search over unstructured data
SQL 2000: Starting to add XML supportSQL 2005: XML datatype, XQuery, XML IndicesSQL 2008: Spatialdatatype and ops, Spatial Indexing, Filestream with Win 32 (but requires special library to open/create), integrated FTS Filestream requires NTFS
As of SQL Server 2012:Exposing Win 32 natively through FileTableAddition of Semantic Platform to enable Semantic search (and eventually – post Denali - query)Efficient Storage: building on existing relational storage and indexing infrastructure and backup/restore/HA. Bring SQL Server’s superior TCO to BR data and assures efficient and safe storage of customer’s high-value dataRich Capabilities: Necessary (but not sufficent) programmability experience to move customers to entrust their high-value data to SQL with minimal migration pains and access it via their favorite programming model/API.Rich Services: Provide high-value services to unlock information in all data in a highly scalable way. Entices customers to move their high-value data into SQL to discover information fast. Provides platform stickiness and differentiation.
Focus in SQL Server 2012 in priority order:Capabilities and rich services for unstructured dataSpatial platformSustain existing BR supportToolingPerformance & ScaleOrthogonalityLarge new Features
Focus in SQL Server 2012 in priority order:Capabilities and rich services for unstructured dataSpatial platformSustain existing BR supportToolingPerformance & ScaleOrthogonalityLarge new Features
SQL 2008 provides Filestreams as a way add large blobs/unstructured data streams into SQL and still be able to open a Win32 handle (using SQL API) and provide high streaming performance for the data Win32 Namespace support in SQL Server 2012 has the following goals Reduce the barrier to entry for customers who have data in file servers and have Win32 applications that work on these currently. By enabling Win32 namespace, SQL will generate Windows Share that can be exposed to existing Win32 applications similar to any file server shares. This can allow Win32 applications/mid tier servers (like IIS) to work with this data without having to understand the database/transaction semantics Single integrated set of Admin tools – SQL backup/restore, Replication, HA solutions etc Scale up – Add multiple disks on a machine for storing Filestream data. Use SQL services like Full text search for both FileStream and relational metadata, Property Promotion Infrastructure fro extracting interesting properties from SQL blobs/filestream to surface as relational columns for query
Reading bigger buffers gives better performance FS volumeDedicated volumes means volumes not used for tempdb (non-OS, paging, SQL data & log volumes)If stored files are large as we generally recommend, format with 64K clustersDo compress filestream volumes or filestream containers, but ONLY if data to be stored is compressible. Note that in this case NTFS cluster size must be 4K.1 vol per container => enables space management at volume level.AV should be configured not to delete infected files but to quarantine them. Otherwise corruption will be reported.SMBWith 60KB: A read can happen in one single IO and ideally coming back in one single TCP-IP packet. It is not 64K because 64KB data can't fit in one single TCP/IP buffer.Partitioning:FILESTREAM columns require the presence of the ROWGUID unique index for aligned partitioning, or in case this is not possible, explicitly specifying the data placement option for the unique or primary key constraint on the ROWGUID column.
customer lab testing with 220 MB video files. FS win32 reads video streaming performance.FILESTREAM best practices.
Stats on inserts followed by reads.8.3 etc…
Optimized hot paths, removed unnecessary serialization, expensive FileSystem operations etc
Focus in SQL Server 2012 in priority order:Capabilities and rich services for unstructured dataSpatial platformSustain existing BR supportToolingPerformance & ScaleOrthogonalityLarge new Features
TB
Experimentation: For instance, consider this dataset: US Highways. In this dataset some of the LineStrings are quite long (over 2000 miles) and others are quite short (400 meters or less). For optimal performance, the following two indexes were roughly equivalent:Geography Index: MEDIUM, MEDIUM, MEDIUM, MEDIUM 1024Geometry Index: LOW, LOW, LOW, LOW 1024