Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
1. SQL Server Workshop
for Developers
Andrew Brust Leonard Lobel
CEO and Founder CTO, Sleek Technologies
Blue Badge Insights Principal Consultant, Tallan
Level: Intermediate
May 14, 2012
2. Meet Andrew
• CEO and Founder, Blue Badge Insights
• CTO, Tallan, Inc.
• Member, Microsoft BI Partner Advisory Council
• Microsoft Regional Director, MVP
• Co-chair VSLive! and over 15 years as a speaker
• Founder, Microsoft BI User Group of NYC
– http://www.msbinyc.com
• Co-moderator, NYC .NET Developers Group
– http://www.nycdotnetdev.com
• “Redmond Review” columnist for
Visual Studio Magazine and Redmond Developer News
• brustblog.com, Twitter: @andrewbrust
6. Agenda
• Part I
– Overview
• Part II
– T-SQL Enhancements
• Part III
– Business Intelligence
• Part IV
– Beyond Relational
• Demos, demos, demos!
–http://sdrv.ms/VSLiveNY2012SQL
9. SQL Server Versions
• SQL Server 2012 (Version 11)
– Newest version of SQL Server
– Released March 7, 2012
• SQL Server 2008 R2 (Version 10.5)
– Adds powerful BI features to SQL Server 2008
– Released April 21, 2010
• SQL Server 2008 (Version 10)
– Released August 6, 2008
• SQL Server 2005
– Revolutionary upgrade from SQL Server 2000
– Integration of multiple services (SSRS, SSAS, SSIS) with the RDBMS
• Before SQL Server 2005…
– The relational database engine was the product
– Added value features (Reporting, OLAP, DTS) through a patchwork of
optional add-ons
10. SQL Server Versions
• SQL Server 2012 (Version 11)
– Newest version of SQL Server
– Released March 7, 2012
• SQL Server 2008 R2 (Version 10.5)
– Adds powerful BI features to SQL Server 2008
– Released April 21, 2010
• SQL Server 2008 (Version 10)
– Released August 6, 2008
• SQL Server 2005
– Revolutionary upgrade from SQL Server 2000
– Integration of multiple services (SSRS, SSAS, SSIS) with the RDBMS
• Before SQL Server 2005…
– The relational database engine was the product
– Added value features (Reporting, OLAP, DTS) through a patchwork of
optional add-ons
11. Introducing SQL Server 2012
• The latest major release of SQL Server
• Mission Critical Platform
– HADR (High-Availability Disaster Recovery)
• IT and Developer Productivity
– SQL Server Data Tools (SSDT)
– T-SQL enhancements
– Beyond Relational enhancements (FileTable, FTS improvements)
– Geospatial improvements (circular data, full globe, performance)
• Pervasive Insight
– Columnstore Indexing (xVelocity)
– BI Semantic Model (PowerPivot technology) comes to Analysis
Services
– Real ad hoc reporting and data visualization in Power View
– Data lineage and data quality (DQS)
12. BI Foundation
• Stack Review
– The MS BI Stack
– The SQL Server 2008 R2 “sub-stack”
– New SQL Server 2012 Components
• Analysis Services and OLAP
– Dimensional Concepts
– Analysis Services cube design
– Overview of ADO MD.NET and other APIs
13. BI Delivery
• Presentation Layer
– Excel BI
– PowerPivot and Excel Services
Including new PowerPivot features in SQL Server 2012
– Analysis Services tabular databases (SQL Server 2012)
– Power View (SQL Server 2012)
– Reporting Services and Report Builder
– PerformancePoint Services (brief)
• Overview: Other New R2/2012 Components
– Master Data Services, Data Quality Services
– StreamInsight
14. Introducing SQL Server Data Tools
• SSDT – Code-named “Juneau”
– Next generation IDE for SQL Server development
– Declarative, model-based development in connected, offline, and
cloud scenarios
– New project type: SQL Server Database Project
– Visual Studio shell, solutions, projects + (partial) SSMS/Azure
functionality + full BIDS
• Not intended to replace SSMS
– SSMS still the primary dba tool for managing SQL Server
• Intended to replace VS DbPro (aka “data dude”)
– But not ready to, as it still lacks total feature parity
– Missing data generation, data compare, unit testing
• Separate Web Platform Installer Download
– Updates to ship out-of-band with VS and SQL Server releases
15. Declarative Model-Based Development
SQL Server Data Tools (SSDT)
Offline
Dev/Test
On-Premise Cloud
DataCenter Database Model
SQL Server Database Project
SQL Server 2005, Local Database
2008, 2008 R2, Runtime SQL Azure
2012 (LocalDB)
Database Snapshot File (.dacpac)
Version
History
16. SSDT Projects
SSDT
Relational Analysis Reporting Integration
Engine Services Services Services
SQL Server Static Analysis Database T-SQL Language Power Buffer
Object Explorer and Validation Publish Services Editing
Schema Local Database T-SQL
Table Designer SQL CLR
Compare Runtime Debugging
19. Table-Valued Parameters
• Process a set of rows as a single entity
– Similar to temp tables, table variables and CTEs
– Example: INSERT an entire order (header & details) with only two
parameters and one stored procedure call
• Populate a TVP, and then pass it around
– It’s a single parameter
– Pass from procedure to procedure on the server
– Pass from client to server across the network
• Based on User-Defined Table Types
– Defines the schema, just like an ordinary table
– Simply declare a variable as the type to get a TVP
• Stored in tempdb
– Created and destroyed automatically behind the scenes
– Can be indexed
20. Creating a User-Defined Table Type
CREATE TYPE CustomerUdt AS TABLE
(Id int,
CustomerName nvarchar(50),
PostalCode nvarchar(50))
DECLARE @BestCustomers AS CustomerUdt
22. TVP Limitations
• TVPs are read-only, once populated and passed
– You must apply the READONLY keyword when declaring
TVPs in stored procedure signatures
– OUTPUT keyword cannot be used
– You cannot update, insert or delete
• No ALTER TABLE…AS TYPE statement
– To change the schema, it must be dropped and re-created
– All dependent objects must also be dropped and re-created
• Statistics are not maintained on TVPs
24. MERGE
• Four statements in one
– SELECT
– INSERT
– UPDATE
– DELETE
• And even more…
– OUTPUT clause
– INSERT OVER DML
• Operates on a join
– Between source and target
– Type of join based on merge clause
• Start using it now
– 100% compatible with existing business logic
– Existing triggers continue to work
25. MERGE Syntax
MERGE target
USING source
ON join
WHEN MATCHED
UPDATE | DELETE
WHEN NOT MATCHED [BY TARGET]
INSERT
WHEN NOT MATCHED BY SOURCE
UPDATE | DELETE
;
28. DML Output
• INSERT, UPDATE, DELETE, and MERGE all
support the OUTPUT clause
– Captures before-and-after snapshots of modified data via
INSERTED and DELETED pseudo-tables (just like triggers)
– MERGE adds $action virtual column (returning INSERT,
UPDATE or DELETE)
• OUTPUT INTO can capture the change data to a
table or table variable
– Suffers from one limitation – no filtering
– Solution – use INSERT OVER DML
29. INSERT OVER DML Syntax
INSERT INTO target(columns)
SELECT columns FROM
(DML statement with OUTPUT)
CHANGES(columns)
WHERE filter
31. Improved Date and Time Support
• Start using these four data types
– date
– time
– datetime2
– datetimeoffset
• Stop using these two data types
– datetime
– smalldatetime
• Enhanced storage, portability and functionality
– Greater range and precision
– More efficient compacted storage
– Time zone awareness
– All traditional functions support the new types
– ODBC, OLE-DB, ADO.NET, SSIS, SSAS, SSRS, Replication
32. Separate Date and Time Types
• Only use what you need
– Eliminate extraneous storage when only date or time is needed
– Better performance for date-only manipulations and
calculations
• For example,
DECLARE @DOB AS date
DECLARE @MedsAt AS time(0)
33. More Portable Dates and Times
• Value ranges align with .NET and Windows
• Date Values
– From 1/1/0001 (DateTime.MinValue)
– Through 12/31/9999 (DateTime.MaxValue)
– Legacy datetime type limited to 1/1/1753 through 12/31/9999
• Time Values
– Up to 100ns (10-millionth of a second) precision
– datetime accurate only to roughly hundredth of millisecond,
smalldatetime to the minute
34. Time Zone Awareness
• datetimeoffset type
– Same range and precision as datetime2
• Includes the time zone
– Stores an offset ranging from -14:00 to +14:00
– Does not support DST (daylight savings time)
• Store local date/time in different regions of the world
– Values appear to go in and out as local dates/times
• Internally stored in UTC
– Automatically converted and treated as UTC for comparisons,
sorting and indexing
• You append the time zone…
– …and SQL Server handles the conversions to and from UTC under
the covers automatically
35. Date/Time Accuracy and Storage
• Date values compacted into 3 bytes
– One byte less than 4-byte date portion of datetime
– Greater range in less space!
• Time values consume 5 bytes at most
– Supports 100-ns accuracy
• Pay less for less
– Reduced storage for times that don’t require high accuracy
– Specify a scale parameter (0-7) on time, datetime2 or datetimeoffset
types
– 0 = No decimal precision in 3 bytes
– 7 = Greatest decimal precision (100-ns) in 5 bytes
– Example: DECLARE @FeedingTime time(0)
– Differing scales compatible for comparison
37. T-SQL Enhancements
(SQL Server 2012)
• Windowing (OVER Clause) Enhancements
• New T-SQL Functions in SQL Server 2012
• The THROW Statement
• Server-Side Paging
• The SEQUENCE Object
• Metadata Discovery
• Contained Databases
38. Introducing OVER
• OVER Clause
– Exposes a window over the result set for each row
– Added in SQL Server 2005, along with the ranking
functions ROW_NUMBER, RANK, DENSE_RANK,
NTILE
• Can also be used with aggregate functions
– SUM, COUNT, MIN, MAX, AVG
– Doesn’t require GROUP BY
42. Window Partitioning and Ordering
• OVER with PARTITION BY
– Optionally groups the result set into multiple windows
• OVER with ORDER BY
– Specifies the row sequence in each window
– Required for the ranking functions
– Not previously supported for the aggregate functions
• Now in SQL Server 2012
– OVER with ORDER BY now supported with aggregate
functions
– Window “framing” with ROWS and RANGE
– Eight new analytic windowing functions
44. New T-SQL Analytic Functions
• FIRST_VALUE
– Returns a column value from the first row of the window
• LAST_VALUE
– Returns a column value from the last row of the window
• LAG
– Returns a column value from a previous row of the window
• LEAD
– Returns a column value from a subsequent row of the window
• PERCENT_RANK
– Calculate percentile as (RANK – 1) / (N – 1)
• CUME_DIST
– Calculate percentile as RANK / N
• PERCENTILE_DISC
– Returns a discreet column value at the specified percentile
• PERCENTILE_CONT
– Returns a value based on the scale of column values at the
specified percentile
47. The THROW Statement
• SQL Server 2005 added TRY/CATCH
– Borrowed from .NET’s try/catch model
– Vast improvement over repeatedly testing @@ERROR
– Still used RAISERROR for generating errors
• SQL Server 2012 adds THROW
– Recommended alternative way to generate your own errors
– Does not entirely replace RAISERROR
• Two usages for THROW
– With error code, description, and state parameters (like
RAISERROR)
– Inside a CATCH block with no parameters (re-throw)
48. THROW vs. RAISERROR
THROW RAISERROR
Can only generate user exceptions Can generate user (>= 50000) and
(unless re-throwing in CATCH block) system (< 50000) exceptions
Supplies ad-hoc text; doesn’t utilize Requires user messages defined in
sys.messages sys.messages (except for code 50000)
Doesn’t support token substitutions Supports token substitutions
Always uses severity level 16 (unless Can set any severity level
re-throwing in a CATCH block)
Can re-throw original exception caught Always generates a new exception; the
in the TRY block original exception is lost to the client
Error messages are buffered, and don’t Supports WITH NOWAIT to immediate
appear in real-time flush buffered output on error
50. Server-Side Paging
• New result paging keywords
• Append to ORDER BY clause
• Limits the number of rows returned
• OFFSET @start ROWS
• The first result row to return (zero-based)
• FETCH NEXT @count ROWS
– The number of rows to return
52. Sequences
• Sequential Number Generator
– As found in Oracle and DB2
– Alternative to using IDENTITY for assigning new primary keys
• Advantages over IDENTITY
– SET IDENTITY INSERT ON/OFF not needed for inserts
– Can obtain next sequence value without performing an insert
• Create a Sequence Object
CREATE SEQUENCE MySequence
START WITH 1
INCREMENT 1
MINVALUE 1
NO MAXVALUE
• Retrieve Sequence Values
INSERT INTO MyTable(Id, ...)
VALUES(NEXT VALUE FOR MySequence, ...)
54. Metadata Discovery
• New system stored procedures and data
management views
• sys.sp_describe_first_result_set
– Analyzes a T-SQL statement and returns information describing the
schema of the statement’s result set
• sys.dm_exec_describe_first_result_set
– Similar, but implemented as a table-valued function to support
filtering
• sys.dm_exec_describe_first_result_set_for_object
– Similar, but accepts an OBJECT_ID of a T-SQL object in the
database to be analyzed, rather than a T-SQL statement
• sys.sp_describe_undeclared_parameters
– Analyzes a T-SQL statement and returns information describing the
parameter(s) required by the statement
56. Contained Databases
• Databases are not entirely self-contained
– They have external instance-level dependencies
– Logins, collations, tempdb, linked servers, endpoints,
etc.
• SQL Server 2012 provides partial
containment
– Makes databases more portable
– Enables (but does not enforce) containment
• Can store logins (with passwords) in the
database
– Users authenticate directly at the database level
• sys.dm_db_contained_entities
– Discovers threats to containment
60. Microsoft Business Intelligence
Business User Experience Familiar User Experience
Self-Service access & insight
Data exploration & analysis
Business Collaboration
Predictive analysis
Platform
Data visualization
Dashboards & Scorecards
Business Collaboration Platform Contextual visualization
Excel Services
Databased forms &
Web Infrastructure
workflow
and BI Platform
Collaboration
Analysis Services
Search
Reporting Services
Information Platform
Content Management
Integration Services
LOB data integration
Master Data Services
Data Mining
Data Warehousing
62. But Wait, There’s More!
• R2: PowerPivot
• R2: Report Parts in SSRS
• 2012: Analysis Services Tabular mode
– And corresponding improvements in PowerPivot
• 2012: Power View
• 2012: Data Quality Services
• How to get through it all? Here’s the
menu…
63. The Appetizer
• Learn Data Warehousing/BI terms and
concepts.
64. The Main Course
• Build a multidimensional cube
– Query in Excel
• Build a PowerPivot model
– Query that from Excel too.
– Publish to SharePoint
• Upsize the PowerPivot model to SSAS
tabular model
– Add advanced features
– Query from Excel
• Analyze tabular model from Power View
65. Dessert
• Reporting Services
– Report Builder and Report Parts
• PerformancePoint Services
• Overviews of
– Data Quality Services
– Master Data Services
– StreamInsight
70. Star Schemas Country
• Physical data model
• Central fact table
• Multiple dimension
Shipper Year
tables
– Used to constrain fact table
Total
queries Sales
Sales
Product
Person
71. Example Data Request
• Get Total Sales By State, By Month for a
Calendar Year For Country = USA and
Calendar Year = 1996
73. Data Migration
Data
• Transactions Warehouse • Multi-
• Process • Data dimensional
• Relationships • Hierarchical
• Analysis
Transaction OLAP
Database Database
74. SQL Server Analysis Services
• Built for analysis
• It is free with SQL Server
• And you can use the Microsoft stack that you
know and love
75. From Data Warehouse to OLAP
Dimensions
• Measure
Dimension
•Aggregations
– Can have Hierarchies Fact Table
• Cube
Measures
76. Building OLAP Cube With BIDS
• Business Intelligence Development Studio
– AKA Visual Studio
• Business Intelligence Projects
– Analysis Services Project Type
Add Data Source
Add Data Source View
Add Cube
Add Dimensions
Add Measures
Deploy the Cube
80. SSAS Interfaces
C++ App VB App .NET App Any App
OLEDB for OLAP/DM ADO/DSO ADOMD.NET AMO Any Platform, Any Device
WAN
XMLA XMLA
Over TCP/IP Over HTTP
Analysis Server (msmdsrv.exe)
OLAP Data Mining
Server ADOMD.NET DM Interfaces
.NET Stored Procedures Microsoft Algorithms Third Party Algorithms
83. Presenting Your Cube
Excel Services
PerformancePoint Services
Reporting Services
Excel
SQL
Server
Oracle
DB2
Tera-
Data
84. The SSAS/Excel/SharePoint Loop
Build models
with SSAS
Multidim’l,
Tabular or
PowerPivot
Visualize + Query from
Analyze with Excel
SSRS/ PivotTables and
Excel Services/
PerformancePoint
Charts
Services
OR Power View
Publish to
SharePoint
(via Excel
Services) and
query in the
browser
86. PivotStuff
• PivotTable, and linked charts (sometimes
referred to as PivotCharts) work extremely well
with OLAP cubes
• How to create a PivotTable:
– Ribbon’s Data tab (From Other Sources button/From Analysis
Services option or Existing Connections button)
– Insert tab (PivotTable “split button”)
• How to insert a chart
– PivotChart button on PivotTable Tools/Options tab
– Several others
88. Formula Language CUBE
Functions
• CUBEMEMBER and CUBEVALUE
– Also CUBEKPIMEMBER, CUBEMEMBERPROPERTY,
CUBERANKEDMEMBER, CUBESET and CUBESETCOUNT
• IntelliSense style support
– In a cell, type “=CU” and all CUBE formulas will display
– Select one with arrow keys and hit Tab
– When prompted for connection, type " and view pop-up list
– Other pop-ups activate on " or "."
89. At Your Service
• “Range Drag” and relative formula support
on CUBEVALUE
• CUBEVALUE and Data Bars go great
together
• Ability to convert PivotTables to formulas
92. Self-Service BI with PowerPivot
• Excel + Analysis Services + SharePoint
• Enables the working in Excel but mitigates
the “spreadmart” pitfalls:
– Use Analysis Services (AS) as a hidden engine
Instead of no engine
– Share via SharePoint, accessible by all AS clients
Instead of “deploying” via email
– Formal data refresh on server
So data doesn’t get stale, and users don’t have to
make effort at updating
– Allow IT to monitor
So it’s not all rogue
– Provide path to more rigorous implementations
Can be upsized to Analysis Services
93. Column-Oriented Stores
• Imagine, instead of:
Employee ID Age Income
1 43 90000
2 38 100000
3 35 100000
• You have:
Employee ID 1 2 3
Age 43 38 35
Income 90000 100000 100000
• Perf: values you wish to aggregate are adjacent
• Efficiency: great compression from identical or nearly-
identical values in proximity
• Fast aggregation and high compression means huge volumes
of data can be stored and processed, in RAM
94. Data Import
• Relational databases
– SQL Server (including SQL Azure!), Access
– Oracle, DB2, Sybase, Informix
– Teradata
– “Others” (OLE DB, including OLE DB provider for ODBC)
• OData feeds, incl. R2/2012 Reporting Services,
Azure DataMarket, WCF Data Services (Astoria),
SharePoint 2010 lists, Visual Studio LightSwitch
• Excel via clipboard, linked tables
• Filter, preview, friendly names for
tables/columns
95. Calculated Columns and DAX
• Formula-based columns may be created
• Formula syntax is called DAX (Data Analysis
eXpressions).
– Not to be confused with MDX or DMX. Or DACs.
• DAX expressions are similar to Excel formulas
– Work with tables and columns; similar to, but distinct from,
worksheets and their columns (and rows)
• =FUNC('table name'[column name])
• =FUNCX('table name', <filter expression>)
• FILTER(Resellers,[ProductLine] = "Mountain")
• RELATED(Products[EnglishProductName])
• DAX expressions can be heavily nested
96. Import data from
almost anywhere
PowerPivot Guidebook View data
in Excel
Calculated
column
entry
Sort and filter
DAX formula bar
Relationship
indicator
Table tabs
97. Data and
What’s New?
Diagram views
KPIs
Measures
Measure
grid
Sort one column
by another
Measure
formula
98. Perspectives
Default Aggregations Special Advanced Mode
Diagram View Reporting
properties
Hierarchies
Hide specific
columns and
tables
Create relationships
visually
Measures
KPIs
100. Excel Services
• A component of SharePoint Server 2007/2010;
requires Enterprise CAL
• Allows export of workbook, worksheet, or
individual items to SharePoint report library
– Works great for PivotTables and Charts!
– Also for sheets with CUBExxx formulas or conditional
formatting-driven “scorecards”
• Content can be viewed in browser
– Excel client not required
– Drilldown interactivity maintained
– Rendered in pure HTML and JavaScript
– Parameterization supported
101. PowerPivot Server
• Publish to Excel Services
• Viewing and interacting
• Data Refresh
• Treating as SSAS cube
– 2008 R2 version: URL to .xlsx as server name
– 2012 version: use POWERPIVOT named instance and treat
just like SSAS
Db name is GUID-based; best to discover it
– Use Excel, Reporting Services as clients
And now Power View too…more later
102. The IT Dashboard
Increase IT efficiency:
Familiar Technologies
for Authoring, Sharing,
Security, and
Compliance
Customizable IT
Dashboard
Visualize usage with
animated charts
Simplify management of SSBI content using
IT Operations Dashboard for SharePoint
104. Analysis Services Tabular Mode
• SSAS Tabular Mode is the
enterprise/server implementation of
PowerPivot
• You must have a dedicated tabular mode
SSAS instance
• Tabular SSAS projects: BI Developer
Studio (BIDS) gone PowerPivot
– Implements equivalent tooling to PowerPivot Window
– Can create an SSAS tabular database project by
importing an Excel workbook with PowerPivot model
• SSAS tabular models support partitions
and roles
105. SSAS Tabular Project in BIDS
SSAS tabular project
menus and toolbar
Measure grid and
formula bar
Reporting properties
in Properties window
106. DirectQuery Mode
• In DQ mode, model
defines schema,
but is not used for
data
• Queries issued
directly against
source
• Similar to ROLAP
storage for
conventional cubes
• Combine with
xVelocity
ColumnStore
indexes for fast,
real-time querying
109. What is Power View?
• Ad hoc reporting. Really!
• Analysis, data exploration
• Data Visualization
• In Silverlight, in the browser, in SharePoint
• Feels a little like Excel BI
• Is actually based on SSRS
– Power View makes a special RDL file
110. Power View Data Sources
• Power View works only against
PowerPivot/SSAS tabular models
– DirectQuery mode supported, however
• For PowerPivot, click “Create Power View
Report” button or option on workbook in
SharePoint PowerPivot Gallery
• For SSAS tabular model, create BISM data
source, then click its “Create Power View
Report” button or option
– BISM data sources can point to PowerPivot
workbooks too, if you want.
111. In the browser,
Power View! in Silverlight
Ribbon, like Excel
Variety of
visualizations
and data formats
Field list, like Excel
Data regions pane,
like Excel
112. View Modes
Maximize one
chart,
fit report to
window, put whole
report
in Reading Mode
or
Full Screen
Create multiple pages
(views)
114. Constraining Your Data
In Power View
• Tiles
– A filtering mechanism within a visualization
• Highlighting
– Selection in one visualization affects the others
• Slicers
– Similar to Excel against PowerPivot
• True Filters
– Checked drop-down list; very Excel-like
– Right-hand filter pane, similar to SSRS and Excel
Services
116. Scatter/Bubble Charts
• Allow for several measures
• Features a “play” axis which can be
manipulated through a slider or animated
• Excellent way to visualize trends over time
117. Multipliers
• Multiple charts within a chart, in columns,
rows, or a matrix
– Horizontal and vertical multipliers
• Allows for visualizing 1 or 2 additional
dimensions
118. Advanced Properties
• Setting the representative column and
image tells Power View how to summarize
your data, and show stored images
• Other properties tell it about key
attributes, default aggregations and more
• Reminder: “DirectQuery” mode tells
Power View to get data from relational
data source instead of columnar cache
– Use with columnstore indexes to have the best of both
worlds
– columnstore indexes require Enterprise Edition,
available in BI Edition
120. Vocabulary
• MOLAP: Multidimensional OLAP
• UDM: Unified Dimensional Model
• Cube: unit of schema in a dimensional
database
• xVelocity Columnstore Technology:
PowerPivot/SSAS’ column store engine
• VertiPaq: Old name for xVelocity
• BISM: BI Semantic Model
• IMBI: In-Memory BI engine
• Tabular: a column store-based model
– Because it uses tables, not cubes
121. xVelocity Columnstore Indexes
• Implementation of xVelocity columnar technology
engine for SQL Server relational databases
– Code name was: “Apollo”
• Use it by creating a columnstore index
– CREATE COLUMNSTORE INDEX index ON table (col1,
Col2, …)
• Can ignore it in a SELECT, too:
– OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
• Can significantly increase performance of star join
queries (i.e. aggregating queries with dimension
lookups).
• Must enable “batch” mode as well – look @ query
plan to confirm!
• Not as good as SSAS, but better than plain old
GROUP BY
126. SSRS + SSAS = YES(S)
• Reporting Services can query Analysis
Services databases, including
multidimensional cubes and tabular
models, directly. PowerPivot too.
• Uses MSOLAP OLE DB provider and
issues MDX queries
– Has its own query designer for MDX
• Beware: SSRS essentially “flattens” SSAS
data
– Conforming multidimensional data to relational
structures
127. Self-Service Reporting?
• Fact: Reporting Services is powerful
• Fact: the data visualizations in SSRS are some of
the best in the BI stack
• Fact: None of that makes SSRS report design
end-user-friendly
• Building SSRS reports, and especially charts,
gauges, etc. from scratch is tedious
• Until now, best end-user option has been to copy
an existing report and tweak it. Yech!
• There must be a better way…
128. Report Parts to the Rescue
• Skilled SSRS designers can publish report parts
– From Report Builder 3.0 or VS report projects
• End users can pick them from a gallery
– A task pane, in Report Builder 3.0, with search capability
– Cannot select from VS report designer
• What can be published?:
– Tablixes (i.e. tables, matrices)
– Rectangles
– Images, Charts, Gauges, Maps
– Parameters and Lists
• All aided by new ability to share Datasets and
original ability to share Data Sources
132. PerformancePoint Services (PPS)
Scorecards Analytics
Cascading Multi-dimensional
scorecards with slice and dice for
interactive charts advanced analytics
and data from including
multiple sources decomposition tree,
performance map,
and perspective
view
136. MDS: Microsoft’s Master Data
Management (MDM) tool
• Examples:
– Sales states, countries, currencies, customer types
– Customers, products
– Think of “lookup tables” or just think of dimensions!
– Slowly changing non-transactional entities in your data
• What gets stored:
– Schemas
– Any hierarchies
– The data!
• Other features:
– Collections, business rules, security, workflows
– Versioning
137. Other Facts
• Result of acquisition of Stratature
• v1 is an ASP.NET application; UI is “different”
• New in v2 (SQL Server 2012):
• Now Silverlight-based; UI is still “different”
• Excel add-in for data entry; creation of entities and attributes
• Perform matching with DQS before loading
• Includes .NET and Web Services APIs for
reading/writing data and creating/editing models
• Does not integrate with Analysis Services tools even
though many of its features and concepts mirror
those of dimension designer
• Catalog kept in SQL Server database
• Deployment packages can be created, shared and
deployed
138. Objects in MDS
• Models
– Entities (like tables or SSAS dimensions)
Attributes (like columns/fields or SSAS attributes)
Common attributes are Name and Code
Attribute Groups
Used to taxonomize attributes within tabs in UI
Members (like rows/records or SSAS members)
Hierarchies (like SSAS hierarchies)
Derived or Explicit
Collections (like SSAS named sets)
– Versions
– Business rules
– Workflows
139. Data Quality Services
• Data Cleansing Tool
• New to SQL Server 2012
• Result of Zoomix acquisition
• Uses Artificial Intelligence algorithms to
detect invalid data and perform matching
(for de-duplication)
• Allows manual intervention, too
• Can integrate with MDS and SSIS
• Cleaner data = better adoption of your BI
project
140. DQS Concepts
• Knowledge Bases
– Domains
“semantic representation[s] of a type of data in a data
field…[contain] a list of trusted values, invalid values,
and erroneous data.”
– Mapping
• Data Quality Projects
– Cleansing (i.e. correcting)
Validate Using Reference Data Services and Azure
DataMarket (or 3rd party providers)
– Matching (i.e. de-duping)
– Confidence
– Profiling, Monitoring
141. StreamInsight
• Microsoft’s Complex Event Handling (CEP)
Product
• Processes data streams that are fast and high-
volume
• Highly parallel C++ code assures low latency,
high throughput
• Not based on SQL Server, though that is its “ship
vehicle”
• Interesting collaborative potential with BizTalk
and SSIS
142. StreamInsight Concepts
• No UI. All interaction is programmatic.
• Based on adapter architecture
– Input and output adapters
– Buy or build
– Sensors, RFID readers, Web logs, market data streams are
possible event sources
• StreamInsight applications
– Streams and events can be queried via LINQ from .NET
– Server can run in-process, or shared
143.
144. StreamInsight v1.2:
New in SQL Server 2012
• Resiliency: can take snapshots and
restore to that saved state after an outage
• Dev experience: LINQ enhancements
– Multiple FROM clauses
– Nested types
• Extensibility: user-defined stream
operators now supported
• Admin features: server-wide and query-
specific perf counters, Windows event
logging
• Included with Web and Standard Editions
145. Hadoop on Windows
• Hadoop is the open source implementation of
Google’s MapReduce distributed processing
engine
• MS working with Hortonworks to implement it
on Windows
– A full “distro”
– Window Server, Windows Azure DIY, Windows Azure self-
serve cluster provisioning
• Also: ODBC driver for Hive
– Works with SSRS, SSAS Tabular, PowerPivot and plain
Excel.
• Also: JavaScript console in the browser
• Come to my session on Thursday!
148. The Need To Stream
• Data explosion accelerating the creation and
management of unstructured binary large
objects (BLOBs)
– Photos
– Audio
– Video
– Email messages
– Spreadsheets
– Documents
– Etc.
149. BLOBs And The Database
Two choices
• Store them in the database
• Store them outside the database, either in the file system
or in a dedicated BLOB store
BLOBs in the database
• varbinary(max) column
• Integrated management
• Transactional
• Simplified programming
• Bloats the structured file
groups
151. BLOBs And The Database
Two choices
• Store them in the database
• Store them outside the database, either in the file system
or in a dedicated BLOB store
BLOBs in the database BLOBs outside the database
• varbinary(max) column • Path references to file system
• Integrated management • Separate from the database
• Transactional • Not transactional
• Simplified programming • Complex programming
• Bloats the structured file • Doesn’t interfere with
groups performance
153. BLOBs Using FILESTREAM
• Transparently store varbinary(max) data in the file system
– Declare column as “varbinary(max) FILESTREAM”
• Integrated management
– BLOBs are logically part of the database (backup, restore, etc.), but stored
physically separate as a file group mapped to the file system
• Simplified programming
– SQL Server transparently links rows in relational tables to BLOBs in the file
system
– No developer effort required to maintain references between structured and
unstructured data
• Transactional
– SQL Server integrates with the NTFS file system
– Database transactions wrap NTFS transactions
• Performant
– File system is optimized for streaming
155. Enabling the Service for FILESTREAM
• Security concern of the Windows administrator
• Set to one of four levels
– Disabled
– T-SQL only
– T-SQL + file system (local only)
– T-SQL + file system (remote)
• Enable it either:
– During setup
– With SQL Server Configuration
Manager
– No way to script with T-SQL, but
VBScript file is available on
CodePlex that provides
command-line alternative
156. Enabling the Instance for FILESTREAM
• Security concern of the SQL Server administrator
– Windows and SQL admins must agree!
• Set to one of three levels
– Disabled, T-SQL only, T-SQL + file system
• Enable it either:
– In SSMS Server Properties dialog
– In T-SQL, with:
EXEC sp_configure filestream_access_level, n
RECONFIGURE
159. Creating FILESTREAM Columns
• Table requires ROWGUIDCOL column
– Attribute applied to a uniqueidentifier (GUID) column
– Must be primary key or have unique constraint
– Only one ROWGUIDCOL column is permitted per table
• Define BLOB columns as “varbinary(max)
FILESTREAM”
– Any number of BLOB columns are permitted per table
161. Introducing SqlFileStream
• It is not easy, practical, or efficient to
manipulate BLOBs in T-SQL
• Build a streaming client with SqlFileStream
– System.Data.SqlTypes.SqlFileStream
– Part of System.Data.dll in .NET 3.5 SP1 and higher
– Inherits from Stream
– Constructor takes logical path and transaction context
– Wraps OpenSqlFilestream SQL Server native client API
– Consumes no SQL Server memory during processing
162. The SqlFileStream Recipe
• Begin Transaction
• INSERT/SELECT row
• Retrieve BLOB PathName()
• Retrieve
GET_FILESTREAM_TRANSACTION_CONTEXT()
• Create and use SqlFileStream
• Commit Transaction
165. FILESTREAM Limitations &
Considerations
• Mirroring/HADR
– Not supported with mirroring
– Supported with HADR (SQL Server 2012 “AlwaysOn”)
• Transparent Data Encryption (TDE)
– Does not encrypt files
• Replication
– Supported with some restrictions, see BOL
• Log Shipping
– Fully supported
– Primary and secondary servers require SQL Server 2008 or higher
• Full-Text Search (FTS)
– Fully supported
166. FILESTREAM Limitations &
Considerations (cont.)
• Database Snapshots
– Not supported for FILESTREAM filegroups
• Snapshot Isolation Level
– Wasn’t supported in SQL Server 2008, supported in 2008 R2 and 2012
• Local NTFS File System
– Requires local NTFS file system
– RBS (Remote BLOB Store) API makes SQL Server act as a dedicated
BLOB store
• Security
– Requires mixed-mode (integrated) security
• SQL Server Express Edition
– Fully supported
– Database size limit (4GB in SQL Server 2008, 10GB in 2008 R2 and
2012) does not include FILESTREAM data
168. Hierarchical Data Is Not Relational
• Today’s most common form of hierarchical data
is XML
• XML support added in SQL Server 2005 is great,
if:
– You want to store and retrieve an entire hierarchy at one time
– The data is consumed in XML by client applications
• Parent-child relationships define rigid
hierarchies
– Can’t support unlimited breadth and depth
169. Hierarchical Storage Scenarios
• Forum and mailing list
threads
• Business organization
charts
• Content management
• Product categories
• File/folder management
• FileTable in SQL Server 2012
• Many more…
– All typically iterated
recursively
170. Traditional Self-Join Approach
• One table
– Each row is linked to its parent
• Works, but has limitations
– CTEs help with recursive queries
– Still your job to manage updates
– Manually maintain structure
– Complex to reparent entire sub-trees
– Difficult to query
– Difficult to control precise ordering of siblings
171. Introducing hierarchyid
• System CLR data type
– Extremely compact variable-length format
– Does not require SQL CLR to be enabled on the server
• Enables a robust hierarchical structure over a
self-joining table
– Each row is a node with a unique hierarchyid value
– Contains the path in the hierarchy to the node… down to the
sibling ordinal position
• Invoke methods in T-SQL
– Efficiently query the hierarchy
– Arbitrarily insert, modify, and delete nodes
– Reparent entire sub-trees with a single update
173. Indexing hierarchyid Columns
• Two types of indexes: Depth-First Index
– Use one, the other, or
both as your needs
dictate
• Depth-First
– Create a primary key or
unique index Breadth-First Index
• Breadth-First
– Create a composite index
that includes a level
column
177. Introducing FileTable
• Builds on FILESTREAM and hierarchyid
• A “semi”-ordinary table that houses a logical file system
– Fixed column schema
– Each row represents a “folder” or “file”
Column Name Data Type Description
stream_id uniqueidentifier Unique row identifier
ROWGUIDCOL
file_stream varbinary(max) BLOB content (NULL if directory)
FILESTREAM
name nvarchar(255) Name of file or directory
path_locator hierarchyid Location of file or directory within the file
system hierarchy
– Plus 10 storage attributes columns (e.g., is directory, created, modified, archive bit)
• Windows file/directory management API support
– A Windows file share exposes the FileTable
– Bi-directional – changes to the table are reflected in the file share and vice versa
178. BLOBs Using FileTable hierarchyid column
varbinary(max) Name
Database Rows
FileTable Name
Server Machine
Instance
Name
defines logical file Name
FILESTREAM column
FILESTREAM Share
and folder paths contents
holds each file’s
T-SQL T-SQL SqlFileStream
stream_id name path_locator file_stream
27D8D4AD-D100-39… 'Financials' 0xFF271A3562… NULL
78F603CC-0460-73… 'ReadMe.docx' 0xFF59345688… 0x3B0E956636AE3B2F020B…
207D4A96-E854-01… 'Budget.xlsx' 0xFD0011039A… 0xF3F359000EEF293039A2…
179. FileTable Prerequisites
• Prerequisites at the instance level
– FILESTREAM must be enabled for the instance
• Prerequisites at the database level
– Enable non-transactional FILESTREAM access for the
database (is still transactional internally)
– Set a root directory name for all FileTables in the database
(this will become a child in the Windows file share)
– Use this T-SQL statement:
CREATE DATABASE … WITH – or –
ALTER DATABASE … SET
– …followed by…
FILESTREAM(
NON_TRANSACTED_ACCESS=FULL|READ,
DIRECTORY_NAME='DatabaseRootDirectory')
180. Creating a FileTable
• Create a FileTable in T-SQL
– CREATE TABLE TableName AS FileTable
• FileTable has a fixed schema
– You don’t (can’t) supply a column list
• Creates logical directory
– Logical root directory for the FileTable
– Created beneath the root directory for the database
– Named after the table, can override by specifying:
WITH(FileTable_Directory='TableRootDirectory')
– Exposes a functional Win32 file system
– Does not support memory-mapped files (does not
affect remote clients)
182. SQL Server Spaces Out
• Integrate location awareness into any application
– Long been the domain of sophisticated GIS applications
• GIS
– A system for capturing, storing, analyzing, and managing data and
associated attributes which are spatially referenced to the earth
• Allow a user to interact with information that is
relevant to locations that they care about:
– Home, work, school, or vacation destinations
• Two geospatial models
– Planar
– Geodetic
183. Spatial Data Types
• Two spatial models = Two system CLR types
• geometry
– Planar (flat) model
– Flat 2-dimensional Cartesian Coordinate system
– X and Y coordinates with uniform units of measure
– Use for mapping small areas
• geography
– Geodetic (round-earth) model
– Latitude and longitude
– Use for larger mapping where land mass is too big to fit on
one planar projection
184. Planar Spatial Model
• Two-Dimensional Surface
– X and Y coordinates on an arbitrary plane
• Flat Earth Projection
– To work with geospatial data on a 2D surface, a projection is
created to flatten the geographical objects on the spheroid
– Example: Planar Model based on Mercator Projection
Greenland Square KM:
North
- Antarctica = 13 million
America - Greenland = 2 million
Africa
- N. America = 24 million
- Africa = 30 million
Antarctica
185. Geodetic Spatial Model
• Accurate geographic measurements
– Locations on planet surface described by latitude and
longitude angles
• Ellipsoidal sphere
– Latitude = angle N/S of the equator
– Longitude = angle E/W of the Prime Meridian
186. Spatial Data Standards
• Open Geospatial Consortium (OGC)
– International standards body
• Microsoft belongs to the OGC
– SQL Server 2008 uses the OGC’s Simple Feature Access
standards
• OpenGIS Simple Feature Interface Standards (SFS)
– A well-defined way for applications to store and access spatial data
in relational databases
– Described using vector elements; such as points, lines and
polygons
• Three ways to import geospatial data
– Well-Known Text (WKT)
– Well-Known Binary (WKB)
– Geographic Markup Language (GML)
192. Spatial Improvements In
SQL Server 2012
• Circular Arcs
• CircularString
• CompoundCurve
• CurvePolygon
• All existing methods work on circular objects
• New spatial methods
• BufferWithCurves
• STNumCurves, STCurveN
• STCurveToLine
• CurveToLineWithTolerance
• IsValidDetailed
• HasZ, HasM, AsBinaryZM
• ShortestLineTo
• UnionAggregate, EnvelopeAggregate, CollectionAggregate,
ConvexHullAggregate
• MinDbCompatibilityLevel
193. Spatial Improvements In
SQL Server 2012
• Improved Precision
• Constructions and relations use 48 bits of precision (previously 27 bits)
• geography Enhancements
• Support for objects larger than a logical hemisphere (“FullGlobe”)
• Support for new and previous “geometry-only” methods
• New SRID
• Spatial reference ID 104001 (sphere of radius 1)
• Performance Improvements
• Better tuning and hints
• Auto Grid indexing with 8 levels (previously 4 levels)
• Other Improvements
• New histogram stored procedures
• Support for persisted computed columns
194. Resources
• Workshop slides and code
– http://sdrv.ms/VSLiveNY2012SQL
• Aaron Bertrand’s Blog
– http://sqlblog.com/blogs/aaron_bertrand
• Bob Beauchemin’s Blog
– http://sqlskills.com/BLOGS/BOBB
• Itzik Ben-Gan’s Web Site
– http://www.sql.co.il
• James Serra’s Blog
– http://www.jamesserra.com
• simple-talk – Learn SQL Server
– http://www.simple-talk.com/sql/learn-sql-server
195. Thank You!
• Contact us
– andrew.brust@bluebadgeinsights.com
– lenni.lobel@sleektech.com
• Visit our blogs
– brustblog.com
– lennilobel.wordpress.com
• Follow us on Twitter
– @andrewbrust
– @lennilobel
• Thanks for coming!