Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data Integration through
Data Virtualization
Cathrine Wilhelmsen, Inmeta
@cathrinew | cathrinew.net
February 21st 2019
Abstract
Data virtualization is an alternative to Extract, Transform and Load (ETL) processes. It handles the
complexity o...
@cathrinew
cathrinew.net
…the next 60 minutes…
PolyBase
Virtual
Data Layer
Data
Virtualization
Data
Integration
Data
Integration
Combine Data in Different Formats
from Separate Sources into Useful
and Valuable Information
Combine Data in Different Formats
from Separate Sources into Useful
and Valuable Information
Combine Data
Extract Transform Load
Extract Load Transform
Data Ingestion
Data Preparation
Data Wrangling
Combine Data in Different Formats
from Separate Sources into Useful
and Valuable Information
Different Formats
SQL
TXT
CSV
XLS
XML
JSON
Orc
Parquet
Combine Data in Different Formats
from Separate Sources into Useful
and Valuable Information
Separate Sources
SQL Server
Oracle
Teradata
MongoDB
Hadoop
Azure Blob Storage
Azure Data-Lake
Local File System
Combine Data in Different Formats
from Separate Sources into Useful
and Valuable Information
Useful Information
Accurate
Complete
Consistent
Timely
Unique
Valid
Combine Data in Different Formats
from Separate Sources into Useful
and Valuable Information
Valuable Information
What you need
Answer questions
Solve problems
Timesaving
Reduce effort
Improve efficiency
Combine Data in Different Formats
from Separate Sources into Useful
and Valuable Information
ETL – Extract Transform Load
ELT – Extract Load Transform
ETL – Extract Transform Load
ELT – Extract Load Transform
= data movement
Data Movement: Costs
Duplicated storage costs
Need resources to build and maintain
Data Movement: Speed
Takes time to build and maintain
Delays before data can be used
Data Movement: Security
Increased attack surface area
Inconsistent security models
Data Movement: Data Quality
More storage layers and pipelines
Higher complexity
Data movement is a
barrier to faster insights
- Microsoft
Data
Virtualization
Data Virtualization
Logical Layers and Abstractions
(Near) Real-Time View of Data
Store in separate locations
View in one ...
Data Virtualization: Costs
Lower storage costs
Fewer resources to build and maintain
Data Virtualization: Speed
No data latency
Rapid iterations and prototypes
Data Virtualization: Security
Smaller attack surface area
Consistent security models
Data Virtualization: Data Quality
Fewer storage layers and pipelines
Less complexity
Data virtualization
creates solutions
- Microsoft
Data Movement = Bad ?
Data Virtualization = Good ?
Data Movement = Bad ?
Data Virtualization = Good ?
no, just different use cases!
PolyBase
PolyBase
Feature in SQL Server 2016 and later
Query tables and files using T-SQL
Used to query, import, and export data
PolyBase Performance
Push-Down Computations
Scale-Out Groups
PolyBase in SQL Server 2016 / 2017
Hadoop
Azure Blob Storage
Azure Data Lake
PolyBase in SQL Server 2019
Hadoop
Azure Blob Storage
Azure Data Lake
SQL Server
Oracle
Teradata
MongoDB
ODBC NoSQL Relational Databases Big Data
PolyBase
How to use PolyBase?
1. Install PolyPase
2. Configure PolyBase Connectivity
3. Create Database Master Key
4. Create Databa...
How to use PolyBase?
4. ...
5. Create External Data Sources
6. Create External File Formats
7. Create External Tables
8. C...
Install PolyBase
1. Install Prerequisites
Microsoft .NET Framework 4.5
Oracle Java SE Runtime Environment (JRE) 7 or 8
2. Install PolyBase
...
Install Prerequisites
Microsoft .NET Framework 4.5
https://www.microsoft.com/nl-nl/download/details.aspx?id=30653
Oracle J...
Install PolyBase
Note: PolyBase can be installed on only one SQL
Server instance per machine.
Note: After you install Poly...
Enable PolyBase
sp_configure
'polybase enabled', 1;
RECONFIGURE;
Verify Installation
SELECT SERVERPROPERTY
('IsPolyBaseInstalled');
Configure PolyBase
Connectivity
1. Configure PolyBase Connectivity
2. Restart Services
SQL Server
SQL Server PolyBase Engine
SQL Server PolyBase Data Move...
Configure PolyBase Connectivity
sp_configure
'hadoop connectivity', 7;
RECONFIGURE;
Configure PolyBase Connectivity
Hadoop Connectivity:
• Specify type of data source
• Values: 0-7
• 1, 4, 7: Multiple Data ...
Configure PolyBase Connectivity
Configure PolyBase Connectivity
Restart Services
Restart Services
Create Database
Master Key
Create Database Master Key
CREATE MASTER KEY ENCRYPTION
BY PASSWORD = '<password>';
Create Database
Scoped Credential
Create Database Scoped Credential
CREATE DATABASE SCOPED CREDENTIAL
<CredentialName>
WITH IDENTITY = '<identity>',
SECRET ...
Create External
Data Sources
Create External Data Source
Create External Data Source
Create External Data Source
Create External Data Source
CREATE EXTERNAL DATA SOURCE <HadoopName> WITH (
TYPE = HADOOP,
LOCATION ='<hdfs://...>',
CREDE...
Create External Data Source
CREATE EXTERNAL DATA SOURCE <AzureBlobName> WITH (
TYPE = HADOOP,
LOCATION ='<wasbs://...>',
C...
Create External Data Source
CREATE EXTERNAL DATA SOURCE <OracleName> WITH (
LOCATION ='<oracle://...>',
CREDENTIAL = <Cred...
Create External
File Formats
Create External File Format
Create External File Format
Create External File Format
Create External File Format
CREATE EXTERNAL FILE FORMAT <FileFormatName> WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTION...
Create External
Tables
Create External Table
Create External Table
Create External Table
Create External Table
CREATE EXTERNAL TABLE [SchemaName].[TableName] (
[ColumnName] INT NOT NULL
) WITH (
LOCATION = <File...
Create Statistics
Create Statistics
Note: To create statistics, SQL Server imports the
external data into temp table first. Remember to
choo...
Create Statistics
CREATE STATISTICS <StatName>
ON <TableName>(<ColumnName>);
CREATE STATISTICS <StatName>
ON <TableName>(<...
All Done
Verify using Catalog Views
SELECT * FROM sys.external_data_sources
SELECT * FROM sys.external_file_formats;
SELECT * FROM ...
T-SQL All The Things :)
…or…?
PolyBase can be grumpy :(
Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Not enough columns in this line.
Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Too many columns in the line.
Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Could not find a delimiter after
stri...
Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Error converting data type NVARCHAR t...
Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Conversion failed when converting the...
Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Too long string in column [-1]:
Actua...
Msg 46518, Level 16, State 12, Line 1:
The type 'nvarchar(max)' is not supported
with external tables.
Msg 2717, Level 16, State 2, Line 1:
The size (10000) given to the parameter
exceeds the maximum allowed (4000).
Msg 131, Level 15, State 2, Line 1:
The size (10000) given to the column
exceeds the maximum allowed for any data
type (80...
= Know your data :)
SQL Server 2019
Big Data Clusters
SQL Server 2019 Big Data Clusters
SQL Server, Spark, and HDFS
Scalable clusters of containers
Runs on Kubernetes
Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes Pod
SQL Server
Master Instance
SQL Server
HDFS Data Node
SparkSQL ...
Build Virtual
Data Layer
Build Virtual Data Layer
Scenarios:
1. Text Files in Azure Blob Storage
2. Tables in Oracle Database
Text Files in Azure Blob Storage
Tables in Oracle Database
DEMO
Build Virtual
Data Layer in SSMS
It's as easy as 1, 2, 3!
…4, 5, 6, 7, 8, 9, 10…
Is there an easier way?
Biml 💚 PolyBase
Ben Weissman:
Using Biml to automagically keep your external
polybase tables in sync!
https://www.solisyon...
Azure Data Studio
1. Install Azure Data Studio
docs.microsoft.com/en-us/sql/azure-data-studio/download
2. Install Extensio...
Extension: SQL Server 2019 (Preview)
Extension: SQL Server 2019 (Preview)
Double-clicking the .vsix
file doesn't work…
Extension: SQL Server 2019 (Preview)
…install preview
extensions from Azure
Data Studio
Azure Data Studio
Wizard: CSV Files
Azure Data Studio
Wizard: Oracle
DEMO
Build Virtual
Data Layer in ADS
Next Steps
Where can I learn more?
Microsoft SQL Docs:
docs.microsoft.com/sql
Where can I learn more?
Kevin Feasel:
36chambers.wordpress.com/polybase
How can I try PolyBase?
Microsoft Hands-on Labs:
microsoft.com/handsonlabs
How can I try SQL Server 2019?
For Windows, Linux, and containers:
aka.ms/trysqlserver2019
How can I try Big Data Clusters?
SQL Server 2019 Early Adoption Program:
aka.ms/eapsignup
@cathrinew
cathrinew.net
hi@cathrinew.net
Vielen Dank!
Thank you very much for your attention.
Vielen Dank für Eure Aufmerksamkeit.
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Upcoming SlideShare
Loading in …5
×

of

Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 1 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 2 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 3 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 4 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 5 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 6 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 7 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 8 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 9 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 10 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 11 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 12 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 13 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 14 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 15 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 16 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 17 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 18 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 19 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 20 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 21 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 22 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 23 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 24 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 25 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 26 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 27 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 28 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 29 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 30 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 31 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 32 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 33 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 34 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 35 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 36 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 37 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 38 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 39 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 40 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 41 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 42 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 43 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 44 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 45 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 46 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 47 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 48 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 49 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 50 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 51 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 52 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 53 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 54 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 55 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 56 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 57 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 58 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 59 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 60 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 61 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 62 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 63 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 64 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 65 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 66 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 67 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 68 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 69 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 70 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 71 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 72 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 73 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 74 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 75 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 76 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 77 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 78 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 79 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 80 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 81 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 82 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 83 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 84 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 85 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 86 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 87 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 88 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 89 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 90 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 91 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 92 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 93 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 94 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 95 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 96 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 97 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 98 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 99 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 100 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 101 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 102 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 103 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 104 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 105 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 106 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 107 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 108 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 109 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 110 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 111 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 112 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 113 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 114 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 115 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 116 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 117 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 118 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 119 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 120 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 121 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 122 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 123 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 124 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 125 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 126 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 127 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 128 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 129 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 130 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 131 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 132 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 133 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 134 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 135 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 136 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 137 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 138 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 139 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 140 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 141 Data Integration through Data Virtualization (SQL Server Konferenz 2019) Slide 142
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

8 Likes

Share

Download to read offline

Data Integration through Data Virtualization (SQL Server Konferenz 2019)

Download to read offline

Data Integration through Data Virtualization - PolyBase and new SQL Server 2019 Features (Presented at SQL Server Konferenz 2019 on February 21st, 2019)

Data Integration through Data Virtualization (SQL Server Konferenz 2019)

  1. 1. Data Integration through Data Virtualization Cathrine Wilhelmsen, Inmeta @cathrinew | cathrinew.net February 21st 2019
  2. 2. Abstract Data virtualization is an alternative to Extract, Transform and Load (ETL) processes. It handles the complexity of integrating different data sources and formats without requiring you to replicate or move the data itself. Save time, minimize effort, and eliminate duplicate data by creating a virtual data layer using PolyBase in SQL Server. In this session, we will first go through fundamental PolyBase concepts such as external data sources and external tables. Then, we will look at the PolyBase improvements in SQL Server 2019. Finally, we will create a virtual data layer that accesses and integrates both structured and unstructured data from different sources. Along the way, we will cover lessons learned, best practices, and known limitations.
  3. 3. @cathrinew cathrinew.net
  4. 4. …the next 60 minutes… PolyBase Virtual Data Layer Data Virtualization Data Integration
  5. 5. Data Integration
  6. 6. Combine Data in Different Formats from Separate Sources into Useful and Valuable Information
  7. 7. Combine Data in Different Formats from Separate Sources into Useful and Valuable Information
  8. 8. Combine Data Extract Transform Load Extract Load Transform Data Ingestion Data Preparation Data Wrangling
  9. 9. Combine Data in Different Formats from Separate Sources into Useful and Valuable Information
  10. 10. Different Formats SQL TXT CSV XLS XML JSON Orc Parquet
  11. 11. Combine Data in Different Formats from Separate Sources into Useful and Valuable Information
  12. 12. Separate Sources SQL Server Oracle Teradata MongoDB Hadoop Azure Blob Storage Azure Data-Lake Local File System
  13. 13. Combine Data in Different Formats from Separate Sources into Useful and Valuable Information
  14. 14. Useful Information Accurate Complete Consistent Timely Unique Valid
  15. 15. Combine Data in Different Formats from Separate Sources into Useful and Valuable Information
  16. 16. Valuable Information What you need Answer questions Solve problems Timesaving Reduce effort Improve efficiency
  17. 17. Combine Data in Different Formats from Separate Sources into Useful and Valuable Information
  18. 18. ETL – Extract Transform Load ELT – Extract Load Transform
  19. 19. ETL – Extract Transform Load ELT – Extract Load Transform = data movement
  20. 20. Data Movement: Costs Duplicated storage costs Need resources to build and maintain
  21. 21. Data Movement: Speed Takes time to build and maintain Delays before data can be used
  22. 22. Data Movement: Security Increased attack surface area Inconsistent security models
  23. 23. Data Movement: Data Quality More storage layers and pipelines Higher complexity
  24. 24. Data movement is a barrier to faster insights - Microsoft
  25. 25. Data Virtualization
  26. 26. Data Virtualization Logical Layers and Abstractions (Near) Real-Time View of Data Store in separate locations View in one location
  27. 27. Data Virtualization: Costs Lower storage costs Fewer resources to build and maintain
  28. 28. Data Virtualization: Speed No data latency Rapid iterations and prototypes
  29. 29. Data Virtualization: Security Smaller attack surface area Consistent security models
  30. 30. Data Virtualization: Data Quality Fewer storage layers and pipelines Less complexity
  31. 31. Data virtualization creates solutions - Microsoft
  32. 32. Data Movement = Bad ? Data Virtualization = Good ?
  33. 33. Data Movement = Bad ? Data Virtualization = Good ? no, just different use cases!
  34. 34. PolyBase
  35. 35. PolyBase Feature in SQL Server 2016 and later Query tables and files using T-SQL Used to query, import, and export data
  36. 36. PolyBase Performance Push-Down Computations Scale-Out Groups
  37. 37. PolyBase in SQL Server 2016 / 2017 Hadoop Azure Blob Storage Azure Data Lake
  38. 38. PolyBase in SQL Server 2019 Hadoop Azure Blob Storage Azure Data Lake SQL Server Oracle Teradata MongoDB
  39. 39. ODBC NoSQL Relational Databases Big Data PolyBase
  40. 40. How to use PolyBase? 1. Install PolyPase 2. Configure PolyBase Connectivity 3. Create Database Master Key 4. Create Database Scoped Credential 5. ...
  41. 41. How to use PolyBase? 4. ... 5. Create External Data Sources 6. Create External File Formats 7. Create External Tables 8. Create Statistics
  42. 42. Install PolyBase
  43. 43. 1. Install Prerequisites Microsoft .NET Framework 4.5 Oracle Java SE Runtime Environment (JRE) 7 or 8 2. Install PolyBase Single Node or Scale-Out Group 3. Enable PolyBase
  44. 44. Install Prerequisites Microsoft .NET Framework 4.5 https://www.microsoft.com/nl-nl/download/details.aspx?id=30653 Oracle Java SE Runtime Environment (JRE) 7 or 8 https://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html
  45. 45. Install PolyBase Note: PolyBase can be installed on only one SQL Server instance per machine. Note: After you install PolyBase either standalone or in a scale-out group, you have to uninstall and reinstall to change it. . . . Ask me how I know : )
  46. 46. Enable PolyBase sp_configure 'polybase enabled', 1; RECONFIGURE;
  47. 47. Verify Installation SELECT SERVERPROPERTY ('IsPolyBaseInstalled');
  48. 48. Configure PolyBase Connectivity
  49. 49. 1. Configure PolyBase Connectivity 2. Restart Services SQL Server SQL Server PolyBase Engine SQL Server PolyBase Data Movement
  50. 50. Configure PolyBase Connectivity sp_configure 'hadoop connectivity', 7; RECONFIGURE;
  51. 51. Configure PolyBase Connectivity Hadoop Connectivity: • Specify type of data source • Values: 0-7 • 1, 4, 7: Multiple Data Sources
  52. 52. Configure PolyBase Connectivity
  53. 53. Configure PolyBase Connectivity
  54. 54. Restart Services
  55. 55. Restart Services
  56. 56. Create Database Master Key
  57. 57. Create Database Master Key CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>';
  58. 58. Create Database Scoped Credential
  59. 59. Create Database Scoped Credential CREATE DATABASE SCOPED CREDENTIAL <CredentialName> WITH IDENTITY = '<identity>', SECRET = '<secret>';
  60. 60. Create External Data Sources
  61. 61. Create External Data Source
  62. 62. Create External Data Source
  63. 63. Create External Data Source
  64. 64. Create External Data Source CREATE EXTERNAL DATA SOURCE <HadoopName> WITH ( TYPE = HADOOP, LOCATION ='<hdfs://...>', CREDENTIAL = <CredentialName>, RESOURCE_MANAGER_LOCATION = '<ip>' );
  65. 65. Create External Data Source CREATE EXTERNAL DATA SOURCE <AzureBlobName> WITH ( TYPE = HADOOP, LOCATION ='<wasbs://...>', CREDENTIAL = <CredentialName> );
  66. 66. Create External Data Source CREATE EXTERNAL DATA SOURCE <OracleName> WITH ( LOCATION ='<oracle://...>', CREDENTIAL = <CredentialName> );
  67. 67. Create External File Formats
  68. 68. Create External File Format
  69. 69. Create External File Format
  70. 70. Create External File Format
  71. 71. Create External File Format CREATE EXTERNAL FILE FORMAT <FileFormatName> WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS ( FIELD_TERMINATOR = ';', USE_TYPE_DEFAULT = TRUE ) );
  72. 72. Create External Tables
  73. 73. Create External Table
  74. 74. Create External Table
  75. 75. Create External Table
  76. 76. Create External Table CREATE EXTERNAL TABLE [SchemaName].[TableName] ( [ColumnName] INT NOT NULL ) WITH ( LOCATION = <FileName>', DATA_SOURCE = <DataSourceName>, FILE_FORMAT = <FileFormatName> )
  77. 77. Create Statistics
  78. 78. Create Statistics Note: To create statistics, SQL Server imports the external data into temp table first. Remember to choose sampling or full scan. Note: Updating statistics is not supported. Drop and re-create instead.
  79. 79. Create Statistics CREATE STATISTICS <StatName> ON <TableName>(<ColumnName>); CREATE STATISTICS <StatName> ON <TableName>(<ColumnName>) WITH FULLSCAN;
  80. 80. All Done
  81. 81. Verify using Catalog Views SELECT * FROM sys.external_data_sources SELECT * FROM sys.external_file_formats; SELECT * FROM sys.external_tables;
  82. 82. T-SQL All The Things :)
  83. 83. …or…?
  84. 84. PolyBase can be grumpy :(
  85. 85. Unexpected error encountered filling record reader buffer: HadoopExecutionException: Not enough columns in this line.
  86. 86. Unexpected error encountered filling record reader buffer: HadoopExecutionException: Too many columns in the line.
  87. 87. Unexpected error encountered filling record reader buffer: HadoopExecutionException: Could not find a delimiter after string delimiter.
  88. 88. Unexpected error encountered filling record reader buffer: HadoopExecutionException: Error converting data type NVARCHAR to INT.
  89. 89. Unexpected error encountered filling record reader buffer: HadoopExecutionException: Conversion failed when converting the NVARCHAR value '"0"' to data type BIT.
  90. 90. Unexpected error encountered filling record reader buffer: HadoopExecutionException: Too long string in column [-1]: Actual len = [4242]. MaxLEN=[4000]
  91. 91. Msg 46518, Level 16, State 12, Line 1: The type 'nvarchar(max)' is not supported with external tables.
  92. 92. Msg 2717, Level 16, State 2, Line 1: The size (10000) given to the parameter exceeds the maximum allowed (4000).
  93. 93. Msg 131, Level 15, State 2, Line 1: The size (10000) given to the column exceeds the maximum allowed for any data type (8000).
  94. 94. = Know your data :)
  95. 95. SQL Server 2019 Big Data Clusters
  96. 96. SQL Server 2019 Big Data Clusters SQL Server, Spark, and HDFS Scalable clusters of containers Runs on Kubernetes
  97. 97. Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes Pod SQL Server Master Instance SQL Server HDFS Data Node SparkSQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark
  98. 98. Build Virtual Data Layer
  99. 99. Build Virtual Data Layer Scenarios: 1. Text Files in Azure Blob Storage 2. Tables in Oracle Database
  100. 100. Text Files in Azure Blob Storage
  101. 101. Tables in Oracle Database
  102. 102. DEMO Build Virtual Data Layer in SSMS
  103. 103. It's as easy as 1, 2, 3!
  104. 104. …4, 5, 6, 7, 8, 9, 10…
  105. 105. Is there an easier way?
  106. 106. Biml 💚 PolyBase Ben Weissman: Using Biml to automagically keep your external polybase tables in sync! https://www.solisyon.de/biml-polybase-external-tables/
  107. 107. Azure Data Studio 1. Install Azure Data Studio docs.microsoft.com/en-us/sql/azure-data-studio/download 2. Install Extension: SQL Server 2019 (Preview) docs.microsoft.com/en-us/sql/azure-data-studio/sql-server-2019-extension
  108. 108. Extension: SQL Server 2019 (Preview)
  109. 109. Extension: SQL Server 2019 (Preview) Double-clicking the .vsix file doesn't work…
  110. 110. Extension: SQL Server 2019 (Preview) …install preview extensions from Azure Data Studio
  111. 111. Azure Data Studio Wizard: CSV Files
  112. 112. Azure Data Studio Wizard: Oracle
  113. 113. DEMO Build Virtual Data Layer in ADS
  114. 114. Next Steps
  115. 115. Where can I learn more? Microsoft SQL Docs: docs.microsoft.com/sql
  116. 116. Where can I learn more? Kevin Feasel: 36chambers.wordpress.com/polybase
  117. 117. How can I try PolyBase? Microsoft Hands-on Labs: microsoft.com/handsonlabs
  118. 118. How can I try SQL Server 2019? For Windows, Linux, and containers: aka.ms/trysqlserver2019
  119. 119. How can I try Big Data Clusters? SQL Server 2019 Early Adoption Program: aka.ms/eapsignup
  120. 120. @cathrinew cathrinew.net hi@cathrinew.net Vielen Dank!
  121. 121. Thank you very much for your attention. Vielen Dank für Eure Aufmerksamkeit.
  • ochoto

    Nov. 4, 2020
  • LisLeckeySwanson

    Oct. 9, 2020
  • JunshanHe

    Sep. 20, 2020
  • luongvh

    Jun. 13, 2020
  • RavikumarVishwakarma

    Apr. 15, 2020
  • GalinaRachkova

    Feb. 6, 2020
  • AriUotinen

    Dec. 10, 2019
  • DanielBartley2

    Feb. 21, 2019

Data Integration through Data Virtualization - PolyBase and new SQL Server 2019 Features (Presented at SQL Server Konferenz 2019 on February 21st, 2019)

Views

Total views

2,973

On Slideshare

0

From embeds

0

Number of embeds

42

Actions

Downloads

128

Shares

0

Comments

0

Likes

8

×