SlideShare a Scribd company logo
Introduction to PolyBase
James Serra
Big Data Evangelist
Microsoft
JamesSerra3@gmail.com
About Me
 Microsoft, Big Data Evangelist
 In IT for 30 years, worked on many BI and DW projects
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm employee, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
 Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
Provides a scalable, T-SQL compatible query processing
framework for combining data from both universes
2012 2013 ……… 2016…2014
PolyBase in SQL Server 16
(CTP3)
PolyBase in SQL DW
PolyBase in SQL Server
2016
2015
PolyBase
Query relational and non-relational data with T-SQL
Disaster recovery: We have several customers that use a pattern of APS > Blob Storage > SQL DW (all
via PolyBase) as a pattern for DR (using the cloud service)
SELECT TOP 10 *
FROM SQLServer S
JOIN Hadoop H
S.Key = H.Key
SELECT TOP 10 *
FROM SQLServer S
JOIN Hadoop H
S.Key = H.Key
SELECT TOP 10 *
FROM SQLServer S
JOIN Hadoop H
S.Key = H.Key
SELECT TOP 10 *
FROM SQLServer S
JOIN Blob B
S.Key = B.Key
SELECT TOP 10 *
FROM SQLServer S
JOIN Blob B
S.Key = B.Key
SELECT TOP 10 *
FROM SQLServer S
JOIN Hadoop H
S.Key = H.Key
JOIN Blob B
and S.Key = B.Key
https://msdn.microsoft.com/en-us/library/mt143174.aspx
Polybase (works
with)
Azure Blob
Store Push Down HDInsight
Push
Down Cloudera
Push
Down HortonWorks
Push
Down
Azure Data Lake
Store Push Down
SQL 2016 (Now) Yes N/A Yes No Yes Yes Yes Yes No N/A
SQL 2016 (Near
future) Yes N/A Yes No Yes Yes Yes Yes No N/A
Azure SQL DW
(Now) Yes N/A Yes No No No No No Yes! N/A
Azure SQL DW (Near
future) Yes N/A Yes No No No No No Yes N/A
APS (Now) Yes N/A Yes
Yes (int).
No (ext) Yes Yes Yes Yes No N/A
APS (Near future) Yes N/A Yes Yes/No Yes Yes Yes Yes No N/A
https://msdn.microsoft.com/en-us/library/mt607030.aspx
Allows you to create a cluster of
SQL Server instances to process
large data sets from external data
sources in a scale-out fashion for
better query performance
CREATE DATABASE SCOPED CREDENTIAL HadoopCredential
WITH IDENTITY = 'hadoopUserName', Secret = 'hadoopPassword';
CREATE EXTERNAL DATA SOURCE HadoopCluster
WITH (TYPE = Hadoop, LOCATION = 'hdfs://10.193.26.177:8020',
RESOURCE_MANAGER_LOCATION = '10.193.26.178:8050',
HadoopCredential);
CREATE EXTERNAL FILE FORMAT TextFile
WITH ( FORMAT_TYPE = DELIMITEDTEXT,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec',
FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE));
CREATE EXTERNAL TABLE [dbo].[Customer] (
[SensorKey] int NOT NULL,
int NOT NULL,
[Speed] float NOT NULL
)
WITH (LOCATION='//Sensor_Data//May2014/',
DATA_SOURCE = HadoopCluster,
FILE_FORMAT = TextFile
);
Once per Hadoop User
HDFS File Path
Once per File Format
Once per Hadoop Cluster
per user
Resources
 PolyBase guide: https://msdn.microsoft.com/en-us/library/mt143171.aspx
 Azure SQL Data Warehouse loading patterns and strategies: http://bit.ly/1XskZL2
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted via the
“Presentations” link on the top menu)

More Related Content

What's hot

Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
BI Brainz
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Flink Forward
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
Sascha Dittmann
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
Sergio Zenatti Filho
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
Databricks
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1
Databricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
Ike Ellis
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
Michael Kogan
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Databricks
 

What's hot (20)

Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 

Viewers also liked

Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
Mark Tabladillo
 
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftMicrosoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Travis Wright
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
James Serra
 
SQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionSQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux Introduction
Travis Wright
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017
Travis Wright
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 
Bootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on LinuxBootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on Linux
Maximiliano Accotto
 

Viewers also liked (9)

Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftMicrosoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
SQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionSQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux Introduction
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
Bootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on LinuxBootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on Linux
 

Similar to Introduction to PolyBase

Mine craft:
Mine craft: Mine craft:
Mine craft:
Mark Tabladillo
 
Moving from SQL Server to MongoDB
Moving from SQL Server to MongoDBMoving from SQL Server to MongoDB
Moving from SQL Server to MongoDB
Nick Court
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
James Serra
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
Mark Tabladillo
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - Polybase
Łukasz Grala
 
Sql Azure Pass
Sql Azure PassSql Azure Pass
Sql Azure Pass
sqlserver.co.il
 
Sql Azure Pass
Sql Azure PassSql Azure Pass
Sql Azure Pass
sqlserver.co.il
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Lviv Startup Club
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle Cloud
Kellyn Pot'Vin-Gorman
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
Kellyn Pot'Vin-Gorman
 
Uotm workshop
Uotm workshopUotm workshop
Uotm workshop
Ravi Patel
 
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
Tobias Koprowski
 
Modern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced AnalyticsModern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced Analytics
Collective Intelligence Inc.
 
SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
Eduardo Castro
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
Microsoft SQL Server - SQL Server + Visual Studio Presentation
Microsoft SQL Server - SQL Server + Visual Studio PresentationMicrosoft SQL Server - SQL Server + Visual Studio Presentation
Microsoft SQL Server - SQL Server + Visual Studio Presentation
Microsoft Private Cloud
 

Similar to Introduction to PolyBase (20)

Mine craft:
Mine craft: Mine craft:
Mine craft:
 
Moving from SQL Server to MongoDB
Moving from SQL Server to MongoDBMoving from SQL Server to MongoDB
Moving from SQL Server to MongoDB
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - Polybase
 
Sql Azure Pass
Sql Azure PassSql Azure Pass
Sql Azure Pass
 
Sql Azure Pass
Sql Azure PassSql Azure Pass
Sql Azure Pass
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle Cloud
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Uotm workshop
Uotm workshopUotm workshop
Uotm workshop
 
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
 
Modern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced AnalyticsModern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced Analytics
 
SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
Microsoft SQL Server - SQL Server + Visual Studio Presentation
Microsoft SQL Server - SQL Server + Visual Studio PresentationMicrosoft SQL Server - SQL Server + Visual Studio Presentation
Microsoft SQL Server - SQL Server + Visual Studio Presentation
 

More from James Serra

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
James Serra
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
James Serra
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
James Serra
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
James Serra
 
How to build your career
How to build your careerHow to build your career
How to build your career
James Serra
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
James Serra
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
James Serra
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
James Serra
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
James Serra
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 

More from James Serra (20)

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
How to build your career
How to build your careerHow to build your career
How to build your career
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 

Recently uploaded

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 

Recently uploaded (20)

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 

Introduction to PolyBase

  • 1. Introduction to PolyBase James Serra Big Data Evangelist Microsoft JamesSerra3@gmail.com
  • 2. About Me  Microsoft, Big Data Evangelist  In IT for 30 years, worked on many BI and DW projects  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm employee, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference  Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform Solutions  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  • 3.
  • 4.
  • 5. Provides a scalable, T-SQL compatible query processing framework for combining data from both universes
  • 6. 2012 2013 ……… 2016…2014 PolyBase in SQL Server 16 (CTP3) PolyBase in SQL DW PolyBase in SQL Server 2016 2015
  • 7. PolyBase Query relational and non-relational data with T-SQL
  • 8. Disaster recovery: We have several customers that use a pattern of APS > Blob Storage > SQL DW (all via PolyBase) as a pattern for DR (using the cloud service)
  • 9. SELECT TOP 10 * FROM SQLServer S JOIN Hadoop H S.Key = H.Key
  • 10. SELECT TOP 10 * FROM SQLServer S JOIN Hadoop H S.Key = H.Key
  • 11. SELECT TOP 10 * FROM SQLServer S JOIN Hadoop H S.Key = H.Key
  • 12. SELECT TOP 10 * FROM SQLServer S JOIN Blob B S.Key = B.Key
  • 13. SELECT TOP 10 * FROM SQLServer S JOIN Blob B S.Key = B.Key
  • 14. SELECT TOP 10 * FROM SQLServer S JOIN Hadoop H S.Key = H.Key JOIN Blob B and S.Key = B.Key
  • 16. Polybase (works with) Azure Blob Store Push Down HDInsight Push Down Cloudera Push Down HortonWorks Push Down Azure Data Lake Store Push Down SQL 2016 (Now) Yes N/A Yes No Yes Yes Yes Yes No N/A SQL 2016 (Near future) Yes N/A Yes No Yes Yes Yes Yes No N/A Azure SQL DW (Now) Yes N/A Yes No No No No No Yes! N/A Azure SQL DW (Near future) Yes N/A Yes No No No No No Yes N/A APS (Now) Yes N/A Yes Yes (int). No (ext) Yes Yes Yes Yes No N/A APS (Near future) Yes N/A Yes Yes/No Yes Yes Yes Yes No N/A
  • 17.
  • 18. https://msdn.microsoft.com/en-us/library/mt607030.aspx Allows you to create a cluster of SQL Server instances to process large data sets from external data sources in a scale-out fashion for better query performance
  • 19.
  • 20.
  • 21. CREATE DATABASE SCOPED CREDENTIAL HadoopCredential WITH IDENTITY = 'hadoopUserName', Secret = 'hadoopPassword'; CREATE EXTERNAL DATA SOURCE HadoopCluster WITH (TYPE = Hadoop, LOCATION = 'hdfs://10.193.26.177:8020', RESOURCE_MANAGER_LOCATION = '10.193.26.178:8050', HadoopCredential); CREATE EXTERNAL FILE FORMAT TextFile WITH ( FORMAT_TYPE = DELIMITEDTEXT, DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec', FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)); CREATE EXTERNAL TABLE [dbo].[Customer] ( [SensorKey] int NOT NULL, int NOT NULL, [Speed] float NOT NULL ) WITH (LOCATION='//Sensor_Data//May2014/', DATA_SOURCE = HadoopCluster, FILE_FORMAT = TextFile ); Once per Hadoop User HDFS File Path Once per File Format Once per Hadoop Cluster per user
  • 22.
  • 23. Resources  PolyBase guide: https://msdn.microsoft.com/en-us/library/mt143171.aspx  Azure SQL Data Warehouse loading patterns and strategies: http://bit.ly/1XskZL2
  • 24. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)

Editor's Notes

  1. First introduced with the Analytics Platform System (APS), PolyBase simplifies management and querying of both relational and non-relational data using T-SQL. It is now available in both Azure SQL Data Warehouse and SQL Server 2016. The major features of PolyBase include the ability to do ad-hoc queries on Hadoop data and the ability to import data from Hadoop and Azure blob storage to SQL Server for persistent storage. A major part of the presentation will be a demo on querying and creating data on HDFS (using Azure Blobs). Come see why PolyBase is the “glue” to creating federated data warehouse solutions where you can query data as it sits instead of having to move it all to one data platform.
  2. Fluff, but point is I bring real work experience to the session
  3. We are planning to release a preview of this functionality early next year as part of SQL Server V.Next CTPs, exact release dates are still in flux. By preview early next year PolyBase will support Teradata, Oracle, SQL Server, MongoDB, Hadoop and Azure blob storage (not MySQL!). We will continue to add more sources until GA. http://demo.sqlmag.com/scaling-success-sql-server-2016/integrating-big-data-and-sql-server-2016 When it comes to key BI investments we are making it much easier to manage relational and non-relational data with Polybase technology that allows you to query Hadoop data and SQL Server relational data through single T-SQL query. One of the challenges we see with Hadoop is there are not enough people out there with Hadoop and Map Reduce skillset and this technology simplifies the skillset needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.
  4. https://msdn.microsoft.com/en-us/library/mt163689.aspx SQL Server 2016 does not support HDInsight yet (but APS does). SQL DW only supports Azure Blob Storage. At the time we executed the PoC, Polybase didn’t support ADL Storage directly so we needed to go through BLOB Storage anyway. This forced us to deal with data type conversions and file encoding again. The PG is working to ADL Store – Polybase integration indeed. what is the timeline for Polybase to work on Azure Data Lake Storage? prior to end of CY’16 I know that work is currently underway on this. I do not have an exact date for ALDS support, but the work is underway and the aim is to have it done by the end of CY16. Currently there are no plans for spark integration with polybase. Yes, we are working on a strategy for integration with metanautix to add support for additional data sources with polybase, but don’t have any details to share at this time.