SlideShare a Scribd company logo
1 of 35
Deep Dive On SQL Server and
Big Data
Travis Wright, Mihaela Blendea, Umachandar Jayachandran
Program Managers, SQL Server
Build intelligent apps and
AI with all your data
Analyzing all data
Easily and securely manage
data big and small
Managing all data
Simplified management and analysis through a unified deployment, governance, and tooling
SQL Server enables
intelligence over all your data
Unified access to all your data with
unparalleled performance
Integrating all data
Managing
all data
Easily deploy and manage a
SQL Server + Big Data cluster
Easily deploy and manage a Big Data cluster using Microsoft’s
Kubernetes-based Big Data solution built-in to SQL Server
Hadoop Distributed File System (HDFS) storage, SQL Server
relational engine, and Spark analytics are deployed as containers
on Kubernetes in one easy-to manage package
Benefits of containers and Kubernetes:
Fast to deploy
Self-contained – no installation required
Upgrades are easy because - just upload a new image
Scalable, multi-tenant, designed for elasticity
SQL Server Big Data Cluster Layout
Compute pool
SQL Compute
Node
SQL Compute
Node
SQL Compute
Node
…
Compute pool
SQL Compute
Node
IoT data
Directly
read from
HDFS
Persistent storage
…
Storage pool
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
Kubernetes pod
Analytics
Custom
apps BI
SQL Server
master instance
Node Node Node Node Node Node Node
SQL
Data pool
SQL Data
Node
SQL Data
Node
Compute pool
SQL Compute
Node
Storage Storage
Controller
Base node configuration
Applies to nodes across all planes
Services
kubelet – K8s local agent
kube-proxy – network config and forwarding
supervisord – process monitor and control
fluentd – node logging
flanneld – Software defined network
collectd – OS and application data collection
SQL Big Data watchdog– config sync, watchdog, data
collector (DMV, etc) Kubernetes node
watchdog
kubelet
supervisord
fluentd
kube-proxy
flanneld
collectd
Control plane
External Endpoints
Kubernetes (REST)
Aris Control Service (REST)
Knox Gateway (REST gateway for Hadoop APIs)
SQL Server Master (TDS gateway for data marts and SQL
Master Service)
Services
etcd
Kubernetes Master Services
Controller
SQL Master instance
SQL Big Data Admin Portal
Knox Gateway
HDFS Name Service
YARN Master
Hive Metastore
InfluxDB (metrics store)
Livy (REST interface for Spark)
Spark Driver
Kubernetes node
Base node services + etcd
Controller
SQL Master
Proxy
HDFS Name Node
Kubernetes node
Base node services + etcd
Kubernetes Master Services
SQL Big Data Admin Portal
Spark Driver
InfluxDB
Kubernetes node
Base node services + etcd
Livy
Elastic Search
Knox
Hive Metastore
Grafana
Kibana
Yarn Master
Controller
External REST/HTTPS Endpoint
Bootstrap and Build out
Manage Capacity
Configure High Availability and recover from failure (AGs)
Security (authN, authZ, certificate rotation)
Lifecycle (upgrade/downgrade/rollback)
Configuration management
Monitoring - capacity, health, metrics, logs
Troubleshooting – performance, failures
Cluster Admin Portal
Controller Service
Buildout
Upgrade/Rollback
HADR
Add/Remove Capacity
Central AuthZ/AuthN
Cluster Admin Portal
Troubleshooting
SQL Master instance
TDS endpoint into the cluster
High value data
OLTP server
Data connectors
Machine learning & extensibility
Scalable query engine with readable secondary replicas
Built-in high availability with Always On Availability Groups
(coming soon)
Master Instance Availability Group
Compute plane
Hosts one or more SQL Compute Pools
Compute pool is a group of instances that forms a data,
security, and resource boundary.
Compute pool processes complex distributed queries against
the data plane.
Local storage is used for shuffling data if necessary.
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Data plane
Storage pool
Data ingestion through Spark (batch and streaming)
Data storage in HDFS
Data access through HDFS and SQL endpoints
SQL engine reads files in HDFS directly
Data pool
Partitioned, in-memory cache for external data or HDFS
Scale-out data storage for append only data sets
Data ingestion through Spark
Storage pool node
Base node services
SQL Engine
Data pool node
Base node services
SQL Engine
HDFS
Spark
Storage pool node
Base node services
SQL Engine
HDFS
Spark
Integrating
all data










SQL Server
T-SQLAnalytics Apps
ODBC NoSQL Relational databases Big Data
PolyBase external tables
SQL Server is the hub for integrating data
Easily combine across relational and non-relational data stores
Scale-out data pools combine and cache data from many
sources for fast querying
Scenario
 A global car manufacturing company wants to join data
from across multiple sources including HDFS, SQL Server,
and Cosmos DB
Solution
• Query data in relational and non-relational data stores with
new PolyBase connectors
• Create a scale-out data pool cache of combined data
• Expose the datasets as a shared data source, without
writing code to move and integrate data
SQL Server
Scale-out data pool
HDFS Cosmos DB SQL Server
Polybase
connectors
Shard 1 Shard nShard 2IoT data
SQL Server can now read directly from HDFS files
Elastically scale compute and storage using HDFS-based
storage pools with SQL Server and Spark built in
Mount and manage remote stores through HDFS
Mount various on-prem and cloud data stores
Accelerate computation by caching data locally
Disaster recovery/Data backup
Storage pool
SQL Server Master instance/Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
Other HDFS store Remote cloud
store








Analyzing
all data
Data scientists can use familiar tools to analyze
structured and unstructured data
1. Use SQL Ops Studio notebooks run a Spark job
over structured and unstructured data
2. Spark SQL jobs can access data in SQL Server
3. Queries can be pushed down to other data
sources like Oracle Database and Mongo DB
4. The Spark job returns the data to the notebook
SQL Server master instance
External data
sources
Storage pool
Spark Spark Spark
Azure Data
Studio
Model & serve
Business/custom apps
(Structured)
Logs, files and media
(unstructured)
Sensors and IoT
(unstructured)
Predictive
apps
BI tools
Store
HDFS
SQL Server data
pools
Ingest
Spark streaming
Prep & train
Spark
Spark ML
SQL Server
ML Services
SQL Server
master instance
Simplified management and analysis through a unified deployment, governance, and tooling
Integrate structured and unstructured data
SQL Server
master instance
REST API containers
for models
SQL
Server
SQL Server master instance
Storage pool
Spark
MLeap Runtime
Spark Spark
Model
Scoring
Training Training Training
Managed SQL Server, Spark
and data lake
Store high volume data in a data lake and access
it easily using either SQL or Spark
Management services, admin portal, and
integrated security make it all easy to manage
SQL Server
Data virtualization
Combine data from many sources without
moving or replicating it
Scale out compute and caching to boost
performance
T-SQL
Analytics Apps
Open
database
connectivity
NoSQL Relational
databases
HDFS
Complete AI platform
Easily feed integrated data from many sources
to your model training
Ingest and prep data and then train, store, and
operationalize your models all in one system
SQL Server
External Tables
Compute pools and data pools
Spark
Scalable, shared storage (HDFS)
External
data sources
Admin portal and management services
Integrated AD-based security
SQL Server
ML Services
Spark & Spark
ML
HDFS
REST API
containers
for models
Apply to join the SQL Server
Early Adoption Program
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session

More Related Content

What's hot

Red Hat Container Strategy
Red Hat Container StrategyRed Hat Container Strategy
Red Hat Container StrategyRed Hat Events
 
Automating the Enterprise with CloudForms & Ansible
Automating the Enterprise with CloudForms & AnsibleAutomating the Enterprise with CloudForms & Ansible
Automating the Enterprise with CloudForms & AnsibleJerome Marc
 
MOUG17 Keynote: What's New from Oracle Database Development
MOUG17 Keynote: What's New from Oracle Database DevelopmentMOUG17 Keynote: What's New from Oracle Database Development
MOUG17 Keynote: What's New from Oracle Database DevelopmentMonica Li
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
 
Cisco's MultiCloud Strategy
Cisco's MultiCloud StrategyCisco's MultiCloud Strategy
Cisco's MultiCloud StrategyMaulik Shyani
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayDataWorks Summit
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Create B2B Exchanges with Cisco Connected Processes: an overview
Create B2B Exchanges with Cisco Connected Processes: an overviewCreate B2B Exchanges with Cisco Connected Processes: an overview
Create B2B Exchanges with Cisco Connected Processes: an overviewCisco DevNet
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youModusOptimum
 
Oracle database in cloud, dr in cloud and overview of oracle database 18c
Oracle database in cloud, dr in cloud and overview of oracle database 18cOracle database in cloud, dr in cloud and overview of oracle database 18c
Oracle database in cloud, dr in cloud and overview of oracle database 18cAiougVizagChapter
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion
 
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."Gustavo Cuervo
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
 
Public Sector Virtual Town Hall
Public Sector Virtual Town HallPublic Sector Virtual Town Hall
Public Sector Virtual Town HallEDB
 

What's hot (20)

Red Hat Container Strategy
Red Hat Container StrategyRed Hat Container Strategy
Red Hat Container Strategy
 
Automating the Enterprise with CloudForms & Ansible
Automating the Enterprise with CloudForms & AnsibleAutomating the Enterprise with CloudForms & Ansible
Automating the Enterprise with CloudForms & Ansible
 
MOUG17 Keynote: What's New from Oracle Database Development
MOUG17 Keynote: What's New from Oracle Database DevelopmentMOUG17 Keynote: What's New from Oracle Database Development
MOUG17 Keynote: What's New from Oracle Database Development
 
IBM Cloud pak for data brochure
IBM Cloud pak for data   brochureIBM Cloud pak for data   brochure
IBM Cloud pak for data brochure
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
Cisco's MultiCloud Strategy
Cisco's MultiCloud StrategyCisco's MultiCloud Strategy
Cisco's MultiCloud Strategy
 
GIS Into to Cloud Microsoft Azure
GIS  Into  to Cloud Microsoft Azure GIS  Into  to Cloud Microsoft Azure
GIS Into to Cloud Microsoft Azure
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right Way
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Oracle OpenWorld 2017 Review (31st October 2017 - 250 slides)
Oracle OpenWorld 2017 Review (31st October 2017 - 250 slides)Oracle OpenWorld 2017 Review (31st October 2017 - 250 slides)
Oracle OpenWorld 2017 Review (31st October 2017 - 250 slides)
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Create B2B Exchanges with Cisco Connected Processes: an overview
Create B2B Exchanges with Cisco Connected Processes: an overviewCreate B2B Exchanges with Cisco Connected Processes: an overview
Create B2B Exchanges with Cisco Connected Processes: an overview
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for you
 
Db2 event store
Db2 event storeDb2 event store
Db2 event store
 
Oracle database in cloud, dr in cloud and overview of oracle database 18c
Oracle database in cloud, dr in cloud and overview of oracle database 18cOracle database in cloud, dr in cloud and overview of oracle database 18c
Oracle database in cloud, dr in cloud and overview of oracle database 18c
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
 
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 
Public Sector Virtual Town Hall
Public Sector Virtual Town HallPublic Sector Virtual Town Hall
Public Sector Virtual Town Hall
 

Similar to Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro sessionTravis Wright
 
SQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptxSQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptxQuyVo27
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Lviv Startup Club
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
SQL Server Ground to Cloud.pptx
SQL Server Ground to          Cloud.pptxSQL Server Ground to          Cloud.pptx
SQL Server Ground to Cloud.pptxsaidbilgen
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudMark Kromer
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019Juan Fabian
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudMark Kromer
 
SQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsSQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsMichaela Murray
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekMark Kromer
 
Modern dataintegration azuredatafactory_ssis
Modern dataintegration azuredatafactory_ssisModern dataintegration azuredatafactory_ssis
Modern dataintegration azuredatafactory_ssisGaurav Malhotra
 
SQL Server Versions & Migration Paths
SQL Server Versions & Migration PathsSQL Server Versions & Migration Paths
SQL Server Versions & Migration PathsJeannette Browning
 

Similar to Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session (20)

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
 
SQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptxSQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptx
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
SQL Server Ground to Cloud.pptx
SQL Server Ground to          Cloud.pptxSQL Server Ground to          Cloud.pptx
SQL Server Ground to Cloud.pptx
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
SQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsSQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT Solutions
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
Modern dataintegration azuredatafactory_ssis
Modern dataintegration azuredatafactory_ssisModern dataintegration azuredatafactory_ssis
Modern dataintegration azuredatafactory_ssis
 
SQL Server Versions & Migration Paths
SQL Server Versions & Migration PathsSQL Server Versions & Migration Paths
SQL Server Versions & Migration Paths
 

More from Travis Wright

PASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DivePASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DiveTravis Wright
 
SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017Travis Wright
 
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftMicrosoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftTravis Wright
 
SQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionSQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionTravis Wright
 
SQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionSQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionTravis Wright
 
SQL Server 2017 Overview and Partner Opportunities
SQL Server 2017 Overview and Partner OpportunitiesSQL Server 2017 Overview and Partner Opportunities
SQL Server 2017 Overview and Partner OpportunitiesTravis Wright
 
Data Amp South Africa - Keynote
Data Amp South Africa - KeynoteData Amp South Africa - Keynote
Data Amp South Africa - KeynoteTravis Wright
 
Data Amp South Africa - SQL Server 2017
Data Amp South Africa - SQL Server 2017Data Amp South Africa - SQL Server 2017
Data Amp South Africa - SQL Server 2017Travis Wright
 
NYC Data Amp - SQL Server 2017
NYC Data Amp - SQL Server 2017NYC Data Amp - SQL Server 2017
NYC Data Amp - SQL Server 2017Travis Wright
 
NYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services OverviewNYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services OverviewTravis Wright
 
Build 2017 SQL Server in Dev Ops
Build 2017 SQL Server in Dev OpsBuild 2017 SQL Server in Dev Ops
Build 2017 SQL Server in Dev OpsTravis Wright
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftTravis Wright
 
SQL Server in DevOps Town Hall Webinar
SQL Server in DevOps Town Hall WebinarSQL Server in DevOps Town Hall Webinar
SQL Server in DevOps Town Hall WebinarTravis Wright
 
SQL Server vNext on Linux
SQL Server vNext on LinuxSQL Server vNext on Linux
SQL Server vNext on LinuxTravis Wright
 
SUSE Webinar - Introduction to SQL Server on Linux
SUSE Webinar - Introduction to SQL Server on LinuxSUSE Webinar - Introduction to SQL Server on Linux
SUSE Webinar - Introduction to SQL Server on LinuxTravis Wright
 
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux OverviewNordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux OverviewTravis Wright
 
Nordic infrastructure Conference 2017 - SQL Server in DevOps
Nordic infrastructure Conference 2017 - SQL Server in DevOpsNordic infrastructure Conference 2017 - SQL Server in DevOps
Nordic infrastructure Conference 2017 - SQL Server in DevOpsTravis Wright
 

More from Travis Wright (17)

PASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DivePASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep Dive
 
SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017
 
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftMicrosoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
 
SQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionSQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux Introduction
 
SQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionSQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux Introduction
 
SQL Server 2017 Overview and Partner Opportunities
SQL Server 2017 Overview and Partner OpportunitiesSQL Server 2017 Overview and Partner Opportunities
SQL Server 2017 Overview and Partner Opportunities
 
Data Amp South Africa - Keynote
Data Amp South Africa - KeynoteData Amp South Africa - Keynote
Data Amp South Africa - Keynote
 
Data Amp South Africa - SQL Server 2017
Data Amp South Africa - SQL Server 2017Data Amp South Africa - SQL Server 2017
Data Amp South Africa - SQL Server 2017
 
NYC Data Amp - SQL Server 2017
NYC Data Amp - SQL Server 2017NYC Data Amp - SQL Server 2017
NYC Data Amp - SQL Server 2017
 
NYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services OverviewNYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services Overview
 
Build 2017 SQL Server in Dev Ops
Build 2017 SQL Server in Dev OpsBuild 2017 SQL Server in Dev Ops
Build 2017 SQL Server in Dev Ops
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
 
SQL Server in DevOps Town Hall Webinar
SQL Server in DevOps Town Hall WebinarSQL Server in DevOps Town Hall Webinar
SQL Server in DevOps Town Hall Webinar
 
SQL Server vNext on Linux
SQL Server vNext on LinuxSQL Server vNext on Linux
SQL Server vNext on Linux
 
SUSE Webinar - Introduction to SQL Server on Linux
SUSE Webinar - Introduction to SQL Server on LinuxSUSE Webinar - Introduction to SQL Server on Linux
SUSE Webinar - Introduction to SQL Server on Linux
 
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux OverviewNordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
 
Nordic infrastructure Conference 2017 - SQL Server in DevOps
Nordic infrastructure Conference 2017 - SQL Server in DevOpsNordic infrastructure Conference 2017 - SQL Server in DevOps
Nordic infrastructure Conference 2017 - SQL Server in DevOps
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session

  • 1.
  • 2. Deep Dive On SQL Server and Big Data Travis Wright, Mihaela Blendea, Umachandar Jayachandran Program Managers, SQL Server
  • 3. Build intelligent apps and AI with all your data Analyzing all data Easily and securely manage data big and small Managing all data Simplified management and analysis through a unified deployment, governance, and tooling SQL Server enables intelligence over all your data Unified access to all your data with unparalleled performance Integrating all data
  • 5. Easily deploy and manage a SQL Server + Big Data cluster Easily deploy and manage a Big Data cluster using Microsoft’s Kubernetes-based Big Data solution built-in to SQL Server Hadoop Distributed File System (HDFS) storage, SQL Server relational engine, and Spark analytics are deployed as containers on Kubernetes in one easy-to manage package Benefits of containers and Kubernetes: Fast to deploy Self-contained – no installation required Upgrades are easy because - just upload a new image Scalable, multi-tenant, designed for elasticity
  • 6. SQL Server Big Data Cluster Layout Compute pool SQL Compute Node SQL Compute Node SQL Compute Node … Compute pool SQL Compute Node IoT data Directly read from HDFS Persistent storage … Storage pool SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node Kubernetes pod Analytics Custom apps BI SQL Server master instance Node Node Node Node Node Node Node SQL Data pool SQL Data Node SQL Data Node Compute pool SQL Compute Node Storage Storage Controller
  • 7.
  • 8. Base node configuration Applies to nodes across all planes Services kubelet – K8s local agent kube-proxy – network config and forwarding supervisord – process monitor and control fluentd – node logging flanneld – Software defined network collectd – OS and application data collection SQL Big Data watchdog– config sync, watchdog, data collector (DMV, etc) Kubernetes node watchdog kubelet supervisord fluentd kube-proxy flanneld collectd
  • 9. Control plane External Endpoints Kubernetes (REST) Aris Control Service (REST) Knox Gateway (REST gateway for Hadoop APIs) SQL Server Master (TDS gateway for data marts and SQL Master Service) Services etcd Kubernetes Master Services Controller SQL Master instance SQL Big Data Admin Portal Knox Gateway HDFS Name Service YARN Master Hive Metastore InfluxDB (metrics store) Livy (REST interface for Spark) Spark Driver Kubernetes node Base node services + etcd Controller SQL Master Proxy HDFS Name Node Kubernetes node Base node services + etcd Kubernetes Master Services SQL Big Data Admin Portal Spark Driver InfluxDB Kubernetes node Base node services + etcd Livy Elastic Search Knox Hive Metastore Grafana Kibana Yarn Master
  • 10. Controller External REST/HTTPS Endpoint Bootstrap and Build out Manage Capacity Configure High Availability and recover from failure (AGs) Security (authN, authZ, certificate rotation) Lifecycle (upgrade/downgrade/rollback) Configuration management Monitoring - capacity, health, metrics, logs Troubleshooting – performance, failures Cluster Admin Portal Controller Service Buildout Upgrade/Rollback HADR Add/Remove Capacity Central AuthZ/AuthN Cluster Admin Portal Troubleshooting
  • 11. SQL Master instance TDS endpoint into the cluster High value data OLTP server Data connectors Machine learning & extensibility Scalable query engine with readable secondary replicas Built-in high availability with Always On Availability Groups (coming soon) Master Instance Availability Group
  • 12. Compute plane Hosts one or more SQL Compute Pools Compute pool is a group of instances that forms a data, security, and resource boundary. Compute pool processes complex distributed queries against the data plane. Local storage is used for shuffling data if necessary. Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine
  • 13. Data plane Storage pool Data ingestion through Spark (batch and streaming) Data storage in HDFS Data access through HDFS and SQL endpoints SQL engine reads files in HDFS directly Data pool Partitioned, in-memory cache for external data or HDFS Scale-out data storage for append only data sets Data ingestion through Spark Storage pool node Base node services SQL Engine Data pool node Base node services SQL Engine HDFS Spark Storage pool node Base node services SQL Engine HDFS Spark
  • 14.
  • 15.
  • 18. SQL Server T-SQLAnalytics Apps ODBC NoSQL Relational databases Big Data PolyBase external tables SQL Server is the hub for integrating data Easily combine across relational and non-relational data stores
  • 19.
  • 20. Scale-out data pools combine and cache data from many sources for fast querying Scenario  A global car manufacturing company wants to join data from across multiple sources including HDFS, SQL Server, and Cosmos DB Solution • Query data in relational and non-relational data stores with new PolyBase connectors • Create a scale-out data pool cache of combined data • Expose the datasets as a shared data source, without writing code to move and integrate data SQL Server Scale-out data pool HDFS Cosmos DB SQL Server Polybase connectors Shard 1 Shard nShard 2IoT data
  • 21.
  • 22. SQL Server can now read directly from HDFS files Elastically scale compute and storage using HDFS-based storage pools with SQL Server and Spark built in Mount and manage remote stores through HDFS Mount various on-prem and cloud data stores Accelerate computation by caching data locally Disaster recovery/Data backup Storage pool SQL Server Master instance/Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark Other HDFS store Remote cloud store
  • 24.
  • 25.
  • 27. Data scientists can use familiar tools to analyze structured and unstructured data 1. Use SQL Ops Studio notebooks run a Spark job over structured and unstructured data 2. Spark SQL jobs can access data in SQL Server 3. Queries can be pushed down to other data sources like Oracle Database and Mongo DB 4. The Spark job returns the data to the notebook SQL Server master instance External data sources Storage pool Spark Spark Spark Azure Data Studio
  • 28. Model & serve Business/custom apps (Structured) Logs, files and media (unstructured) Sensors and IoT (unstructured) Predictive apps BI tools Store HDFS SQL Server data pools Ingest Spark streaming Prep & train Spark Spark ML SQL Server ML Services SQL Server master instance Simplified management and analysis through a unified deployment, governance, and tooling Integrate structured and unstructured data SQL Server master instance REST API containers for models
  • 30. SQL Server master instance Storage pool Spark MLeap Runtime Spark Spark Model Scoring Training Training Training
  • 31.
  • 32.
  • 33. Managed SQL Server, Spark and data lake Store high volume data in a data lake and access it easily using either SQL or Spark Management services, admin portal, and integrated security make it all easy to manage SQL Server Data virtualization Combine data from many sources without moving or replicating it Scale out compute and caching to boost performance T-SQL Analytics Apps Open database connectivity NoSQL Relational databases HDFS Complete AI platform Easily feed integrated data from many sources to your model training Ingest and prep data and then train, store, and operationalize your models all in one system SQL Server External Tables Compute pools and data pools Spark Scalable, shared storage (HDFS) External data sources Admin portal and management services Integrated AD-based security SQL Server ML Services Spark & Spark ML HDFS REST API containers for models
  • 34. Apply to join the SQL Server Early Adoption Program

Editor's Notes

  1. Challenge: Managing relational and Big Data is complicated Pillar: Simplify big data management with improved performance and security
  2. Kubernetes-based Big Data distribution is easy to deploy, upgrade and patch, and scales in seconds using compute pools visually tie this container to the containers in the previous slide zoom in to show you what's inside one of the containers
  3. Challenge: Data movement can be problematic Pillar: Gain unified access to all your data using data virtualization
  4. Query across relational and non-relational data stores including Oracle, Teradata, Mongo etc Besides supporting Hadoop, now PolyBase will let you query over RDBMS, NoSQL and generic ODBC sources.
  5. Increase performance for data virtualization using data pools in scale-out data pools SCENARIO: Scale out SQL Server with Hadoop and Spark to unlock Big Data Scenario: A car manufacturing company wants to join data from across multiple sources including Cloudera for sensor data, SQL Server that has customer data with PII (that they want to keep in SQL and mask), and Cosmos DB that has connected car data in Azure. Today the car manufacturer has a Cloudera cluster with 100 nodes on-prem and 1.5 Petabytes of data. They are running into issues with Hive performance for interactive queries. Problem: To join this data now, they would have to move it into a single system, a huge undertaking. Solution: Create a shared, scalable data lake based on HDFS. Expose these datasets as a shared semantic layer, so that it can be used to business analysts without moving the data. In this scenario, you still have to refresh the data periodically (still lives in its home database – not in the data lake)
  6. Application developers can access relational and non-relational data using familiar T-SQL commands
  7. Challenge: Managing relational and Big Data is complicated Pillar: Simplify big data management with improved performance and security
  8. Data scientists can easily analyze data through Spark jobs
  9. Easy tooling to ingest, store, prep & train, model and serve high velocity using unified data management Spark streaming in the box Data preparation tools Integrated Jupyter notebooks
  10. Open source effort A common serialization format and execution engine Train your pipeline and score it on a JVM Supports Spark ML, scikit-learn and Tensorflow Core execution engine implemented in Scala Choose from two portable serialization formats (JSON, Protobuf) MLeap DataFrame JSON/Avro/Binary
  11. Bring your own Java : - Windows: 1.10 (1.8 support is coming) - Linux 1.8 support
  12. Get started today with the SQL Server v.Next preview by going to https://aka.ms/eapsignup