The Future of SQL Server 2019
and Big Data
*IDC White Paper, Data Age 2025: The Evolution of Data to Life-Critical
163 ZBs
of data will be generated
In 2025In 2016
16.1 ZBs
of data was generated
Barriers to insights are
barriers to success
The task of generating insights from ever-increasing data is tough
Organizations that transform data into insights
outperform the competition
Source: Keystone Strategy interviews Oct 2015 - Mar 2016
74% of leaders use predictive models37% of leaders dynamically update
data models
Leaders combine structured and
unstructured data in a data lake 8X
as often
Integrate data
without ETL
Combine data in a
central data store
Perform
predictive analytics
What do these organizations do differently?
Build intelligent apps and
AI with all your data
Analyzing all data
Easily and securely manage
data big and small
Managing all data
Simplified management and analysis through a unified deployment, governance, and tooling
SQL Server enables
intelligence over all your data
Unified access to all your data with
unparalleled performance
Integrating all data
Integrating
all data
Data movement is a barrier to
faster insights
Costs
Duplicated storage costs
Engineering effort to build and
maintain data pipelines
Delays in integrate data before it
can be used
Increased data latency
Increased attack surface area
Inconsistent security models
Data quality issues can be created
by ETL pipelines
Increased governance
issues
No, 19%
Don't
Know, 5%
Yes, 76%
3/4 of respondents say that
untimely data has inhibited business opportunities
Speed
Security
Quality
Compliance
*IDC 3rd Platform Information Management Requirements Survey, Oct 2016
Data virtualization
creates solutions
Costs
Lower storage costs
Less dev time spent on integration
Rapid iterations and prototypes
Timely data
Smaller attach surface area
Consistent security model
Fresh and accurate data
Easier data governance
Speed
Security
Quality
Compliance
Data virtualization integrates data from disparate
sources, locations and formats, without replicating or
moving the data, to create a single "virtual" data fabric
SQL Server
T-SQLAnalytics Apps
ODBC NoSQL Relational databases Big Data
PolyBase external tables
SQL Server is the hub for integrating data
Easily combine across relational and non-relational data stores
Managing
all data
Complex scale-out deployment
Time-consuming patching and upgrades
Cumbersome security management
Easily deploy and manage a
SQL Server + Big Data cluster
Easily deploy and manage a Big Data cluster using Microsoft’s
Kubernetes-based Big Data solution built-in to SQL Server
Hadoop Distributed File System (HDFS) storage, SQL Server
relational engine, and Spark analytics are deployed as containers
on Kubernetes in one easy-to manage package
Simplified deployment with
containers & Kubernetes
A container is a standardized unit of software that includes
everything needed to run it
Kubernetes is a container hosting platform
Benefits of containers and Kubernetes:
1. Fast to deploy
2. Self-contained – no installation required
3. Upgrades are easy because - just upload a new image
4. Scalable, multi-tenant, designed for elasticity
Kubernetes pod
SQL Server
HDFS Data Node
Spark
SQL Server can now read directly from HDFS files
Elastically scale compute and storage using HDFS-based
storage pools with SQL Server and Spark built in
Apps, BI, and analytics access Big Data through the
SQL Server master instance
Scale Big Data on demand
SQL Server
master instance
Persistent storage
Custom apps AnalyticsBI
SQL
Server
HDFS Data Node
Spark
Kubernetes pod
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
Node Node Node
SQL
Scale-out data pools combine and cache data from many
sources for fast querying
Scenario
 A global car manufacturing company wants to join data
from across multiple sources including HDFS, SQL Server,
and Cosmos DB
Solution
• Query data in relational and non-relational data stores with
new PolyBase connectors
• Create a scale-out data pool cache of combined data
• Expose the datasets as a shared data source, without
writing code to move and integrate data
SQL Server
Scale-out data pool
HDFS Cosmos DB SQL Server
Polybase
connectors
Shard 1 Shard nShard 2
Persistent storage
SQL Server
Scale-out data pool
IoT data
Extend SQL Server with a scale-out storage tier by
partitioning the data across multiple instances
Speed up query performance by scaling out the filtering
and local aggregation across multiple instances
Shard 1 Shard nShard 2
Increase analytics and apps performance
Compute pool
SQL Compute
Node
SQL Compute
Node
SQL Compute
Node
…
Compute pool
SQL Compute
Node
IoT data
Directly
read from
HDFS
Persistent storage
…
Storage pool
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
Kubernetes pod
Analytics
Custom
apps BI
SQL Server
master instance
Node Node Node Node Node Node Node
SQL
Data pool
SQL Data
Node
SQL Data
Node
Compute pool
SQL Compute
Node
Storage Storage
Azure Data Studio provides a unified tool for
querying data using a notebook experience for
both T-SQL and Spark
Easily access all your data across SQL Server and
HDFS
The cluster administration portal provides easy to
use cloud-style managed services for HA,
monitoring, backup/recovery, security, and
provisioning.
The REST API and command line tools simplify
automation
The development and management experience is
consistent regardless of where you run – on prem
or any of the major cloud providers
Integrated Big Data and SQL Server security model
Simple, single sign-on with Active Directory authentication
Manage data access with SQL Server security roles
Access reporting for audit and compliance
Central security
and governance
External data sources
Active Directory
App and AI Developer
Impersonation
Active Directory
Analyzing
all data
Developers struggle to access
insights from Big Data
Data science is siloed from
operational data
Lengthy time to train and
operationalize models
Storage pool
Access relational and non-relational data using familiar T-
SQL commands and development frameworks
Enrich apps with data from other sources like Oracle
database, Mongo DB
Build intelligent applications with access to unstructured,
high volume, and high velocity data
Train R and Python models against Big Data stored in
Hadoop and score your application data without ever
leaving SQL Server
Apply easy to use tools like Azure Data Studio and Visual
Studio Code
SQL Server master instance
Django framework
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
Data scientists can use familiar tools to analyze
structured and unstructured data
1. Use Azure Data Studio notebooks run a Spark
job over structured and unstructured data
2. Spark jobs can access data in SQL Server
through JDBC, Tedious, etc.
3. Queries can be access data from other sources
like Oracle Database and Mongo DB via
external tables
4. The Spark job returns the data to the notebook
SQL Server master instance
External data
sources
Storage pool
Spark Spark Spark
SQL Ops
Studio
Model & serve
Business/custom apps
(Structured)
Logs, files and media
(unstructured)
Sensors and IoT
(unstructured)
Predictive
apps
BI tools
Store
HDFS
SQL Server data
pools
Ingest
Spark streaming
Prep & train
Spark
Spark ML
SQL Server
ML Services
SQL Server
master instance
Simplified management and analysis through a unified deployment, governance, and tooling
Integrate structured and unstructured data
SQL Server
master instance
REST API containers
for models
SQL Server
Integration Services
VolumeVarietyVelocity Veracity
Mount and manage remote stores through HDFS
Mount various on-prem and cloud data stores
Accelerate computation by caching data locally
Disaster recovery/Data backup
Storage pool
SQL Server Master instance/Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
Other HDFS store Remote cloud
store
SQL Server 2019 big data & analytics
Managed SQL Server, Spark,
and data lake
Store high volume data in a data lake and access
it easily using either SQL or Spark
Management services, admin portal, and
integrated security make it all easy to manage
SQL
Server
Data virtualization
Combine data from many sources without
moving or replicating it
Scale out compute and caching to boost
performance
T-SQL
Analytics Apps
Open
database
connectivity
NoSQL Relational
databases
HDFS
Complete AI platform
Easily feed integrated data from many sources to
your model training
Ingest and prep data and then train, store, and
operationalize your models all in one system
SQL Server External Tables
Compute pools and data pools
Spark
Scalable, shared storage (HDFS)
External
data sources
Admin portal and management services
Integrated AD-based security
SQL Server
ML Services
Spark &
Spark ML
HDFS
REST API containers
for models
Intelligence
over all data
drives innovation
Simplified management and analysis through a unified deployment, governance, and tooling model
Analyzing all dataManaging all dataIntegrating all data
Apply to join the SQL Server 2019
Early Adoption Program
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session

  • 2.
    The Future ofSQL Server 2019 and Big Data
  • 3.
    *IDC White Paper,Data Age 2025: The Evolution of Data to Life-Critical 163 ZBs of data will be generated In 2025In 2016 16.1 ZBs of data was generated
  • 4.
    Barriers to insightsare barriers to success The task of generating insights from ever-increasing data is tough
  • 5.
    Organizations that transformdata into insights outperform the competition Source: Keystone Strategy interviews Oct 2015 - Mar 2016 74% of leaders use predictive models37% of leaders dynamically update data models Leaders combine structured and unstructured data in a data lake 8X as often Integrate data without ETL Combine data in a central data store Perform predictive analytics What do these organizations do differently?
  • 6.
    Build intelligent appsand AI with all your data Analyzing all data Easily and securely manage data big and small Managing all data Simplified management and analysis through a unified deployment, governance, and tooling SQL Server enables intelligence over all your data Unified access to all your data with unparalleled performance Integrating all data
  • 7.
  • 8.
    Data movement isa barrier to faster insights Costs Duplicated storage costs Engineering effort to build and maintain data pipelines Delays in integrate data before it can be used Increased data latency Increased attack surface area Inconsistent security models Data quality issues can be created by ETL pipelines Increased governance issues No, 19% Don't Know, 5% Yes, 76% 3/4 of respondents say that untimely data has inhibited business opportunities Speed Security Quality Compliance *IDC 3rd Platform Information Management Requirements Survey, Oct 2016
  • 9.
    Data virtualization creates solutions Costs Lowerstorage costs Less dev time spent on integration Rapid iterations and prototypes Timely data Smaller attach surface area Consistent security model Fresh and accurate data Easier data governance Speed Security Quality Compliance Data virtualization integrates data from disparate sources, locations and formats, without replicating or moving the data, to create a single "virtual" data fabric
  • 10.
    SQL Server T-SQLAnalytics Apps ODBCNoSQL Relational databases Big Data PolyBase external tables SQL Server is the hub for integrating data Easily combine across relational and non-relational data stores
  • 13.
  • 14.
    Complex scale-out deployment Time-consumingpatching and upgrades Cumbersome security management
  • 15.
    Easily deploy andmanage a SQL Server + Big Data cluster Easily deploy and manage a Big Data cluster using Microsoft’s Kubernetes-based Big Data solution built-in to SQL Server Hadoop Distributed File System (HDFS) storage, SQL Server relational engine, and Spark analytics are deployed as containers on Kubernetes in one easy-to manage package
  • 17.
    Simplified deployment with containers& Kubernetes A container is a standardized unit of software that includes everything needed to run it Kubernetes is a container hosting platform Benefits of containers and Kubernetes: 1. Fast to deploy 2. Self-contained – no installation required 3. Upgrades are easy because - just upload a new image 4. Scalable, multi-tenant, designed for elasticity Kubernetes pod SQL Server HDFS Data Node Spark
  • 19.
    SQL Server cannow read directly from HDFS files Elastically scale compute and storage using HDFS-based storage pools with SQL Server and Spark built in Apps, BI, and analytics access Big Data through the SQL Server master instance Scale Big Data on demand SQL Server master instance Persistent storage Custom apps AnalyticsBI SQL Server HDFS Data Node Spark Kubernetes pod SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark Node Node Node SQL
  • 21.
    Scale-out data poolscombine and cache data from many sources for fast querying Scenario  A global car manufacturing company wants to join data from across multiple sources including HDFS, SQL Server, and Cosmos DB Solution • Query data in relational and non-relational data stores with new PolyBase connectors • Create a scale-out data pool cache of combined data • Expose the datasets as a shared data source, without writing code to move and integrate data SQL Server Scale-out data pool HDFS Cosmos DB SQL Server Polybase connectors Shard 1 Shard nShard 2
  • 22.
    Persistent storage SQL Server Scale-outdata pool IoT data Extend SQL Server with a scale-out storage tier by partitioning the data across multiple instances Speed up query performance by scaling out the filtering and local aggregation across multiple instances Shard 1 Shard nShard 2
  • 23.
    Increase analytics andapps performance Compute pool SQL Compute Node SQL Compute Node SQL Compute Node … Compute pool SQL Compute Node IoT data Directly read from HDFS Persistent storage … Storage pool SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node Kubernetes pod Analytics Custom apps BI SQL Server master instance Node Node Node Node Node Node Node SQL Data pool SQL Data Node SQL Data Node Compute pool SQL Compute Node Storage Storage
  • 24.
    Azure Data Studioprovides a unified tool for querying data using a notebook experience for both T-SQL and Spark Easily access all your data across SQL Server and HDFS The cluster administration portal provides easy to use cloud-style managed services for HA, monitoring, backup/recovery, security, and provisioning. The REST API and command line tools simplify automation The development and management experience is consistent regardless of where you run – on prem or any of the major cloud providers
  • 25.
    Integrated Big Dataand SQL Server security model Simple, single sign-on with Active Directory authentication Manage data access with SQL Server security roles Access reporting for audit and compliance Central security and governance External data sources Active Directory App and AI Developer Impersonation Active Directory
  • 27.
  • 28.
    Developers struggle toaccess insights from Big Data Data science is siloed from operational data Lengthy time to train and operationalize models
  • 29.
    Storage pool Access relationaland non-relational data using familiar T- SQL commands and development frameworks Enrich apps with data from other sources like Oracle database, Mongo DB Build intelligent applications with access to unstructured, high volume, and high velocity data Train R and Python models against Big Data stored in Hadoop and score your application data without ever leaving SQL Server Apply easy to use tools like Azure Data Studio and Visual Studio Code SQL Server master instance Django framework SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark
  • 30.
    Data scientists canuse familiar tools to analyze structured and unstructured data 1. Use Azure Data Studio notebooks run a Spark job over structured and unstructured data 2. Spark jobs can access data in SQL Server through JDBC, Tedious, etc. 3. Queries can be access data from other sources like Oracle Database and Mongo DB via external tables 4. The Spark job returns the data to the notebook SQL Server master instance External data sources Storage pool Spark Spark Spark SQL Ops Studio
  • 31.
    Model & serve Business/customapps (Structured) Logs, files and media (unstructured) Sensors and IoT (unstructured) Predictive apps BI tools Store HDFS SQL Server data pools Ingest Spark streaming Prep & train Spark Spark ML SQL Server ML Services SQL Server master instance Simplified management and analysis through a unified deployment, governance, and tooling Integrate structured and unstructured data SQL Server master instance REST API containers for models SQL Server Integration Services
  • 32.
  • 33.
    Mount and manageremote stores through HDFS Mount various on-prem and cloud data stores Accelerate computation by caching data locally Disaster recovery/Data backup Storage pool SQL Server Master instance/Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark Other HDFS store Remote cloud store
  • 36.
    SQL Server 2019big data & analytics Managed SQL Server, Spark, and data lake Store high volume data in a data lake and access it easily using either SQL or Spark Management services, admin portal, and integrated security make it all easy to manage SQL Server Data virtualization Combine data from many sources without moving or replicating it Scale out compute and caching to boost performance T-SQL Analytics Apps Open database connectivity NoSQL Relational databases HDFS Complete AI platform Easily feed integrated data from many sources to your model training Ingest and prep data and then train, store, and operationalize your models all in one system SQL Server External Tables Compute pools and data pools Spark Scalable, shared storage (HDFS) External data sources Admin portal and management services Integrated AD-based security SQL Server ML Services Spark & Spark ML HDFS REST API containers for models
  • 37.
    Intelligence over all data drivesinnovation Simplified management and analysis through a unified deployment, governance, and tooling model Analyzing all dataManaging all dataIntegrating all data
  • 38.
    Apply to jointhe SQL Server 2019 Early Adoption Program

Editor's Notes

  • #9 Source: 3rd Platform Information Management Requirements Survey, IDC, October, 2016, n=110 An IDC InfoBrief | May 2017 | “Choosing a DBMS to Address the Challenges of the Third Platform”