Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session

488 views

Published on

SQL Server 2019 big data clusters brings HDFS and Spark into SQL Server for scalable compute and storage.

Published in: Technology
  • Be the first to comment

Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session

  1. 1. Deep Dive On SQL Server and Big Data Travis Wright, Mihaela Blendea, Umachandar Jayachandran Program Managers, SQL Server
  2. 2. Build intelligent apps and AI with all your data Analyzing all data Easily and securely manage data big and small Managing all data Simplified management and analysis through a unified deployment, governance, and tooling SQL Server enables intelligence over all your data Unified access to all your data with unparalleled performance Integrating all data
  3. 3. Managing all data
  4. 4. Easily deploy and manage a SQL Server + Big Data cluster Easily deploy and manage a Big Data cluster using Microsoft’s Kubernetes-based Big Data solution built-in to SQL Server Hadoop Distributed File System (HDFS) storage, SQL Server relational engine, and Spark analytics are deployed as containers on Kubernetes in one easy-to manage package Benefits of containers and Kubernetes: Fast to deploy Self-contained – no installation required Upgrades are easy because - just upload a new image Scalable, multi-tenant, designed for elasticity
  5. 5. SQL Server Big Data Cluster Layout Compute pool SQL Compute Node SQL Compute Node SQL Compute Node … Compute pool SQL Compute Node IoT data Directly read from HDFS Persistent storage … Storage pool SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node Kubernetes pod Analytics Custom apps BI SQL Server master instance Node Node Node Node Node Node Node SQL Data pool SQL Data Node SQL Data Node Compute pool SQL Compute Node Storage Storage Controller
  6. 6. Base node configuration Applies to nodes across all planes Services kubelet – K8s local agent kube-proxy – network config and forwarding supervisord – process monitor and control fluentd – node logging flanneld – Software defined network collectd – OS and application data collection SQL Big Data watchdog– config sync, watchdog, data collector (DMV, etc) Kubernetes node watchdog kubelet supervisord fluentd kube-proxy flanneld collectd
  7. 7. Control plane External Endpoints Kubernetes (REST) Aris Control Service (REST) Knox Gateway (REST gateway for Hadoop APIs) SQL Server Master (TDS gateway for data marts and SQL Master Service) Services etcd Kubernetes Master Services Controller SQL Master instance SQL Big Data Admin Portal Knox Gateway HDFS Name Service YARN Master Hive Metastore InfluxDB (metrics store) Livy (REST interface for Spark) Spark Driver Kubernetes node Base node services + etcd Controller SQL Master Proxy HDFS Name Node Kubernetes node Base node services + etcd Kubernetes Master Services SQL Big Data Admin Portal Spark Driver InfluxDB Kubernetes node Base node services + etcd Livy Elastic Search Knox Hive Metastore Grafana Kibana Yarn Master
  8. 8. Controller External REST/HTTPS Endpoint Bootstrap and Build out Manage Capacity Configure High Availability and recover from failure (AGs) Security (authN, authZ, certificate rotation) Lifecycle (upgrade/downgrade/rollback) Configuration management Monitoring - capacity, health, metrics, logs Troubleshooting – performance, failures Cluster Admin Portal Controller Service Buildout Upgrade/Rollback HADR Add/Remove Capacity Central AuthZ/AuthN Cluster Admin Portal Troubleshooting
  9. 9. SQL Master instance TDS endpoint into the cluster High value data OLTP server Data connectors Machine learning & extensibility Scalable query engine with readable secondary replicas Built-in high availability with Always On Availability Groups (coming soon) Master Instance Availability Group
  10. 10. Compute plane Hosts one or more SQL Compute Pools Compute pool is a group of instances that forms a data, security, and resource boundary. Compute pool processes complex distributed queries against the data plane. Local storage is used for shuffling data if necessary. Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine
  11. 11. Data plane Storage pool Data ingestion through Spark (batch and streaming) Data storage in HDFS Data access through HDFS and SQL endpoints SQL engine reads files in HDFS directly Data pool Partitioned, in-memory cache for external data or HDFS Scale-out data storage for append only data sets Data ingestion through Spark Storage pool node Base node services SQL Engine Data pool node Base node services SQL Engine HDFS Spark Storage pool node Base node services SQL Engine HDFS Spark
  12. 12. Integrating all data
  13. 13.          
  14. 14. SQL Server T-SQLAnalytics Apps ODBC NoSQL Relational databases Big Data PolyBase external tables SQL Server is the hub for integrating data Easily combine across relational and non-relational data stores
  15. 15. Scale-out data pools combine and cache data from many sources for fast querying Scenario  A global car manufacturing company wants to join data from across multiple sources including HDFS, SQL Server, and Cosmos DB Solution • Query data in relational and non-relational data stores with new PolyBase connectors • Create a scale-out data pool cache of combined data • Expose the datasets as a shared data source, without writing code to move and integrate data SQL Server Scale-out data pool HDFS Cosmos DB SQL Server Polybase connectors Shard 1 Shard nShard 2IoT data
  16. 16. SQL Server can now read directly from HDFS files Elastically scale compute and storage using HDFS-based storage pools with SQL Server and Spark built in Mount and manage remote stores through HDFS Mount various on-prem and cloud data stores Accelerate computation by caching data locally Disaster recovery/Data backup Storage pool SQL Server Master instance/Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark Other HDFS store Remote cloud store
  17. 17.        
  18. 18. Analyzing all data
  19. 19. Data scientists can use familiar tools to analyze structured and unstructured data 1. Use SQL Ops Studio notebooks run a Spark job over structured and unstructured data 2. Spark SQL jobs can access data in SQL Server 3. Queries can be pushed down to other data sources like Oracle Database and Mongo DB 4. The Spark job returns the data to the notebook SQL Server master instance External data sources Storage pool Spark Spark Spark Azure Data Studio
  20. 20. Model & serve Business/custom apps (Structured) Logs, files and media (unstructured) Sensors and IoT (unstructured) Predictive apps BI tools Store HDFS SQL Server data pools Ingest Spark streaming Prep & train Spark Spark ML SQL Server ML Services SQL Server master instance Simplified management and analysis through a unified deployment, governance, and tooling Integrate structured and unstructured data SQL Server master instance REST API containers for models
  21. 21. SQL Server
  22. 22. SQL Server master instance Storage pool Spark MLeap Runtime Spark Spark Model Scoring Training Training Training
  23. 23. Managed SQL Server, Spark and data lake Store high volume data in a data lake and access it easily using either SQL or Spark Management services, admin portal, and integrated security make it all easy to manage SQL Server Data virtualization Combine data from many sources without moving or replicating it Scale out compute and caching to boost performance T-SQL Analytics Apps Open database connectivity NoSQL Relational databases HDFS Complete AI platform Easily feed integrated data from many sources to your model training Ingest and prep data and then train, store, and operationalize your models all in one system SQL Server External Tables Compute pools and data pools Spark Scalable, shared storage (HDFS) External data sources Admin portal and management services Integrated AD-based security SQL Server ML Services Spark & Spark ML HDFS REST API containers for models
  24. 24. Apply to join the SQL Server Early Adoption Program

×