Introduction to SQL Server Big Data Clusters

•

0 likes•58 views

Rock Pereira

Perform scalable data processing and analysis with SQL Server and Spark in Kubernetes.

Data & Analytics

SQL Server
Big Data Clusters
Rock Pereira
SQL Saturday, Redmond
April 27, 2019

Contents
1.Kubernetes for Data Science
2.SQL Server Big Data Clusters
3.Understand the problem
4.Data exploration and analysis
5.Data-driven application development with Kubernetes

1.1 What is Kubernetes?
Docker Containers
MCR:
Microsoft Container
Registry

1.2 Benefits
Build &
Configure
InsightObservation
Estimate
Compute Needs
Parameterized
Deployment
Autoscaling

2.1 What is a Big Data Cluster?
Unified data platform for analytics
Data-driven solutions using Kubernetes
Components of a BDC:
●
Spark - Distributed, In-memory compute
●
HDFS - Elastic Storage
●
SQL Server - Data Hub for structured &
unstructured data
●
Kubernetes - Scale-out, fault-tolerant

2.2 Features
●
Deploy anywhere there is managed Kubernetes
●
Management services for logging, monitoring,
backup and high availability
●
Consistent portal for managing all your clusters

2.3 Polybase
Query HDFS (Azure Blob Storage, Hortonworks, Cloudera)
using External Tables in SQL Server
●
Manage permissions with Active Directory
●
No data duplication – The data is not persistent
New in SQL Server 2019:
●
Connectors to Azure SQL DB, Azure SQL DW, Oracle,
Teradata, MongoDB, Azure CosmosDB + any ODBC
compliant source with an ODBC driver (IBM DB2, SAP
Hana, Microsoft Excel)
●
Read CSV & parquet

2.4 Architecture
Compute Pool:
Parallel Ingest
Storage Pool:
Scalable Storage
Data Processing
SQL Data Pool:
Caching External Data
Distributed across SS
Instances
SS Master Pool:
Read-Write OLTP
Store dimensional

2.5 Azure Data Studio
●
Work with relational (big) data in SQL Server
●
HDFS browser – like Azure Storage Explorer
●
External Table wizard, incl column mapping
●
Jupyter-based notebooks
●
Collaboration
●
Code with intellisense
●
Submit Spark jobs

2.6 Deploying a Big Data Cluster
Minikube On-Prem Cloud (AKS)
Single Node
Requirements:
Memory: 32 GB
CPU: 8
Disk Space: 100 GB
Use kubeadm Use python script
Set environment
variables before
deploying
Tools:
mssqlctl (app_commands, ref), kubectl, Azure CLI
Azure Data Studio + SQL Server 2019 extension

What's hot

DotnetConf - Cloud native and .Net5 announcementsSajeetharan

Compute Security - Container SecurityEng Teong Cheah

Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff

Uri budnik moving from virtualized infrastructure to open stack-4.17.13OpenStack Foundation

Virtual Kubernetes Clusters on Amazon EKSJim Bugwadia

Azure Container ServicesPedro Sousa

Windows 2012 Technical OverviewAmit Gatenyo

Enterprise data management for microsoft hd insightJana Lass

Taking Care of Business at Office Depot with Elastic Cloud Enterprise Elasticsearch

Amis Query (02-09-2008): Reports From Oracle Open World - DatabaseMarco Gralike

Tokyo Azure Meetup #29 AKSKenichiro Nakamura

Monitoring Your AWS EKS Environment with DatadogDevOps.com

OpenstackSupriya Natarajan

What's hot (14)

DotnetConf - Cloud native and .Net5 announcements

Compute Security - Container Security

Dell/EMC Technical Validation of BlueData EPIC with Isilon

Uri budnik moving from virtualized infrastructure to open stack-4.17.13

Virtual Kubernetes Clusters on Amazon EKS

Azure Container Services

Windows 2012 Technical Overview

Enterprise data management for microsoft hd insight

Taking Care of Business at Office Depot with Elastic Cloud Enterprise

Amis Query (02-09-2008): Reports From Oracle Open World - Database

Tokyo Azure Meetup #29 AKS

Monitoring Your AWS EKS Environment with Datadog

Openstack

Similar to Introduction to SQL Server Big Data Clusters

SQL Server 2019 hotlap - WARDY IT SolutionsMichaela Murray

Modern big data and machine learning in the era of cloud, docker and kubernetesSlim Baltagi

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro sessionTravis Wright

20210427 azure lille_meetup_azure_data_stackAlexandre BERGERE

SQL Server 2019 hotlap - WARDY IT SolutionsMichaela Murray

Highlights of OpenStack Mitaka and the OpenStack SummitCloud Standards Customer Council

Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engineparekhnikunj

Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Mydbops

Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Cédrick Lunven

Microsoft Azure News - 2018 JuneDaniel Toomey

Speed up Digital Transformation with Openstack Cloud & Software Defined StorageMatthew Sheppard

Data relay introduction to big data clustersChris Adkin

Managing containers at scale Smruti Ranjan Tripathy

Lets talk about: Azure Kubernetes Service (AKS)Pedro Sousa

Logging, indicateurs et APM : le trio gagnant pour des opérations réussiesElasticsearch

Serverless SQLTorsten Steinbach

Why Kubernetes as a container orchestrator is a right choice for running spar...DataWorks Summit

Microservices Architecture - Cloud Native AppsAraf Karsh Hamid

[DSC Europe 22] REDLake Azure - Utilizing 750 TB with Azure Components - Pred...DataScienceConferenc1

Azure Lowlands: An intro to Azure Data LakeRick van den Bosch

Similar to Introduction to SQL Server Big Data Clusters (20)

SQL Server 2019 hotlap - WARDY IT Solutions

Modern big data and machine learning in the era of cloud, docker and kubernetes

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session

20210427 azure lille_meetup_azure_data_stack

SQL Server 2019 hotlap - WARDY IT Solutions

Highlights of OpenStack Mitaka and the OpenStack Summit

Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine

Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...

Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)

Microsoft Azure News - 2018 June

Speed up Digital Transformation with Openstack Cloud & Software Defined Storage

Data relay introduction to big data clusters

Managing containers at scale

Lets talk about: Azure Kubernetes Service (AKS)

Logging, indicateurs et APM : le trio gagnant pour des opérations réussies

Serverless SQL

Why Kubernetes as a container orchestrator is a right choice for running spar...

Microservices Architecture - Cloud Native Apps

[DSC Europe 22] REDLake Azure - Utilizing 750 TB with Azure Components - Pred...

Azure Lowlands: An intro to Azure Data Lake

Recently uploaded

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

Week-01-2.ppt BBB human Computer interactionfulawalesam

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

April 2024 - Crypto Market Report's Analysismanisha194592

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Data-Analysis for Chicago Crime Data 2023ymrp368

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Introduction-to-Machine-Learning (1).pptxfirstjob4

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx

Capstone Project on IBM Data Analytics Program

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Week-01-2.ppt BBB human Computer interaction

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

100-Concepts-of-AI by Anupama Kate .pptx

Ravak dropshipping via API with DroFx.pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

April 2024 - Crypto Market Report's Analysis

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

Generative AI on Enterprise Cloud with NiFi and Milvus

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Smarteg dropshipping via API with DroFx.pptx

Data-Analysis for Chicago Crime Data 2023

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Introduction-to-Machine-Learning (1).pptx

FESE Capital Markets Fact Sheet 2024 Q1.pdf

VidaXL dropshipping via API with DroFx.pptx

Introduction to SQL Server Big Data Clusters

1. SQL Server Big Data Clusters Rock Pereira SQL Saturday, Redmond April 27, 2019

2. Contents 1.Kubernetes for Data Science 2.SQL Server Big Data Clusters 3.Understand the problem 4.Data exploration and analysis 5.Data-driven application development with Kubernetes

3. 1 Kubernetes for Data Science

4. 1.1 What is Kubernetes?

5. 1.1 What is Kubernetes? Docker Containers MCR: Microsoft Container Registry

6. 1.2 Benefits Build & Configure InsightObservation Estimate Compute Needs Parameterized Deployment Autoscaling

7. 1.3 Team Data Science Lifecycle

8. 1.4 Demo: SS 2019 in Minikube

9. 2 SQL Server Big Data Clusters

10. 2.1 What is a Big Data Cluster? Unified data platform for analytics Data-driven solutions using Kubernetes Components of a BDC: ● Spark - Distributed, In-memory compute ● HDFS - Elastic Storage ● SQL Server - Data Hub for structured & unstructured data ● Kubernetes - Scale-out, fault-tolerant

11. 2.2 Features ● Deploy anywhere there is managed Kubernetes ● Management services for logging, monitoring, backup and high availability ● Consistent portal for managing all your clusters

12. 2.3 Polybase Query HDFS (Azure Blob Storage, Hortonworks, Cloudera) using External Tables in SQL Server ● Manage permissions with Active Directory ● No data duplication – The data is not persistent New in SQL Server 2019: ● Connectors to Azure SQL DB, Azure SQL DW, Oracle, Teradata, MongoDB, Azure CosmosDB + any ODBC compliant source with an ODBC driver (IBM DB2, SAP Hana, Microsoft Excel) ● Read CSV & parquet

13. 2.4 Architecture Compute Pool: Parallel Ingest Storage Pool: Scalable Storage Data Processing SQL Data Pool: Caching External Data Distributed across SS Instances SS Master Pool: Read-Write OLTP Store dimensional

14. 2.5 Azure Data Studio ● Work with relational (big) data in SQL Server ● HDFS browser – like Azure Storage Explorer ● External Table wizard, incl column mapping ● Jupyter-based notebooks ● Collaboration ● Code with intellisense ● Submit Spark jobs

15. 2.6 Deploying a Big Data Cluster Minikube On-Prem Cloud (AKS) Single Node Requirements: Memory: 32 GB CPU: 8 Disk Space: 100 GB Use kubeadm Use python script Set environment variables before deploying Tools: mssqlctl (app_commands, ref), kubectl, Azure CLI Azure Data Studio + SQL Server 2019 extension

Introduction to SQL Server Big Data Clusters

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Introduction to SQL Server Big Data Clusters

Similar to Introduction to SQL Server Big Data Clusters (20)

Recently uploaded

Recently uploaded (20)

Introduction to SQL Server Big Data Clusters