Advertisement

More Related Content

Slideshows for you(20)

Advertisement
Advertisement

Discovery Day 2019 Sofia - Big data clusters

  1. Powered by SQL Server 2019 Big Data Clusters Rozalina Zaharieva & Dimitar Zahariev
  2. SQLServer Big Data Cluster Layout IoT data Controller Cluster Compute plane Compute pool Compute pool SQL Compute Node SQL Compute Node Compute pool SQL Compute Node SQL Compute Node SQL Compute Node Control planeSQL Server Master instance Storage plane Directly read From HDFS Data pool SQL Data Node SQL Data Node Storage Storage HDFS Data Node Spark SQL Server Storage pool Spark SQL Server HDFS Data Node HDFS Data Node Spark SQL Server Kubernetes pod External data sources Microsoft SQL Server Node Persistent storage Node Node Node Node Node Node Node Analytics Custom apps BI
  3. Architecturedissection • Kubernetes (K8s) concepts • SQL Server 2019 big data cluster (BDC) components
  4. Kubernetes concepts
  5. WhatisKubernetesandwhatitdoes?  Kubernetes is a container orchestrator and is responsible for:  Run a cluster of hosts  Schedule containers to run on different hosts  Facilitate the communication between the containers  Provide and control access to/from outside world  Track and optimize the resource usage  Similar solutions  Docker Swarm, Mesos Marathon, Amazon ECS, Hashicorp Nomad
  6. K8sarchitectureoverview kube-proxy Kubelet Node1 Pod1 PodN ... kube-proxy Kubelet NodeK Pod1 PodM ... Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store
  7. MasterNodes  Responsible for managing the cluster  Typically more than one is installed  In HA mode one Master node is the Leader  Can be reached via CLI (kubectl), APIs, or Dashboard Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store Schedules the work on different nodes Takes care of: 1) Control loops 2) Desired state Performs: 1) Administrative tasks 2) Stores cluster state etcd is used and it can be: 1) part of the master 2) installed externally
  8. (Worker)Nodes  Initially called Minions  Container runtime  containerd, rkt, lxd  Kubelet  Communicates with master  Uses CRI shims  kube-proxy  Network proxy Node kube-proxy Kubelet Container Runtime Pod 1 Pod 2
  9. Pods(1)  Smallest unit of scheduling  Contains one or more containers  Containers share the pod environment  Scheduled on nodes  Created via manifest files Pod Main container Supporting containers net mount ... Environment
  10. Pods(2)  Each pod has unique IP address  Inter-pod communication is via a pod network  Intra-pod communication is via localhost and port Pod 2 10.10.20.21 Pod network Pod 1 10.10.20.20 localhost
  11. ReplicationControllers  Higher level workload  Looks after pod or set of pods  Scale up/down pods  Sets Desired State Replication Controller Pod
  12. Deployment Deployments  Even higher level workload  Simplifies updates and rollbacks  Declarative and imperative approach  Self documenting  Suitable for versioning Replication Set Pod
  13. Services(1)  Provide reliable network endpoint  IP address  DNS name  Port  Expose Pods to the outside world  NodePort (cluster-wide port)  LoadBalancer (cloud-based)  Use End Point object to track Pods IP = 10.10.10.1 DNS = demo-svc Port = 32000 Service Pod A IP, Pod B IP, ... End Point Node 1 Pod A 10.10.20.21 Node 2 Pod B 10.10.20.22
  14. Services(2)  Services use label selectors to do their magic Service version=v01 app=myapp Pod version=v01 app=myapp Pod version=v01 app=myapp
  15. Services(2) Service version=v01 app=myapp Pod version=v01 app=myapp Pod version=v02 app=myapp Pod version=v02 app=myapp Pod version=v01 app=myapp  Services use label selectors to do their magic
  16. Services(2) Service version=v02 app=myapp Pod version=v01 app=myapp Pod version=v02 app=myapp Pod version=v02 app=myapp Pod version=v01 app=myapp  Services use label selectors to do their magic
  17. Services(2) Service version=v02 app=myapp Pod version=v02 app=myapp Pod version=v02 app=myapp  Services use label selectors to do their magic
  18. SQL Server 2019 big data cluster (BDC) components
  19. SQLServer2019bigdatacluster
  20. Basenodeconfiguration Applies to nodes across all planes. Services:  kubelet – K8s local agent  kube-proxy – network config and forwarding  supervisord – process monitor and control  fluentd – node logging  flanneld – Software defined network  collectd – OS and application data collection SQL Big Data watchdog– config sync, watchdog, data collector (DMV, etc) Kubernetes node watchdog kubelet kube-proxy supervisord fluentd flanned collectd
  21. ControlPlane External Endpoints:  Kubernetes (REST)  Aris Control Service (REST)  Knox Gateway (REST gateway for Hadoop APIs)  SQL Server Master (TDS gateway for data marts and SQL Master Service) Services:  etcd  Kubernetes Master Services Controller  SQL Master instance  SQL Big Data Admin Portal  Knox Gateway  HDFS Name Service  YARN Master  Hive Metastore  InfluxDB (metrics store)  Livy (REST interface for Spark)  Spark Driver Kubernetes node Base node services + etcd K8s Master service Spark driver SQL Big Data Admin portal InfluxDB Grafana Kubernetes node Base node services + etcd Controller Proxy SQL Master HDFS Name Node Kibana Kubernetes node Base node services + etcd Livy Knox Elastic Search HIVE Metastore YARN Master
  22. Controller  External REST/HTTPS Endpoint  Bootstrap and Build out  Manage Capacity  Configure High Availability and recover from failure (AGs) Security (authN, authZ, certificate rotation)  Lifecycle (upgrade/downgrade/rollback)  Configuration management  Monitoring - capacity, health, metrics, logs  Troubleshooting – performance, failures  Cluster Admin Portal Controller service Buildout Upgrade/Rollback Add/Remove capacity Central AuthZ/AutnN Cluster Admin Portal Troubleshooting Controller Metadata
  23. SQLMasterInstance  TDS endpoint into the cluster  High value data  OLTP server  Data connectors  Machine learning & extensibility  Scalable query engine Master instance Availability Group Primary Readable Secondary Readable Secondary
  24. Computeplane  Hosts one or more SQL Compute Pools  Compute pool is a group of instances that forms a data, security, and resource boundary.  Compute pool processes complex distributed queries against the data plane.  Local storage is used for shuffling data if necessary. Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine
  25. Dataplane Storage pool:  Data ingestion through Spark (batch and streaming)  Data storage in HDFS  Data access through HDFS and SQL endpoints. SQL engine reads files in HDFS directly Data pool:  Partitioned, in-memory cache for external data  Scale-out data storage for append only data sets  Data ingestion through Spark  Provide persistent SQL Server storage for the cluster Storage pool node Base node services SQL Engine HDFS Spark Data pool node Base node services SQL Engine Storage pool node Base node services SQL Engine HDFS Spark
  26. Installation,configurationsandtools Installation methods: • Cloud - platform such as Azure Kubernetes Service (AKS) • On-premis - VMs, Bare Metal • Localhost - using minikube (to be used only for training and testing) Configurations: • All-in-One Single Node and Different Multi Node Options Tools: • mssqlctl, kubectl, Azure Data Studio, SQL Server 2019 extension, • Azure CLI (for AKS), mssql-cli, sqlcmd, curl
  27. Demonstrations
  28. Powered by
Advertisement