Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session

216 views

Published on

SQL Server 2019 big data clusters brings HDFS and Spark into SQL Server for scalable compute and storage.

Published in: Technology
  • Be the first to comment

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session

  1. 1. The Future of SQL Server 2019 and Big Data
  2. 2. *IDC White Paper, Data Age 2025: The Evolution of Data to Life-Critical 163 ZBs of data will be generated In 2025In 2016 16.1 ZBs of data was generated
  3. 3. Barriers to insights are barriers to success The task of generating insights from ever-increasing data is tough
  4. 4. Organizations that transform data into insights outperform the competition Source: Keystone Strategy interviews Oct 2015 - Mar 2016 74% of leaders use predictive models37% of leaders dynamically update data models Leaders combine structured and unstructured data in a data lake 8X as often Integrate data without ETL Combine data in a central data store Perform predictive analytics What do these organizations do differently?
  5. 5. Build intelligent apps and AI with all your data Analyzing all data Easily and securely manage data big and small Managing all data Simplified management and analysis through a unified deployment, governance, and tooling SQL Server enables intelligence over all your data Unified access to all your data with unparalleled performance Integrating all data
  6. 6. Integrating all data
  7. 7. Data movement is a barrier to faster insights Costs Duplicated storage costs Engineering effort to build and maintain data pipelines Delays in integrate data before it can be used Increased data latency Increased attack surface area Inconsistent security models Data quality issues can be created by ETL pipelines Increased governance issues No, 19% Don't Know, 5% Yes, 76% 3/4 of respondents say that untimely data has inhibited business opportunities Speed Security Quality Compliance *IDC 3rd Platform Information Management Requirements Survey, Oct 2016
  8. 8. Data virtualization creates solutions Costs Lower storage costs Less dev time spent on integration Rapid iterations and prototypes Timely data Smaller attach surface area Consistent security model Fresh and accurate data Easier data governance Speed Security Quality Compliance Data virtualization integrates data from disparate sources, locations and formats, without replicating or moving the data, to create a single "virtual" data fabric
  9. 9. SQL Server T-SQLAnalytics Apps ODBC NoSQL Relational databases Big Data PolyBase external tables SQL Server is the hub for integrating data Easily combine across relational and non-relational data stores
  10. 10. Managing all data
  11. 11. Complex scale-out deployment Time-consuming patching and upgrades Cumbersome security management
  12. 12. Easily deploy and manage a SQL Server + Big Data cluster Easily deploy and manage a Big Data cluster using Microsoft’s Kubernetes-based Big Data solution built-in to SQL Server Hadoop Distributed File System (HDFS) storage, SQL Server relational engine, and Spark analytics are deployed as containers on Kubernetes in one easy-to manage package
  13. 13. Simplified deployment with containers & Kubernetes A container is a standardized unit of software that includes everything needed to run it Kubernetes is a container hosting platform Benefits of containers and Kubernetes: 1. Fast to deploy 2. Self-contained – no installation required 3. Upgrades are easy because - just upload a new image 4. Scalable, multi-tenant, designed for elasticity Kubernetes pod SQL Server HDFS Data Node Spark
  14. 14. SQL Server can now read directly from HDFS files Elastically scale compute and storage using HDFS-based storage pools with SQL Server and Spark built in Apps, BI, and analytics access Big Data through the SQL Server master instance Scale Big Data on demand SQL Server master instance Persistent storage Custom apps AnalyticsBI SQL Server HDFS Data Node Spark Kubernetes pod SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark Node Node Node SQL
  15. 15. Scale-out data pools combine and cache data from many sources for fast querying Scenario  A global car manufacturing company wants to join data from across multiple sources including HDFS, SQL Server, and Cosmos DB Solution • Query data in relational and non-relational data stores with new PolyBase connectors • Create a scale-out data pool cache of combined data • Expose the datasets as a shared data source, without writing code to move and integrate data SQL Server Scale-out data pool HDFS Cosmos DB SQL Server Polybase connectors Shard 1 Shard nShard 2
  16. 16. Persistent storage SQL Server Scale-out data pool IoT data Extend SQL Server with a scale-out storage tier by partitioning the data across multiple instances Speed up query performance by scaling out the filtering and local aggregation across multiple instances Shard 1 Shard nShard 2
  17. 17. Increase analytics and apps performance Compute pool SQL Compute Node SQL Compute Node SQL Compute Node … Compute pool SQL Compute Node IoT data Directly read from HDFS Persistent storage … Storage pool SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node Kubernetes pod Analytics Custom apps BI SQL Server master instance Node Node Node Node Node Node Node SQL Data pool SQL Data Node SQL Data Node Compute pool SQL Compute Node Storage Storage
  18. 18. Azure Data Studio provides a unified tool for querying data using a notebook experience for both T-SQL and Spark Easily access all your data across SQL Server and HDFS The cluster administration portal provides easy to use cloud-style managed services for HA, monitoring, backup/recovery, security, and provisioning. The REST API and command line tools simplify automation The development and management experience is consistent regardless of where you run – on prem or any of the major cloud providers
  19. 19. Integrated Big Data and SQL Server security model Simple, single sign-on with Active Directory authentication Manage data access with SQL Server security roles Access reporting for audit and compliance Central security and governance External data sources Active Directory App and AI Developer Impersonation Active Directory
  20. 20. Analyzing all data
  21. 21. Developers struggle to access insights from Big Data Data science is siloed from operational data Lengthy time to train and operationalize models
  22. 22. Storage pool Access relational and non-relational data using familiar T- SQL commands and development frameworks Enrich apps with data from other sources like Oracle database, Mongo DB Build intelligent applications with access to unstructured, high volume, and high velocity data Train R and Python models against Big Data stored in Hadoop and score your application data without ever leaving SQL Server Apply easy to use tools like Azure Data Studio and Visual Studio Code SQL Server master instance Django framework SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark
  23. 23. Data scientists can use familiar tools to analyze structured and unstructured data 1. Use Azure Data Studio notebooks run a Spark job over structured and unstructured data 2. Spark jobs can access data in SQL Server through JDBC, Tedious, etc. 3. Queries can be access data from other sources like Oracle Database and Mongo DB via external tables 4. The Spark job returns the data to the notebook SQL Server master instance External data sources Storage pool Spark Spark Spark SQL Ops Studio
  24. 24. Model & serve Business/custom apps (Structured) Logs, files and media (unstructured) Sensors and IoT (unstructured) Predictive apps BI tools Store HDFS SQL Server data pools Ingest Spark streaming Prep & train Spark Spark ML SQL Server ML Services SQL Server master instance Simplified management and analysis through a unified deployment, governance, and tooling Integrate structured and unstructured data SQL Server master instance REST API containers for models SQL Server Integration Services
  25. 25. VolumeVarietyVelocity Veracity
  26. 26. Mount and manage remote stores through HDFS Mount various on-prem and cloud data stores Accelerate computation by caching data locally Disaster recovery/Data backup Storage pool SQL Server Master instance/Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark Other HDFS store Remote cloud store
  27. 27. SQL Server 2019 big data & analytics Managed SQL Server, Spark, and data lake Store high volume data in a data lake and access it easily using either SQL or Spark Management services, admin portal, and integrated security make it all easy to manage SQL Server Data virtualization Combine data from many sources without moving or replicating it Scale out compute and caching to boost performance T-SQL Analytics Apps Open database connectivity NoSQL Relational databases HDFS Complete AI platform Easily feed integrated data from many sources to your model training Ingest and prep data and then train, store, and operationalize your models all in one system SQL Server External Tables Compute pools and data pools Spark Scalable, shared storage (HDFS) External data sources Admin portal and management services Integrated AD-based security SQL Server ML Services Spark & Spark ML HDFS REST API containers for models
  28. 28. Intelligence over all data drives innovation Simplified management and analysis through a unified deployment, governance, and tooling model Analyzing all dataManaging all dataIntegrating all data
  29. 29. Apply to join the SQL Server 2019 Early Adoption Program

×