Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PASS_Summit_2019_Azure_Storage_Options_for_Analytics

53 views

Published on

Storage options for Analytics are not one size fits all. To deliver the best solution, you need to understand the use case, performance requirements, and users of the system. This session will break down the options you have in Azure to build a data analytics ecosystem, and explain why everyone's talking about data lakes and where's best to build your data warehouse.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

PASS_Summit_2019_Azure_Storage_Options_for_Analytics

  1. 1. Azure Storage Options for Analytics Dustin Vannoy Data Engineer Cloud + Streaming
  2. 2. Please silence cell phones
  3. 3. everything PASS has to offer Free online webinar events Free 1-day local training events Local user groups around the world Online special interest user groups Business analytics training Get involved Free Online Resources Newsletters PASS.org Explore
  4. 4. Dustin Vannoy Data Engineering Consultant Co-founder Data Engineering San Diego /dustinvannoy @dustinvannoy dustin@dustinvannoy.com Technologies • Azure & AWS • Spark • Kafka • Python Modern Data Systems • Data Lakes • Analytics in Cloud • Streaming
  5. 5. PASS Summit Learning Pathway: Becoming an Azure Data Engineer Roles and Responsibilities of the Azure Data Engineer Jes Borland Wednesday November 06, 10:15 AM Room: TCC Tahoma 2 Azure Storage Options for Analytics Dustin Vannoy Wednesday, November 06, 3:15 PM Room: TCC Skagit 4 An Azure Data Engineer’s ETL Toolkit Simon Whiteley Thursday, November 07, 3:15 PM Room: TCC Tahoma 4 Data Modeling Trends for 2019 and Beyond Ike Ellis Friday, November 08, 9:30 AM Room: 2AB
  6. 6. Azure Storage for Analytics 1. Data Lakes 2. Data Warehouses 3. Analytics
  7. 7. Data Lakes in Azure
  8. 8. Data Lake Defined Varied Data Raw, intermediate, and fully processed Ready for Analysts Query layer, other analytic tools access Big Data Capable Store first, evaluate and model later * Not just a file system
  9. 9. Store Everything Why Data Lakes? • CSV, JSON, Logs, Text • No schema on write • Cheaper storage Reason #1
  10. 10. Massive Scale (Big Data) Why Data Lakes? • Serverless Hadoop • Span hot and cold storage • Pay for what you use Reason #2
  11. 11. Reason #3 Storage + Compute Separate Why Data Lakes? • Cost savings • Multiple analytics tools / same data
  12. 12. D E M O Example Data Lake Querying
  13. 13. Data Lake Best Practices • Metadata portal • Not just raw data • Dataset certification • Not too much governance
  14. 14. Azure Blob Storage • Storage for pretty much anything • Can choose from Block blob, Append blob, or Page blob • Low cost: $
  15. 15. Azure Blob Storage Structure Storage Account Containers Blobs
  16. 16. ADLS Gen 1 ADLS Gen 2 Azure Data Lake Storage File system semantics Granular security Scale Benefits from Gen 1 + Low cost + Hierarchical namespace
  17. 17. Data Lake Storage, Gen 2 • Built on Azure Blob Storage • Hadoop compatible access • Optimized for cloud analytics • Low cost: $$
  18. 18. ADLS Gen 2 Structure Storage Account File System Files
  19. 19. Options for Import Getting Data into ADLS Gen 2 • Azure Databricks • Azure Data Factory • AzCopy • Azure Storage Explorer
  20. 20. Options for Access Accessing Data From ADLS Gen 2 • Azure Databricks • HD Insight • Polybase (SQL DW / SQL Server) • Power BI
  21. 21. D E M O ADLS Gen 2: Setup and Upload
  22. 22. Archive Storage • Still part of Azure Blob Storage • Seamless integration with hot/cool • Keep everything • Very low cost but... • High read cost • Early deletion charges
  23. 23. Cost Comparison – Hot LRS Type Storage (Dollars/GB) Reads (per 10,000) Writes (per 10,000) Blob Storage (Hot) .021 .004 .055 ADLS Gen 2 (Hot) .021 .006 .072 * for ADLS every 4MB is considered an operation
  24. 24. Cost Comparison – Cool LRS Type Storage (Dollars/GB) Reads (per 10,000) Writes (per 10,000) Blob Storage (Cool) .015 .010 .100 ADLS Gen 2 (Cool) .015 .013 .130 * for ADLS every 4MB is considered an operation
  25. 25. Cost Comparison – Archive LRS Type Storage (Dollars/GB) Reads (per 10,000) Writes (per 10,000) Blob Storage (Archive) .002 5.500 .110 ADLS Gen 2 (Archive) .002 7.15 .143 * for ADLS every 4MB is considered an operation
  26. 26. Storage Redundancy Options Review redundancy and cost implications: https://azure.microsoft.com/en- us/pricing/details/storage/
  27. 27. Data Warehouses in Azure
  28. 28. Data Warehouse Defined Structured Data Processed and modeled for analytics use Interactive queries Analysts can get answers to questions quickly BI tool support Reporting tools can query efficiently
  29. 29. Speed of thought Why Data Warehouses? • Fast query response • Indexing or column store • SQL with analytic functions Reason #1
  30. 30. Reason #2 Ready to use data Why Data Warehouses? • Useful column names • Cleaned and standardized • Focused
  31. 31. Update/Delete Why Data Warehouses? • Support for real-time ingestion • Keep latest view or manage history Reason #3
  32. 32. Data Warehouse Best Practices • Staging data off limits • Star schema design • Indexing strategies • Read replicas
  33. 33. Azure SQL DB • Good ole relational database • Less DBA work required • Scalable on demand • Medium cost: $$ - $$$$ Managed SQL Server
  34. 34. Azure SQL DB – Elastic pools • DBs can auto-scale within the pool • Can move DB to different pool • Want DBs peak usage at different times • Important to understand utilization of DBs Resources shared among DBs
  35. 35. Azure SQL DB – Managed Instances Most on-premise features supported • SQL Agent jobs • Change Data Capture • Enabled CLR • Cross database queries • DB Mail • Service Broker • Transactional Replication Best for migrations
  36. 36. Azure SQL DB – Hyperscale • Storage, Compute, and Log scale separately • Backups, restores and scaling not tied to volume of data • Optimized for OLTP, but supports analytical workloads • One way migration Highly scalable storage and compute
  37. 37. Hyperscale Architecture http://aka.ms/ SQLDB_Hyperscale
  38. 38. D E M O Azure SQL DB: Analytics querying
  39. 39. Azure Synapse Analytics - SQL DW • MPP - fast reads, many users • Supports Polybase • Scalable on demand • High cost: $$$$ High performance Analytic DB
  40. 40. D E M O Synapse Analytics (SQL DW): Analytics querying
  41. 41. Cosmos DB • Useful for in-app analytics • Best with known search key, e.g. CustomerID • Key-value, Column-family, Document, Graph • SQL, Cassandra, MongoDB, Gremlin, Table, etcd, Spark • Medium cost: $$ - $$$ Managed NoSQL
  42. 42. Analytics in Azure
  43. 43. Shared semantic model Cache data Azure Analysis Services Build calculations and aggregations into a model that can be used by many analytics tools Improve query speeds by caching data
  44. 44. Visual report tool Supports most sources Power BI Build interactive dashboards and reports or do exploratory data analysis Connects to everything Azure and many other source types
  45. 45. D E M O Power BI: Connect to Data Lake
  46. 46. Final Thoughts
  47. 47. Keep Learning! Databricks / ETL 10 Cool Things You Can Do With Azure Databricks – Ike, Simon, Dustin An Azure Data Engineer's ETL Toolkit – Simon Whiteley Code Like a Snake Charmer - Introduction to Python! – Jamey Johnston Code Like a Snake Charmer – Advanced Data Modeling in Python! – Jamey Johnston Cosmos Cosmic DBA - Cosmos DB for SQL Server Admins and Developers – Michael Donnelly CosmosDB - Designing and Troubleshooting Lessons – Neil Hambly Data Modeling Data Modeling Trends for 2019 and Beyond – Ike Ellis Innovative Data Modeling for Cool Data Warehouses – Jeff Renz, Leslie Weed Data Warehouse / SQL DB Best, Better, Hyperscale! The Last Database You will Ever Need in the Cloud – Denzil Ribeiro Introducing Azure Synapse Analytics: The End-to-End Analytics Platform Built for Every Data Professional – Saveen Reddy Azure SQL Database: Maximizing Cloud Performance and Availability – Joe Sack, Denzil Ribeiro Delivering a Data Warehouse in the Cloud – Jeff Renz Data Warehousing: Which of the Many Cloud Products is the Right One for You? – Ginger Grant
  48. 48. Session Evaluations Submit by 5pm Friday, November 15th to win prizes. Download the GuideBook App and search: PASS Summit 2019 Follow the QR code link on session signage Go to PASSsummit.com 3 W A Y S T O A C C E S S
  49. 49. Thank You Dustin Vannoy @dustinvannoy dustin@dustinvannoy.com

×