Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ai big dataconference_eugene_polonichko_azure data lake


Published on

Topic of presentation: Azure Data Lake: what is it? why is it? where is it?

The main points of the presentation:
What is Azure Data Lake? Why does this technology call Microsoft Big Data? Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Ai big dataconference_eugene_polonichko_azure data lake

  1. 1. Azure Data Lake: What is it? Why is it? Where is it? EUGENE POLONICHKO DATA PLATFORM MVP BIDWH ARCHITECT
  2. 2. About me Eugene Polonichko has over 7 years of experience with SQL Server. He mainly focused on BI projects (SSAS, SSIS, PowerBI, Cognos, Informatica PowerCenter, Pentaho, Tableau). Eugene is a passionate speaker and SQL community volunteer presenting regularly at PASS SQL Saturday events and local user groups around Ukraine and Europe. Eugene is PASS Chapter Leader and he has a status MVP Data Platform
  3. 3. Agenda  What is Data Lake?  Architecture of Azure Data Lake  Azure Data Lake Store  Overview of Azure Data Lake Store  Compare  For big data processing  Azure Data Lake Analytics  U-SQL  Concepts  U-SQL Script Structure  Extractors  U-SQL Jobs  U-SQL catalog  Monitoring and performance U-SQL jobs  Data Lake Analytics pricing
  4. 4. Data Lake
  5. 5. Data Lake
  6. 6. Architecture of Azure Data Lake
  7. 7. Azure Data Lake Stores  Azure Data Lake Store is a hyper-scale repository for big data analytic workloads. Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics.  The Azure Data Lake store is an Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS)  Can be accessed from Hadoop (available with HDInsight cluster) using the WebHDFS-compatible REST APIs
  8. 8. Azure Data Lake Stores Use Cases  Store social media posts, log files, sensor data  Store corporate data such as relational databases (as flat files)
  9. 9. Data Lake Storage vs Azure Storage Optimized storage for big data analytics workloads General purpose object store for a wide variety of storage scenarios Batch, interactive, streaming analytics, log files and etc Any type of text or binary data, such as application back end, account contains folders, which in turn contains data stored as files Storage account has containers Optimized performance for parallel analytics workloads. High Throughput and IOPS. Not optimized for analytics workloads
  10. 10. Big Data requirements
  11. 11. Pricing Transaction prices Storage prices
  12. 12. DEMO
  13. 13. Azure Data Lake Analytics Azure Data Lake Analytics is an on-demand analytics job service to simplify big data analytics. You can focus on writing, running, and managing jobs rather than on operating distributed infrastructure.  Dynamic scaling  Develop faster, debug, and optimize smarter using familiar tools  Affordable and cost effective  Works with all your Azure Data  U-SQL: simple and familiar, powerful, and extensible
  14. 14. U-SQL T-SQL C# U-SQL
  15. 15. Concepts Retrieve data from stored locations in rowset format Transform the rowset(s) Transform the rowset(s)
  16. 16. U-SQL Script Structure Script := Statement_List. Statement_List := { [Statement] ';' }. Statement := Use_Statement | If_Else_Statement | Declare_Variable_Statement | Reference_Assembly_Statement | Deploy_Resource_Statement | DDL_Statement | Query_Statement | Procedure_Call | Import_Package_Statement | DML_Statement | Output_Statement.
  17. 17. U-SQL Script Structure
  18. 18. U-SQL Built-in Extractors:  Extractors.Text() :  Extractors.Csv()  Extractors.Tsv() Extractors
  19. 19. U-SQL Jobs UNIT V-- V-- V— V--- V-- V-- ADLAUs
  20. 20. U-SQL Jobs ADLAUs Azure Data Lake Analytics Unit Parallelism N = N ADLAUs 1 ADLAU ~= A VM with 2 cores and 6 GB of memory
  21. 21. U-SQL Jobs
  22. 22. U-SQL Catalog Database Table Views Procedures
  23. 23. DEMO
  24. 24. Monitoring 1 Azure Portal
  25. 25. Monitoring Visual Studio
  26. 26. DEMO
  27. 27. Pricing
  28. 28. Links  
  29. 29. Questions?
  30. 30. Thank you