Big Data Made Easy with
Azure Data Lake
Kanio Dimitrov,
Tokyo Azure Meetup
About Me
Azure Architect & Advisor
Tokyo Azure Meetup Host
twitter: @azurekanio
blog: https://azurekan.wordpress.com/
Big Data Made Easy When
• Easy to manage
• Easy to debug
• Easy to optimize
Key Points
• Any Data
• Enterprise
• Developers
What is Azure Data Lake?
Cosmos
• Internal Microsoft System
• 10 000 Developers
• 100 000-s interactive jobs/day
• Exabytes of data
• Microsoft Core System
From Cosmos to Azure Data Lake
Azure Data Lake Store Azure Data Lake Analytics
• Ease of use
• Ability to Scale
• Offered to the public
Azure Data Lake – Based on Open Source
Easy to Start
• Create ADL Store Account
• Create ADL Analytics Account (90 seconds, free)
• Write & Submit U-SQL script
• U-SQL job executes
ADL Analytics
• Distributed analysis service
• Built on Apache YARN
• Dynamic scaling
ADL Analytics
• Pay per query
• Scale per query
• Federated query
ADL Analytics
• Uses U-SQL - C# & SQL
• No Scale limits
• Optimized to work with ADL Store
Data Source Read Write
ADL Store Yes Yes
Storage Blob Yes Yes
Azure SQL Yes In Future
Azure SQL Data
Warehouse
Yes In Future
Azure SQL DB in
VM
Yes In Future
On Premise Data
Sources
In Future In Future
ADL Analytics Administration
• Web-based management in Azure Portal
• Automation with PowerShell
• Role-Based Access Control with Azure Active Directory
• Monitory Service Operation and Activity
Development
• Author, debug & optimize Big Data applications in Visual
Studio
• Languages: U-SQL & Hive (coming soon)
• .Net integration with U-SQL
ADL Analytics SDK-s
JAVA C++ .NET Node.js Python
U-SQL
Extensibility
Yes
Management
Operations
By GA Yes Yes By GA
U-SQL
SQL
• Support of familiar SQL clauses
• Structured and Unstructured
Data
• Relational metadata objects
.NET
• U-SQL - full C# expressions
• Reuse .NET code
• Use C# for defining:
• Types, Functions, Joins,
Aggregations, I/O (Extractors,
Outputters)
Logical Plan -> Physical Plan
• One node perspective
• Physical plan created
• Defines level of parallelism
HDInsight
• Managed Hadoop Cluster in the Cloud
• Deploy Storm jobs from Visual Studio
• Use C# to author event processing logic
• Integrate existing packages & code
ADLA vs HDInsight
Azure Data
Lake Analytics
HDInsight
• Automatically Scale
• Start quickly with C#, SQL, Visual Studio
• Jobs - Convenient, efficient, automatic
scale
• Leverage open source tech – Java, Eclipse,
Hive
• Manage clusters – customization, control and
flexibility
Azure Data Lake Store
• Hyper Scale Web HDFS store in the cloud
• Store any data in native format
• Enterprise grade
• No limits to Scale
• Optimized for analytic workload performance
Azure Data Lake Store
• Unlimited Storage (petabytes)
• Optimized for Analytics
• Parallel computing optimized
• Auto optimization for any throughput
• Reliable
• Automatically replicates data (3 copies)
• Highly available
Integration
ADL Store
HDInsight
ADL StoreSDK-s
JAVA C++ .NET Node.js Python
U-SQL
Extensibility
WebHDFS Client LibWebHDFS Yes Yes By GA
Management
Operations
By GA Yes Yes By GA
Visual Studio Tools
ADL Analytics Billing Model
• Account is free
• Pay for compute nodes for the duration of query
• Formula (GA) = 5 cents + (minutes x parallelism x Analytics
Unit Price)
• Analytics Unit Price - $0.017 / minute
• Preview – 50% discount
ADL Store Billing Model
• Account is free
• Pay for amount of data - $0.08 / GB per month
• Pay for number of I/O operations - $0.14 / million
transactions
• Preview – 50% discount
Security
• Based on Azure Active Directory
• Federate with Enterprise Active Directory
• Two factor authentication
Security - Access
• Role Based Access Control
• Custom access with POSIX ACLs
• Permissions for specific named users or groups
Security - Runtime
• All user code runs in VM
• VM-s are locked down
• Detailed audit records – who, when, what, how long
• Audit logs available out of the box
Security - Encryption
• Encryption on the wire - Data uploaded via HTTPS
• Encryption at rest – after public preview
• Integration with Azure Key Vault for keys
• Encryption is optional
Security - Compliance
• Certification
• Azure compliance requirements
• External auditing
DEMO
Tokyo Azure Meetup
Learn | Share | Enjoy Cool Demos!

Tokyo azure meetup #2 big data made easy

  • 1.
    Big Data MadeEasy with Azure Data Lake Kanio Dimitrov, Tokyo Azure Meetup
  • 2.
    About Me Azure Architect& Advisor Tokyo Azure Meetup Host twitter: @azurekanio blog: https://azurekan.wordpress.com/
  • 3.
    Big Data MadeEasy When • Easy to manage • Easy to debug • Easy to optimize
  • 4.
    Key Points • AnyData • Enterprise • Developers
  • 5.
    What is AzureData Lake?
  • 6.
    Cosmos • Internal MicrosoftSystem • 10 000 Developers • 100 000-s interactive jobs/day • Exabytes of data • Microsoft Core System
  • 7.
    From Cosmos toAzure Data Lake Azure Data Lake Store Azure Data Lake Analytics • Ease of use • Ability to Scale • Offered to the public
  • 8.
    Azure Data Lake– Based on Open Source
  • 9.
    Easy to Start •Create ADL Store Account • Create ADL Analytics Account (90 seconds, free) • Write & Submit U-SQL script • U-SQL job executes
  • 10.
    ADL Analytics • Distributedanalysis service • Built on Apache YARN • Dynamic scaling
  • 11.
    ADL Analytics • Payper query • Scale per query • Federated query
  • 12.
    ADL Analytics • UsesU-SQL - C# & SQL • No Scale limits • Optimized to work with ADL Store
  • 13.
    Data Source ReadWrite ADL Store Yes Yes Storage Blob Yes Yes Azure SQL Yes In Future Azure SQL Data Warehouse Yes In Future Azure SQL DB in VM Yes In Future On Premise Data Sources In Future In Future
  • 14.
    ADL Analytics Administration •Web-based management in Azure Portal • Automation with PowerShell • Role-Based Access Control with Azure Active Directory • Monitory Service Operation and Activity
  • 15.
    Development • Author, debug& optimize Big Data applications in Visual Studio • Languages: U-SQL & Hive (coming soon) • .Net integration with U-SQL
  • 16.
    ADL Analytics SDK-s JAVAC++ .NET Node.js Python U-SQL Extensibility Yes Management Operations By GA Yes Yes By GA
  • 17.
    U-SQL SQL • Support offamiliar SQL clauses • Structured and Unstructured Data • Relational metadata objects .NET • U-SQL - full C# expressions • Reuse .NET code • Use C# for defining: • Types, Functions, Joins, Aggregations, I/O (Extractors, Outputters)
  • 22.
    Logical Plan ->Physical Plan • One node perspective • Physical plan created • Defines level of parallelism
  • 23.
    HDInsight • Managed HadoopCluster in the Cloud • Deploy Storm jobs from Visual Studio • Use C# to author event processing logic • Integrate existing packages & code
  • 25.
    ADLA vs HDInsight AzureData Lake Analytics HDInsight • Automatically Scale • Start quickly with C#, SQL, Visual Studio • Jobs - Convenient, efficient, automatic scale • Leverage open source tech – Java, Eclipse, Hive • Manage clusters – customization, control and flexibility
  • 26.
    Azure Data LakeStore • Hyper Scale Web HDFS store in the cloud • Store any data in native format • Enterprise grade • No limits to Scale • Optimized for analytic workload performance
  • 27.
    Azure Data LakeStore • Unlimited Storage (petabytes) • Optimized for Analytics • Parallel computing optimized • Auto optimization for any throughput • Reliable • Automatically replicates data (3 copies) • Highly available
  • 28.
  • 31.
    ADL StoreSDK-s JAVA C++.NET Node.js Python U-SQL Extensibility WebHDFS Client LibWebHDFS Yes Yes By GA Management Operations By GA Yes Yes By GA
  • 32.
  • 33.
    ADL Analytics BillingModel • Account is free • Pay for compute nodes for the duration of query • Formula (GA) = 5 cents + (minutes x parallelism x Analytics Unit Price) • Analytics Unit Price - $0.017 / minute • Preview – 50% discount
  • 34.
    ADL Store BillingModel • Account is free • Pay for amount of data - $0.08 / GB per month • Pay for number of I/O operations - $0.14 / million transactions • Preview – 50% discount
  • 35.
    Security • Based onAzure Active Directory • Federate with Enterprise Active Directory • Two factor authentication
  • 36.
    Security - Access •Role Based Access Control • Custom access with POSIX ACLs • Permissions for specific named users or groups
  • 37.
    Security - Runtime •All user code runs in VM • VM-s are locked down • Detailed audit records – who, when, what, how long • Audit logs available out of the box
  • 38.
    Security - Encryption •Encryption on the wire - Data uploaded via HTTPS • Encryption at rest – after public preview • Integration with Azure Key Vault for keys • Encryption is optional
  • 39.
    Security - Compliance •Certification • Azure compliance requirements • External auditing
  • 41.
  • 42.
    Tokyo Azure Meetup Learn| Share | Enjoy Cool Demos!

Editor's Notes

  • #2 https://www.getpostman.com/collections/2449c4125d7af478aed8 http://azjobsdemo.azurewebsites.net/ https://searchsamples.azurewebsites.net/#/