Velocity
Big Data –
Store & Compute
About me
• Directeur Technique chez Cellenza
• MVP Azure
• Microsoft P-Seller
Machine Learning
and Analytics
Big Data Stores
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
HDInsight
(Hadoop and Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Big Data as part of Cortana Intelligence
MICROSOFTBIGDATASOLUTIONS
Pourquoi le big data ?
Architecture fonctionnelle
Présentation Apache Hadoop
Hadoop is a platform with portfolio of projects
Governed by Apache Software Foundation (ASF)
Comprises core services of MapReduce, HDFS, and YARN
In addition to the core, includes functions across:
Validated by
top analysts
Forrester Wave for Big Data
Hadoop Cloud
• Named industry leader by
Forrester with the most
comprehensive, scalable, and
integrated platforms*
• Recognized for its cloud-first
strategy that is paying off*
*The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
Challenges avec Hadoop
However, there are challenges to Big Data…
Obtaining skills
and capabilities
Determining how
to get value
Integrating with
existing IT investments
Pourquoi Hadoop dans le Cloud?
FULLY MANAGED AND SUPPORTED Hadoop,
Spark, HBase and Storm
Available on LINUX and WINDOWS
Works on AZURE STORAGE or DATA LAKE
STORE
100% OPEN SOURCE Apache Hadoop (HDP 2.3)
Clusters up and RUNNING IN MINUTES
Use familiar BI TOOLS FOR ANALYSIS like Excel
Azure HDInsight
Hadoop as a Service on Azure
No hardware licenses or service-specific support agreement
Pay only for what you use, when you need it, not more than
you need
Independently scale storage and compute
No need to hire specialized operations team to do big data
63% lower total cost of ownership than on-premises*
*Pending IDC study found on a per TB basis, Microsoft customers using cloud-based Hadoop in Data
Lake have a 63% lower TCO than on-premises
No limits to SCALE
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the cloud
ENTERPRISE READY access control, encryption
at rest
Optimized for analytic workload
PERFORMANCE
Azure Data Lake
Store
A hyper scale repository for big
data analytics workloads
IN PREVIEW
Hadoop Distributed File System (HDFS) For The
Cloud
Built from the ground-up as a Hadoop File
System
Support for file/folder objects and operations
Integrated w/ HDInsight, Hortonworks,
Cloudera
Accessible to all HDFS compliant projects
(Spark, Storm, Flume, Sqoop, Kafka, R, etc.)
HDInsight
Manage and Secure Your Data Assets
Azure Active Directory integration
File and folder level access control
Audit data access
Encryption of data-at-rest
Oozie Metastore
Hive Metastore
ADLS
Storage (Azure)
GW1
Head Node 1
Worker Node
Worker Node
Worker Node
Worker Node
Zookeeper1
Edge Node
HDI Management IP’s
168.61.49.99
23.99.5.239
168.61.48.131
138.91.141.162
VNET
Subnet
GW2Head Node 2
Zookeeper2 Zookeeper3
HDInsight Application Platform – Deep Dive
Distributed, parallel analytics
framework
U-SQL (based on C# and SQL)
Dial for scale
Hides infrastructure complexity
Visual Studio integration
Instant scale on demand
Reduced learning curve
Azure Data Lake Analytics
Azure Services for big data analytics
YARN
HDFS
HDInsightAnalytics
Service
Store
Partners
U-SQL
Clickstream
Sensors
Video
Social
Web
Devices
Relational
Applications
EXTRACT statement:
Schema on read that is powerful and simple
TABLES:
Structured big data with partitioning control
Deep integration to Visual Studio
Easy for novices to write simple queries
Robust environment for experts to also be productive
Integrated with U-SQL, Hive, and Storm
Playback that visualizes performance to identify
bottlenecks and areas for optimization
CREATE ASSEMBLY OrdersDB.SampleDotNetCode
FROM @"/DLLs/Helpers.dll";
REFERENCE ASSEMBLY OrdersDB.Helpers;
@rows =
SELECT
OrdersDB.Helpers.Normalize(Customer) AS Customer,
Amount AS Amount
FROM @orders;
Use code from the Assembly
Use Your own C# methods
Register an Assembly
Reference the Assembly
Démo
Azure Data Lake Analytics
Exemple Concret
Call Log Files
Customer Table
Call Log Files
Customer Table
Customer
Churn Table
Azure Data
Factory:
Data Sources
Customers
Likely to
Churn
Customer
Call Details
Transform & Analyze PublishIngest
Merci pour votre attention !
Q&A

Azure Big data

  • 1.
  • 2.
    About me • DirecteurTechnique chez Cellenza • MVP Azure • Microsoft P-Seller
  • 3.
    Machine Learning and Analytics BigData Stores Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data Big Data as part of Cortana Intelligence MICROSOFTBIGDATASOLUTIONS
  • 4.
  • 5.
  • 6.
  • 7.
    Hadoop is aplatform with portfolio of projects Governed by Apache Software Foundation (ASF) Comprises core services of MapReduce, HDFS, and YARN In addition to the core, includes functions across:
  • 8.
    Validated by top analysts ForresterWave for Big Data Hadoop Cloud • Named industry leader by Forrester with the most comprehensive, scalable, and integrated platforms* • Recognized for its cloud-first strategy that is paying off* *The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
  • 9.
  • 10.
    However, there arechallenges to Big Data… Obtaining skills and capabilities Determining how to get value Integrating with existing IT investments
  • 11.
  • 12.
    FULLY MANAGED ANDSUPPORTED Hadoop, Spark, HBase and Storm Available on LINUX and WINDOWS Works on AZURE STORAGE or DATA LAKE STORE 100% OPEN SOURCE Apache Hadoop (HDP 2.3) Clusters up and RUNNING IN MINUTES Use familiar BI TOOLS FOR ANALYSIS like Excel Azure HDInsight Hadoop as a Service on Azure
  • 13.
    No hardware licensesor service-specific support agreement Pay only for what you use, when you need it, not more than you need Independently scale storage and compute No need to hire specialized operations team to do big data 63% lower total cost of ownership than on-premises* *Pending IDC study found on a per TB basis, Microsoft customers using cloud-based Hadoop in Data Lake have a 63% lower TCO than on-premises
  • 14.
    No limits toSCALE Store ANY DATA in its native format HADOOP FILE SYSTEM (HDFS) for the cloud ENTERPRISE READY access control, encryption at rest Optimized for analytic workload PERFORMANCE Azure Data Lake Store A hyper scale repository for big data analytics workloads IN PREVIEW
  • 15.
    Hadoop Distributed FileSystem (HDFS) For The Cloud Built from the ground-up as a Hadoop File System Support for file/folder objects and operations Integrated w/ HDInsight, Hortonworks, Cloudera Accessible to all HDFS compliant projects (Spark, Storm, Flume, Sqoop, Kafka, R, etc.) HDInsight
  • 16.
    Manage and SecureYour Data Assets Azure Active Directory integration File and folder level access control Audit data access Encryption of data-at-rest
  • 17.
    Oozie Metastore Hive Metastore ADLS Storage(Azure) GW1 Head Node 1 Worker Node Worker Node Worker Node Worker Node Zookeeper1 Edge Node HDI Management IP’s 168.61.49.99 23.99.5.239 168.61.48.131 138.91.141.162 VNET Subnet GW2Head Node 2 Zookeeper2 Zookeeper3 HDInsight Application Platform – Deep Dive
  • 18.
    Distributed, parallel analytics framework U-SQL(based on C# and SQL) Dial for scale Hides infrastructure complexity Visual Studio integration Instant scale on demand Reduced learning curve Azure Data Lake Analytics Azure Services for big data analytics YARN HDFS HDInsightAnalytics Service Store Partners U-SQL Clickstream Sensors Video Social Web Devices Relational Applications
  • 19.
    EXTRACT statement: Schema onread that is powerful and simple TABLES: Structured big data with partitioning control
  • 20.
    Deep integration toVisual Studio Easy for novices to write simple queries Robust environment for experts to also be productive Integrated with U-SQL, Hive, and Storm Playback that visualizes performance to identify bottlenecks and areas for optimization
  • 21.
    CREATE ASSEMBLY OrdersDB.SampleDotNetCode FROM@"/DLLs/Helpers.dll"; REFERENCE ASSEMBLY OrdersDB.Helpers; @rows = SELECT OrdersDB.Helpers.Normalize(Customer) AS Customer, Amount AS Amount FROM @orders; Use code from the Assembly Use Your own C# methods Register an Assembly Reference the Assembly
  • 22.
  • 23.
    Exemple Concret Call LogFiles Customer Table Call Log Files Customer Table Customer Churn Table Azure Data Factory: Data Sources Customers Likely to Churn Customer Call Details Transform & Analyze PublishIngest
  • 24.
    Merci pour votreattention ! Q&A