Big data is high-volume, high-velocity and/or
high-variety information assets that demand
cost-effective, innovative forms of information
processing that enable enhanced insight,
decision making, and process
However, there are challenges to Big Data…
Obtaining skills
and capabilities
Determining how
to get value
Integrating with
existing IT investments
Azure
HDInsight
A Cloud Spark and
Hadoop service for the
Enterprise
Reliable with an industry leading SLA
Enterprise-grade security and monitoring
Productive platform for developers and
scientists
Cost effective cloud scale
Integration with leading ISV applications
Easy for administrators to manage
63% lower TCO than deploy your own
Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
Big Data as a cornerstone of Cortana Intelligence
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Azure HDInsight
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready
Integration of OSS with
world class IDEs
• Submit Hive, Pig and Storm
jobs to an Azure HDInsight
cluster using Visual Studio
• Submit Spark jobs using IntelliJ
and Eclipse
• Other tools supported by
HDInsight include Hue, Hive
Views (Ambari) among others
Manage cluster using
popular admin tools
• HDInsight uses Ambari as
tool to manage and
administer Hadoop on
cloud
• Ensures that customers get
latest innovation in open
source to their clusters fast
Easy notebook
experience
for data engineers
• Most popular notebooks,
Jupyter & Zeppelin out-of-
the-box
• Combine code, statistical
equations and visualizations
• Worked w/ Jupyter
community to enhance kernel
to allow Spark execution
through REST endpoint
Easy for data scientists
with familiar R language
R Server for HDInsight
• Largest R-compatible parallel
analytics library
• Terabyte-scale machine
learning—1,000x larger than
in open source R
• Up to 100x faster performance
using Spark and optimized
vector/math libraries
• Enterprise-grade security
and support
Azure HDInsight
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready
Manage and secure
your data by
leveraging existing
IT investments
• Auditing, alerting, access
control—based on Apache
Ranger
• File and folder level ACLs
• Azure Active Directory
integration for identity
and access management
Highest availability
guarantee in the industry
for peace of mind
• Managed, monitored and
supported by Microsoft
• Enterprise-leading SLA—
99.9% uptime
• Top tier support from both
Microsoft
*Applies to HDInsight only
99.9% SLA
Validated by
top analysts
Forrester Wave for Big Data
Hadoop Cloud
• Named industry leader by
Forrester with the most
comprehensive, scalable, and
integrated platforms*
• Recognized for its cloud-first
strategy that is paying off*
*The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
Oozie Metastore
Hive Metastore
ADLS
Storage (Azure)
GW1
Head Node 1
Worker Node
Worker Node
Worker Node
Worker Node
Zookeeper1
Edge Node
HDI Management IP’s
168.61.49.99
23.99.5.239
168.61.48.131
138.91.141.162
VNET
Subnet
GW2Head Node 2
Zookeeper2 Zookeeper3
HDInsight Application Platform – Deep Dive
Why StreamSets
• Data Movement from disparate sources
• KPI’s on data transfer
• Manage Data drift (unexpected changes) at scale.
Machine Learning
and Analytics
Big Data Stores
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive Services
Power BI
Information
Management
Event Hubs
HDInsight
(Hadoop and Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine Learning
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Deep Integration with Microsoft Azure
STREAMSETS DATACOLLECTOR
*
Data Catalog
*
*
*
*
Data Lake Store
Use cases
• Building a hybrid solution
• Hadoop on-prem and HDInsight in cloud
• HDInsight platform benefits
• Leverage HDI benefits of PaaS platform.
• Scale out job execution using HDInsight
• Enterprise grade Security – Authentication
• Network Isolation
• Monitoring
• Cost effectiveness for cloud scale
Big Data Stores Machine Learning
and Analytics
Cask and Microsoft:
Enabling Customers’ Journey from Data Ingestion to Action
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
HDInsight
(Hadoop and Spark)
Stream Analytics
Intelligence
Data Lake Analytics
Machine Learning
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Capture, track and
analyze (meta)data
Enable governed
self-service
Develop, test, debug
& deploy apps
Ingest, explore &
serve data
Azure HDInsight application platform: Install solutions built
for the Apache Hadoop ecosystem
Install custom HDInsight applications
Try Datameer on HDInsight
Try AtScale on HDInsight
Try DSS from Dataiku on HDInsight
Try SDC from StreamSets on HDInsight
Try CDAP from Cask on HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight

Build Big Data Enterprise Solutions Faster on Azure HDInsight

  • 3.
    Big data ishigh-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process
  • 4.
    However, there arechallenges to Big Data… Obtaining skills and capabilities Determining how to get value Integrating with existing IT investments
  • 5.
    Azure HDInsight A Cloud Sparkand Hadoop service for the Enterprise Reliable with an industry leading SLA Enterprise-grade security and monitoring Productive platform for developers and scientists Cost effective cloud scale Integration with leading ISV applications Easy for administrators to manage 63% lower TCO than deploy your own Hadoop on-premises* *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  • 6.
    Big Data asa cornerstone of Cortana Intelligence Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  • 7.
    Azure HDInsight Big Datamade easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  • 8.
    Integration of OSSwith world class IDEs • Submit Hive, Pig and Storm jobs to an Azure HDInsight cluster using Visual Studio • Submit Spark jobs using IntelliJ and Eclipse • Other tools supported by HDInsight include Hue, Hive Views (Ambari) among others
  • 9.
    Manage cluster using popularadmin tools • HDInsight uses Ambari as tool to manage and administer Hadoop on cloud • Ensures that customers get latest innovation in open source to their clusters fast
  • 10.
    Easy notebook experience for dataengineers • Most popular notebooks, Jupyter & Zeppelin out-of- the-box • Combine code, statistical equations and visualizations • Worked w/ Jupyter community to enhance kernel to allow Spark execution through REST endpoint
  • 11.
    Easy for datascientists with familiar R language R Server for HDInsight • Largest R-compatible parallel analytics library • Terabyte-scale machine learning—1,000x larger than in open source R • Up to 100x faster performance using Spark and optimized vector/math libraries • Enterprise-grade security and support
  • 12.
    Azure HDInsight Big Datamade easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  • 13.
    Manage and secure yourdata by leveraging existing IT investments • Auditing, alerting, access control—based on Apache Ranger • File and folder level ACLs • Azure Active Directory integration for identity and access management
  • 14.
    Highest availability guarantee inthe industry for peace of mind • Managed, monitored and supported by Microsoft • Enterprise-leading SLA— 99.9% uptime • Top tier support from both Microsoft *Applies to HDInsight only 99.9% SLA
  • 15.
    Validated by top analysts ForresterWave for Big Data Hadoop Cloud • Named industry leader by Forrester with the most comprehensive, scalable, and integrated platforms* • Recognized for its cloud-first strategy that is paying off* *The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
  • 18.
    Oozie Metastore Hive Metastore ADLS Storage(Azure) GW1 Head Node 1 Worker Node Worker Node Worker Node Worker Node Zookeeper1 Edge Node HDI Management IP’s 168.61.49.99 23.99.5.239 168.61.48.131 138.91.141.162 VNET Subnet GW2Head Node 2 Zookeeper2 Zookeeper3 HDInsight Application Platform – Deep Dive
  • 26.
    Why StreamSets • DataMovement from disparate sources • KPI’s on data transfer • Manage Data drift (unexpected changes) at scale.
  • 27.
    Machine Learning and Analytics BigData Stores Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data Deep Integration with Microsoft Azure STREAMSETS DATACOLLECTOR * Data Catalog * * * * Data Lake Store
  • 29.
    Use cases • Buildinga hybrid solution • Hadoop on-prem and HDInsight in cloud • HDInsight platform benefits • Leverage HDI benefits of PaaS platform. • Scale out job execution using HDInsight • Enterprise grade Security – Authentication • Network Isolation • Monitoring • Cost effectiveness for cloud scale
  • 30.
    Big Data StoresMachine Learning and Analytics Cask and Microsoft: Enabling Customers’ Journey from Data Ingestion to Action Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data Capture, track and analyze (meta)data Enable governed self-service Develop, test, debug & deploy apps Ingest, explore & serve data
  • 31.
    Azure HDInsight applicationplatform: Install solutions built for the Apache Hadoop ecosystem Install custom HDInsight applications Try Datameer on HDInsight Try AtScale on HDInsight Try DSS from Dataiku on HDInsight Try SDC from StreamSets on HDInsight Try CDAP from Cask on HDInsight