SlideShare a Scribd company logo
1 of 35
Big Data in the Cloud –
The what, why and how from the experts
Nishant Thacker
Technical Product Manager – Big Data
Microsoft
@nishantthacker
Big Data in the
Cloud
2
Big Data in the
Cloud
3
Traditional Clusters
4
Challenges with implementing clusters
Hadoop Clusters in the Cloud
6
Why Hadoop in the cloud?
Distributed Storage
• Files split across storage
• Files replicated
• Nearest node responds
• Abstracted Administration
Hadoop/Spark Clusters
Extensible
• APIs to extend functionality
• Add new capabilities
• Allow for inclusion in custom
environments
Automated Failover
• Unmonitored failover to replicated data
• Built for resiliency
• Metadata stored for later retrieval
Hyper-Scale
• Add resources as desired
• Built to include commodity configs
• Direct correlation of performance and
resources
Distributed Compute
• Distributed processing
• Resource Utilization
• Cost-Efficient method calls
8
Distributed Storage
• Files split across storage
• Files replicated
• Nearest node responds
• Abstracted Administration
Cloud
Extensible
• APIs to extend functionality
• Add new capabilities
• Allow for inclusion in custom
environments
Automated Failover
• Unmonitored failover to replicated data
• Built for resiliency
• Metadata stored for later retrieval
Hyper-Scale
• Add resources as desired
• Built to include commodity configs
• Direct correlation of performance and
resources
Distributed Compute
• Distributed processing
• Resource Utilization
• Cost-Efficient method calls
9
Distributed Storage
• Files split across storage
• Files replicated
• Nearest node responds
• Abstracted Administration
Big Data in the Cloud
Extensible
• APIs to extend functionality
• Add new capabilities
• Allow for inclusion in custom
environments
Automated Failover
• Unmonitored failover to replicated data
• Built for resiliency
• Metadata stored for later retrieval
Hyper-Scale
• Add resources as desired
• Built to include commodity configs
• Direct correlation of performance and
resources
Distributed Compute
• Distributed processing
• Resource Utilization
• Cost-Efficient method calls
10
Big Data in the
Cloud
11
Big Data in the Cloud - Options
Scenarios for deploying as hybrid
Traditional Clusters – On Prem
14
Hadoop Cluster
Worker Node
HDFS
HDFS HDFS
Tasks Tasks Tasks Tasks Tasks Tasks
Task Tracker
Master Node
Client
Job (jar) file
Job (jar) file
Clusters in the Cloud
Azure
HDInsight
Hadoop and Spark as a
Service on Azure
Fully managed Hadoop and Spark for the cloud
100% Open Source Hortonworks Data Platform
Clusters up and running in minutes
Managed, monitored and supported by Microsoft
with the industry’s best enterprise SLA
Use familiar BI tools for analysis, or open source
notebooks for interactive data science
63% lower total cost of ownership than deploy
your own Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
HDInsight Cluster
Azure Data Lake Storage
HDInsight cluster
Domain credentials
Azure Storage Blob
Head node
Back-up
Data node
HDInsight Cluster Security
AAD tenant
Azure VNET to
VNET peering
HDInsight Cluster
Azure Data Lake Storage
Domain credentials
Azure Storage Blob
Head node
Back-up
Data node
Big Data as a Service
Compute requirement U-SQL
ADLS WASB
Decoupling Compute from Storage
Latency? Consistency?
Bandwidth?
Network
Decoupling Compute from Storage
Network
HDD-like latency
50 Tb+ aggregate
bandwidth[1]
Strong consistency
[1] Azure Flat Network Architecture
Decoupling - Benefits
Azure
Data Lake Store
A hyper scale
repository for big data
analytics workloads
Hadoop File System (HDFS) for the cloud
No limits to scale
Store any data in its native format
Enterprise grade access control and encryption
Optimized for analytic workload performance
Customize
cluster?
HDInsight cluster provisioning states
RDP to cluster, update
config files (non-durable)
Ad hoc
Cluster customization options
Hive/Oozie Metastore
Storage accounts & VNET’s
ScriptAction
Via Azure portal
Ready for
deployment
Accepted
Cluster
storage
provisioned
AzureVM
configuration
Running
Timed Out
Error
Cluster
operational
Configuring
HDInsight
Cluster
customization
(custom script
running
Config values
JAR file placement in
cluster
Via scripting / SDK
No
Yes
Cluster integration options
Each cluster surfaces a REST endpoint for integration,
secured via basic authN over SSL
/thrift – ODBC & JDBC
/Templeton – Job Submission,
Metadata management
/ambari – Cluster health,
monitoring
/oozie – Job orchestration,
scheduling
Big Data in the
Cloud
26
27
Big Data Application Architecture
The Azure Architecture
Source A
Source B
Source C
Data Factory
Azure Data Lake Store
Source D
Powershell
Stream
Analytics
HDInsight
Azure Data Lake Analytics
Azure SQL Data
Warehouse
Azure Analysis
Services
Ingestion Backend Frontend
Push
Stream
DAX
T-SQL
HiveQL
Analyst
Analyst
Analyst
Analyst
The Azure Architecture - Detailed
29
Introducing Cortana Intelligence Suite
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Where Big Data is a cornerstone
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Summary
32
 For more information on HDInsight visit: http://azure.com/hdinsight
 For more information on Data Lake visit: http://azure.com/datalake
nishant.thacker@microsoft.com
© 2016 Microsoft Corporation. All rights reserved.

More Related Content

What's hot

Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezJan Pieter Posthuma
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataOfir Manor
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersDataWorks Summit/Hadoop Summit
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersDataWorks Summit
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcDataWorks Summit
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyAlluxio, Inc.
 

What's hot (20)

Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 

Similar to Big Data in the Cloud - The What, Why and How from the Experts

Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsDataWorks Summit
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudAlluxio, Inc.
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 

Similar to Big Data in the Cloud - The What, Why and How from the Experts (20)

Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
Hadoop
HadoopHadoop
Hadoop
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Big Data in the Cloud - The What, Why and How from the Experts

  • 1. Big Data in the Cloud – The what, why and how from the experts Nishant Thacker Technical Product Manager – Big Data Microsoft @nishantthacker
  • 2. Big Data in the Cloud 2
  • 3. Big Data in the Cloud 3
  • 6. Hadoop Clusters in the Cloud 6
  • 7. Why Hadoop in the cloud?
  • 8. Distributed Storage • Files split across storage • Files replicated • Nearest node responds • Abstracted Administration Hadoop/Spark Clusters Extensible • APIs to extend functionality • Add new capabilities • Allow for inclusion in custom environments Automated Failover • Unmonitored failover to replicated data • Built for resiliency • Metadata stored for later retrieval Hyper-Scale • Add resources as desired • Built to include commodity configs • Direct correlation of performance and resources Distributed Compute • Distributed processing • Resource Utilization • Cost-Efficient method calls 8
  • 9. Distributed Storage • Files split across storage • Files replicated • Nearest node responds • Abstracted Administration Cloud Extensible • APIs to extend functionality • Add new capabilities • Allow for inclusion in custom environments Automated Failover • Unmonitored failover to replicated data • Built for resiliency • Metadata stored for later retrieval Hyper-Scale • Add resources as desired • Built to include commodity configs • Direct correlation of performance and resources Distributed Compute • Distributed processing • Resource Utilization • Cost-Efficient method calls 9
  • 10. Distributed Storage • Files split across storage • Files replicated • Nearest node responds • Abstracted Administration Big Data in the Cloud Extensible • APIs to extend functionality • Add new capabilities • Allow for inclusion in custom environments Automated Failover • Unmonitored failover to replicated data • Built for resiliency • Metadata stored for later retrieval Hyper-Scale • Add resources as desired • Built to include commodity configs • Direct correlation of performance and resources Distributed Compute • Distributed processing • Resource Utilization • Cost-Efficient method calls 10
  • 11. Big Data in the Cloud 11
  • 12. Big Data in the Cloud - Options
  • 14. Traditional Clusters – On Prem 14 Hadoop Cluster Worker Node HDFS HDFS HDFS Tasks Tasks Tasks Tasks Tasks Tasks Task Tracker Master Node Client Job (jar) file Job (jar) file
  • 16. Azure HDInsight Hadoop and Spark as a Service on Azure Fully managed Hadoop and Spark for the cloud 100% Open Source Hortonworks Data Platform Clusters up and running in minutes Managed, monitored and supported by Microsoft with the industry’s best enterprise SLA Use familiar BI tools for analysis, or open source notebooks for interactive data science 63% lower total cost of ownership than deploy your own Hadoop on-premises* *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  • 17. HDInsight Cluster Azure Data Lake Storage HDInsight cluster Domain credentials Azure Storage Blob Head node Back-up Data node
  • 18. HDInsight Cluster Security AAD tenant Azure VNET to VNET peering HDInsight Cluster Azure Data Lake Storage Domain credentials Azure Storage Blob Head node Back-up Data node
  • 19. Big Data as a Service Compute requirement U-SQL ADLS WASB
  • 20. Decoupling Compute from Storage Latency? Consistency? Bandwidth? Network
  • 21. Decoupling Compute from Storage Network HDD-like latency 50 Tb+ aggregate bandwidth[1] Strong consistency [1] Azure Flat Network Architecture
  • 23. Azure Data Lake Store A hyper scale repository for big data analytics workloads Hadoop File System (HDFS) for the cloud No limits to scale Store any data in its native format Enterprise grade access control and encryption Optimized for analytic workload performance
  • 24. Customize cluster? HDInsight cluster provisioning states RDP to cluster, update config files (non-durable) Ad hoc Cluster customization options Hive/Oozie Metastore Storage accounts & VNET’s ScriptAction Via Azure portal Ready for deployment Accepted Cluster storage provisioned AzureVM configuration Running Timed Out Error Cluster operational Configuring HDInsight Cluster customization (custom script running Config values JAR file placement in cluster Via scripting / SDK No Yes
  • 25. Cluster integration options Each cluster surfaces a REST endpoint for integration, secured via basic authN over SSL /thrift – ODBC & JDBC /Templeton – Job Submission, Metadata management /ambari – Cluster health, monitoring /oozie – Job orchestration, scheduling
  • 26. Big Data in the Cloud 26
  • 27. 27 Big Data Application Architecture
  • 28. The Azure Architecture Source A Source B Source C Data Factory Azure Data Lake Store Source D Powershell Stream Analytics HDInsight Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services Ingestion Backend Frontend Push Stream DAX T-SQL HiveQL Analyst Analyst Analyst Analyst
  • 29. The Azure Architecture - Detailed 29
  • 30. Introducing Cortana Intelligence Suite Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  • 31. Where Big Data is a cornerstone Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  • 33.  For more information on HDInsight visit: http://azure.com/hdinsight  For more information on Data Lake visit: http://azure.com/datalake
  • 35. © 2016 Microsoft Corporation. All rights reserved.

Editor's Notes

  1. Hardware acquisition (Capex up front) Scale constrained to on-premise procurement (resource and capacity planning) Skilled Hadoop Expertise Tuning + Maintenance
  2. Why Hadoop in the cloud? You can deploy Hadoop in a traditional on-site datacenter. Some companies–including Microsoft–also offer Hadoop as a cloud-based service. One obvious question is: why use Hadoop in the cloud? Here's why a growing number of organizations are choosing this option. The cloud saves time and money Open source doesn't mean free. Deploying Hadoop on-premises still requires servers and skilled Hadoop experts to set up, tune, and maintain them. A cloud service lets you spin up a Hadoop cluster in minutes without up-front costs. See how Virginia Tech is using Microsoft's cloud instead of spending millions of dollars to establish their own supercomputing center. The cloud is flexible and scales fast In the Microsoft Azure cloud, you pay only for the compute and storage you use, when you use it. Spin up a Hadoop cluster, analyze your data, then shut it down to stop the meter. We quickly spun up the Azure HDInsight cluster and processed six years worth of data in just a few hours, and then we shut it down&ellipsis; processing the data in the cloud made it very affordable. –Paul Henderson, National Health Service (U.K.) The cloud makes you nimble Create a Hadoop cluster in minutes–and add nodes on-demand. The cloud offers organizations immediate time to value. It was simply so much faster to do this in the cloud with Windows Azure. We were able to implement the solution and start working with data in less than a week. –Morten Meldgaard, Chr. Hansen
  3. This topic explores how you can get data into your Big Data solution. It describes several different but typical data ingestion techniques that are generally applicable to any Big Data solution. These techniques include ways to handle streaming data and for automating the ingestion process. While the focus is primarily on Microsoft Azure HDInsight, many of the techniques described here are equally relevant to solutions built on other Big Data frameworks and platforms. The figure shows an overview of the techniques and technologies covered in this section of the guide.
  4. This topic explores how you can get data into your Big Data solution. It describes several different but typical data ingestion techniques that are generally applicable to any Big Data solution. These techniques include ways to handle streaming data and for automating the ingestion process. While the focus is primarily on Microsoft Azure HDInsight, many of the techniques described here are equally relevant to solutions built on other Big Data frameworks and platforms. The figure shows an overview of the techniques and technologies covered in this section of the guide.
  5. This topic explores how you can get data into your Big Data solution. It describes several different but typical data ingestion techniques that are generally applicable to any Big Data solution. These techniques include ways to handle streaming data and for automating the ingestion process. While the focus is primarily on Microsoft Azure HDInsight, many of the techniques described here are equally relevant to solutions built on other Big Data frameworks and platforms. The figure shows an overview of the techniques and technologies covered in this section of the guide.
  6. This topic explores how you can get data into your Big Data solution. It describes several different but typical data ingestion techniques that are generally applicable to any Big Data solution. These techniques include ways to handle streaming data and for automating the ingestion process. While the focus is primarily on Microsoft Azure HDInsight, many of the techniques described here are equally relevant to solutions built on other Big Data frameworks and platforms. The figure shows an overview of the techniques and technologies covered in this section of the guide.
  7. This topic explores how you can get data into your Big Data solution. It describes several different but typical data ingestion techniques that are generally applicable to any Big Data solution. These techniques include ways to handle streaming data and for automating the ingestion process. While the focus is primarily on Microsoft Azure HDInsight, many of the techniques described here are equally relevant to solutions built on other Big Data frameworks and platforms. The figure shows an overview of the techniques and technologies covered in this section of the guide.
  8. This topic explores how you can get data into your Big Data solution. It describes several different but typical data ingestion techniques that are generally applicable to any Big Data solution. These techniques include ways to handle streaming data and for automating the ingestion process. While the focus is primarily on Microsoft Azure HDInsight, many of the techniques described here are equally relevant to solutions built on other Big Data frameworks and platforms. The figure shows an overview of the techniques and technologies covered in this section of the guide.
  9. Cortana Intelligence delivers an end-to-end platform with an integrated and comprehensive set of tools and services to help you build intelligent applications that let you easily take advantage of Advanced Analytics and intelligence capabilities. First, Cortana Intelligence provides services to bring data in, so that you can analyze it.  It provides information management capabilities like Azure Data Factory so that you can pull data from any source (relational DB like SQL or non-relational ones like your Hadoop cluster) in an automated and scheduled way, while performing the necessary data transforms (like setting certain data columns as dates vs. currency etc).  Think ETL (Extract, Transform, Load) in the cloud. Event Hubs does the same for IoT type ingestion of data that streams in from lots of end points. The data brought in then can be persisted in flexible big data storage services like Data Lake Store and Azure SQL Data Warehouse. You can then use a wide range of analytics services from Machine Learning to Azure Data Lake Analytics to Azure HDInsight to Azure Stream Analytics to analyze the data stored in the big data storage.  This means you can create analytics services and models specific to your business need (say real time demand forecasting). The resultant analytics services and models created by taking these steps can then be surfaced as interactive dashboards and visualizations via Power BI. These same analytics services and models created can also be integrated into various different UI (web apps or mobile apps or rich client apps), or with Cortana, so end users can naturally interact with them via speech etc., and so that end users can get proactively be notified by Cortana if the analytics model finds a new anomaly (unusual growth in certain product purchases- in the case of real time demand forecasting example given above) or whatever deserves the attention of the business users. Similar integration can occur with Cognitive Services or Bot Framework based applications. At a high level though, Cortana Intelligence capabilities are in three main areas: data, analytics and intelligence. <Transition>: We’re going to dive into each one, starting with data.
  10. Cortana Intelligence delivers an end-to-end platform with an integrated and comprehensive set of tools and services to help you build intelligent applications that let you easily take advantage of Advanced Analytics and intelligence capabilities. First, Cortana Intelligence provides services to bring data in, so that you can analyze it.  It provides information management capabilities like Azure Data Factory so that you can pull data from any source (relational DB like SQL or non-relational ones like your Hadoop cluster) in an automated and scheduled way, while performing the necessary data transforms (like setting certain data columns as dates vs. currency etc).  Think ETL (Extract, Transform, Load) in the cloud. Event Hubs does the same for IoT type ingestion of data that streams in from lots of end points. The data brought in then can be persisted in flexible big data storage services like Data Lake Store and Azure SQL Data Warehouse. You can then use a wide range of analytics services from Machine Learning to Azure Data Lake Analytics to Azure HDInsight to Azure Stream Analytics to analyze the data stored in the big data storage.  This means you can create analytics services and models specific to your business need (say real time demand forecasting). The resultant analytics services and models created by taking these steps can then be surfaced as interactive dashboards and visualizations via Power BI. These same analytics services and models created can also be integrated into various different UI (web apps or mobile apps or rich client apps), or with Cortana, so end users can naturally interact with them via speech etc., and so that end users can get proactively be notified by Cortana if the analytics model finds a new anomaly (unusual growth in certain product purchases- in the case of real time demand forecasting example given above) or whatever deserves the attention of the business users. Similar integration can occur with Cognitive Services or Bot Framework based applications. At a high level though, Cortana Intelligence capabilities are in three main areas: data, analytics and intelligence. <Transition>: We’re going to dive into each one, starting with data.