SlideShare a Scribd company logo
1 of 24
Introduction to
Microsoft Azure HDInsight
Dattatrey Sindhol
2
Agenda
Introduction
Hadoop Distributions
Microsoft Azure HDInsight
Microsoft BI and Data Platform
HDInsight - Use Cases
HDInsight - Typical Implementation
Further Learning
3
Introduction
What is Big Data?
“Big Data is a collection
of data sets so large and
complex that it becomes
difficult to process using
on-hand database
management tools or
traditional data processing
applications”.
4
Introduction
Hadoop is an open source
framework, from Apache foundation,
capable of processing very large
volumes of heterogeneous data sets
in a distributed fashion across clusters
of commodity computers and
hardware using a simplified
programming model.
What is Hadoop?
5
Introduction
Conclusion
In simple terms, Big Data is the Challenge and Hadoop is the Solution.
6
Hadoop Distributions
Amazon Elastic
Map Reduce
(EMR)
Cloudera Hortonworks
IBM
InfoSphere
BigInsights
MapR
Pivotal Teradata Intel
Azure
HDInsight
Reference: How the 9 Leading Commercial Hadoop Distributions Stack Up
7
Which Distribution Should I Use?
Cost
Scalability
Availability
Existing Technology Stack
Existing Infrastructure
Existing Skillset
8
HDInsight - Overview
Microsoft’s
Hadoop
Distribution in
the Cloud
Offers Hadoop
on Windows
Platform
Based on
Hortonworks
Data Platform
(HDP)
Tightly
integrated
with Microsoft
Technology
Stack
9
HDInsight - Architecture
10
Microsoft Data Platform and Enterprise BI Ecosystem
11
Why HDInsight?
Microsoft Stack
Runs on Windows
Create & Destroy
On-Demand
DFS Implementation
in Blob Storage
DFS Implementation
in Blob Storage
Store data on Blob
Storage for Later Use
Automation using
PowerShell
Orchestration/Work
flow using SSIS
Scheduling using
SQL Agent
BI & Analytics with
Power BI
12
Considerations
Requires dropping and
re-creating the cluster to
scale-up/down
Storage and Cluster should be in
the same Data Center
13
HDInsight Versions
COMPONENT VERSION 1.6 VERSION 2.1 VERSION 3.0
VERSION 3.1
(Current/Default)
Hortonworks Data Platform (HDP) 1.1 1.3 2.0 2.1.7
Apache Hadoop & YARN 1.0.3 1.2.0 2.2.0 2.4.0
Tez 0.4.0
Apache Pig 0.9.3 0.11.0 0.12.0 0.12.1
Apache Hive & HCatalog 0.9.0 0.11.0 0.12.0 0.13.1
HBase 0.98.0
Apache Sqoop 1.4.2 1.4.3 1.4.4 1.4.4
Apache Oozie 3.2.0 3.3.2 4.0.0 4.0.0
Apache HCatalog 0.4.1 Merged with Hive Merged with Hive Merged with Hive
Apache Templeton 0.1.4 Merged with Hive Merged with Hive Merged with Hive
Ambari API v1.0 1.4.1 >=1.5.1
Zookeeper 3.4.5 3.4.5
Storm 0.9.1
Mahout 0.9.0
Phoenix 4.0.0.2.1.7.0-2162
14
HDInsight Use Case - Iterative Exploration
15
HDInsight Use Case - Data Warehouse on Demand
16
HDInsight Use Case - ETL Automation
17
HDInsight Use Case - BI Integration
18
Typical Implementation
Transactional
Social
Warehouse
Azure
Blob
Blob Blob
Blob Blob
Multi-Node
HDInsight Cluster
MapReduce
• Hive
• Java
Reporting and Analytics
• SSRS
• Excel
• Power BI
Web Logs
Clickstream
Files
(TXT, XML, JSON, ..)
Collaboration
Office 365 / SharePoint
19
Typical Implementation (Contd…)
E-CommerceInternalSystems
OLTP
Transactional
Internal Systems
Customers
Internal Systems
Team
Sqoop
Or AzCopy
Hive Metastore
MapReduce
Hive
Multi-Node
HDInsight Cluster
MapReduce
• Hive
• Pig
• Java
• Python
Collaboration, Reporting, and Analytics• SSRS
• Excel
• Power BI
PowerShell / SSIS / SQL Agent
Subscription & Cluster Management | Data Movement | Job Execution
Warehouse
Web Logs
Social
Web Logs
Azure
Blob Storage
Blob
Blob Blob
Blob
Blob
Blob
Blob
20
Further Reading and Learning Resources
• HDInsight Emulator
• http://azure.microsoft.com
• Learning map for HDInsight: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map
21
References
• http://msdn.microsoft.com/en-us/library/dn749804.aspx
• http://azure.microsoft.com/en-us/documentation/articles/hdinsight-
component-versioning/
• http://msdn.microsoft.com/en-us/library/dn749848.aspx
• http://msdn.microsoft.com/en-us/library/dn749787.aspx
• http://msdn.microsoft.com/en-us/library/dn749805.aspx
• http://msdn.microsoft.com/en-us/library/dn749876.aspx
22
Related Apache Projects
Term Description
Ambari / HUE Deployment, Configuration, and Monitoring
Avro / Parquet / RC / Sequence Data serialization system
Flume / S4 / Storm Collection and import of log and event data
Hbase / Cassandra Column-oriented database scaling to billions of rows
HCatalog Schema and Data Type Sharing over Pig, Hive, and MapReduce
Hive / Drill / Impala Data Warehouse with SQL-Like Access
Hive-QL/HQL SQL-Like Language to Query Hive
Mahout Library of machine learning and data mining algorithms
Pig High-level programming for Hadoop computations
Oozie Orchestration and workflow management
Sqoop Imports data from relational databases
Tez Application framework for graph
Whirr Cloud-agnostic deployment of clusters
MapReduce / YARN
MapReduce is a programming model for distributed data processing. MapReduce has undergone a
complete overhaul in hadoop-0.23 and we now have Map-Reduce 2.0 (MRv2) or YARN.
Zookeeper Configuration management and coordination
THANK YOU
24
Top 10
Mobile Companies
Top 5
Outsourced Product Development Companies
2012 Partner of the year
Windows Azure, Finalist
40
GLOBAL OFFICES
7500
EMPLOYEES
23
COUNTRIES
Excellence Award
Technology Agency of the Year

More Related Content

What's hot

Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
DataWorks Summit
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
 

What's hot (20)

Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 

Viewers also liked

Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDrive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
DataWorks Summit
 
Introduction to Cloud Computing and Windows Azure
Introduction to Cloud Computing and Windows AzureIntroduction to Cloud Computing and Windows Azure
Introduction to Cloud Computing and Windows Azure
Kaushal Bhavsar
 
NuVitae Presentation
NuVitae PresentationNuVitae Presentation
NuVitae Presentation
nuvitae
 
Open Source + Big Data = Big Money
Open Source + Big Data = Big Money Open Source + Big Data = Big Money
Open Source + Big Data = Big Money
sogrady
 

Viewers also liked (20)

Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
 
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDrive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
 
The Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft AzureThe Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft Azure
 
Introduction of Windows azure and overview
Introduction of Windows azure and overviewIntroduction of Windows azure and overview
Introduction of Windows azure and overview
 
Introduction to Cloud Computing and Windows Azure
Introduction to Cloud Computing and Windows AzureIntroduction to Cloud Computing and Windows Azure
Introduction to Cloud Computing and Windows Azure
 
Azure Cloud PPT
Azure Cloud PPTAzure Cloud PPT
Azure Cloud PPT
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
NuVitae Presentation
NuVitae PresentationNuVitae Presentation
NuVitae Presentation
 
A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primer
 
Open Source + Big Data = Big Money
Open Source + Big Data = Big Money Open Source + Big Data = Big Money
Open Source + Big Data = Big Money
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
Cortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BICortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BI
 
A quick introduction to AWS Kinesis
A quick introduction to AWS KinesisA quick introduction to AWS Kinesis
A quick introduction to AWS Kinesis
 
INTERNET OF THINGS
INTERNET OF THINGSINTERNET OF THINGS
INTERNET OF THINGS
 
Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage Azure
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 

Similar to Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol

Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
ArunshankarArjunan
 

Similar to Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol (20)

Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightStéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 

More from HARMAN Services

More from HARMAN Services (20)

3 Dimensions Of Transformation
3 Dimensions Of Transformation3 Dimensions Of Transformation
3 Dimensions Of Transformation
 
Testing Strategies to Deliver Consistent App Performance
Testing Strategies to Deliver Consistent App Performance Testing Strategies to Deliver Consistent App Performance
Testing Strategies to Deliver Consistent App Performance
 
How to Manage APIs in your Enterprise for Maximum Reusability and Governance
How to Manage APIs in your Enterprise for Maximum Reusability and GovernanceHow to Manage APIs in your Enterprise for Maximum Reusability and Governance
How to Manage APIs in your Enterprise for Maximum Reusability and Governance
 
Digital Transformation: Connected API Ecosystems
Digital Transformation: Connected API EcosystemsDigital Transformation: Connected API Ecosystems
Digital Transformation: Connected API Ecosystems
 
Webinar - Transforming Manufacturing with IoT
Webinar - Transforming Manufacturing with IoTWebinar - Transforming Manufacturing with IoT
Webinar - Transforming Manufacturing with IoT
 
Microsoft Azure Explained - Hitesh D Kesharia
Microsoft Azure Explained - Hitesh D KeshariaMicrosoft Azure Explained - Hitesh D Kesharia
Microsoft Azure Explained - Hitesh D Kesharia
 
15 Big Data Billionaires
15 Big Data Billionaires15 Big Data Billionaires
15 Big Data Billionaires
 
Digital Transformation in Travel
Digital Transformation in TravelDigital Transformation in Travel
Digital Transformation in Travel
 
Digital Transformation in Retail
Digital Transformation in RetailDigital Transformation in Retail
Digital Transformation in Retail
 
Digital Transformation in Media
Digital Transformation in MediaDigital Transformation in Media
Digital Transformation in Media
 
Digital Transformation in Hospitality
Digital Transformation in HospitalityDigital Transformation in Hospitality
Digital Transformation in Hospitality
 
Top LinkedIn Influencers Every CIO Must Follow
Top LinkedIn Influencers Every CIO Must Follow Top LinkedIn Influencers Every CIO Must Follow
Top LinkedIn Influencers Every CIO Must Follow
 
Ladbrokes and Aditi - Digital Transformation Case study
Ladbrokes and Aditi - Digital Transformation Case study Ladbrokes and Aditi - Digital Transformation Case study
Ladbrokes and Aditi - Digital Transformation Case study
 
How Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
How Internet of Things (IoT) is Reshaping the Automotive Sector - InfographicHow Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
How Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
 
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
 
Analyzing Gartner's CIO Study: Fliping to Digital Leadership
Analyzing Gartner's CIO Study: Fliping to Digital Leadership Analyzing Gartner's CIO Study: Fliping to Digital Leadership
Analyzing Gartner's CIO Study: Fliping to Digital Leadership
 
24 Connected Car features to look out for before the release of Bond 24
24 Connected Car features to look out for before the release of Bond 2424 Connected Car features to look out for before the release of Bond 24
24 Connected Car features to look out for before the release of Bond 24
 
Webinar: How I Met Your Connected Customer
Webinar: How I Met Your Connected CustomerWebinar: How I Met Your Connected Customer
Webinar: How I Met Your Connected Customer
 
5 Takeaways From The UX India Conference
5 Takeaways From The UX India Conference5 Takeaways From The UX India Conference
5 Takeaways From The UX India Conference
 
Cross-channel customer engagement: What 150 C-Level executives think about it!
Cross-channel customer engagement: What 150 C-Level executives think about it!Cross-channel customer engagement: What 150 C-Level executives think about it!
Cross-channel customer engagement: What 150 C-Level executives think about it!
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 

Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol

  • 1. Introduction to Microsoft Azure HDInsight Dattatrey Sindhol
  • 2. 2 Agenda Introduction Hadoop Distributions Microsoft Azure HDInsight Microsoft BI and Data Platform HDInsight - Use Cases HDInsight - Typical Implementation Further Learning
  • 3. 3 Introduction What is Big Data? “Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”.
  • 4. 4 Introduction Hadoop is an open source framework, from Apache foundation, capable of processing very large volumes of heterogeneous data sets in a distributed fashion across clusters of commodity computers and hardware using a simplified programming model. What is Hadoop?
  • 5. 5 Introduction Conclusion In simple terms, Big Data is the Challenge and Hadoop is the Solution.
  • 6. 6 Hadoop Distributions Amazon Elastic Map Reduce (EMR) Cloudera Hortonworks IBM InfoSphere BigInsights MapR Pivotal Teradata Intel Azure HDInsight Reference: How the 9 Leading Commercial Hadoop Distributions Stack Up
  • 7. 7 Which Distribution Should I Use? Cost Scalability Availability Existing Technology Stack Existing Infrastructure Existing Skillset
  • 8. 8 HDInsight - Overview Microsoft’s Hadoop Distribution in the Cloud Offers Hadoop on Windows Platform Based on Hortonworks Data Platform (HDP) Tightly integrated with Microsoft Technology Stack
  • 10. 10 Microsoft Data Platform and Enterprise BI Ecosystem
  • 11. 11 Why HDInsight? Microsoft Stack Runs on Windows Create & Destroy On-Demand DFS Implementation in Blob Storage DFS Implementation in Blob Storage Store data on Blob Storage for Later Use Automation using PowerShell Orchestration/Work flow using SSIS Scheduling using SQL Agent BI & Analytics with Power BI
  • 12. 12 Considerations Requires dropping and re-creating the cluster to scale-up/down Storage and Cluster should be in the same Data Center
  • 13. 13 HDInsight Versions COMPONENT VERSION 1.6 VERSION 2.1 VERSION 3.0 VERSION 3.1 (Current/Default) Hortonworks Data Platform (HDP) 1.1 1.3 2.0 2.1.7 Apache Hadoop & YARN 1.0.3 1.2.0 2.2.0 2.4.0 Tez 0.4.0 Apache Pig 0.9.3 0.11.0 0.12.0 0.12.1 Apache Hive & HCatalog 0.9.0 0.11.0 0.12.0 0.13.1 HBase 0.98.0 Apache Sqoop 1.4.2 1.4.3 1.4.4 1.4.4 Apache Oozie 3.2.0 3.3.2 4.0.0 4.0.0 Apache HCatalog 0.4.1 Merged with Hive Merged with Hive Merged with Hive Apache Templeton 0.1.4 Merged with Hive Merged with Hive Merged with Hive Ambari API v1.0 1.4.1 >=1.5.1 Zookeeper 3.4.5 3.4.5 Storm 0.9.1 Mahout 0.9.0 Phoenix 4.0.0.2.1.7.0-2162
  • 14. 14 HDInsight Use Case - Iterative Exploration
  • 15. 15 HDInsight Use Case - Data Warehouse on Demand
  • 16. 16 HDInsight Use Case - ETL Automation
  • 17. 17 HDInsight Use Case - BI Integration
  • 18. 18 Typical Implementation Transactional Social Warehouse Azure Blob Blob Blob Blob Blob Multi-Node HDInsight Cluster MapReduce • Hive • Java Reporting and Analytics • SSRS • Excel • Power BI Web Logs Clickstream Files (TXT, XML, JSON, ..) Collaboration Office 365 / SharePoint
  • 19. 19 Typical Implementation (Contd…) E-CommerceInternalSystems OLTP Transactional Internal Systems Customers Internal Systems Team Sqoop Or AzCopy Hive Metastore MapReduce Hive Multi-Node HDInsight Cluster MapReduce • Hive • Pig • Java • Python Collaboration, Reporting, and Analytics• SSRS • Excel • Power BI PowerShell / SSIS / SQL Agent Subscription & Cluster Management | Data Movement | Job Execution Warehouse Web Logs Social Web Logs Azure Blob Storage Blob Blob Blob Blob Blob Blob Blob
  • 20. 20 Further Reading and Learning Resources • HDInsight Emulator • http://azure.microsoft.com • Learning map for HDInsight: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map
  • 21. 21 References • http://msdn.microsoft.com/en-us/library/dn749804.aspx • http://azure.microsoft.com/en-us/documentation/articles/hdinsight- component-versioning/ • http://msdn.microsoft.com/en-us/library/dn749848.aspx • http://msdn.microsoft.com/en-us/library/dn749787.aspx • http://msdn.microsoft.com/en-us/library/dn749805.aspx • http://msdn.microsoft.com/en-us/library/dn749876.aspx
  • 22. 22 Related Apache Projects Term Description Ambari / HUE Deployment, Configuration, and Monitoring Avro / Parquet / RC / Sequence Data serialization system Flume / S4 / Storm Collection and import of log and event data Hbase / Cassandra Column-oriented database scaling to billions of rows HCatalog Schema and Data Type Sharing over Pig, Hive, and MapReduce Hive / Drill / Impala Data Warehouse with SQL-Like Access Hive-QL/HQL SQL-Like Language to Query Hive Mahout Library of machine learning and data mining algorithms Pig High-level programming for Hadoop computations Oozie Orchestration and workflow management Sqoop Imports data from relational databases Tez Application framework for graph Whirr Cloud-agnostic deployment of clusters MapReduce / YARN MapReduce is a programming model for distributed data processing. MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have Map-Reduce 2.0 (MRv2) or YARN. Zookeeper Configuration management and coordination
  • 24. 24 Top 10 Mobile Companies Top 5 Outsourced Product Development Companies 2012 Partner of the year Windows Azure, Finalist 40 GLOBAL OFFICES 7500 EMPLOYEES 23 COUNTRIES Excellence Award Technology Agency of the Year