SlideShare a Scribd company logo
Session Objectives And Takeaways
 Understanding HDInsight cluster types & tiers in Azure
 HBase as a Hadoop NoSQL database
 Hive is a data warehouse software to manage large datasets
using SQL
 Understanding data processing options in Hadoop ecosystem
using Storm and Spark
• HDInsight is a cloud implementation on Microsoft Azure of the rapidly expanding Apache
Hadoop technology stack that is the go-to solution for big data analysis.
• It includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie,
Ambari, and so on.
• HDInsight also integrates with business intelligence (BI) tools such as Power BI, Excel, SQL
Server Analysis Services, and SQL Server Reporting Services.
• HDInsight is available on Windows and Linux
• HDInsight on Linux: A Hadoop cluster on Ubuntu
• HDInsight on Windows: A Hadoop cluster on Win Server 2012 R2
What is HDInsight
• HDInsight provides cluster Types & custom configurations for:
• Hadoop (HDFS)
• HBase
• Storm
• Spark
• R Server (Preview)
• Skip maintaining and purchasing hardware
• HDInsight has powerful programming extensions for languages including C#, Java,
and .NET. Use your programming language of choice on Hadoop to create, configure,
submit, and monitor Hadoop jobs.
HDInsight clusters on Azure
HDInsight clusters on Azure
• Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled
after Google BigTable.
• HBase provides random access and strong consistency for large amounts of unstructured
and semistructured data in a schemaless database organized by column families
• Data is stored in the rows of a table, and data within a row is grouped by column family.
• The open-source code scales linearly to handle petabytes of data on thousands of nodes.
It can rely on data redundancy, batch processing, and other features that are provided by
distributed applications in the Hadoop ecosystem.
What is HBase
Order No Customer Name Customer Phone Company Name Company
Address
12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
Customer Company
Order No Customer Name Customer Phone Company Name Company Address
12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
• HBase Commands:
• create  Equivalent to Create table in T-SQL
• get  Equivalent to Select statements in T-SQL
• put  Equivalent to Update, Insert statement in T-SQL
• scan  Equivalent to Select (no where condition) in T-SQL
• HBase shell is your query tool to execute in CRUD commands to a HBase cluster.
• Data can also be managed using the HBase C# API, which provides a client library on top
of the HBase REST API.
• An HBase database can also be queried by using Hive using SQLHive.
What is HBase
• Apache Hive is a data warehouse system for Hadoop, which enables data summarization,
querying, and analysis of data by using HiveQL (a query language similar to SQL).
• Hive understands how to work with structured and semi-structured data, such as text files
where the fields are delimited by specific characters.
• Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly
structured data.
• Hive can also be extended through user-defined functions (UDF).
• A UDF allows you to implement functionality or logic that isn't easily modeled in HiveQL.
What is Hive
• Apache Storm is a distributed, fault-tolerant, open-source computation system that allows
you to process data in real-time with Hadoop.
• Apache Storm on HDInsight allows you to create distributed, real-time analytics solutions
in the Azure environment by using Apache Hadoop.
• Storm solutions can also provide guaranteed processing of data, with the ability to replay
data that was not successfully processed the first time.
• Ability to write Storm components in C#, JAVA and Python.
• Azure Scale up or Scale down without an impact for running Storm topologies.
• Ease of provision and use in Azure portal.
• Visual Studio project templates for Storm apps
What is Apache Storm
• Apache Storm apps are submitted as Topologies.
• A topology is a graph of computation that processes streams
• Stream: An unbound collection of tuples. Streams are produced by spouts and bolts, and
they are consumed by bolts.
• Tuple: A named list of dynamically typed values.
• Spout: Consumes data from a data source and emits one or more streams.
• Bolt: Consumes streams, performs processing on tuples, and may emit streams. Bolts are
also responsible for writing data to external storage, such as a queue, HDInsight, HBase, a
blob, or other data store.
• Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures.
Apache Storm Components
• Apache Spark™ is a fast and general engine for large-scale data processing.
• Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on
disk.
• Write applications quickly in Java, Scala, Python, R.
• Combine SQL, streaming, and complex analytics.
• Spark's in-memory computation capabilities
make it a good choice for iterative algorithms in
ML and graph computations.
• Spark is also compatible with Azure Blob storage (WASB) so your existing data stored in
Azure can easily be processed via Spark.
• Support for R Server & Azure Data Lake.
What is Apache Spark
Session Objectives And Takeaways
 Understanding HDInsight cluster types & tiers in Azure
 HBase as a Hadoop NoSQL database
 Hive is a data warehouse software to manage large datasets
using SQL
 Understanding data processing options in Hadoop ecosystem
using Storm and Spark
Building Big data solutions in Azure

More Related Content

What's hot

The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
Dremio Corporation
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
Ashish Thapliyal
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
vhrocca
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowSimplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
PyData
 
Webinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeWebinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data Lake
Josh Lane
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
PASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DivePASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep Dive
Travis Wright
 
What database
What databaseWhat database
What database
Regunath B
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse Overview
Justin Munsters
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
Neev Technologies
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
DataWorks Summit
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Data Con LA
 
Tomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQLTomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQLThe Hive
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
DataWorks Summit
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 

What's hot (19)

The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowSimplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
 
Webinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeWebinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data Lake
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
PASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DivePASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep Dive
 
What database
What databaseWhat database
What database
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse Overview
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
Tomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQLTomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQL
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 

Viewers also liked

Patterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-insPatterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-ins
Mostafa
 
Build intelligent solutions using Azure
Build intelligent solutions using AzureBuild intelligent solutions using Azure
Build intelligent solutions using Azure
Mostafa
 
Extending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook ConnectorsExtending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook Connectors
Mostafa
 
Introducing Power BI Embedded
Introducing Power BI EmbeddedIntroducing Power BI Embedded
Introducing Power BI Embedded
Mostafa
 
Data science essentials in azure ml
Data science essentials in azure mlData science essentials in azure ml
Data science essentials in azure ml
Mostafa
 
Build Interactive Analytics using Power BI
Build Interactive Analytics using Power BIBuild Interactive Analytics using Power BI
Build Interactive Analytics using Power BI
Mostafa
 
PnP in building office add ins - public
PnP in building office add ins - publicPnP in building office add ins - public
PnP in building office add ins - public
Mostafa
 
Build intelligent solutions using ms azure
Build intelligent solutions using ms azureBuild intelligent solutions using ms azure
Build intelligent solutions using ms azure
Mostafa
 
Mistakes that kill startups
Mistakes that kill startupsMistakes that kill startups
Mistakes that kill startups
Mostafa
 
TypeScript Jump Start
TypeScript Jump StartTypeScript Jump Start
TypeScript Jump Start
Mostafa
 
Azure architecture
Azure architectureAzure architecture
Azure architectureAmal Dev
 
Windows Azure and the Hybrid Cloud
Windows Azure and the Hybrid CloudWindows Azure and the Hybrid Cloud
Windows Azure and the Hybrid Cloud
Windows Azure
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
Mostafa
 
Improving Application Security With Azure
Improving Application Security With AzureImproving Application Security With Azure
Improving Application Security With Azure
Softchoice Corporation
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
Tomasz Kopacz
 
Architecting azure IaaS Solutions
Architecting azure IaaS SolutionsArchitecting azure IaaS Solutions
Architecting azure IaaS Solutions
swapnilrkambli
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
Mostafa
 
Microsoft Azure Hybrid Cloud - Getting Started For Techies
Microsoft Azure Hybrid Cloud - Getting Started For TechiesMicrosoft Azure Hybrid Cloud - Getting Started For Techies
Microsoft Azure Hybrid Cloud - Getting Started For Techies
Aidan Finn
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
Mostafa
 
Azure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAzure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data Center
Adnan Hashmi
 

Viewers also liked (20)

Patterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-insPatterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-ins
 
Build intelligent solutions using Azure
Build intelligent solutions using AzureBuild intelligent solutions using Azure
Build intelligent solutions using Azure
 
Extending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook ConnectorsExtending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook Connectors
 
Introducing Power BI Embedded
Introducing Power BI EmbeddedIntroducing Power BI Embedded
Introducing Power BI Embedded
 
Data science essentials in azure ml
Data science essentials in azure mlData science essentials in azure ml
Data science essentials in azure ml
 
Build Interactive Analytics using Power BI
Build Interactive Analytics using Power BIBuild Interactive Analytics using Power BI
Build Interactive Analytics using Power BI
 
PnP in building office add ins - public
PnP in building office add ins - publicPnP in building office add ins - public
PnP in building office add ins - public
 
Build intelligent solutions using ms azure
Build intelligent solutions using ms azureBuild intelligent solutions using ms azure
Build intelligent solutions using ms azure
 
Mistakes that kill startups
Mistakes that kill startupsMistakes that kill startups
Mistakes that kill startups
 
TypeScript Jump Start
TypeScript Jump StartTypeScript Jump Start
TypeScript Jump Start
 
Azure architecture
Azure architectureAzure architecture
Azure architecture
 
Windows Azure and the Hybrid Cloud
Windows Azure and the Hybrid CloudWindows Azure and the Hybrid Cloud
Windows Azure and the Hybrid Cloud
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
Improving Application Security With Azure
Improving Application Security With AzureImproving Application Security With Azure
Improving Application Security With Azure
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Architecting azure IaaS Solutions
Architecting azure IaaS SolutionsArchitecting azure IaaS Solutions
Architecting azure IaaS Solutions
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Microsoft Azure Hybrid Cloud - Getting Started For Techies
Microsoft Azure Hybrid Cloud - Getting Started For TechiesMicrosoft Azure Hybrid Cloud - Getting Started For Techies
Microsoft Azure Hybrid Cloud - Getting Started For Techies
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Azure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAzure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data Center
 

Similar to Building Big data solutions in Azure

Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
Mostafa
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
Mostafa
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsight
Eng Teong Cheah
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
Some corner at the Laboratory
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
sheetal sharma
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
sheetal sharma
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
Anthony Thomas
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
SandeepTaksande
 
Hive
HiveHive
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
Omar Jaber
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
葵慶 李
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
André Faria Gomes
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
Microsoft TechNet - Belgium and Luxembourg
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 

Similar to Building Big data solutions in Azure (20)

Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsight
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
 
Hive
HiveHive
Hive
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 

More from Mostafa

The role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud publicThe role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud public
Mostafa
 
Skill up in machine learning using Azure ML
Skill up in machine learning using Azure MLSkill up in machine learning using Azure ML
Skill up in machine learning using Azure ML
Mostafa
 
Architecting big data solutions in the cloud
Architecting big data solutions in the cloudArchitecting big data solutions in the cloud
Architecting big data solutions in the cloud
Mostafa
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 
How to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud serviceHow to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud service
Mostafa
 
HBase introduction in azure
HBase introduction in azureHBase introduction in azure
HBase introduction in azure
Mostafa
 
eRecall
eRecalleRecall
eRecall
Mostafa
 
Get your site microsoft edge ready
Get your site microsoft edge readyGet your site microsoft edge ready
Get your site microsoft edge ready
Mostafa
 
Developing cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache CordovaDeveloping cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache Cordova
Mostafa
 
Identity and o365 on Azure
Identity and o365 on AzureIdentity and o365 on Azure
Identity and o365 on Azure
Mostafa
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platform
Mostafa
 
Building IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & AzureBuilding IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & Azure
Mostafa
 

More from Mostafa (12)

The role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud publicThe role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud public
 
Skill up in machine learning using Azure ML
Skill up in machine learning using Azure MLSkill up in machine learning using Azure ML
Skill up in machine learning using Azure ML
 
Architecting big data solutions in the cloud
Architecting big data solutions in the cloudArchitecting big data solutions in the cloud
Architecting big data solutions in the cloud
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
How to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud serviceHow to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud service
 
HBase introduction in azure
HBase introduction in azureHBase introduction in azure
HBase introduction in azure
 
eRecall
eRecalleRecall
eRecall
 
Get your site microsoft edge ready
Get your site microsoft edge readyGet your site microsoft edge ready
Get your site microsoft edge ready
 
Developing cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache CordovaDeveloping cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache Cordova
 
Identity and o365 on Azure
Identity and o365 on AzureIdentity and o365 on Azure
Identity and o365 on Azure
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platform
 
Building IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & AzureBuilding IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & Azure
 

Recently uploaded

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Building Big data solutions in Azure

  • 1.
  • 2. Session Objectives And Takeaways  Understanding HDInsight cluster types & tiers in Azure  HBase as a Hadoop NoSQL database  Hive is a data warehouse software to manage large datasets using SQL  Understanding data processing options in Hadoop ecosystem using Storm and Spark
  • 3. • HDInsight is a cloud implementation on Microsoft Azure of the rapidly expanding Apache Hadoop technology stack that is the go-to solution for big data analysis. • It includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie, Ambari, and so on. • HDInsight also integrates with business intelligence (BI) tools such as Power BI, Excel, SQL Server Analysis Services, and SQL Server Reporting Services. • HDInsight is available on Windows and Linux • HDInsight on Linux: A Hadoop cluster on Ubuntu • HDInsight on Windows: A Hadoop cluster on Win Server 2012 R2 What is HDInsight
  • 4. • HDInsight provides cluster Types & custom configurations for: • Hadoop (HDFS) • HBase • Storm • Spark • R Server (Preview) • Skip maintaining and purchasing hardware • HDInsight has powerful programming extensions for languages including C#, Java, and .NET. Use your programming language of choice on Hadoop to create, configure, submit, and monitor Hadoop jobs. HDInsight clusters on Azure
  • 6. • Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable. • HBase provides random access and strong consistency for large amounts of unstructured and semistructured data in a schemaless database organized by column families • Data is stored in the rows of a table, and data within a row is grouped by column family. • The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop ecosystem. What is HBase
  • 7. Order No Customer Name Customer Phone Company Name Company Address 12012015 Mostafa 101-232-2345 Microsoft Redmond, WA Customer Company Order No Customer Name Customer Phone Company Name Company Address 12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
  • 8. • HBase Commands: • create  Equivalent to Create table in T-SQL • get  Equivalent to Select statements in T-SQL • put  Equivalent to Update, Insert statement in T-SQL • scan  Equivalent to Select (no where condition) in T-SQL • HBase shell is your query tool to execute in CRUD commands to a HBase cluster. • Data can also be managed using the HBase C# API, which provides a client library on top of the HBase REST API. • An HBase database can also be queried by using Hive using SQLHive. What is HBase
  • 9. • Apache Hive is a data warehouse system for Hadoop, which enables data summarization, querying, and analysis of data by using HiveQL (a query language similar to SQL). • Hive understands how to work with structured and semi-structured data, such as text files where the fields are delimited by specific characters. • Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly structured data. • Hive can also be extended through user-defined functions (UDF). • A UDF allows you to implement functionality or logic that isn't easily modeled in HiveQL. What is Hive
  • 10.
  • 11. • Apache Storm is a distributed, fault-tolerant, open-source computation system that allows you to process data in real-time with Hadoop. • Apache Storm on HDInsight allows you to create distributed, real-time analytics solutions in the Azure environment by using Apache Hadoop. • Storm solutions can also provide guaranteed processing of data, with the ability to replay data that was not successfully processed the first time. • Ability to write Storm components in C#, JAVA and Python. • Azure Scale up or Scale down without an impact for running Storm topologies. • Ease of provision and use in Azure portal. • Visual Studio project templates for Storm apps What is Apache Storm
  • 12. • Apache Storm apps are submitted as Topologies. • A topology is a graph of computation that processes streams • Stream: An unbound collection of tuples. Streams are produced by spouts and bolts, and they are consumed by bolts. • Tuple: A named list of dynamically typed values. • Spout: Consumes data from a data source and emits one or more streams. • Bolt: Consumes streams, performs processing on tuples, and may emit streams. Bolts are also responsible for writing data to external storage, such as a queue, HDInsight, HBase, a blob, or other data store. • Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures. Apache Storm Components
  • 13.
  • 14. • Apache Spark™ is a fast and general engine for large-scale data processing. • Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. • Write applications quickly in Java, Scala, Python, R. • Combine SQL, streaming, and complex analytics. • Spark's in-memory computation capabilities make it a good choice for iterative algorithms in ML and graph computations. • Spark is also compatible with Azure Blob storage (WASB) so your existing data stored in Azure can easily be processed via Spark. • Support for R Server & Azure Data Lake. What is Apache Spark
  • 15.
  • 16. Session Objectives And Takeaways  Understanding HDInsight cluster types & tiers in Azure  HBase as a Hadoop NoSQL database  Hive is a data warehouse software to manage large datasets using SQL  Understanding data processing options in Hadoop ecosystem using Storm and Spark

Editor's Notes

  1. The session covers how to get started to build big data solutions in Azure. Azure provides different Hadoop clusters for Hadoop ecosystem. The session covers the basic understanding of HDInsight clusters including: Apache Hadoop, HBase, Storm and Spark. The session covers how to integrate with HDInsight in .NET using different Hadoop integration frameworks and libraries. The session is a jump start for engineers and DBAs with RDBMS experience who are looking for a jump start working and developing Hadoop solutions. The session is a demo driven and will cover the basics of Hadoop open source products.
  2. The session covers how to get started to build big data solutions in Azure. Azure provides different Hadoop clusters for Hadoop ecosystem. The session covers the basic understanding of HDInsight clusters including: Apache Hadoop, HBase, Storm and Spark. The session covers how to integrate with HDInsight in .NET using different Hadoop integration frameworks and libraries. The session is a jump start for engineers and DBAs with RDBMS experience who are looking for a jump start working and developing Hadoop solutions. The session is a demo driven and will cover the basics of Hadoop open source products.
  3. Ref: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/
  4. Ref: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-overview/
  5. Ref: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-overview/
  6. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-tutorial-get-started/ A) Working with hbase shell: Create a table. Insert a record. Update a record. Delete a record. Create a hive table that maps to hbase table we just created. B) Working with Hive: use the dashboard to create database and tables.
  7. Apache Storm in HDInsight https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-overview/
  8. Apache Storm in HDInsight https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-overview/ Tips: The Nimbus node provides similar functionality to the Hadoop JobTracker, and it assigns tasks to other nodes in the cluster through Zookeeper.
  9. Demo: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-develop-csharp-visual-studio-topology/ Overview in HDInsight project templates in Visual Studio 2015: Create storm application Create Hive Application
  10. Ref: http://spark.apache.org/ https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-overview/
  11. Demo: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-ipython-notebook-machine-learning/ Apache Spark notepads https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-jupyter-spark-sql/
  12. HD Insight main documentation: https://azure.microsoft.com/en-us/documentation/services/hdinsight/