SlideShare a Scribd company logo
1 of 20
Big data and Hadoop
Learn how Hadoop deals problems associated with Big data Analysis
Srikanth M V
There are 30 billion pieces of content
shared on Facebook every day.
Wal-Mart handles more than 1
million customer transactions an
hour.
More than 5 billion people are
calling, texting, tweeting and
browsing websites using
Smart phones.
The 3-Vs of Big Data
Volume
Giga Bytes, Tera Bytes,
Peta Bytes or Zeta
Bytes….
Velocity
The rate at which data
flows into an
organization
Variety
Structured and
Unstructured
So What is Big Data?
• Big data is large and complex data sets collected
from various sources like Sensors, Social Media,
Satellite images, Audio, Video, RFID etc.
• Big data is data that exceeds the processing
capacity of conventional database systems.
• How ‘Big’ is big?
 GB, TB, PB , ZB?? NO..
 Data is Big when the organization’s ability to handle,
store and analyze exceeds its capacity.
Problem:
Storing and Analyzing
“Big Data”
Solution*:
Move compute to data
*One among many, but Hadoop is flexible, Simple and reliable.
Hey there, I’m
Hadoop and I
can do that
for you..
• Created: 2005
• Creators: Doug Cutting and Mike Cafarella
• Contributors: Apache, Yahoo, Google
• Language: Java
How Hadoop deals with “Big data”
• Primary Components
• HDFS – Hadoop Distributed File System
• Map Reduce
• Hadoop YARN
• Job Scheduling and Resource Management
• Hadoop Common
• Access to file system
HDFS
• Distributed, scalable,
reliable and portable file
system.
• Hadoop Cluster is a set of
Data Nodes and a Name
Node
• Client divides the data to
process, into blocks
• Each block of data is
replicated in 3 Nodes*
• More Nodes, More
Efficiency.
• Robust - Relies on Software
instead of hardware
HDFS Cluster
Server
Data Node Name Node
Server
Data Node
Server
Data Node
Server
Data Node
M
a
s
t
e
r
S
l
a
v
e
s
B1 B2 B3
Somefile.txt
B1
B2
B3
B2
B3
B1
B3
B1
B3
Map Reduce
• Divide and Conquer
• Parallel Computing
• Map(): Perform Sorting &
Filtering
• Reduce(): Perform
Summary Operation
• Each node has Task tracker
which communicates with
Job Tracker.
• The output files will be
available as local files on
client.
Hadoop Architecture
Hadoop Secondary Components
• Ambari
• Web Tool for provisioning, managing and monitoring Clusters
• Hbase
• Scalable distributed database that supports structured data for large tables
• Zoo Keeper
– A High performance coordination service for distributed applications
• Pig
– A High level data flow language and execution framework for parallel computation
• Hive
– A Data warehouse infrastructure that provides data summarization and ad hoc querying
• Cassandra
– A scalable multi master database with no single point failures
• Chukwa
– A data collection system for managing large distributed systems
• Lucene and Solr
– Search engines, currently not part of Hadoop
Real World Example of Big data
Analytics using Hadoop
MySql
Database
1
7
2 3
56
1. Users interact with Facebook using data in textual, image, video formats.
2. Facebook transfers the core data to My SQL database.
3. My SQL data is replicated to Hadoop clusters.
4. Data is processed using Hadoop MapReduce functions
5. The results are transferred back to My SQL
6. Facebook uses the data to create recommendations for you based on
your interests.
4
Other users:
Why should an Enterprise move to Big
Data Analytics?
• Enterprises will be able to
harness relevant data and
use it to make the best
decisions
– Increasing the redemption
rate
– Determine optimum prices
– Calculate risks in a minute,
and understand future
possibilities to mitigate risk
– Enabling new products
– Identifying patterns help
identify trends in business
The key lies in collecting quality data, not quantity.
What is in it for us?
Hadoop on Cloud
• Provision Scalable Storage for storing Big data as Blobs– PAAS
• Provision Linux VMs on Cloud – IAAS
• Language support for JS and C#
• Business Intelligence – Connect MS Excel to Hadoop Hive
• Remote Access to Hadoop Jobs via REST API, WebHCat REST API.
• Easy to access Management Portal for monitoring Hadoop Jobs
• .NET SDK to execute Hive Jobs on HDInsight
• …..More
+
Thank you
• Questions ?
Vishwanath.srikanth@gmail.com
http://Vishwanathsrikanth.wordpress.com

More Related Content

What's hot

Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesJanBask Training
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou, MBA, PhD
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleSpringPeople
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 

What's hot (20)

Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Big Data
Big DataBig Data
Big Data
 
Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languages
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Case study on big data
Case study on big dataCase study on big data
Case study on big data
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Introducing Big Data
Introducing Big DataIntroducing Big Data
Introducing Big Data
 
Introducing Data Lakes
Introducing Data LakesIntroducing Data Lakes
Introducing Data Lakes
 
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Atul Mithe
Atul MitheAtul Mithe
Atul Mithe
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 

Similar to Big data and hadoop

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoopahmed alshikh
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA Zeeshan Khan
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by SunnyDignitasDigital1
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overviewNitesh Ghosh
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion ahmed alshikh
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With HadoopUmair Shafique
 

Similar to Big data and hadoop (20)

Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
 
Big Data
Big DataBig Data
Big Data
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data
Big DataBig Data
Big Data
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big Data
Big DataBig Data
Big Data
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 

More from Sri Kanth

Run UI Automation Tests using Selenium Grid and Azure Container Service (AKS)
Run UI Automation Tests using Selenium Grid and Azure Container Service (AKS)Run UI Automation Tests using Selenium Grid and Azure Container Service (AKS)
Run UI Automation Tests using Selenium Grid and Azure Container Service (AKS)Sri Kanth
 
Build Proactive bot using Microsoft Bot Framework
Build Proactive bot using Microsoft Bot FrameworkBuild Proactive bot using Microsoft Bot Framework
Build Proactive bot using Microsoft Bot FrameworkSri Kanth
 
Windows Server Containers
Windows Server ContainersWindows Server Containers
Windows Server ContainersSri Kanth
 
Windows server containers
Windows server containersWindows server containers
Windows server containersSri Kanth
 
Windows server containers
Windows server containersWindows server containers
Windows server containersSri Kanth
 
Async CTP 3 Presentation for MUGH 2012
Async CTP 3 Presentation for MUGH 2012Async CTP 3 Presentation for MUGH 2012
Async CTP 3 Presentation for MUGH 2012Sri Kanth
 

More from Sri Kanth (7)

Run UI Automation Tests using Selenium Grid and Azure Container Service (AKS)
Run UI Automation Tests using Selenium Grid and Azure Container Service (AKS)Run UI Automation Tests using Selenium Grid and Azure Container Service (AKS)
Run UI Automation Tests using Selenium Grid and Azure Container Service (AKS)
 
Build Proactive bot using Microsoft Bot Framework
Build Proactive bot using Microsoft Bot FrameworkBuild Proactive bot using Microsoft Bot Framework
Build Proactive bot using Microsoft Bot Framework
 
Windows Server Containers
Windows Server ContainersWindows Server Containers
Windows Server Containers
 
Windows server containers
Windows server containersWindows server containers
Windows server containers
 
Windows server containers
Windows server containersWindows server containers
Windows server containers
 
Async CTP 3 Presentation for MUGH 2012
Async CTP 3 Presentation for MUGH 2012Async CTP 3 Presentation for MUGH 2012
Async CTP 3 Presentation for MUGH 2012
 
Introducing
IntroducingIntroducing
Introducing
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Big data and hadoop

  • 1. Big data and Hadoop Learn how Hadoop deals problems associated with Big data Analysis Srikanth M V
  • 2. There are 30 billion pieces of content shared on Facebook every day.
  • 3. Wal-Mart handles more than 1 million customer transactions an hour.
  • 4. More than 5 billion people are calling, texting, tweeting and browsing websites using Smart phones.
  • 5.
  • 6. The 3-Vs of Big Data Volume Giga Bytes, Tera Bytes, Peta Bytes or Zeta Bytes…. Velocity The rate at which data flows into an organization Variety Structured and Unstructured
  • 7. So What is Big Data? • Big data is large and complex data sets collected from various sources like Sensors, Social Media, Satellite images, Audio, Video, RFID etc. • Big data is data that exceeds the processing capacity of conventional database systems. • How ‘Big’ is big?  GB, TB, PB , ZB?? NO..  Data is Big when the organization’s ability to handle, store and analyze exceeds its capacity.
  • 9. Solution*: Move compute to data *One among many, but Hadoop is flexible, Simple and reliable. Hey there, I’m Hadoop and I can do that for you..
  • 10. • Created: 2005 • Creators: Doug Cutting and Mike Cafarella • Contributors: Apache, Yahoo, Google • Language: Java
  • 11. How Hadoop deals with “Big data” • Primary Components • HDFS – Hadoop Distributed File System • Map Reduce • Hadoop YARN • Job Scheduling and Resource Management • Hadoop Common • Access to file system
  • 12. HDFS • Distributed, scalable, reliable and portable file system. • Hadoop Cluster is a set of Data Nodes and a Name Node • Client divides the data to process, into blocks • Each block of data is replicated in 3 Nodes* • More Nodes, More Efficiency. • Robust - Relies on Software instead of hardware HDFS Cluster Server Data Node Name Node Server Data Node Server Data Node Server Data Node M a s t e r S l a v e s B1 B2 B3 Somefile.txt B1 B2 B3 B2 B3 B1 B3 B1 B3
  • 13. Map Reduce • Divide and Conquer • Parallel Computing • Map(): Perform Sorting & Filtering • Reduce(): Perform Summary Operation • Each node has Task tracker which communicates with Job Tracker. • The output files will be available as local files on client.
  • 15. Hadoop Secondary Components • Ambari • Web Tool for provisioning, managing and monitoring Clusters • Hbase • Scalable distributed database that supports structured data for large tables • Zoo Keeper – A High performance coordination service for distributed applications • Pig – A High level data flow language and execution framework for parallel computation • Hive – A Data warehouse infrastructure that provides data summarization and ad hoc querying • Cassandra – A scalable multi master database with no single point failures • Chukwa – A data collection system for managing large distributed systems • Lucene and Solr – Search engines, currently not part of Hadoop
  • 16. Real World Example of Big data Analytics using Hadoop MySql Database 1 7 2 3 56 1. Users interact with Facebook using data in textual, image, video formats. 2. Facebook transfers the core data to My SQL database. 3. My SQL data is replicated to Hadoop clusters. 4. Data is processed using Hadoop MapReduce functions 5. The results are transferred back to My SQL 6. Facebook uses the data to create recommendations for you based on your interests. 4 Other users:
  • 17. Why should an Enterprise move to Big Data Analytics? • Enterprises will be able to harness relevant data and use it to make the best decisions – Increasing the redemption rate – Determine optimum prices – Calculate risks in a minute, and understand future possibilities to mitigate risk – Enabling new products – Identifying patterns help identify trends in business The key lies in collecting quality data, not quantity.
  • 18. What is in it for us?
  • 19. Hadoop on Cloud • Provision Scalable Storage for storing Big data as Blobs– PAAS • Provision Linux VMs on Cloud – IAAS • Language support for JS and C# • Business Intelligence – Connect MS Excel to Hadoop Hive • Remote Access to Hadoop Jobs via REST API, WebHCat REST API. • Easy to access Management Portal for monitoring Hadoop Jobs • .NET SDK to execute Hive Jobs on HDInsight • …..More +
  • 20. Thank you • Questions ? Vishwanath.srikanth@gmail.com http://Vishwanathsrikanth.wordpress.com