SlideShare a Scribd company logo
1 of 77
Download to read offline
Big Data Hadoop and Spark Developer
Introduction to Big Data and Hadoop
Describe the concepts of Big Data
Explain Hadoop and how it addresses Big Data challenges
Describe the components of Hadoop Ecosystem
Learning Objectives
By the end of this lesson, you will be able to:
Introduction to Big Data
Big Data Overview
Big Data is the data that has high volume, variety, velocity, veracity, and value.
6
5
1
3
4
Manufacturing
Consumer
Energy
Technology
2
Healthcare
Banking
According to US Bureau of Labour Statistics, Big Data alone will fetch 11.5 million jobs by 2026.
Traditional Decision-Making
Experience and Intuition
What We Think Rule of Thumb
Challenges of Traditional Decision-Making
Takes a long time to arrive at a decision, therefore losing the competitive advantage
Requires human intervention at various stages
Lacks systematic linkage among strategy, planning, execution, and reporting
Provides limited scope of data analytics, that is, it provides only a bird's eye view
Obstructs company’s ability to make fully informed decisions
Big Data Analytics
The Solution: Big Data Analytics
Solution
The decision-making is based on what you know which in turn is based on data
analytics.
It provides a comprehensive view of the overall picture which is a result of
analyzing data from various sources.
It provides streamlined decision-making from top to bottom.
Big data analytics helps in analyzing unstructured data.
It helps in faster decision-making thus improving the competitive advantage and
saving time and energy.
Case Study: Google’s Self-Driving Car
Technical Data
Community Data
Personal Data
Big Data Analytics Pipeline
What Is Big Data?
What Is Big Data?
Big data refers to extremely large data sets that may be analyzed computationally to reveal
patterns, trends, and associations, especially relating to human behavior and interactions.
Big Data at a Glance
Different Types of Data
Growth in Data
By 2020, data will show an exponential rise!
Four Vs of Big Data
Four Vs of Big Data
Volume Variety
Velocity
Veracity
• Overall amount of information
produced every day is rising
exponentially
• 2.3 trillion gigabytes of data is
generated every day on the internet
• Social media, CRM systems, e-mails,
audio and video forms produce
varied data
• Analytics tools are used to segregate
groups based on the type of data
generated
• More than 50,000 Google searches
are completed
• More than 125,000 YouTube videos
are viewed
• 7,000 tweets are sent out
• More than 2 million e-mails are sent
Inherent discrepancies in the data
collected results in inaccurate
predictions
Unstructured Data Conundrum
Unstructured Data Semi-structured Data
Web Logs Multimedia
Social Media
Case Study: Royal Bank of Scotland
Case Study: Royal Bank of Scotland
100% of this data could be processed whereas only 3% could be processed earlier
with traditional systems.
Case Study: Royal Bank of Scotland
Sentiment analysis
Improved customer
satisfaction
Reduced processing time
The case study of Royal Bank of Scotland gave the following three things:
Challenges of Traditional System
Challenges of Traditional Systems (RDBMS and DWH)
UNSTRUCTURED DATA
Relational databases can’t
categorize unstructured data.
GROWTH RATE
RDBMS systems are designed
for steady data retention
rather than rapid growth.
DATA SIZE
Data ranges from terabytes
(10^12
bytes) to exabytes
(10^18
bytes).
Advantages of Big Data
1 Processes all types of data at scale
Processes huge data quickly in real-time
3 Can run anywhere and additional hardware can be added
4 Better decision-making, thanks to Hadoop
2
2
Companies Using Big Data
Big Data: Case Study
1 When do users watch a show?
Where do they watch it?
3 On which device do they watch the show?
4 How often do they pause a program?
2
2
5 How often do they re-watch a program?
6 Do they skip the credits?
7 What are the keywords searched?
Big Data: Case Study
Solution
• Traditionally, the analysis of such data was done using a
computer algorithm that was designed to produce a correct
solution for any given instance.
• As the data started to grow, a series of computers were
employed to do the analysis.
• They were also known as distributed systems.
Multiple systems
Features of Big Data Analytics
Scalability in Big Data
● A scalable data platform accommodates rapid changes in the growth of data, either
in traffic or volume.
● It utilizes and adds hardware or software to increase the output and storage of data.
● When a company has a scalable data platform, it is prepared for the potential of
growth in its data needs.
Fault Tolerance in Big Data
● Fault tolerance in Big data or Hadoop HDFS refers to the working strength of a
system in unfavorable conditions and how that system can handle such a situation.
● HDFS also maintains the replication factor by creating a replica of data on other
available machines in the cluster if one machine fails unexpectedly.
Data Inconsistency in Big Data
● Once data is captured in big data, inconsistent or conflicting phenomena can occur
at various granularities.
● It occurs from knowledge content, data, information, knowledge, meta-knowledge, to
expertise, and can adversely affect the quality of the outcomes in Big data analysis
process.
Distributed Systems
Distributed Systems
A distributed system is a model in which components located on networked computers
communicate and coordinate their actions by passing messages.
How Does a Distributed System Work?
Data =1 Terabyte Data =1 Terabyte
In recent times, distributed systems have been replaced by Hadoop.
Challenges of Distributed Systems
System failure
1 2 Limited bandwidth 3
High programming
complexity
Any solution? Hadoop
Since, multiple computers are used in a distributed system, there are high chances of:
Introduction to Hadoop
What Is Hadoop?
Hadoop is a framework that allows distributed processing of large datasets
across clusters of commodity computers using simple programming models.
Doug Cutting discovered Hadoop and named it after his son’s yellow toy
elephant. It is inspired by the technical document published by Google.
Characteristics of Hadoop
Can follow both horizontal and
vertical scaling
Can store huge data and decide to
use it later
Stores copies of the data on
different machines and is
resistant to hardware failure
Can use ordinary computers
for data processing
Economical
Reliable
Scalable
Flexible
The four key characteristics of Hadoop are:
Traditional Database Systems vs. Hadoop
Data sent to the program Program sent to the data
Traditional System Hadoop
VS.
Analogy of Traditional System and Hadoop
Standards and Structured
Traditional Database Systems vs. Hadoop
Hadoop Core Components
YARN Resource
Management
Data Processing
Hadoop Core
Storage
Components of Hadoop Ecosystem
Components of Hadoop Ecosystem
Data Processing
YARN
Cluster Resource
Management
Data Visualization
Data Analysis Data Exploration
Data
Ingestion
Sqoop
Flume
Data
Storage
Components of Hadoop Ecosystem
HDFS (HADOOP DISTRIBUTED FILE SYSTEM)
A storage layer for
Hadoop
Suitable for the
distributed storage
and processing
Hadoop provides a
command line interface
to interact with HDFS
Streaming access to
file system data
Provides file
permissions and
authentication
Components of Hadoop Ecosystem
HBase
Mainly used when you need random,
real-time, read/write access to your Big Data
Provides support to high volume of data and
high throughput
A NoSQL database or
non-relational database
Stores data in HDFS
Table can have thousands of columns
Components of Hadoop Ecosystem
SQOOP
• Sqoop is a tool designed to transfer data between Hadoop and
relational database servers.
• It is used to import data from relational databases such as Oracle
and MySQL to HDFS and export data from HDFS to relational
databases.
Components of Hadoop Ecosystem
FLUME
Ideally suited for event
data from multiple
systems
A distributed service
for ingesting
streaming data
If you want to ingest event data such as, streaming data, sensor data, or log files, then
you can use Flume.
Components of Hadoop Ecosystem
SPARK
An open source cluster computing framework
Provides 100 times faster
performance than Map-Reduce
Supports machine learning, business intelligence,
streaming, and batch processing
Components of Hadoop Ecosystem
SPARK: COMPONENTS
Apache Spark
Spark Core and
Resilient
Distributed
Datasets
(RDDs)
Spark SQL
Spark
Streaming
Machine
Learning
Library
(MLlib)
GraphX
Components of Hadoop Ecosystem
HADOOP MAP-REDUCE
The original Hadoop
processing engine which is
primarily Java based
An extensive and
mature fault tolerance
framework
Based on the map and
reduce programming
model
Commonly used
Components of Hadoop Ecosystem
PIG
Best for ad-hoc queries
such as join and filter
An open source
dataflow system
An alternative to writing
Map-Reduce code
Converts pig script to
Map-Reduce code
Components of Hadoop Ecosystem
IMPALA
Very low latency – measured in milliseconds
High performance SQL engine which runs on Hadoop cluster
Ideal for interactive analysis
Supports a dialect of SQL (Impala SQL)
Components of Hadoop Ecosystem
HIVE
Executes queries using
Map-Reduce
Similar to Impala
Best for data processing and ETL
Components of Hadoop Ecosystem
CLOUDERA SEARCH
One of Cloudera's near-real-time access products
Enables nontechnical users to
search and explore data stored
in or ingested into Hadoop and
HBase
Eliminates the need to
move large datasets across
infrastructures to address
business tasks
A fully integrated data processing platform
Components of Hadoop Ecosystem
OOZIE
Start
End
Action A
B
C
Oozie Coordinator
Engine
Oozie Workflow
Engine
Action1
Action2
Action3
Oozie is a workflow or coordination system used to manage the Hadoop jobs
Components of Hadoop Ecosystem
HUE (HADOOP USER EXPERIENCE)
Hue is an acronym for Hadoop User Experience
Hue is an open source
Web interface for
analyzing data with
Hadoop
It provides SQL editors for Hive,
Impala, MySQL, Oracle,
PostgreSQL, Spark SQL, and Solr
SQL
Commercial Hadoop Distributions
Various Commercial Hadoop Distributions
HDInsight
Big Data Processing
Components of Hadoop ecosystem work together to process big data.
There are four stages of big data processing:
Walk-Through of the Simplilearn Cloud Lab Duration: 10 mins
Problem Statement: In this demonstration, we will walk you through the Simplilearn cloud lab.
Access: Click on the Practice Labs tab on the left side panel of the LMS. Copy or note the
username and password that is generated. Click on the Launch Lab button. On the page that
appears, enter the username and password in the respective fields, and click Login.
Describe the concepts of Big Data
Explain Hadoop and how it addresses Big Data challenges
Describe the components of Hadoop Ecosystem
Key Takeaways
You are now able to:
Knowledge Check
Knowledge
Check
a.
b.
c.
d.
Which of the following is a source of unstructured data?
1
Data from social media websites
Transactional data in Amazon’s database
Web and server logs
All of the above
The correct answer is
a.
b.
c.
d.
Knowledge
Check
Which of the following is a source of unstructured data?
Unstructured data comprises of data that is usually not easily searchable, including formats like audio, video, and social media
postings.
a.
1
Data from social media websites
Transactional data in Amazon’s database
Web and server logs
All of the above
Knowledge
Check
a.
b.
c.
d.
A bank wants to process 1000 transactions per second.
Which one of the following Vs reflects this real-world use case?
2
Volume
Variety
Velocity
Veracity
The correct answer is
a.
b.
c.
d.
Knowledge
Check A bank wants to process 1000 transactions per second.
Which one of the following Vs reflects this real-world use case?
Velocity is the frequency of incoming data that needs to be processed. Given use case is an example of an application that
handles the velocity of data.
c.
2
Volume
Variety
Velocity
Veracity
Knowledge
Check
a.
b.
c.
d.
Why has popularity of big data increased tremendously in the recent years?
3
Due to increased volume of data
Big data is an open source
Abundance of unstructured data
None of the above
The correct answer is
a.
b.
c.
d.
Knowledge
Check
Why has popularity of big data increased tremendously in the recent years?
Unstructured data is growing at astronomical rates, contributing to the big data deluge that's sweeping across enterprise data
storage environments.
a.
3
Due to increased volume of data
Big data is an open source
Abundance of unstructured data
None of the above
Knowledge
Check
a.
b.
c.
d.
What is Hadoop?
4
It is an in-memory tool used in Mahout algorithm computing.
It is a computing framework used for resource management.
It is a framework that allows distributed processing of large datasets across clusters of
commodity computers using a simple programming model.
It is a search and analytics tool that provides access to analyze data.
The correct answer is
a.
b.
c.
d.
Knowledge
Check
What is Hadoop?
Hadoop is a framework that allows distributed processing of large datasets across clusters of commodity computers using a
simple programming model.
c.
4
It is an in-memory tool used in Mahout algorithm computing.
It is a computing framework used for resource management.
It is a search and analytics tool that provides access to analyze data.
It is a framework that allows distributed processing of large datasets across clusters of
commodity computers using a simple programming model.
Knowledge
Check
a.
b.
c.
d.
Which of the following is a column-oriented NoSQL database that runs on
top of HDFS?
5
MongoDB
Flume
Ambari
HBase
The correct answer is
a.
b.
c.
d.
Knowledge
Check Which of the following is a column-oriented NoSQL database that runs on
top of HDFS?
Apache HBase is a NoSQL database that runs on top of Hadoop as a distributed and scalable big data store.
d.
5
MongoDB
Flume
Ambari
HBase
Knowledge
Check
a.
b.
c.
d.
Scoop is used to _______.
6
Execute queries using Map-Reduce
Stream event data from multiple systems
Enable nontechnical users to search and explore data stored in or ingested into
Hadoop and HBase
Import data from relational databases to Hadoop HDFS and export from Hadoop file
system to relational databases
The correct answer is
a.
b.
c.
d.
Knowledge
Check
Scoop is used to _______.
Scoop is used to import data from relational databases to Hadoop HDFS and export from Hadoop file system to relational
databases.
a.
6
Import data from relational databases to Hadoop HDFS and export from Hadoop file
system to relational databases
Execute queries using Map-Reduce
Enable nontechnical users to search and explore data stored in or ingested into
Hadoop and HBase
Stream event data from multiple systems
Lesson 1 introduction to_big_data_and_hadoop.pptx

More Related Content

What's hot

Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data AnalyticsTUSHAR GARG
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsSherinMariamReji05
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data TechnologiesMahindra Comviva
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data WarehousingThomas Kejser
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035Neelam Rawat
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentalsrjain51
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 

What's hot (20)

Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data storage
Big data storageBig data storage
Big data storage
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data
Big DataBig Data
Big Data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 

Similar to Lesson 1 introduction to_big_data_and_hadoop.pptx

Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxshujee381
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 

Similar to Lesson 1 introduction to_big_data_and_hadoop.pptx (20)

Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 

Recently uploaded

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Delhi Call girls
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...HyderabadDolls
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 

Recently uploaded (20)

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 

Lesson 1 introduction to_big_data_and_hadoop.pptx

  • 1. Big Data Hadoop and Spark Developer
  • 2. Introduction to Big Data and Hadoop
  • 3. Describe the concepts of Big Data Explain Hadoop and how it addresses Big Data challenges Describe the components of Hadoop Ecosystem Learning Objectives By the end of this lesson, you will be able to:
  • 5. Big Data Overview Big Data is the data that has high volume, variety, velocity, veracity, and value. 6 5 1 3 4 Manufacturing Consumer Energy Technology 2 Healthcare Banking According to US Bureau of Labour Statistics, Big Data alone will fetch 11.5 million jobs by 2026.
  • 6. Traditional Decision-Making Experience and Intuition What We Think Rule of Thumb
  • 7. Challenges of Traditional Decision-Making Takes a long time to arrive at a decision, therefore losing the competitive advantage Requires human intervention at various stages Lacks systematic linkage among strategy, planning, execution, and reporting Provides limited scope of data analytics, that is, it provides only a bird's eye view Obstructs company’s ability to make fully informed decisions
  • 9. The Solution: Big Data Analytics Solution The decision-making is based on what you know which in turn is based on data analytics. It provides a comprehensive view of the overall picture which is a result of analyzing data from various sources. It provides streamlined decision-making from top to bottom. Big data analytics helps in analyzing unstructured data. It helps in faster decision-making thus improving the competitive advantage and saving time and energy.
  • 10. Case Study: Google’s Self-Driving Car Technical Data Community Data Personal Data
  • 11. Big Data Analytics Pipeline
  • 12. What Is Big Data?
  • 13. What Is Big Data? Big data refers to extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.
  • 14. Big Data at a Glance
  • 16. Growth in Data By 2020, data will show an exponential rise!
  • 17. Four Vs of Big Data
  • 18. Four Vs of Big Data Volume Variety Velocity Veracity • Overall amount of information produced every day is rising exponentially • 2.3 trillion gigabytes of data is generated every day on the internet • Social media, CRM systems, e-mails, audio and video forms produce varied data • Analytics tools are used to segregate groups based on the type of data generated • More than 50,000 Google searches are completed • More than 125,000 YouTube videos are viewed • 7,000 tweets are sent out • More than 2 million e-mails are sent Inherent discrepancies in the data collected results in inaccurate predictions
  • 19. Unstructured Data Conundrum Unstructured Data Semi-structured Data Web Logs Multimedia Social Media
  • 20. Case Study: Royal Bank of Scotland
  • 21. Case Study: Royal Bank of Scotland 100% of this data could be processed whereas only 3% could be processed earlier with traditional systems.
  • 22. Case Study: Royal Bank of Scotland Sentiment analysis Improved customer satisfaction Reduced processing time The case study of Royal Bank of Scotland gave the following three things:
  • 24. Challenges of Traditional Systems (RDBMS and DWH) UNSTRUCTURED DATA Relational databases can’t categorize unstructured data. GROWTH RATE RDBMS systems are designed for steady data retention rather than rapid growth. DATA SIZE Data ranges from terabytes (10^12 bytes) to exabytes (10^18 bytes).
  • 25. Advantages of Big Data 1 Processes all types of data at scale Processes huge data quickly in real-time 3 Can run anywhere and additional hardware can be added 4 Better decision-making, thanks to Hadoop 2 2
  • 27. Big Data: Case Study 1 When do users watch a show? Where do they watch it? 3 On which device do they watch the show? 4 How often do they pause a program? 2 2 5 How often do they re-watch a program? 6 Do they skip the credits? 7 What are the keywords searched?
  • 28. Big Data: Case Study Solution • Traditionally, the analysis of such data was done using a computer algorithm that was designed to produce a correct solution for any given instance. • As the data started to grow, a series of computers were employed to do the analysis. • They were also known as distributed systems. Multiple systems
  • 29. Features of Big Data Analytics
  • 30. Scalability in Big Data ● A scalable data platform accommodates rapid changes in the growth of data, either in traffic or volume. ● It utilizes and adds hardware or software to increase the output and storage of data. ● When a company has a scalable data platform, it is prepared for the potential of growth in its data needs.
  • 31. Fault Tolerance in Big Data ● Fault tolerance in Big data or Hadoop HDFS refers to the working strength of a system in unfavorable conditions and how that system can handle such a situation. ● HDFS also maintains the replication factor by creating a replica of data on other available machines in the cluster if one machine fails unexpectedly.
  • 32. Data Inconsistency in Big Data ● Once data is captured in big data, inconsistent or conflicting phenomena can occur at various granularities. ● It occurs from knowledge content, data, information, knowledge, meta-knowledge, to expertise, and can adversely affect the quality of the outcomes in Big data analysis process.
  • 34. Distributed Systems A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages.
  • 35. How Does a Distributed System Work? Data =1 Terabyte Data =1 Terabyte In recent times, distributed systems have been replaced by Hadoop.
  • 36. Challenges of Distributed Systems System failure 1 2 Limited bandwidth 3 High programming complexity Any solution? Hadoop Since, multiple computers are used in a distributed system, there are high chances of:
  • 38. What Is Hadoop? Hadoop is a framework that allows distributed processing of large datasets across clusters of commodity computers using simple programming models. Doug Cutting discovered Hadoop and named it after his son’s yellow toy elephant. It is inspired by the technical document published by Google.
  • 39. Characteristics of Hadoop Can follow both horizontal and vertical scaling Can store huge data and decide to use it later Stores copies of the data on different machines and is resistant to hardware failure Can use ordinary computers for data processing Economical Reliable Scalable Flexible The four key characteristics of Hadoop are:
  • 40. Traditional Database Systems vs. Hadoop Data sent to the program Program sent to the data Traditional System Hadoop VS.
  • 41. Analogy of Traditional System and Hadoop
  • 42. Standards and Structured Traditional Database Systems vs. Hadoop
  • 43. Hadoop Core Components YARN Resource Management Data Processing Hadoop Core Storage
  • 44. Components of Hadoop Ecosystem
  • 45. Components of Hadoop Ecosystem Data Processing YARN Cluster Resource Management Data Visualization Data Analysis Data Exploration Data Ingestion Sqoop Flume Data Storage
  • 46. Components of Hadoop Ecosystem HDFS (HADOOP DISTRIBUTED FILE SYSTEM) A storage layer for Hadoop Suitable for the distributed storage and processing Hadoop provides a command line interface to interact with HDFS Streaming access to file system data Provides file permissions and authentication
  • 47. Components of Hadoop Ecosystem HBase Mainly used when you need random, real-time, read/write access to your Big Data Provides support to high volume of data and high throughput A NoSQL database or non-relational database Stores data in HDFS Table can have thousands of columns
  • 48. Components of Hadoop Ecosystem SQOOP • Sqoop is a tool designed to transfer data between Hadoop and relational database servers. • It is used to import data from relational databases such as Oracle and MySQL to HDFS and export data from HDFS to relational databases.
  • 49. Components of Hadoop Ecosystem FLUME Ideally suited for event data from multiple systems A distributed service for ingesting streaming data If you want to ingest event data such as, streaming data, sensor data, or log files, then you can use Flume.
  • 50. Components of Hadoop Ecosystem SPARK An open source cluster computing framework Provides 100 times faster performance than Map-Reduce Supports machine learning, business intelligence, streaming, and batch processing
  • 51. Components of Hadoop Ecosystem SPARK: COMPONENTS Apache Spark Spark Core and Resilient Distributed Datasets (RDDs) Spark SQL Spark Streaming Machine Learning Library (MLlib) GraphX
  • 52. Components of Hadoop Ecosystem HADOOP MAP-REDUCE The original Hadoop processing engine which is primarily Java based An extensive and mature fault tolerance framework Based on the map and reduce programming model Commonly used
  • 53. Components of Hadoop Ecosystem PIG Best for ad-hoc queries such as join and filter An open source dataflow system An alternative to writing Map-Reduce code Converts pig script to Map-Reduce code
  • 54. Components of Hadoop Ecosystem IMPALA Very low latency – measured in milliseconds High performance SQL engine which runs on Hadoop cluster Ideal for interactive analysis Supports a dialect of SQL (Impala SQL)
  • 55. Components of Hadoop Ecosystem HIVE Executes queries using Map-Reduce Similar to Impala Best for data processing and ETL
  • 56. Components of Hadoop Ecosystem CLOUDERA SEARCH One of Cloudera's near-real-time access products Enables nontechnical users to search and explore data stored in or ingested into Hadoop and HBase Eliminates the need to move large datasets across infrastructures to address business tasks A fully integrated data processing platform
  • 57. Components of Hadoop Ecosystem OOZIE Start End Action A B C Oozie Coordinator Engine Oozie Workflow Engine Action1 Action2 Action3 Oozie is a workflow or coordination system used to manage the Hadoop jobs
  • 58. Components of Hadoop Ecosystem HUE (HADOOP USER EXPERIENCE) Hue is an acronym for Hadoop User Experience Hue is an open source Web interface for analyzing data with Hadoop It provides SQL editors for Hive, Impala, MySQL, Oracle, PostgreSQL, Spark SQL, and Solr SQL
  • 60. Various Commercial Hadoop Distributions HDInsight
  • 61. Big Data Processing Components of Hadoop ecosystem work together to process big data. There are four stages of big data processing:
  • 62. Walk-Through of the Simplilearn Cloud Lab Duration: 10 mins Problem Statement: In this demonstration, we will walk you through the Simplilearn cloud lab. Access: Click on the Practice Labs tab on the left side panel of the LMS. Copy or note the username and password that is generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the respective fields, and click Login.
  • 63. Describe the concepts of Big Data Explain Hadoop and how it addresses Big Data challenges Describe the components of Hadoop Ecosystem Key Takeaways You are now able to:
  • 65. Knowledge Check a. b. c. d. Which of the following is a source of unstructured data? 1 Data from social media websites Transactional data in Amazon’s database Web and server logs All of the above
  • 66. The correct answer is a. b. c. d. Knowledge Check Which of the following is a source of unstructured data? Unstructured data comprises of data that is usually not easily searchable, including formats like audio, video, and social media postings. a. 1 Data from social media websites Transactional data in Amazon’s database Web and server logs All of the above
  • 67. Knowledge Check a. b. c. d. A bank wants to process 1000 transactions per second. Which one of the following Vs reflects this real-world use case? 2 Volume Variety Velocity Veracity
  • 68. The correct answer is a. b. c. d. Knowledge Check A bank wants to process 1000 transactions per second. Which one of the following Vs reflects this real-world use case? Velocity is the frequency of incoming data that needs to be processed. Given use case is an example of an application that handles the velocity of data. c. 2 Volume Variety Velocity Veracity
  • 69. Knowledge Check a. b. c. d. Why has popularity of big data increased tremendously in the recent years? 3 Due to increased volume of data Big data is an open source Abundance of unstructured data None of the above
  • 70. The correct answer is a. b. c. d. Knowledge Check Why has popularity of big data increased tremendously in the recent years? Unstructured data is growing at astronomical rates, contributing to the big data deluge that's sweeping across enterprise data storage environments. a. 3 Due to increased volume of data Big data is an open source Abundance of unstructured data None of the above
  • 71. Knowledge Check a. b. c. d. What is Hadoop? 4 It is an in-memory tool used in Mahout algorithm computing. It is a computing framework used for resource management. It is a framework that allows distributed processing of large datasets across clusters of commodity computers using a simple programming model. It is a search and analytics tool that provides access to analyze data.
  • 72. The correct answer is a. b. c. d. Knowledge Check What is Hadoop? Hadoop is a framework that allows distributed processing of large datasets across clusters of commodity computers using a simple programming model. c. 4 It is an in-memory tool used in Mahout algorithm computing. It is a computing framework used for resource management. It is a search and analytics tool that provides access to analyze data. It is a framework that allows distributed processing of large datasets across clusters of commodity computers using a simple programming model.
  • 73. Knowledge Check a. b. c. d. Which of the following is a column-oriented NoSQL database that runs on top of HDFS? 5 MongoDB Flume Ambari HBase
  • 74. The correct answer is a. b. c. d. Knowledge Check Which of the following is a column-oriented NoSQL database that runs on top of HDFS? Apache HBase is a NoSQL database that runs on top of Hadoop as a distributed and scalable big data store. d. 5 MongoDB Flume Ambari HBase
  • 75. Knowledge Check a. b. c. d. Scoop is used to _______. 6 Execute queries using Map-Reduce Stream event data from multiple systems Enable nontechnical users to search and explore data stored in or ingested into Hadoop and HBase Import data from relational databases to Hadoop HDFS and export from Hadoop file system to relational databases
  • 76. The correct answer is a. b. c. d. Knowledge Check Scoop is used to _______. Scoop is used to import data from relational databases to Hadoop HDFS and export from Hadoop file system to relational databases. a. 6 Import data from relational databases to Hadoop HDFS and export from Hadoop file system to relational databases Execute queries using Map-Reduce Enable nontechnical users to search and explore data stored in or ingested into Hadoop and HBase Stream event data from multiple systems