SlideShare a Scribd company logo
Big Data and Hadoop Overview
What it Big Data?
Big data is a term that describes the large volume of data – both structured and unstructured – that
inundate a business on a day-to-day basis. In short, big data is so large and complex that none of the
traditional data management tools are able to store it or process it efficiently.
Who Generates Big Data?
More and more data are being produced by an increasing number of electronic devices surrounding us
and on the internet. The amount of data and the frequency at whichthey are produced are so vast that
they are referred as “BIGData”.
Why Is Big Data Important?
The importance of big data doesn’t revolve around how much data you have, but what you do with it.
You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time
reductions, 3) new product development and optimized offerings, and 4) smart decision making. When
you combine big data with high-powered analytics, you can accomplish business-related tasks such as:
 Determining root causes of failures, issues and defects in near-real time.
 Generating coupons at the point of sale based on the customer’s buying habits.
 Recalculating entire risk portfoliosin minutes.
 Detecting fraudulent behaviour before it affectsyour organization.
Brief History of Big Data
While the term “big data” is relatively new, the act of gathering and storing large amounts of
information for eventual analysis is ages old. The concept gained momentum in the early 2000s when
industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs:
Volume Organizations collect data from a variety of sources, including business transactions, social
media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a
problem – but new technologies have eased the burden.
Velocity Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID
tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.
Variety Data comes in all types of formats – from structured, numeric data in traditional databases to
unstructured text documents, email, video, audio, stock ticker data and financial transactions.
Now a day’s below two Vs got added
Variability In addition to the increasing velocities and varieties of data, data flows can be highly
inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-
triggered peak data loads can be challenging to manage.
Complexity Today's data comes from multiple sources, which makes it difficult to link, match, cleanse
and transform data across systems. However, it’s necessary to connect and correlate relationships,
hierarchies and multiple data linkages or your data can quickly spiral out of control.
Categories of 'Big Data'
Big data' couldbe found in three forms:
1. Structured
2. Unstructured
3. Semi-structured
Structured:Data stored in a relational database management system is one example of
a 'structured' data.
Unstructured:Output returned by 'GoogleSearch'.
Semi-structured: Personal data stored in a XML file.
Evolutionof Hadoop
As the World Wide Web grew in the late 1900s and early 2000s, search engines and indexes were
created to help locate relevant information amid the text-based content. In the early years, search
results were returned by humans. But as the web grew from dozens to millions of pages, automation
was needed. Web crawlers were created, many as university-led research projects, and search engine
start-ups took off (Yahoo, AltaVista, etc.).
One such project was an open-source web search engine called Nutch – the brainchild of Doug Cutting
and Mike Cafarella. They wanted to return web search results faster by distributing data and
calculations across different computers so multiple tasks could be accomplished simultaneously. During
this time, another search engine project called Google was in progress. It was based on the same
concept – storing and processing data in a distributed, automated way so that relevant web search
results could be returned faster.
In 2006, Cutting joined Yahoo and took with him the Nutch project as well as ideas based on Google’s
early work with automating distributed data storage and processing. The Nutch project was divided –
the web crawler portion remained as Nutch and the distributed computing and processing portion
became Hadoop. In 2008, Yahoo released Hadoop as an open-source project. Today, Hadoop’s
framework and ecosystem of technologies are managed and maintained by the non-profit Apache
Software Foundation (ASF), a global community of software developers and contributors.
FunFact: "Hadoop”was thenameof a yellow toy elephant owned by the son of one of its inventors.
Why is Hadoop important?
 Ability to store and process huge amounts of any kind of data, quickly. With data volumes and
varieties constantly increasing, especially from social media and the Internet of Things (IoT),
that's a key consideration.
 Computing power. Hadoop's distributed computing model processes big data fast. The more
computing nodes you use the more processing power you have.
 Fault tolerance. Data and application processing are protected against hardware failure. If a
node goes down, jobs are automatically redirected to other nodes to make sure the distributed
computing does not fail. Multiple copies of all data are stored automatically.
 Flexibility. Unlike traditional relational databases, you don’t have to pre-process data before
storing it. You can store as much data as you want and decide how to use it later. That includes
unstructured data like text, images and videos.
 Low cost. The open-source framework is free and uses commodity hardware to store large
quantities of data.
 Scalability. You can easily grow your system to handle more data simply by adding nodes. Little
administration is required.
What are key component of Hadoop?
There are 3 core components of the Hadoop framework are:
 MapReduce– A software programming model forprocessing large sets of data in parallel
 HDFS – The Java-based distributed file system that can store all kinds of data withoutprior
organization.
 YARN– A resource management frameworkforscheduling and handling resource requests
from distributed applications.
Types of Hadoop installation
There are various ways in whichHadoop can be run. Here are the various scenarios in whichHadoop
can be downloaded, installed and run.
Standalone mode
Though Hadoop is a distributed platform for working with big data, we can even install Hadoop on a
single node in a single standalone instance. This way the entire Hadoop platform runs like a system
which is running on Java. This is mostly used for the purpose of debugging. It helps if you want to check
your mapreduce applications on a single node before running on a huge cluster of Hadoop.
Fully Distributed mode
This is distributed mode that has several nodes of commodity hardware connected to form the Hadoop
cluster. In such a setup the NameNode, JobTracker and Secondary NameNode work on the master node
whereas the Datanode and the secondarydatanode work on the slave node. The other set of nodes
namely the Datanode and the TaskTracker work on the slave node.
Pseudo distributed mode
This in effect is a single node Java system that runs the entire Hadoop cluster. So the various daemons
like the NameNode, Datanode, TaskTracker and JobTracker run on the single instance of the Java
machine to form the distributed Hadoop cluster.
Hadoop ecosystem
 HBaseA scalable distributed database that supports structured data storage forlarge tables.
 HiveA data warehouse infrastructure that provides data summarization and ad hoc querying.
 MahoutA Scalable machine learning and data mining library.
 PigA high-level data-flow language and execution frameworkforparallel computation.
 Flumeis a distributed, reliable, and available service forefficiently collecting,aggregating, and
moving large amounts of log data.
 Oozieis a workflow scheduler system to manage Apache Hadoop jobs.
 Sqoopis a tool designed for efficiently transferring bulk data between Apache Hadoop and
structured data stores such as relational databases.
 Zookeeper isan effortto develop and maintain an open-source server which enables highly
reliable distributed coordination.
 TheOracleR ConnectorforHadoop (ORCH) providesaccess to a Hadoop cluster from R,
enabling manipulation of HDFS-resident data and the execution of Mapreduce jobs.

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Tyrone Systems
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
Neelam Rawat
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
KCC Software Ltd. & Easylearning.guru
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
Arvind Kalyan
 
BIG DATA
BIG DATABIG DATA
BIG DATA
Shashank Shetty
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
Matthew Dennis
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
Gigaom
 
Open Source Tools for Big Data
Open Source Tools for Big DataOpen Source Tools for Big Data
Open Source Tools for Big Data
Teemu Heikkilä
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
VIKAS KATARE
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
Ahmed Salman
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Kristof Jozsa
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
Arockiaraj Durairaj
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
ijeei-iaes
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Bigdata
Bigdata Bigdata
Bigdata
NithiDazz
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
KCC Software Ltd. & Easylearning.guru
 

What's hot (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Open Source Tools for Big Data
Open Source Tools for Big DataOpen Source Tools for Big Data
Open Source Tools for Big Data
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Bigdata
Bigdata Bigdata
Bigdata
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
 

Similar to Big data and Hadoop overview

Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
saisreealekhya
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
Dux Chandegra
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
himanshu arora
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
Sysfore Technologies
 
Big data
Big dataBig data
Big data
SMITSHAH219
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
Mohamed Magdy
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
faizrashid1995
 
Big data
Big dataBig data
Big data
Mohamed Salman
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
Rahul Sharma
 
Bigdata overview
Bigdata overviewBigdata overview
Bigdata overview
AllsoftSolutions
 
Big data
Big dataBig data
Big data
Nimish Kochhar
 
Big data
Big dataBig data
Big data
Nimish Kochhar
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
ijtsrd
 

Similar to Big data and Hadoop overview (20)

Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop
HadoopHadoop
Hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Big data
Big dataBig data
Big data
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
Big data
Big dataBig data
Big data
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Bigdata overview
Bigdata overviewBigdata overview
Bigdata overview
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 

Recently uploaded

The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 

Recently uploaded (20)

The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 

Big data and Hadoop overview

  • 1. Big Data and Hadoop Overview What it Big Data? Big data is a term that describes the large volume of data – both structured and unstructured – that inundate a business on a day-to-day basis. In short, big data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently. Who Generates Big Data? More and more data are being produced by an increasing number of electronic devices surrounding us and on the internet. The amount of data and the frequency at whichthey are produced are so vast that they are referred as “BIGData”. Why Is Big Data Important? The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:  Determining root causes of failures, issues and defects in near-real time.  Generating coupons at the point of sale based on the customer’s buying habits.  Recalculating entire risk portfoliosin minutes.  Detecting fraudulent behaviour before it affectsyour organization.
  • 2. Brief History of Big Data While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs: Volume Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies have eased the burden. Velocity Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Variety Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions. Now a day’s below two Vs got added Variability In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event- triggered peak data loads can be challenging to manage. Complexity Today's data comes from multiple sources, which makes it difficult to link, match, cleanse and transform data across systems. However, it’s necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control. Categories of 'Big Data' Big data' couldbe found in three forms: 1. Structured 2. Unstructured 3. Semi-structured Structured:Data stored in a relational database management system is one example of a 'structured' data. Unstructured:Output returned by 'GoogleSearch'. Semi-structured: Personal data stored in a XML file.
  • 3. Evolutionof Hadoop As the World Wide Web grew in the late 1900s and early 2000s, search engines and indexes were created to help locate relevant information amid the text-based content. In the early years, search results were returned by humans. But as the web grew from dozens to millions of pages, automation was needed. Web crawlers were created, many as university-led research projects, and search engine start-ups took off (Yahoo, AltaVista, etc.). One such project was an open-source web search engine called Nutch – the brainchild of Doug Cutting and Mike Cafarella. They wanted to return web search results faster by distributing data and calculations across different computers so multiple tasks could be accomplished simultaneously. During this time, another search engine project called Google was in progress. It was based on the same concept – storing and processing data in a distributed, automated way so that relevant web search results could be returned faster. In 2006, Cutting joined Yahoo and took with him the Nutch project as well as ideas based on Google’s early work with automating distributed data storage and processing. The Nutch project was divided – the web crawler portion remained as Nutch and the distributed computing and processing portion became Hadoop. In 2008, Yahoo released Hadoop as an open-source project. Today, Hadoop’s framework and ecosystem of technologies are managed and maintained by the non-profit Apache Software Foundation (ASF), a global community of software developers and contributors. FunFact: "Hadoop”was thenameof a yellow toy elephant owned by the son of one of its inventors.
  • 4. Why is Hadoop important?  Ability to store and process huge amounts of any kind of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that's a key consideration.  Computing power. Hadoop's distributed computing model processes big data fast. The more computing nodes you use the more processing power you have.  Fault tolerance. Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically.  Flexibility. Unlike traditional relational databases, you don’t have to pre-process data before storing it. You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos.  Low cost. The open-source framework is free and uses commodity hardware to store large quantities of data.  Scalability. You can easily grow your system to handle more data simply by adding nodes. Little administration is required. What are key component of Hadoop? There are 3 core components of the Hadoop framework are:  MapReduce– A software programming model forprocessing large sets of data in parallel  HDFS – The Java-based distributed file system that can store all kinds of data withoutprior organization.  YARN– A resource management frameworkforscheduling and handling resource requests from distributed applications.
  • 5. Types of Hadoop installation There are various ways in whichHadoop can be run. Here are the various scenarios in whichHadoop can be downloaded, installed and run. Standalone mode Though Hadoop is a distributed platform for working with big data, we can even install Hadoop on a single node in a single standalone instance. This way the entire Hadoop platform runs like a system which is running on Java. This is mostly used for the purpose of debugging. It helps if you want to check your mapreduce applications on a single node before running on a huge cluster of Hadoop. Fully Distributed mode This is distributed mode that has several nodes of commodity hardware connected to form the Hadoop cluster. In such a setup the NameNode, JobTracker and Secondary NameNode work on the master node whereas the Datanode and the secondarydatanode work on the slave node. The other set of nodes namely the Datanode and the TaskTracker work on the slave node. Pseudo distributed mode This in effect is a single node Java system that runs the entire Hadoop cluster. So the various daemons like the NameNode, Datanode, TaskTracker and JobTracker run on the single instance of the Java machine to form the distributed Hadoop cluster.
  • 6. Hadoop ecosystem  HBaseA scalable distributed database that supports structured data storage forlarge tables.  HiveA data warehouse infrastructure that provides data summarization and ad hoc querying.  MahoutA Scalable machine learning and data mining library.  PigA high-level data-flow language and execution frameworkforparallel computation.  Flumeis a distributed, reliable, and available service forefficiently collecting,aggregating, and moving large amounts of log data.  Oozieis a workflow scheduler system to manage Apache Hadoop jobs.  Sqoopis a tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases.  Zookeeper isan effortto develop and maintain an open-source server which enables highly reliable distributed coordination.  TheOracleR ConnectorforHadoop (ORCH) providesaccess to a Hadoop cluster from R, enabling manipulation of HDFS-resident data and the execution of Mapreduce jobs.