SlideShare a Scribd company logo
1 of 19
What is BIG DATA ?
 Big data is a term that describes the large volume of data – both structured
and unstructured – that inundates a business on a day-to-day basis. Big
data can be analyzed for insights that lead to better decisions and strategic
business moves.
Where does it come from ?
 Social Media
 Business Transaction
 Smart Phones
 Vehicles
 Satellite
 Log Files
 Smart Devices
 Sensors
Fact Check
3 V’s of Big Data
 Velocity : The rate at which data is generated and changed.
 Variety : The number of different data sources and types.
 Volume : The average quantity of data units/category.
Importance of Big Data
The importance of big data doesn’t revolve around how much data
you have, but what you do with it. You can take data from any
source and analyze it to find answers that enable 1) cost reductions,
2) time reductions, 3) new product development and optimized
offerings, and 4) smart decision making. When you combine big data
with high-powered analytics, you can accomplish business-related
tasks such as:
 Determining root causes of failures, issues and defects in near-real
time.
 Generating coupons at the point of sale based on the customer’s
buying habits.
 Recalculating entire risk portfolios in minutes.
 Detecting fraudulent behavior before it affects your organization.
Applications of Big Data
 A 360 degree view of a customer.
 Internet of Things
 Healthcare
 Information Security
 E-Commerce
 Data warehouse optimization
Emergence of Hadoop
 An Open Source project Nutch (A search engine) – the brainchild of Doug
Cutting and Mike Cafarella, aimed at returning web search results faster by
distributing data and calculations across different computers so multiple
tasks could be accomplished simultaneously.
 In 2006, Cutting joined Yahoo. The Nutch project was divided – the web
crawler portion remained as Nutch and the distributed computing and
processing portion became Hadoop.
 In 2008, Yahoo released Hadoop as an open-source project. Today,
Hadoop’s framework and ecosystem of technologies are managed and
maintained by the non-profit Apache Software Foundation (ASF), a global
community of software developers and contributors.
Importance
 Ability to store and process huge amounts of any kind of data,
quickly. With data volumes and varieties constantly increasing, especially
from social media and the Internet of Things (IoT), that's a key
consideration.
 Computing power. Hadoop's distributed computing model processes big
data fast. The more computing nodes you use, the more processing power
you have.
 Fault tolerance. Data and application processing are protected against
hardware failure. If a node goes down, jobs are automatically redirected to
other nodes to make sure the distributed computing does not fail. Multiple
copies of all data are stored automatically
Importance(contd.)
 Flexibility. Unlike traditional relational databases, you don’t have to
preprocess data before storing it. You can store as much data as you want
and decide how to use it later. That includes unstructured data like text,
images and videos.
 Low cost. The open-source framework is free and uses commodity
hardware to store large quantities of data.
 Scalability. You can easily grow your system to handle more data simply
by adding nodes. Little administration is required.
Hadoop Distributed File System
(HDFS)
MapReduce
 MapReduce is a programming model and an associated implementation
for processing and generating large data sets with a parallel, distributed
algorithm on a cluster.
 The term MapReduce actually refers to two separate and distinct tasks that
Hadoop programs perform. The first is the map job, which takes a set of
data and converts it into another set of data, where individual elements are
broken down into tuples (key/value pairs). The reduce job takes the output
from a map as input and combines those data tuples into a smaller set of
tuples. As the sequence of the name MapReduce implies, the reduce job is
always performed after the map job.
MapReduce : Example
 Let’s look at a simple example. Assume you have three files, and each file
contains two columns (a key and a value in Hadoop terms) that represent
a city and the corresponding temperature recorded in that city for the
various measurement days. Of course we’ve made this example very
simple so it’s easy to follow. You can imagine that a real application won’t
be quite so simple, as it’s likely to contain millions or even billions of rows,
and they might not be neatly formatted rows at all; in fact, no matter how
big or small the amount of data you need to analyze, the key principles
we’re covering here remain the same. Either way, in this example, city is
the key and temperature is the value.
Map Reduce : Example (contd.)
Key Value
Toronto 20
Whitby 25
New York 22
Rome 32
Toronto 14
Rome 33
New York 18
Key Value
Toronto 18
Whitby 22
New York 25
Rome 35
Toronto 22
Rome 38
New York 21
Key Value
Toronto 22
Whitby 26
New York 24
Rome 36
Toronto 12
Rome 35
New York 19
File 1 File 2 File 3
 Out of all the data we have collected, we want to find the maximum
temperature for each city across all of the data files (note that each file
might have the same city represented multiple times).
Map Reduce : Example (contd.)
 After mapping each file will return data as shown below. This is called
mapped data.
Key Value
Toronto 20
Whitby 25
New York 22
Rome 33
Key Value
Whitby 22
New York 25
Toronto 22
Rome 38
Key Value
Toronto 22
Whitby 26
New York 24
Rome 36
Map Reduce : Example (contd.)
 After mapping the reduction phase is performed and the final result is
displayed. All the data in the three files will be compared among the
corresponding key to find the highest temperature.
 The final result will be as follows:-
Key Value
Toronto 22
Whitby 26
New York 26
Rome 38
Conclusion
 Big data is changing the way people within organizations work together. It
is creating a culture in which business and IT leaders must join forces to
realize value from all data. Insights from big data can enable all employees
to make better decisions—deepening customer engagement, optimizing
operations, preventing threats and fraud, and capitalizing on new sources
of revenue. But escalating demand for insights requires a fundamentally
new approach to architecture, tools and practices.
 Competitive Advantage
 Better decision making
 Value of data
What is Big Data and Hadoop Framework

More Related Content

What's hot

R programming analysis
R programming analysisR programming analysis
R programming analysisdigitaladitya
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportAhmad El Tawil
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins Edureka!
 
Aginity Big Data Research Lab V3
Aginity Big Data Research Lab V3Aginity Big Data Research Lab V3
Aginity Big Data Research Lab V3mcacicio
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analyticsAvinash Pandu
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analyticsAvinash Pandu
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions csandit
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisYuanyuan Tian
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Big Data Spain
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduceUday Vakalapudi
 

What's hot (17)

R programming analysis
R programming analysisR programming analysis
R programming analysis
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Late Arrival Facts
Late Arrival FactsLate Arrival Facts
Late Arrival Facts
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
Aginity Big Data Research Lab V3
Aginity Big Data Research Lab V3Aginity Big Data Research Lab V3
Aginity Big Data Research Lab V3
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Big data
Big dataBig data
Big data
 
BIG DATA – Beyond the Hype
BIG DATA – Beyond the HypeBIG DATA – Beyond the Hype
BIG DATA – Beyond the Hype
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Big data no company
Big data   no companyBig data   no company
Big data no company
 
Hadoop Mapreduce joins
Hadoop Mapreduce joinsHadoop Mapreduce joins
Hadoop Mapreduce joins
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduce
 

Viewers also liked

O que é preciso para criar um ambiente de trabalho saudável e produtivo?
O que é preciso para criar um ambiente de trabalho saudável e produtivo?O que é preciso para criar um ambiente de trabalho saudável e produtivo?
O que é preciso para criar um ambiente de trabalho saudável e produtivo?Ítalo de Oliveira Mendonça
 
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBMIBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBMInternet World
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Desing Pathshala
 
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Usama Fayyad
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler Shengwen HOU(侯圣文)
 
1 s2.0-s0099239911004882-main
1 s2.0-s0099239911004882-main1 s2.0-s0099239911004882-main
1 s2.0-s0099239911004882-mainCabinet Lupu
 

Viewers also liked (16)

O que é preciso para criar um ambiente de trabalho saudável e produtivo?
O que é preciso para criar um ambiente de trabalho saudável e produtivo?O que é preciso para criar um ambiente de trabalho saudável e produtivo?
O que é preciso para criar um ambiente de trabalho saudável e produtivo?
 
designing
designingdesigning
designing
 
Expo de opv
Expo de opvExpo de opv
Expo de opv
 
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBMIBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM
 
Anju
AnjuAnju
Anju
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
OAOP
OAOPOAOP
OAOP
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
 
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler
 
Medidas de forma
Medidas de formaMedidas de forma
Medidas de forma
 
1 s2.0-s0099239911004882-main
1 s2.0-s0099239911004882-main1 s2.0-s0099239911004882-main
1 s2.0-s0099239911004882-main
 

Similar to What is Big Data and Hadoop Framework

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworksAmal Targhi
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopAnusha sweety
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
A NOVEL APPROACH FOR PROCESSING BIG DATA
A NOVEL APPROACH FOR PROCESSING BIG DATAA NOVEL APPROACH FOR PROCESSING BIG DATA
A NOVEL APPROACH FOR PROCESSING BIG DATAijdms
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
Big Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesBig Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesUyoyo Edosio
 

Similar to What is Big Data and Hadoop Framework (20)

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
A NOVEL APPROACH FOR PROCESSING BIG DATA
A NOVEL APPROACH FOR PROCESSING BIG DATAA NOVEL APPROACH FOR PROCESSING BIG DATA
A NOVEL APPROACH FOR PROCESSING BIG DATA
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Big Data
Big DataBig Data
Big Data
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
Big Data
Big DataBig Data
Big Data
 
B017320612
B017320612B017320612
B017320612
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
12575474.ppt
12575474.ppt12575474.ppt
12575474.ppt
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Big Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesBig Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and Challenges
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

What is Big Data and Hadoop Framework

  • 1.
  • 2. What is BIG DATA ?  Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
  • 3. Where does it come from ?  Social Media  Business Transaction  Smart Phones  Vehicles  Satellite  Log Files  Smart Devices  Sensors
  • 5. 3 V’s of Big Data  Velocity : The rate at which data is generated and changed.  Variety : The number of different data sources and types.  Volume : The average quantity of data units/category.
  • 6. Importance of Big Data The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:  Determining root causes of failures, issues and defects in near-real time.  Generating coupons at the point of sale based on the customer’s buying habits.  Recalculating entire risk portfolios in minutes.  Detecting fraudulent behavior before it affects your organization.
  • 7. Applications of Big Data  A 360 degree view of a customer.  Internet of Things  Healthcare  Information Security  E-Commerce  Data warehouse optimization
  • 8.
  • 9. Emergence of Hadoop  An Open Source project Nutch (A search engine) – the brainchild of Doug Cutting and Mike Cafarella, aimed at returning web search results faster by distributing data and calculations across different computers so multiple tasks could be accomplished simultaneously.  In 2006, Cutting joined Yahoo. The Nutch project was divided – the web crawler portion remained as Nutch and the distributed computing and processing portion became Hadoop.  In 2008, Yahoo released Hadoop as an open-source project. Today, Hadoop’s framework and ecosystem of technologies are managed and maintained by the non-profit Apache Software Foundation (ASF), a global community of software developers and contributors.
  • 10. Importance  Ability to store and process huge amounts of any kind of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that's a key consideration.  Computing power. Hadoop's distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have.  Fault tolerance. Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically
  • 11. Importance(contd.)  Flexibility. Unlike traditional relational databases, you don’t have to preprocess data before storing it. You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos.  Low cost. The open-source framework is free and uses commodity hardware to store large quantities of data.  Scalability. You can easily grow your system to handle more data simply by adding nodes. Little administration is required.
  • 12. Hadoop Distributed File System (HDFS)
  • 13. MapReduce  MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.  The term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the map job.
  • 14. MapReduce : Example  Let’s look at a simple example. Assume you have three files, and each file contains two columns (a key and a value in Hadoop terms) that represent a city and the corresponding temperature recorded in that city for the various measurement days. Of course we’ve made this example very simple so it’s easy to follow. You can imagine that a real application won’t be quite so simple, as it’s likely to contain millions or even billions of rows, and they might not be neatly formatted rows at all; in fact, no matter how big or small the amount of data you need to analyze, the key principles we’re covering here remain the same. Either way, in this example, city is the key and temperature is the value.
  • 15. Map Reduce : Example (contd.) Key Value Toronto 20 Whitby 25 New York 22 Rome 32 Toronto 14 Rome 33 New York 18 Key Value Toronto 18 Whitby 22 New York 25 Rome 35 Toronto 22 Rome 38 New York 21 Key Value Toronto 22 Whitby 26 New York 24 Rome 36 Toronto 12 Rome 35 New York 19 File 1 File 2 File 3  Out of all the data we have collected, we want to find the maximum temperature for each city across all of the data files (note that each file might have the same city represented multiple times).
  • 16. Map Reduce : Example (contd.)  After mapping each file will return data as shown below. This is called mapped data. Key Value Toronto 20 Whitby 25 New York 22 Rome 33 Key Value Whitby 22 New York 25 Toronto 22 Rome 38 Key Value Toronto 22 Whitby 26 New York 24 Rome 36
  • 17. Map Reduce : Example (contd.)  After mapping the reduction phase is performed and the final result is displayed. All the data in the three files will be compared among the corresponding key to find the highest temperature.  The final result will be as follows:- Key Value Toronto 22 Whitby 26 New York 26 Rome 38
  • 18. Conclusion  Big data is changing the way people within organizations work together. It is creating a culture in which business and IT leaders must join forces to realize value from all data. Insights from big data can enable all employees to make better decisions—deepening customer engagement, optimizing operations, preventing threats and fraud, and capitalizing on new sources of revenue. But escalating demand for insights requires a fundamentally new approach to architecture, tools and practices.  Competitive Advantage  Better decision making  Value of data