SlideShare a Scribd company logo
Presented By:
SADHANA SINGH
M.TECH 1ST YEAR (COMPUTER
SCIENCE)
 Introduction
 Evolution of big data
 Characteristics
 Examples of big data generation
 Big data v/s RDBMS
 Hadoop
 HDFS
 MapReduce
 references
 Big data is a term for DATASETS that are so
large or complex that traditional data
processing applications are inadequate.
 Big data is the capability to manage huge
volume of disparate data, at the right speed
and within the right time frame to allow real
time analysis.
Wave 1: Creating
manageable data structures
Wave 2: Web and content
management
Wave 3: Managing big data
source: understanding big data analytics for enterprise class hadoop and
streaming data
V’sofbigdata
VOLUME
VELOCITY
VARIETY
VERACITY
VARIABILITY
Black box data
Social media data
Stock exchange data
Power grid data
Transport data
Search engine data
 Huge competition in market:
retails- customer analytics and predictive analytics
travel- travel pattern of customers
website- understand users navigation pattern,
interest, conversion etc.
 Sensors, satellite and geospatial data
 Military and intelligence
 Big Data includes huge volume, high
velocity, and extensible variety of data.
The data in it will be of three types.
• Relational data.Structured
data:
• XML data
Semi
Structured
data:
• Word, PDF, Text,
Media Logs
Unstructured
data:
Relational database Big data
 Single-computer
platform that scales
with better CPUs,
centralized
processing.
 Relational database
(SQL), centralized
storage.
 Batched, descriptive,
centralized
 Cluster platforms that
scale to thousands of
nodes, distributed
process
 Non-relational
databases that manage
varied data types and
formats (NoSQL),
distributed storage.
 Real-time, predictive
and prescriptive,
distributed analytics
 An open source apache foundation
framework.
 It allows distributed processing of large
datasets across clusters of computers using
simple programming models.
 Hadoop runs applications using the
MapReduce algorithm, where the data is
processed in parallel with others.
 It Uses the concept of Data locality.
 Processing/Comput
ation layer
(MapReduce), and
 Storage layer
(Hadoop
Distributed File
System).
source : hadoop tutorial on www.tutorialspoint.com
 Hadoop framework allows the user to
quickly write and test distributed systems. It
is efficient, and it automatic distributes the
data and work across the machines and in
turn, utilizes the underlying parallelism of
the CPU cores.
 Hadoop does not rely on hardware to
provide fault-tolerance and high availability
(FTHA), rather Hadoop library itself has been
designed to detect and handle failures at the
application layer.
 hadoop is designed to be self-healing.
 HDFS is a file system designed for storing
very large files with streaming data access
patterns, running on clusters of commodity
hardware.
 It can be defined as, "A reliable, high
bandwidth, low-cost, data-storage cluster
that facilitates the management of related
files across machines.”
Basic architecture of HDFS
source: J. Hurwitz, et al., “Big Data for Dummies,” Wiley, 2013, ISBN:978-1-118-50422-2.
Replica placement
source: Hadoop: The Definitive Guide, by Tom White, 2015, ISBN: 978-1-491-90163-2
 Hadoop mapReduce is an implementation of
mapReduce algorithm.
 Map reduce is a batch query processor, and
the ability to run an adhoc query against
whole dataset and get the results in a
reasonable time is TRANSFORMATIVE.
source: J. Hurwitz, et al., “Big Data for Dummies,” Wiley, 2013, ISBN:978-1-118-50422-2.
 Example of air temperature analysis.
 Problems :
 Dividing the work into equal size pieces is not
easy.
 Combininng the results from independent process
may requirefurther processing.
 The processing capacity of a single machine is
limited.
source: Hadoop: The Definitive Guide, by Tom White, 2015, ISBN: 978-1-491-90163-2
source: Hadoop: The Definitive Guide, by Tom White, 2015, ISBN: 978-1-491-90163-2
i. J. Hurwitz, et al., “Big Data for Dummies,”
Wiley, 2013, ISBN:978-1-118-50422-2.
ii. http://www.cse.wustl.edu/~jain/cse570-
13/
iii. Hadoop: The Definitive Guide, by Tom
White, 2015, ISBN: 978-1-491-90163-2
iv. Hadoop tutorials on
www.tutorialspoint.com
Big_data_ppt

More Related Content

What's hot

Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
Aswadmehar
 
Big data
Big dataBig data
Big data
Nimish Kochhar
 
Big Data
Big DataBig Data
Big Data
Vinayak Kamath
 
Big data
Big dataBig data
Big data
Nausheen Hasan
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
Sivashankar Ganapathy
 
Big Data
Big DataBig Data
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
Chirag Ahuja
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Big data ppt
Big data pptBig data ppt
Big data ppt
IDBI Bank Ltd.
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
Bernard Marr
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Maruf Abdullah (Rion)
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
Srinath Perera
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Ghulam Imaduddin
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Md. Salman Ahmed
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 

What's hot (20)

Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big Data
Big DataBig Data
Big Data
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 

Similar to Big_data_ppt

Big data
Big dataBig data
Big data
Mina Soltani
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
Sysfore Technologies
 
Big data
Big dataBig data
Big data
revathireddyb
 
Big data
Big dataBig data
Big data
revathireddyb
 
Hadoop
HadoopHadoop
Hadoop
Ankit Prasad
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYAAditya Srinivasan
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
saisreealekhya
 
HDFS
HDFSHDFS
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
faizrashid1995
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
himanshu arora
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Samsung Business USA
 
Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja
Swapnaja Tandale
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 

Similar to Big_data_ppt (20)

Big data
Big dataBig data
Big data
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
paper
paperpaper
paper
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop
HadoopHadoop
Hadoop
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
HDFS
HDFSHDFS
HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
 
Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 

Big_data_ppt

  • 1. Presented By: SADHANA SINGH M.TECH 1ST YEAR (COMPUTER SCIENCE)
  • 2.  Introduction  Evolution of big data  Characteristics  Examples of big data generation  Big data v/s RDBMS  Hadoop  HDFS  MapReduce  references
  • 3.  Big data is a term for DATASETS that are so large or complex that traditional data processing applications are inadequate.  Big data is the capability to manage huge volume of disparate data, at the right speed and within the right time frame to allow real time analysis.
  • 4. Wave 1: Creating manageable data structures Wave 2: Web and content management Wave 3: Managing big data
  • 5. source: understanding big data analytics for enterprise class hadoop and streaming data
  • 7. Black box data Social media data Stock exchange data Power grid data Transport data Search engine data
  • 8.  Huge competition in market: retails- customer analytics and predictive analytics travel- travel pattern of customers website- understand users navigation pattern, interest, conversion etc.  Sensors, satellite and geospatial data  Military and intelligence
  • 9.  Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types. • Relational data.Structured data: • XML data Semi Structured data: • Word, PDF, Text, Media Logs Unstructured data:
  • 10. Relational database Big data  Single-computer platform that scales with better CPUs, centralized processing.  Relational database (SQL), centralized storage.  Batched, descriptive, centralized  Cluster platforms that scale to thousands of nodes, distributed process  Non-relational databases that manage varied data types and formats (NoSQL), distributed storage.  Real-time, predictive and prescriptive, distributed analytics
  • 11.  An open source apache foundation framework.  It allows distributed processing of large datasets across clusters of computers using simple programming models.  Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel with others.  It Uses the concept of Data locality.
  • 12.  Processing/Comput ation layer (MapReduce), and  Storage layer (Hadoop Distributed File System). source : hadoop tutorial on www.tutorialspoint.com
  • 13.  Hadoop framework allows the user to quickly write and test distributed systems. It is efficient, and it automatic distributes the data and work across the machines and in turn, utilizes the underlying parallelism of the CPU cores.  Hadoop does not rely on hardware to provide fault-tolerance and high availability (FTHA), rather Hadoop library itself has been designed to detect and handle failures at the application layer.  hadoop is designed to be self-healing.
  • 14.  HDFS is a file system designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware.  It can be defined as, "A reliable, high bandwidth, low-cost, data-storage cluster that facilitates the management of related files across machines.”
  • 15. Basic architecture of HDFS source: J. Hurwitz, et al., “Big Data for Dummies,” Wiley, 2013, ISBN:978-1-118-50422-2.
  • 16. Replica placement source: Hadoop: The Definitive Guide, by Tom White, 2015, ISBN: 978-1-491-90163-2
  • 17.  Hadoop mapReduce is an implementation of mapReduce algorithm.  Map reduce is a batch query processor, and the ability to run an adhoc query against whole dataset and get the results in a reasonable time is TRANSFORMATIVE.
  • 18. source: J. Hurwitz, et al., “Big Data for Dummies,” Wiley, 2013, ISBN:978-1-118-50422-2.
  • 19.  Example of air temperature analysis.  Problems :  Dividing the work into equal size pieces is not easy.  Combininng the results from independent process may requirefurther processing.  The processing capacity of a single machine is limited. source: Hadoop: The Definitive Guide, by Tom White, 2015, ISBN: 978-1-491-90163-2
  • 20. source: Hadoop: The Definitive Guide, by Tom White, 2015, ISBN: 978-1-491-90163-2
  • 21. i. J. Hurwitz, et al., “Big Data for Dummies,” Wiley, 2013, ISBN:978-1-118-50422-2. ii. http://www.cse.wustl.edu/~jain/cse570- 13/ iii. Hadoop: The Definitive Guide, by Tom White, 2015, ISBN: 978-1-491-90163-2 iv. Hadoop tutorials on www.tutorialspoint.com

Editor's Notes

  1. There is a data explosion, according to an figure by IBM 2.5 quintilloins of data is created each day which is very huge amount of data. And to oracle ,in 2012 data growth rate was 40% compound annual rate. Data is growing exponentially.
  2. in the late 1960s, data was stored in flat files that imposed no structure Later in the 1970s, things changed with the invention of the relational data model and the relational database management system (RDBMS) that imposed structure and a method for improving performance Enterprise Content Management systems evolved in the 1980s to provide businesses with the capability to better manage unstructured data, mostly documents. In the 1990s with the rise of the web, organizations wanted to move beyond documents and store and manage web content, images, audio, and video. As with other waves in data management, big data is built on top of the evolution of data management practices over the past five decades. With big data, it is now possible to virtualize data so that it can be stored efficiently and, utilizing cloud-based storage, more cost-effectively as well.
  3. Volume: How much data Velocity: How fast that data is processed Variety: The various types of data Even more important is the fourth V: veracity. How accurate is that data in predicting business value? Variability : Inconsistency of the data set can hamper processes to handle and manage it.
  4. Data processing Data management Analytics
  5. Hadoop is designed to parallelize data processing across computing nodes to speed computations and hide latency. Hadoop has two primary layers: Mapreduce HDFS YARN Common utilities
  6. Servers can be added or removed from the cluster dynamically and Hadoop continues to operate without interruption. Or we can say, hadoop is able to detect changes, including failures, and adjust to the changes and continues to operate without interruption.
  7. Very large files “Very large” in this context means files that are hundreds of megabytes, gigabytes, or terabytes in size. There are Hadoop clusters running today that store petabytes of data. Streaming data access HDFS is built around the idea that the most efficient data processing pattern is a write-once, read-many-times pattern. A dataset is typically generated or copied from source, and then various analyses are performed on that dataset over time. Each analysis will involve a large proportion, if not all, of the dataset, so the time to read the whole dataset is more important than the latency in reading the first record. Commodity hardware Hadoop doesn’t require expensive, highly reliable hardware. It’s designed to run on clusters of commodity hardware (commonly available hardware that can be obtained from multiple vendors) for which the chance of node failure across the cluster is high, at least for large clusters. HDFS is designed to carry on working without a noticeable interruption to the user in the face of such failure. Low-latency data access Lots of small files Multiple writers, arbitrary file modifications
  8. placing replicas in different data centers may maximize redundancy, but at the cost of bandwidth. Even in the same data center (which is what all Hadoop clusters to date have run in), there are a variety of possible placement strategies. Data Hadoop’s default strategy is to place the first replica on the same node as the client (for clients running outside the cluster, a node is chosen at random, although the system tries not to pick nodes that are too full or too busy). The second replica is placed on a different rack from the first (off-rack), chosen at random. The third replica is placed on the same rack as the second, but on a different node chosen at random