SlideShare a Scribd company logo
1 of 18
:Hype or necessity 
Presented by : 
Swapnaja Tandale 
BECSE(WIT,Solapur)
 Introduction : 
 Big data 
 Why study big data ? 
 The Three V’s 
 Data analysis and storage 
 Big Data Technology : 
 Hadoop 
• HDFS 
• MapReduce 
 Conclusion 
2
Big Data refers to datasets that grow so large that it is difficult 
to 
capture, store, manage, share, analyze and visualize with 
The typical database software tools. 
3
Social media 
and 
networks 
Scientific 
instruments 
Mobile 
devices 
Sensor 
technology 
and networks * To Analyze it.. 
large amounts of different data types, or 
big data, in an effort to uncover hidden 
patterns, unknown correlations and other 
useful information i.e Big data is GOLDMINE.
Big data Examples 
5+ 
billion 
people 
on the 
Web by 
end 
2014 
30 billion RFID 
tags today 
(1.3B in 2005) 
4.6 
billion 
camera 
phones 
world 
wide 
100s of 
millions 
of GPS 
enabled 
devices 
sold 
annually 
76 million smart 
meters in 2009… 
200M by 2014 
12+ TBs 
of tweet data 
every day 
10 billion 
people(1PB) 
? TBs of 
data every day 
5
 Volume :The amount of data is big. 
 Velocity : 
 How fast is data available for analysis 
 How fast you can use data 
 Variety : 
 Structured 
 Semi-structured 
 Unstructured 
Other V’s => Veracity ,Variability ,Visualization ,Value .. 
6
 Data Volume 
◦ 44x increase from 2009- 2020 
◦ From 0.8 zettabytes to 35zb. 
 Data volume is increasing exponentially . 
---- 
exponential
 Pre-defined schema imposed on data 
 Highly patterned structured 
 Usually stored in relational database system 
Numbers :20,3.14 
String:”Hello World” 
Dates: 08/04/2014 
Roughly 20% of all data out there is structured . 
8
 Inconsistent structure 
 Cannot be stored typically in tables or database 
 Information is often self-describing( 
label/value) pairs 
 No fix data models 
• Xml – Extensible markup language . 
• Sgml – Standard Generalized markup language . 
• Logs - Catlogs , Weblogs ,Graph logs . 
• Tweets. 
9
 Data does not resides in any particular form 
i.e row-column 
 Opposite of structured data 
•Multimedia –videos,photos,audio,files 
•Email ,Messages 
•Presentation and reports 
•Free form text 
•Word processing documents 
According to experts 80-90% of data in any organization is 
unstructured data . 
10
 Storage capacity of hard drives has increased 
massively over the years. 
 Access speeds have not kept up 
Year Storage 
Capacity 
Transfer 
Speed 
Time 
1990 1370 mb 4.4mbps <5min. 
2010 1Tb 100mbps >2.5hrs. 
 Problem and its solution :Big Data technology. 
11
12
To The Rescue! 
“Hadoop” 
Apache Hadoop is a framework for running applications on 
large cluster built of commodity hardware. 
A common way of avoiding data loss is through replication,The 
Hadoop Distributed Filesystem (HDFS), takes care of this 
problem. 
The second problem is solved by a simple programming model- 
Mapreduce. Hadoop is the popular open source implementation 
of MapReduce, a powerful tool designed for deep analysis and 
transformation of very large data sets. 
13
HDFS 
“Moving Computation is Cheaper than Moving Data” 
14 
HDFS, is a distributed file system designed to hold very 
large amounts of data (terabytes or even petabytes) 
•Redundant copies of the data are kept by the system so that 
in the event of failure, there is another copy available. 
•Portability Across Heterogeneous Hardware and Software 
Platforms
 MapReduce is a programming model . 
 Programs written in this functional style are automatically parallelized and 
executed on a large cluster of commodity machines. 
 MapReduce is an associated implementation for processing and generating large 
data sets. 
MapReduce 
MAP 
map function that processes a 
key/value pair to generate a 
set of intermediate key/value 
pairs 
REDUCE 
and a reduce function that 
merges all intermediate values 
associated with the same 
intermediate key. 
15
References 
o Hadoop- The Definitive Guide, O’Reilly 
2009, Yahoo! Press – Tom White. 
o http://en.wikipedia.org/wiki/Big_data 
* www.technologyreview.in/featured-story/ 
401775/10-emerging-technologies-that- 
will-change-the/ 
16
Conclusion 
BIG DATA is a key for innovation and has a high potential for 
value creation. There are huge opportunities, for example 
concerning healthcare, location related data, retail, manufacturing, 
or social data. There are also challenges, for example concerning 
data volume, data quality, data capturing, and data management, 
such as privacy, security or governance. 
17
Thank 
You

More Related Content

What's hot

What's hot (14)

Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Big data
Big dataBig data
Big data
 
Bar camp bigdata
Bar camp bigdataBar camp bigdata
Bar camp bigdata
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Big data analytics.
Big data analytics.Big data analytics.
Big data analytics.
 
Big Data
Big DataBig Data
Big Data
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Data mining on big data
Data mining on big dataData mining on big data
Data mining on big data
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 

Viewers also liked (10)

BigData and Hadoop Ecosystems_Foundation E-certficate
BigData and Hadoop Ecosystems_Foundation E-certficateBigData and Hadoop Ecosystems_Foundation E-certficate
BigData and Hadoop Ecosystems_Foundation E-certficate
 
Bigdata presentation
Bigdata presentationBigdata presentation
Bigdata presentation
 
BIGDATA & HADOOP PROJECT
BIGDATA & HADOOP PROJECTBIGDATA & HADOOP PROJECT
BIGDATA & HADOOP PROJECT
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data
Big DataBig Data
Big Data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Presentation on BigData by Swapnaja

Similar to Presentation on BigData by Swapnaja (20)

Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Unit 1
Unit 1Unit 1
Unit 1
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data présentation
Big data présentationBig data présentation
Big data présentation
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Hadoop
HadoopHadoop
Hadoop
 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 

Recently uploaded (20)

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Presentation on BigData by Swapnaja

  • 1. :Hype or necessity Presented by : Swapnaja Tandale BECSE(WIT,Solapur)
  • 2.  Introduction :  Big data  Why study big data ?  The Three V’s  Data analysis and storage  Big Data Technology :  Hadoop • HDFS • MapReduce  Conclusion 2
  • 3. Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with The typical database software tools. 3
  • 4. Social media and networks Scientific instruments Mobile devices Sensor technology and networks * To Analyze it.. large amounts of different data types, or big data, in an effort to uncover hidden patterns, unknown correlations and other useful information i.e Big data is GOLDMINE.
  • 5. Big data Examples 5+ billion people on the Web by end 2014 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014 12+ TBs of tweet data every day 10 billion people(1PB) ? TBs of data every day 5
  • 6.  Volume :The amount of data is big.  Velocity :  How fast is data available for analysis  How fast you can use data  Variety :  Structured  Semi-structured  Unstructured Other V’s => Veracity ,Variability ,Visualization ,Value .. 6
  • 7.  Data Volume ◦ 44x increase from 2009- 2020 ◦ From 0.8 zettabytes to 35zb.  Data volume is increasing exponentially . ---- exponential
  • 8.  Pre-defined schema imposed on data  Highly patterned structured  Usually stored in relational database system Numbers :20,3.14 String:”Hello World” Dates: 08/04/2014 Roughly 20% of all data out there is structured . 8
  • 9.  Inconsistent structure  Cannot be stored typically in tables or database  Information is often self-describing( label/value) pairs  No fix data models • Xml – Extensible markup language . • Sgml – Standard Generalized markup language . • Logs - Catlogs , Weblogs ,Graph logs . • Tweets. 9
  • 10.  Data does not resides in any particular form i.e row-column  Opposite of structured data •Multimedia –videos,photos,audio,files •Email ,Messages •Presentation and reports •Free form text •Word processing documents According to experts 80-90% of data in any organization is unstructured data . 10
  • 11.  Storage capacity of hard drives has increased massively over the years.  Access speeds have not kept up Year Storage Capacity Transfer Speed Time 1990 1370 mb 4.4mbps <5min. 2010 1Tb 100mbps >2.5hrs.  Problem and its solution :Big Data technology. 11
  • 12. 12
  • 13. To The Rescue! “Hadoop” Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. A common way of avoiding data loss is through replication,The Hadoop Distributed Filesystem (HDFS), takes care of this problem. The second problem is solved by a simple programming model- Mapreduce. Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. 13
  • 14. HDFS “Moving Computation is Cheaper than Moving Data” 14 HDFS, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes) •Redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. •Portability Across Heterogeneous Hardware and Software Platforms
  • 15.  MapReduce is a programming model .  Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines.  MapReduce is an associated implementation for processing and generating large data sets. MapReduce MAP map function that processes a key/value pair to generate a set of intermediate key/value pairs REDUCE and a reduce function that merges all intermediate values associated with the same intermediate key. 15
  • 16. References o Hadoop- The Definitive Guide, O’Reilly 2009, Yahoo! Press – Tom White. o http://en.wikipedia.org/wiki/Big_data * www.technologyreview.in/featured-story/ 401775/10-emerging-technologies-that- will-change-the/ 16
  • 17. Conclusion BIG DATA is a key for innovation and has a high potential for value creation. There are huge opportunities, for example concerning healthcare, location related data, retail, manufacturing, or social data. There are also challenges, for example concerning data volume, data quality, data capturing, and data management, such as privacy, security or governance. 17