SlideShare a Scribd company logo
1 of 15
What is Big data?
 ‘Big Data’ is similar to ‘small data’, but bigger in size.
 Big data is a term for data sets that are so large or complex
that traditional data processing applications are inadequate
to deal with them.
Evolution of
Technology
Conventional Systems
to Smart Systems
Telephone Desktop Car
Mobile Cloud Smart Car
Social Media
204,000,000 Emails
1,736,111 Pics
4,166,667 Likes &
200,000 Pics
300 Hours of video
uploaded
347,222 Tweets
Three Vs of Big Data
Velocity
• Data speed
Volume
• Data
quantity
Variety
• Data Types
Volume
10,000
20,000
30,000
40,000
2010 2011 2012 2013 201620152014 20182017 2019 2020
4.4 zettabytes of
today will grow up to
44 zettabytes or 44
trillion gigabytes, by
2020
Large amount of data generated every sec
Variety
Different kinds of data is being generated from various sources.
Structured Semi-Structured Unstructured
Velocity
Mobile,
Social
Media,
Cloud …
InternetClient/ServerMainframe
Data is being generated at an alarming rate.
Every 60 Seconds
100,000+Tweets
650,000+ status update
11000,000+Instant chat 698,445+google search
168,000,000+Emails
217+ New users
Problems with Big Data
Problem 1: Storing exponentially growing huge data sets
Solution: A distributed file system
Problem 2: Processing data having complex structure
Solution: A storage which does not use any particular schema to
store data.
Problem 3: Processing data faster
Hadoop is the solution
Hadoop is a frame work that allows us to store and process large data sets of
different types in parallel and distributed fashion.
HDFS (Storage) MapReduce (Processing)
Allows to store various data formats across a
data cluster
Allows parallel processing of data stored in
HDFS
History of Hadoop
 Hadoop was created by computer scientists Doug Cutting and
Mike Cafarella in 2005.
 It was inspired by Google's MapReduce, a software framework
in which an application is broken down into numerous small
parts.
 Doug named it after his son’s toy elephant.
Hadoop Distributed File System
HDFS creates a level of abstraction over the resources, from where we can see
the whole HDFS as a single unit.
HDFS has two core components: Name Node and Data
Node.
The Name Node is the main node that contains meta data about the data
stored.
Data is stored on the Data Nodes which are commodity hardware in the
distributed environment.
Storing Data(Solution is HDFS)
Storage unit of Hadoop
It is a distributed file system
Divide files (input data) into smaller chunks and stores it across the cluster
Scalable as per requirement
512 MB File
Distributed into four 128 MB Files
WORD COUNT USING MapReduce
Big data analytics.

More Related Content

What's hot

What's hot (20)

Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
 
Big Data
Big DataBig Data
Big Data
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Big data hadoop
Big data hadoopBig data hadoop
Big data hadoop
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big Data
Big DataBig Data
Big Data
 
Bar camp bigdata
Bar camp bigdataBar camp bigdata
Bar camp bigdata
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Big data
Big dataBig data
Big data
 
Hadoop
HadoopHadoop
Hadoop
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 

Similar to Big data analytics.

Similar to Big data analytics. (20)

Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Bigdata overview
Bigdata overviewBigdata overview
Bigdata overview
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data(hadoop)
Big data(hadoop)Big data(hadoop)
Big data(hadoop)
 
INTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATAINTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATA
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 

More from GauravBiswas9 (12)

Pipeline anomaly detection
Pipeline anomaly detectionPipeline anomaly detection
Pipeline anomaly detection
 
False colouring
False colouringFalse colouring
False colouring
 
SPARK ARCHITECTURE
SPARK ARCHITECTURESPARK ARCHITECTURE
SPARK ARCHITECTURE
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
WCDMA
WCDMA WCDMA
WCDMA
 
Ofdm
OfdmOfdm
Ofdm
 
2.5G Cellular Standards
2.5G Cellular Standards2.5G Cellular Standards
2.5G Cellular Standards
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Iot in healthcare
Iot in healthcareIot in healthcare
Iot in healthcare
 
Gsm vs gprs
Gsm vs gprsGsm vs gprs
Gsm vs gprs
 
Circuit switch vs packet switch
Circuit switch vs packet switchCircuit switch vs packet switch
Circuit switch vs packet switch
 
Channelization scheme in AMPS & GSM
Channelization scheme in AMPS & GSMChannelization scheme in AMPS & GSM
Channelization scheme in AMPS & GSM
 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 

Big data analytics.

  • 1.
  • 2. What is Big data?  ‘Big Data’ is similar to ‘small data’, but bigger in size.  Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them.
  • 3. Evolution of Technology Conventional Systems to Smart Systems Telephone Desktop Car Mobile Cloud Smart Car
  • 4. Social Media 204,000,000 Emails 1,736,111 Pics 4,166,667 Likes & 200,000 Pics 300 Hours of video uploaded 347,222 Tweets
  • 5. Three Vs of Big Data Velocity • Data speed Volume • Data quantity Variety • Data Types
  • 6. Volume 10,000 20,000 30,000 40,000 2010 2011 2012 2013 201620152014 20182017 2019 2020 4.4 zettabytes of today will grow up to 44 zettabytes or 44 trillion gigabytes, by 2020 Large amount of data generated every sec
  • 7. Variety Different kinds of data is being generated from various sources. Structured Semi-Structured Unstructured
  • 8. Velocity Mobile, Social Media, Cloud … InternetClient/ServerMainframe Data is being generated at an alarming rate. Every 60 Seconds 100,000+Tweets 650,000+ status update 11000,000+Instant chat 698,445+google search 168,000,000+Emails 217+ New users
  • 9. Problems with Big Data Problem 1: Storing exponentially growing huge data sets Solution: A distributed file system Problem 2: Processing data having complex structure Solution: A storage which does not use any particular schema to store data. Problem 3: Processing data faster
  • 10. Hadoop is the solution Hadoop is a frame work that allows us to store and process large data sets of different types in parallel and distributed fashion. HDFS (Storage) MapReduce (Processing) Allows to store various data formats across a data cluster Allows parallel processing of data stored in HDFS
  • 11. History of Hadoop  Hadoop was created by computer scientists Doug Cutting and Mike Cafarella in 2005.  It was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts.  Doug named it after his son’s toy elephant.
  • 12. Hadoop Distributed File System HDFS creates a level of abstraction over the resources, from where we can see the whole HDFS as a single unit. HDFS has two core components: Name Node and Data Node. The Name Node is the main node that contains meta data about the data stored. Data is stored on the Data Nodes which are commodity hardware in the distributed environment.
  • 13. Storing Data(Solution is HDFS) Storage unit of Hadoop It is a distributed file system Divide files (input data) into smaller chunks and stores it across the cluster Scalable as per requirement 512 MB File Distributed into four 128 MB Files
  • 14. WORD COUNT USING MapReduce