SlideShare a Scribd company logo
1 of 29
Building A Smarter Planet
“ INTRODUCTION TO BIG DATA AND HADOOP“
“ Avishek ghosh“
Presented By:
ACADEMY OF TECHNOLOGY,ADISAPTAGRAM
BIG DATA AND HADOOP
ECOSYSTEM
What is Big Data?
Big data-A growing torrent of data
1
$600 to buy a disk drive that can
store all of the world music
2
What Launch Big Data Era?
Data Torrent Computing Anytime,
Anywhere
Big Data Era3
Where Does Big Data Comes From?
Machines People Organizations
3 major sources of Big Data
4
Machine Generated Data-It’s Everywhere and there’s a Lot!
Big Plane -> Big Data?
5
More data = More safe
 Sensors
Temperature
Pressure
Malfunctions
 Real time problem Detection
6
Big Data Generated By People-The Unstructured Challenge
Text Heavy
Unstructured
>
7
Company Data Processed Daily
eBay 100Petabytes(PB)
Google 100PB
Facebook 30+PB
Twitter 100TeraBytes(TB)
Spotify 64 TB
 The Unstructured Data Challenge
8
Structure
9
 80%-90% of entire Data is Unstructured!
10
11
Tools Data Skilled People
Value
12
Organization Generated Data-Structured But Often Siloed
Commercial
Transactions
Banking/Stock
Records
Credit
Cards
Government
Open Data
E-Commerce
Medical
Records
…..
13
Real-World Examples
16 Million
Shipments Per Day
40 Million
Tracking Records
UPS is estimated to have
16 PBs
Of data about its operations
14
Can You Guess How
much money UPS
Can Save by Reducing
Each Driver’s Route
by just 1 Mile?
50 Million
Dollars!
15
• How much Companies are spending on Big
Data?
 Benefits using Big Data
Efficient Operation Higher Sales
Improved Safety
Customer Satisfaction Better Profit Margins
Improved Product Placement
Characteristics Of Big Data-V’s Of Big Data
17
Getting Started-Why Hadoop?
The Hadoop Ecosystem is Great for Big Data
Major Goals
• Enable Scalability
• Optimized for a variety data types
• Facilitate Shared Environment
• Provide Value
• Handle Fault Tolerance
18
The Hadoop Ecosystem
Main Hadoop Components
MapReduce
YARN
HDFS
19
HDFS = foundation for Hadoop
Ecosystem
What is HDFS?
Up to 200
petabytes,
1 billion files and
blocks!
20
22
QUESTIONS?
24
SOURCES:-
• University Of California , San Diego(Super Computer)
• http://www.cloudera.com/
• http://www.ibm.com/big-data/us/en/
25
ACKNOWLEDGEMENTS:-
I would like to thank Prof. Prasenjit Das for her cordial support and
encouragement which was one of the key resources behind this presentation. And
also thanks to all faculty of CSE for your support too.
26
BIG DATA AND HADOOP ECOSYSTEM

More Related Content

What's hot

Risk Factory Big Daddy Digs Big Data
Risk Factory Big Daddy Digs Big DataRisk Factory Big Daddy Digs Big Data
Risk Factory Big Daddy Digs Big DataRisk Crew
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analyticsSanjeev Solanki
 
Data and Content Production - BrightonSEO 2013
Data and Content Production - BrightonSEO 2013Data and Content Production - BrightonSEO 2013
Data and Content Production - BrightonSEO 2013Jellyfish Agency
 
Big Data can be fun!
Big Data can be fun!Big Data can be fun!
Big Data can be fun!Bruno Aziza
 
Data is the new oil
Data is the new oil Data is the new oil
Data is the new oil Richard Titus
 
Action Intelligence for Social Good
Action Intelligence for Social GoodAction Intelligence for Social Good
Action Intelligence for Social GoodFred Chiang
 
Internet of Things Cologne 2015: Rethinking Global Real-Time Data Integration...
Internet of Things Cologne 2015: Rethinking Global Real-Time Data Integration...Internet of Things Cologne 2015: Rethinking Global Real-Time Data Integration...
Internet of Things Cologne 2015: Rethinking Global Real-Time Data Integration...MongoDB
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for BeginnersMichael Perez
 
Mobile & Big Data
Mobile & Big DataMobile & Big Data
Mobile & Big DataSuzzicks
 
Big data Seminar/Presentation
Big data Seminar/PresentationBig data Seminar/Presentation
Big data Seminar/PresentationKirtimaan Chhabra
 

What's hot (13)

Adam wray 945
Adam wray   945Adam wray   945
Adam wray 945
 
Risk Factory Big Daddy Digs Big Data
Risk Factory Big Daddy Digs Big DataRisk Factory Big Daddy Digs Big Data
Risk Factory Big Daddy Digs Big Data
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
Data and Content Production - BrightonSEO 2013
Data and Content Production - BrightonSEO 2013Data and Content Production - BrightonSEO 2013
Data and Content Production - BrightonSEO 2013
 
Big Data can be fun!
Big Data can be fun!Big Data can be fun!
Big Data can be fun!
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Data is the new oil
Data is the new oil Data is the new oil
Data is the new oil
 
Action Intelligence for Social Good
Action Intelligence for Social GoodAction Intelligence for Social Good
Action Intelligence for Social Good
 
Internet of Things Cologne 2015: Rethinking Global Real-Time Data Integration...
Internet of Things Cologne 2015: Rethinking Global Real-Time Data Integration...Internet of Things Cologne 2015: Rethinking Global Real-Time Data Integration...
Internet of Things Cologne 2015: Rethinking Global Real-Time Data Integration...
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for Beginners
 
Mobile & Big Data
Mobile & Big DataMobile & Big Data
Mobile & Big Data
 
Big data Seminar/Presentation
Big data Seminar/PresentationBig data Seminar/Presentation
Big data Seminar/Presentation
 

Viewers also liked

Como crear dinero - Sanaya Roman Duane Packer
Como crear dinero - Sanaya Roman Duane PackerComo crear dinero - Sanaya Roman Duane Packer
Como crear dinero - Sanaya Roman Duane PackerMarlon Vergara
 
Traverse City Remodel
Traverse City RemodelTraverse City Remodel
Traverse City RemodelJohn Zanella
 
9 مراقبة الجودة في إنتاج الأعلاف
9 مراقبة الجودة في إنتاج الأعلاف9 مراقبة الجودة في إنتاج الأعلاف
9 مراقبة الجودة في إنتاج الأعلافAbdelRahman Yousef
 
1. what is_comparative_cost_theory_why_i
1. what is_comparative_cost_theory_why_i1. what is_comparative_cost_theory_why_i
1. what is_comparative_cost_theory_why_iP K
 
Online fundraising workshop Lilongwe 2016
Online fundraising workshop Lilongwe 2016Online fundraising workshop Lilongwe 2016
Online fundraising workshop Lilongwe 2016GlobalGiving
 
Cocoa, chocolate and the confectionery 2016
Cocoa, chocolate and the confectionery 2016Cocoa, chocolate and the confectionery 2016
Cocoa, chocolate and the confectionery 2016ProColombia
 
Cost Comparative Theory
Cost Comparative TheoryCost Comparative Theory
Cost Comparative TheoryMadhura Thite
 
Containes and Packaging
Containes and PackagingContaines and Packaging
Containes and PackagingProColombia
 
The NetBSD package Collection - a.k.a pkgsrc
The NetBSD package Collection - a.k.a pkgsrcThe NetBSD package Collection - a.k.a pkgsrc
The NetBSD package Collection - a.k.a pkgsrcAkio OBATA
 
Augmented Reality - Let’s Make Some Holgrams! (Developer Version)
Augmented Reality - Let’s Make Some Holgrams! (Developer Version)Augmented Reality - Let’s Make Some Holgrams! (Developer Version)
Augmented Reality - Let’s Make Some Holgrams! (Developer Version)Cameron Vetter
 
SEO in 2017 - Boston Growth Meetup (October 2016)
SEO in 2017 - Boston Growth Meetup (October 2016)SEO in 2017 - Boston Growth Meetup (October 2016)
SEO in 2017 - Boston Growth Meetup (October 2016)Kyle Risley
 

Viewers also liked (16)

Como crear dinero - Sanaya Roman Duane Packer
Como crear dinero - Sanaya Roman Duane PackerComo crear dinero - Sanaya Roman Duane Packer
Como crear dinero - Sanaya Roman Duane Packer
 
Traverse City Remodel
Traverse City RemodelTraverse City Remodel
Traverse City Remodel
 
9 مراقبة الجودة في إنتاج الأعلاف
9 مراقبة الجودة في إنتاج الأعلاف9 مراقبة الجودة في إنتاج الأعلاف
9 مراقبة الجودة في إنتاج الأعلاف
 
PWM
PWMPWM
PWM
 
1. what is_comparative_cost_theory_why_i
1. what is_comparative_cost_theory_why_i1. what is_comparative_cost_theory_why_i
1. what is_comparative_cost_theory_why_i
 
Online fundraising workshop Lilongwe 2016
Online fundraising workshop Lilongwe 2016Online fundraising workshop Lilongwe 2016
Online fundraising workshop Lilongwe 2016
 
Conducen todos los caminos
Conducen todos los caminosConducen todos los caminos
Conducen todos los caminos
 
Como yo os
Como yo osComo yo os
Como yo os
 
Cocoa, chocolate and the confectionery 2016
Cocoa, chocolate and the confectionery 2016Cocoa, chocolate and the confectionery 2016
Cocoa, chocolate and the confectionery 2016
 
Cost Comparative Theory
Cost Comparative TheoryCost Comparative Theory
Cost Comparative Theory
 
Containes and Packaging
Containes and PackagingContaines and Packaging
Containes and Packaging
 
The NetBSD package Collection - a.k.a pkgsrc
The NetBSD package Collection - a.k.a pkgsrcThe NetBSD package Collection - a.k.a pkgsrc
The NetBSD package Collection - a.k.a pkgsrc
 
LA POR
LA PORLA POR
LA POR
 
Augmented Reality - Let’s Make Some Holgrams! (Developer Version)
Augmented Reality - Let’s Make Some Holgrams! (Developer Version)Augmented Reality - Let’s Make Some Holgrams! (Developer Version)
Augmented Reality - Let’s Make Some Holgrams! (Developer Version)
 
Sistema kinslow
Sistema kinslowSistema kinslow
Sistema kinslow
 
SEO in 2017 - Boston Growth Meetup (October 2016)
SEO in 2017 - Boston Growth Meetup (October 2016)SEO in 2017 - Boston Growth Meetup (October 2016)
SEO in 2017 - Boston Growth Meetup (October 2016)
 

Similar to BIG DATA AND HADOOP ECOSYSTEM

Opportunities in Data Science.ppt
Opportunities in Data Science.pptOpportunities in Data Science.ppt
Opportunities in Data Science.pptSwapnilTelrandhe1
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 finalAmjid Ali
 
The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)Ben Siscovick
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Crowds, Computers, and Coordinates
Crowds, Computers, and CoordinatesCrowds, Computers, and Coordinates
Crowds, Computers, and CoordinatesAbe Usher
 
BigData : Connected car
BigData : Connected carBigData : Connected car
BigData : Connected carSuresh Mandava
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and InternetSanoj Kumar
 
Connected Farms ...and the Digital Catapult
Connected Farms ...and the Digital CatapultConnected Farms ...and the Digital Catapult
Connected Farms ...and the Digital CatapultChandan Rajah
 
Big Data for One Big Family
Big Data for One Big FamilyBig Data for One Big Family
Big Data for One Big FamilyMatt Asay
 
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014StampedeCon
 
The full service mechanic for your big data project
The full service mechanic for your big data projectThe full service mechanic for your big data project
The full service mechanic for your big data projectNeos IT Services GmbH
 
Data, AI, and Tokens: Ocean Protocol
Data, AI, and Tokens: Ocean ProtocolData, AI, and Tokens: Ocean Protocol
Data, AI, and Tokens: Ocean ProtocolTrent McConaghy
 
How can a $20 toaster affect a $200M ship?
How can a $20 toaster affect a $200M ship?How can a $20 toaster affect a $200M ship?
How can a $20 toaster affect a $200M ship?Markus Sandelin
 

Similar to BIG DATA AND HADOOP ECOSYSTEM (20)

Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Opportunities in Data Science.ppt
Opportunities in Data Science.pptOpportunities in Data Science.ppt
Opportunities in Data Science.ppt
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 
The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Big Data
Big DataBig Data
Big Data
 
Crowds, Computers, and Coordinates
Crowds, Computers, and CoordinatesCrowds, Computers, and Coordinates
Crowds, Computers, and Coordinates
 
BigData : Connected car
BigData : Connected carBigData : Connected car
BigData : Connected car
 
Big Data – Are You Ready?
Big Data – Are You Ready?Big Data – Are You Ready?
Big Data – Are You Ready?
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Connected Farms ...and the Digital Catapult
Connected Farms ...and the Digital CatapultConnected Farms ...and the Digital Catapult
Connected Farms ...and the Digital Catapult
 
Big Data for One Big Family
Big Data for One Big FamilyBig Data for One Big Family
Big Data for One Big Family
 
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
 
The full service mechanic for your big data project
The full service mechanic for your big data projectThe full service mechanic for your big data project
The full service mechanic for your big data project
 
Data, AI, and Tokens: Ocean Protocol
Data, AI, and Tokens: Ocean ProtocolData, AI, and Tokens: Ocean Protocol
Data, AI, and Tokens: Ocean Protocol
 
How can a $20 toaster affect a $200M ship?
How can a $20 toaster affect a $200M ship?How can a $20 toaster affect a $200M ship?
How can a $20 toaster affect a $200M ship?
 

BIG DATA AND HADOOP ECOSYSTEM

  • 1. Building A Smarter Planet “ INTRODUCTION TO BIG DATA AND HADOOP“ “ Avishek ghosh“ Presented By: ACADEMY OF TECHNOLOGY,ADISAPTAGRAM
  • 2. BIG DATA AND HADOOP ECOSYSTEM
  • 3. What is Big Data? Big data-A growing torrent of data 1
  • 4. $600 to buy a disk drive that can store all of the world music 2
  • 5. What Launch Big Data Era? Data Torrent Computing Anytime, Anywhere Big Data Era3
  • 6. Where Does Big Data Comes From? Machines People Organizations 3 major sources of Big Data 4
  • 7. Machine Generated Data-It’s Everywhere and there’s a Lot! Big Plane -> Big Data? 5
  • 8. More data = More safe  Sensors Temperature Pressure Malfunctions  Real time problem Detection 6
  • 9. Big Data Generated By People-The Unstructured Challenge Text Heavy Unstructured > 7
  • 10. Company Data Processed Daily eBay 100Petabytes(PB) Google 100PB Facebook 30+PB Twitter 100TeraBytes(TB) Spotify 64 TB  The Unstructured Data Challenge 8
  • 12.  80%-90% of entire Data is Unstructured! 10
  • 13. 11
  • 14. Tools Data Skilled People Value 12
  • 15. Organization Generated Data-Structured But Often Siloed Commercial Transactions Banking/Stock Records Credit Cards Government Open Data E-Commerce Medical Records ….. 13
  • 16. Real-World Examples 16 Million Shipments Per Day 40 Million Tracking Records UPS is estimated to have 16 PBs Of data about its operations 14
  • 17. Can You Guess How much money UPS Can Save by Reducing Each Driver’s Route by just 1 Mile? 50 Million Dollars! 15
  • 18. • How much Companies are spending on Big Data?  Benefits using Big Data Efficient Operation Higher Sales Improved Safety Customer Satisfaction Better Profit Margins Improved Product Placement
  • 19. Characteristics Of Big Data-V’s Of Big Data 17
  • 20. Getting Started-Why Hadoop? The Hadoop Ecosystem is Great for Big Data Major Goals • Enable Scalability • Optimized for a variety data types • Facilitate Shared Environment • Provide Value • Handle Fault Tolerance 18
  • 21. The Hadoop Ecosystem Main Hadoop Components MapReduce YARN HDFS 19
  • 22. HDFS = foundation for Hadoop Ecosystem What is HDFS? Up to 200 petabytes, 1 billion files and blocks! 20
  • 23.
  • 24. 22
  • 25.
  • 27. SOURCES:- • University Of California , San Diego(Super Computer) • http://www.cloudera.com/ • http://www.ibm.com/big-data/us/en/ 25
  • 28. ACKNOWLEDGEMENTS:- I would like to thank Prof. Prasenjit Das for her cordial support and encouragement which was one of the key resources behind this presentation. And also thanks to all faculty of CSE for your support too. 26