SlideShare a Scribd company logo
1 of 56
Vademecum Big Data
Adam Kawa, Spotify, Compendium CE
About Me
Spotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
And The 20-Minute Story About ...




Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
A Really Data-Driven Company …




Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
And Some Inevitable Problems ...




Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
And Some Inevitable Problems ...




Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
And Some Inevitable Problems ...




Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
Start!
The First Approach Works Fine ...
Until Data Gets Bigger ...
And More Diverse ...
The Data Monster Becomes A Problem




Image source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
Apache Hadoop Becomes A Solution




Image source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
Orchestra Of Nodes




Image source: http://www.dsn.jhu.edu/images/orchestra.gif
Fault-Tolerant Orchestra Of Nodes
Untypical Orchestra Of Typical* Nodes
* however having very cheap nodes is false economy
Highly Scalable Orchestra Of Nodes
Hadoop Distributed File System (HDFS)




Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
HDFS Blocks And Replication
HDFS Self-Healing Features




Image source: http://www.mwctoys.com/images/review_hydra_3.jpg
HDFS Scales And Shines With MapReduce




Image source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
MapReduce Is A Change


                                            DATA
                                             Map And Reduce


Image source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
Map And Reduce Functions
MapReduce Paradigm
Artist Count Example
Sending Computation To Data


                                                                                                     Data
                                                                                                     Is
                                                                                                     Here!


Computation


Image source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
MapReduce Implementation




Image source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
First Success: 5-Node Hadoop Cluster




Image source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
Apache Whirr And The Cloud
===== hadoop.properties =============
whirr.cluster-name=production_cluster
whirr.instance-templates=
1 hadoop-jobtracker+hadoop-namenode,
4 hadoop-datanode+hadoop-tasktracker
whirr.provider=aws-ec2 # or Rackspace cloudservers-us
...
=====================================

$ whirr launch-cluster --config hadoop.properties
$ whirr destroy-cluster --config hadoop.properties
First Sad (Non-Java Speaking) Developers




Image source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
Hadoop Streaming For Scripting Languages




Image source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
Apache Hive Makes You Feel Younger




Image source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
Speak ~SQL, But Run As MapReduce
HUE - Browser-Based Environment




Image source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
Hive Is Based On & Limited By Hadoop
Apache Pig Makes Them Happier!


                        




Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
Pig Accelerates Development


        
Need To Add More Relational Data To HDFS




Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
SQL To Hadoop = Sqoop




Image source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
Sqoop Import/Export Data Using MR




Image source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
Apache Oozie For Defining Workflows




Image source: Apache Oozie website
Apache Oozie For Scheduling




Image source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
Need To Add Even More Logs To HDFS




Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
Apache Flume For Data Collection
                                     e.g. JDBC, Memory, File




Image source: Apache Flume website
How To Manager A Larger Cluster
Apache Avro + Snappy/Deflate_6




Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
When Latency Is To High




Image source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
Cloudera Impala – Real-Time ~SQL Queries




Image source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
Apache HBase - Random, Real-Time
Access To Big Data




Image source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
YARN – Hadoop Cluster More Robust




Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
Hadoop Is Successfully Deployed




Image source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
Learn More About Apache Hadoop?
Use Hadoop To Solve Real-World Problems?
Oozie And YARN At WHUG, Today @18:00
Thank You! Any Questions About Them?




Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg
Apache Hadoop Ecosystem (based on an exemplary data-driven…

More Related Content

What's hot

Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
Mitsuharu Hamba
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Holden Karau
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 

What's hot (19)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
 
Scaling hadoopapplications
Scaling hadoopapplicationsScaling hadoopapplications
Scaling hadoopapplications
 
Apache pig
Apache pigApache pig
Apache pig
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 

Viewers also liked

Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm
Adam Kawa
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Adam Kawa
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacji
Adam Kawa
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Adam Kawa
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
Adam Kawa
 

Viewers also liked (13)

Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacji
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 

Similar to Apache Hadoop Ecosystem (based on an exemplary data-driven…

Logan composition (2)
Logan composition (2)Logan composition (2)
Logan composition (2)
loganm
 
[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design
Christopher Schmitt
 

Similar to Apache Hadoop Ecosystem (based on an exemplary data-driven… (20)

Back to the [Completable] Future
Back to the [Completable] FutureBack to the [Completable] Future
Back to the [Completable] Future
 
Empowering DevOps with Cloud Foundry
Empowering DevOps with Cloud FoundryEmpowering DevOps with Cloud Foundry
Empowering DevOps with Cloud Foundry
 
Testing Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure TestingTesting Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure Testing
 
The Last Mile
The Last MileThe Last Mile
The Last Mile
 
Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011
 
HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015
 
Design+Performance Velocity 2015
Design+Performance Velocity 2015Design+Performance Velocity 2015
Design+Performance Velocity 2015
 
Logan composition (2)
Logan composition (2)Logan composition (2)
Logan composition (2)
 
Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017
 
10 Laravel packages everyone should know
10 Laravel packages everyone should know10 Laravel packages everyone should know
10 Laravel packages everyone should know
 
Velocity Report 2009
Velocity Report 2009Velocity Report 2009
Velocity Report 2009
 
[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design
 
Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous Delivery
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous Delivery
 
Vpn presentation richard kong
Vpn presentation   richard kongVpn presentation   richard kong
Vpn presentation richard kong
 
GDG Varna - Hadoop
GDG Varna - HadoopGDG Varna - Hadoop
GDG Varna - Hadoop
 
High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOps
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOps
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 

Apache Hadoop Ecosystem (based on an exemplary data-driven…

  • 1. Vademecum Big Data Adam Kawa, Spotify, Compendium CE
  • 2. About Me Spotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
  • 3. And The 20-Minute Story About ... Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
  • 4. A Really Data-Driven Company … Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
  • 5. And Some Inevitable Problems ... Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
  • 6. And Some Inevitable Problems ... Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
  • 7. And Some Inevitable Problems ... Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
  • 9. The First Approach Works Fine ...
  • 10. Until Data Gets Bigger ...
  • 12. The Data Monster Becomes A Problem Image source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
  • 13. Apache Hadoop Becomes A Solution Image source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
  • 14. Orchestra Of Nodes Image source: http://www.dsn.jhu.edu/images/orchestra.gif
  • 16. Untypical Orchestra Of Typical* Nodes * however having very cheap nodes is false economy
  • 18. Hadoop Distributed File System (HDFS) Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
  • 19. HDFS Blocks And Replication
  • 20. HDFS Self-Healing Features Image source: http://www.mwctoys.com/images/review_hydra_3.jpg
  • 21. HDFS Scales And Shines With MapReduce Image source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
  • 22. MapReduce Is A Change DATA Map And Reduce Image source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
  • 23. Map And Reduce Functions
  • 26. Sending Computation To Data Data Is Here! Computation Image source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
  • 27. MapReduce Implementation Image source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
  • 28. First Success: 5-Node Hadoop Cluster Image source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
  • 29. Apache Whirr And The Cloud ===== hadoop.properties ============= whirr.cluster-name=production_cluster whirr.instance-templates= 1 hadoop-jobtracker+hadoop-namenode, 4 hadoop-datanode+hadoop-tasktracker whirr.provider=aws-ec2 # or Rackspace cloudservers-us ... ===================================== $ whirr launch-cluster --config hadoop.properties $ whirr destroy-cluster --config hadoop.properties
  • 30. First Sad (Non-Java Speaking) Developers Image source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
  • 31. Hadoop Streaming For Scripting Languages Image source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
  • 32. Apache Hive Makes You Feel Younger Image source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
  • 33. Speak ~SQL, But Run As MapReduce
  • 34. HUE - Browser-Based Environment Image source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
  • 35. Hive Is Based On & Limited By Hadoop
  • 36. Apache Pig Makes Them Happier!   Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
  • 38. Need To Add More Relational Data To HDFS Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  • 39. SQL To Hadoop = Sqoop Image source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
  • 40. Sqoop Import/Export Data Using MR Image source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
  • 41. Apache Oozie For Defining Workflows Image source: Apache Oozie website
  • 42. Apache Oozie For Scheduling Image source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
  • 43. Need To Add Even More Logs To HDFS Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  • 44. Apache Flume For Data Collection e.g. JDBC, Memory, File Image source: Apache Flume website
  • 45. How To Manager A Larger Cluster
  • 46. Apache Avro + Snappy/Deflate_6 Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
  • 47. When Latency Is To High Image source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
  • 48. Cloudera Impala – Real-Time ~SQL Queries Image source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
  • 49. Apache HBase - Random, Real-Time Access To Big Data Image source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
  • 50. YARN – Hadoop Cluster More Robust Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
  • 51. Hadoop Is Successfully Deployed Image source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
  • 52. Learn More About Apache Hadoop?
  • 53. Use Hadoop To Solve Real-World Problems?
  • 54. Oozie And YARN At WHUG, Today @18:00
  • 55. Thank You! Any Questions About Them? Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg