Apache Hadoop Ecosystem (based on an exemplary data-driven…

2,913
-1

Published on

Introduction to Apache Hadoop Ecosystem based on some exemplary data-driven company that wants to store and process large amounts of data.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,913
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Apache Hadoop Ecosystem (based on an exemplary data-driven…

  1. 1. Vademecum Big DataAdam Kawa, Spotify, Compendium CE
  2. 2. About MeSpotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
  3. 3. And The 20-Minute Story About ...Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
  4. 4. A Really Data-Driven Company …Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
  5. 5. And Some Inevitable Problems ...Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
  6. 6. And Some Inevitable Problems ...Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
  7. 7. And Some Inevitable Problems ...Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
  8. 8. Start!
  9. 9. The First Approach Works Fine ...
  10. 10. Until Data Gets Bigger ...
  11. 11. And More Diverse ...
  12. 12. The Data Monster Becomes A ProblemImage source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
  13. 13. Apache Hadoop Becomes A SolutionImage source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
  14. 14. Orchestra Of NodesImage source: http://www.dsn.jhu.edu/images/orchestra.gif
  15. 15. Fault-Tolerant Orchestra Of Nodes
  16. 16. Untypical Orchestra Of Typical* Nodes* however having very cheap nodes is false economy
  17. 17. Highly Scalable Orchestra Of Nodes
  18. 18. Hadoop Distributed File System (HDFS)Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
  19. 19. HDFS Blocks And Replication
  20. 20. HDFS Self-Healing FeaturesImage source: http://www.mwctoys.com/images/review_hydra_3.jpg
  21. 21. HDFS Scales And Shines With MapReduceImage source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
  22. 22. MapReduce Is A Change DATA Map And ReduceImage source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
  23. 23. Map And Reduce Functions
  24. 24. MapReduce Paradigm
  25. 25. Artist Count Example
  26. 26. Sending Computation To Data Data Is Here!ComputationImage source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
  27. 27. MapReduce ImplementationImage source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
  28. 28. First Success: 5-Node Hadoop ClusterImage source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
  29. 29. Apache Whirr And The Cloud===== hadoop.properties =============whirr.cluster-name=production_clusterwhirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,4 hadoop-datanode+hadoop-tasktrackerwhirr.provider=aws-ec2 # or Rackspace cloudservers-us...=====================================$ whirr launch-cluster --config hadoop.properties$ whirr destroy-cluster --config hadoop.properties
  30. 30. First Sad (Non-Java Speaking) DevelopersImage source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
  31. 31. Hadoop Streaming For Scripting LanguagesImage source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
  32. 32. Apache Hive Makes You Feel YoungerImage source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
  33. 33. Speak ~SQL, But Run As MapReduce
  34. 34. HUE - Browser-Based EnvironmentImage source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
  35. 35. Hive Is Based On & Limited By Hadoop
  36. 36. Apache Pig Makes Them Happier!  Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
  37. 37. Pig Accelerates Development  
  38. 38. Need To Add More Relational Data To HDFSBased on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  39. 39. SQL To Hadoop = SqoopImage source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
  40. 40. Sqoop Import/Export Data Using MRImage source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
  41. 41. Apache Oozie For Defining WorkflowsImage source: Apache Oozie website
  42. 42. Apache Oozie For SchedulingImage source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
  43. 43. Need To Add Even More Logs To HDFSBased on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  44. 44. Apache Flume For Data Collection e.g. JDBC, Memory, FileImage source: Apache Flume website
  45. 45. How To Manager A Larger Cluster
  46. 46. Apache Avro + Snappy/Deflate_6Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
  47. 47. When Latency Is To HighImage source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
  48. 48. Cloudera Impala – Real-Time ~SQL QueriesImage source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
  49. 49. Apache HBase - Random, Real-TimeAccess To Big DataImage source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
  50. 50. YARN – Hadoop Cluster More RobustImage source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
  51. 51. Hadoop Is Successfully DeployedImage source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
  52. 52. Learn More About Apache Hadoop?
  53. 53. Use Hadoop To Solve Real-World Problems?
  54. 54. Oozie And YARN At WHUG, Today @18:00
  55. 55. Thank You! Any Questions About Them?Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg

×