Apache Hadoop Ecosystem (based on an exemplary data-driven…

  • 2,272 views
Uploaded on

Introduction to Apache Hadoop Ecosystem based on some exemplary data-driven company that wants to store and process large amounts of data.

Introduction to Apache Hadoop Ecosystem based on some exemplary data-driven company that wants to store and process large amounts of data.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,272
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Vademecum Big DataAdam Kawa, Spotify, Compendium CE
  • 2. About MeSpotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
  • 3. And The 20-Minute Story About ...Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
  • 4. A Really Data-Driven Company …Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
  • 5. And Some Inevitable Problems ...Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
  • 6. And Some Inevitable Problems ...Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
  • 7. And Some Inevitable Problems ...Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
  • 8. Start!
  • 9. The First Approach Works Fine ...
  • 10. Until Data Gets Bigger ...
  • 11. And More Diverse ...
  • 12. The Data Monster Becomes A ProblemImage source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
  • 13. Apache Hadoop Becomes A SolutionImage source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
  • 14. Orchestra Of NodesImage source: http://www.dsn.jhu.edu/images/orchestra.gif
  • 15. Fault-Tolerant Orchestra Of Nodes
  • 16. Untypical Orchestra Of Typical* Nodes* however having very cheap nodes is false economy
  • 17. Highly Scalable Orchestra Of Nodes
  • 18. Hadoop Distributed File System (HDFS)Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
  • 19. HDFS Blocks And Replication
  • 20. HDFS Self-Healing FeaturesImage source: http://www.mwctoys.com/images/review_hydra_3.jpg
  • 21. HDFS Scales And Shines With MapReduceImage source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
  • 22. MapReduce Is A Change DATA Map And ReduceImage source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
  • 23. Map And Reduce Functions
  • 24. MapReduce Paradigm
  • 25. Artist Count Example
  • 26. Sending Computation To Data Data Is Here!ComputationImage source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
  • 27. MapReduce ImplementationImage source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
  • 28. First Success: 5-Node Hadoop ClusterImage source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
  • 29. Apache Whirr And The Cloud===== hadoop.properties =============whirr.cluster-name=production_clusterwhirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,4 hadoop-datanode+hadoop-tasktrackerwhirr.provider=aws-ec2 # or Rackspace cloudservers-us...=====================================$ whirr launch-cluster --config hadoop.properties$ whirr destroy-cluster --config hadoop.properties
  • 30. First Sad (Non-Java Speaking) DevelopersImage source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
  • 31. Hadoop Streaming For Scripting LanguagesImage source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
  • 32. Apache Hive Makes You Feel YoungerImage source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
  • 33. Speak ~SQL, But Run As MapReduce
  • 34. HUE - Browser-Based EnvironmentImage source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
  • 35. Hive Is Based On & Limited By Hadoop
  • 36. Apache Pig Makes Them Happier!  Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
  • 37. Pig Accelerates Development  
  • 38. Need To Add More Relational Data To HDFSBased on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  • 39. SQL To Hadoop = SqoopImage source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
  • 40. Sqoop Import/Export Data Using MRImage source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
  • 41. Apache Oozie For Defining WorkflowsImage source: Apache Oozie website
  • 42. Apache Oozie For SchedulingImage source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
  • 43. Need To Add Even More Logs To HDFSBased on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  • 44. Apache Flume For Data Collection e.g. JDBC, Memory, FileImage source: Apache Flume website
  • 45. How To Manager A Larger Cluster
  • 46. Apache Avro + Snappy/Deflate_6Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
  • 47. When Latency Is To HighImage source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
  • 48. Cloudera Impala – Real-Time ~SQL QueriesImage source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
  • 49. Apache HBase - Random, Real-TimeAccess To Big DataImage source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
  • 50. YARN – Hadoop Cluster More RobustImage source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
  • 51. Hadoop Is Successfully DeployedImage source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
  • 52. Learn More About Apache Hadoop?
  • 53. Use Hadoop To Solve Real-World Problems?
  • 54. Oozie And YARN At WHUG, Today @18:00
  • 55. Thank You! Any Questions About Them?Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg