• Save
Apache Hadoop Ecosystem (based on an exemplary data-driven…
Upcoming SlideShare
Loading in...5

Apache Hadoop Ecosystem (based on an exemplary data-driven…



Introduction to Apache Hadoop Ecosystem based on some exemplary data-driven company that wants to store and process large amounts of data.

Introduction to Apache Hadoop Ecosystem based on some exemplary data-driven company that wants to store and process large amounts of data.



Total Views
Views on SlideShare
Embed Views



2 Embeds 5 4
http://localhost 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Apache Hadoop Ecosystem (based on an exemplary data-driven… Apache Hadoop Ecosystem (based on an exemplary data-driven… Presentation Transcript

    • Vademecum Big DataAdam Kawa, Spotify, Compendium CE
    • About MeSpotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
    • And The 20-Minute Story About ...Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
    • A Really Data-Driven Company …Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
    • And Some Inevitable Problems ...Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
    • And Some Inevitable Problems ...Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
    • And Some Inevitable Problems ...Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
    • Start!
    • The First Approach Works Fine ...
    • Until Data Gets Bigger ...
    • And More Diverse ...
    • The Data Monster Becomes A ProblemImage source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
    • Apache Hadoop Becomes A SolutionImage source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
    • Orchestra Of NodesImage source: http://www.dsn.jhu.edu/images/orchestra.gif
    • Fault-Tolerant Orchestra Of Nodes
    • Untypical Orchestra Of Typical* Nodes* however having very cheap nodes is false economy
    • Highly Scalable Orchestra Of Nodes
    • Hadoop Distributed File System (HDFS)Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
    • HDFS Blocks And Replication
    • HDFS Self-Healing FeaturesImage source: http://www.mwctoys.com/images/review_hydra_3.jpg
    • HDFS Scales And Shines With MapReduceImage source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
    • MapReduce Is A Change DATA Map And ReduceImage source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
    • Map And Reduce Functions
    • MapReduce Paradigm
    • Artist Count Example
    • Sending Computation To Data Data Is Here!ComputationImage source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
    • MapReduce ImplementationImage source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
    • First Success: 5-Node Hadoop ClusterImage source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
    • Apache Whirr And The Cloud===== hadoop.properties =============whirr.cluster-name=production_clusterwhirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,4 hadoop-datanode+hadoop-tasktrackerwhirr.provider=aws-ec2 # or Rackspace cloudservers-us...=====================================$ whirr launch-cluster --config hadoop.properties$ whirr destroy-cluster --config hadoop.properties
    • First Sad (Non-Java Speaking) DevelopersImage source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
    • Hadoop Streaming For Scripting LanguagesImage source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
    • Apache Hive Makes You Feel YoungerImage source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
    • Speak ~SQL, But Run As MapReduce
    • HUE - Browser-Based EnvironmentImage source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
    • Hive Is Based On & Limited By Hadoop
    • Apache Pig Makes Them Happier!  Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
    • Pig Accelerates Development  
    • Need To Add More Relational Data To HDFSBased on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
    • SQL To Hadoop = SqoopImage source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
    • Sqoop Import/Export Data Using MRImage source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
    • Apache Oozie For Defining WorkflowsImage source: Apache Oozie website
    • Apache Oozie For SchedulingImage source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
    • Need To Add Even More Logs To HDFSBased on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
    • Apache Flume For Data Collection e.g. JDBC, Memory, FileImage source: Apache Flume website
    • How To Manager A Larger Cluster
    • Apache Avro + Snappy/Deflate_6Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
    • When Latency Is To HighImage source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
    • Cloudera Impala – Real-Time ~SQL QueriesImage source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
    • Apache HBase - Random, Real-TimeAccess To Big DataImage source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
    • YARN – Hadoop Cluster More RobustImage source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
    • Hadoop Is Successfully DeployedImage source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
    • Learn More About Apache Hadoop?
    • Use Hadoop To Solve Real-World Problems?
    • Oozie And YARN At WHUG, Today @18:00
    • Thank You! Any Questions About Them?Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg