• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Apache Hadoop Ecosystem (based on an exemplary data-driven…
 

Apache Hadoop Ecosystem (based on an exemplary data-driven…

on

  • 2,943 views

Introduction to Apache Hadoop Ecosystem based on some exemplary data-driven company that wants to store and process large amounts of data.

Introduction to Apache Hadoop Ecosystem based on some exemplary data-driven company that wants to store and process large amounts of data.

Statistics

Views

Total Views
2,943
Views on SlideShare
2,938
Embed Views
5

Actions

Likes
5
Downloads
0
Comments
0

2 Embeds 5

http://192.168.6.56 4
http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Apache Hadoop Ecosystem (based on an exemplary data-driven… Apache Hadoop Ecosystem (based on an exemplary data-driven… Presentation Transcript

    • Vademecum Big DataAdam Kawa, Spotify, Compendium CE
    • About MeSpotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
    • And The 20-Minute Story About ...Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
    • A Really Data-Driven Company …Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
    • And Some Inevitable Problems ...Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
    • And Some Inevitable Problems ...Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
    • And Some Inevitable Problems ...Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
    • Start!
    • The First Approach Works Fine ...
    • Until Data Gets Bigger ...
    • And More Diverse ...
    • The Data Monster Becomes A ProblemImage source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
    • Apache Hadoop Becomes A SolutionImage source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
    • Orchestra Of NodesImage source: http://www.dsn.jhu.edu/images/orchestra.gif
    • Fault-Tolerant Orchestra Of Nodes
    • Untypical Orchestra Of Typical* Nodes* however having very cheap nodes is false economy
    • Highly Scalable Orchestra Of Nodes
    • Hadoop Distributed File System (HDFS)Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
    • HDFS Blocks And Replication
    • HDFS Self-Healing FeaturesImage source: http://www.mwctoys.com/images/review_hydra_3.jpg
    • HDFS Scales And Shines With MapReduceImage source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
    • MapReduce Is A Change DATA Map And ReduceImage source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
    • Map And Reduce Functions
    • MapReduce Paradigm
    • Artist Count Example
    • Sending Computation To Data Data Is Here!ComputationImage source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
    • MapReduce ImplementationImage source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
    • First Success: 5-Node Hadoop ClusterImage source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
    • Apache Whirr And The Cloud===== hadoop.properties =============whirr.cluster-name=production_clusterwhirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,4 hadoop-datanode+hadoop-tasktrackerwhirr.provider=aws-ec2 # or Rackspace cloudservers-us...=====================================$ whirr launch-cluster --config hadoop.properties$ whirr destroy-cluster --config hadoop.properties
    • First Sad (Non-Java Speaking) DevelopersImage source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
    • Hadoop Streaming For Scripting LanguagesImage source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
    • Apache Hive Makes You Feel YoungerImage source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
    • Speak ~SQL, But Run As MapReduce
    • HUE - Browser-Based EnvironmentImage source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
    • Hive Is Based On & Limited By Hadoop
    • Apache Pig Makes Them Happier!  Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
    • Pig Accelerates Development  
    • Need To Add More Relational Data To HDFSBased on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
    • SQL To Hadoop = SqoopImage source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
    • Sqoop Import/Export Data Using MRImage source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
    • Apache Oozie For Defining WorkflowsImage source: Apache Oozie website
    • Apache Oozie For SchedulingImage source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
    • Need To Add Even More Logs To HDFSBased on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
    • Apache Flume For Data Collection e.g. JDBC, Memory, FileImage source: Apache Flume website
    • How To Manager A Larger Cluster
    • Apache Avro + Snappy/Deflate_6Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
    • When Latency Is To HighImage source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
    • Cloudera Impala – Real-Time ~SQL QueriesImage source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
    • Apache HBase - Random, Real-TimeAccess To Big DataImage source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
    • YARN – Hadoop Cluster More RobustImage source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
    • Hadoop Is Successfully DeployedImage source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
    • Learn More About Apache Hadoop?
    • Use Hadoop To Solve Real-World Problems?
    • Oozie And YARN At WHUG, Today @18:00
    • Thank You! Any Questions About Them?Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg