Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop

495 views

Published on

  • Be the first to comment

  • Be the first to like this

Hadoop

  1. 1. SOFTWARE DEVELOPMENT DONE RIGHT Netherlands | USA | India | UK | France
  2. 2. What is BigData? Generally refers to data that can not beprocessed by traditional systems efficientlymainly because of its size. Twitter/Facebook example  Facebook – 500TB data daily  Twitter – 250million tweets daily 90% of data has been generated in last 2-3years.
  3. 3. Big DataSources Sources - • Social networking sites like twitter, facebook etc. • Smart phones • Trading platforms • Machines • Log Files This data is used for different purposes like • Product Trends • Market Analysis
  4. 4. What isHadoop ? Apache Hadoop is a Framework for runningapplications on large cluster built of commodityhardware. Transparently provides applications bothreliability and data motion. Implements a computational paradigm namedMap/Reduce where application is divided insmall fragments of work. Provides a distributed file system (HDFS) Transfers code near to data. Hadoop opened the gates for processing BigData
  5. 5. HadoopsHistory Hadoop is based on work done by Google  GFS – HDFS  Google Map Reduce – Hadoop Map Reduce  BigTable – HBase
  6. 6. HadoopFeatures Partial Failure Support Data Recoverability Component Recovery Consistency Scalability
  7. 7. HadoopComponents Core Components • HDFS – Hadoop Distributed File System • Map Reduce Projects in Hadoop Ecosystem • Pig, Hive, HBase, Flume, Oozie, Sqoop etc.
  8. 8. HDFS
  9. 9. Map/Reduce
  10. 10. CaseStudy Product - Data Quality and cleansing productsolutions. Before Hadoop  Two node DB cluster  Multi-threaded java application for de- duplication  1 million records took 10 hrs. to process After Hadoop  8 GB Ram, 4 cores, 4 machines in cluster.  1 million records took 30 min to process
  11. 11. Hadoop InUse Any application which has  > 10TB data  Needs fast and cheap processing Log Analysis Recommendation Engine Feed Analysis Data Mining Statistical Analysis ETL Processing Business Intelligence
  12. 12. Cloudera  Cloudera is “The commercial Hadoop company”.  Founded by leading experts on Hadoop from Facebook, Google,Oracle and Yahoo.  Provides consulting and training services for Hadoop users.  Staff includes committers to virtually all Hadoop projects.
  13. 13. Resources  Books  Hadoop : The Definitive Guide (by Tom White)  Hbase : The Definitive Guide (by Lars George)  MapReduce Design Patterns (by Donald Miner)  Web  http://hadoop.apache.org/  http://hbase.apache.org/  http://research.google.com/archive/bigtable.html  http://research.google.com/archive/mapreduce-osdi04.pdf
  14. 14. Contact us @ Xebia IndiaWebsitewww.xebia.com Thought Leadershipwww.xebia.in http://blog.xebia.comwww.xebia.fr http://podcast.xebia.com

×