SOFTWARE DEVELOPMENT DONE RIGHT Netherlands | USA | India | UK | France
What is BigData? Generally refers to data that can not beprocessed by traditional systems efficientlymainly because of its size. Twitter/Facebook example Facebook – 500TB data daily Twitter – 250million tweets daily 90% of data has been generated in last 2-3years.
Big DataSources Sources - • Social networking sites like twitter, facebook etc. • Smart phones • Trading platforms • Machines • Log Files This data is used for different purposes like • Product Trends • Market Analysis
What isHadoop ? Apache Hadoop is a Framework for runningapplications on large cluster built of commodityhardware. Transparently provides applications bothreliability and data motion. Implements a computational paradigm namedMap/Reduce where application is divided insmall fragments of work. Provides a distributed file system (HDFS) Transfers code near to data. Hadoop opened the gates for processing BigData
HadoopsHistory Hadoop is based on work done by Google GFS – HDFS Google Map Reduce – Hadoop Map Reduce BigTable – HBase
HadoopFeatures Partial Failure Support Data Recoverability Component Recovery Consistency Scalability
HadoopComponents Core Components • HDFS – Hadoop Distributed File System • Map Reduce Projects in Hadoop Ecosystem • Pig, Hive, HBase, Flume, Oozie, Sqoop etc.
CaseStudy Product - Data Quality and cleansing productsolutions. Before Hadoop Two node DB cluster Multi-threaded java application for de- duplication 1 million records took 10 hrs. to process After Hadoop 8 GB Ram, 4 cores, 4 machines in cluster. 1 million records took 30 min to process
Hadoop InUse Any application which has > 10TB data Needs fast and cheap processing Log Analysis Recommendation Engine Feed Analysis Data Mining Statistical Analysis ETL Processing Business Intelligence
Cloudera Cloudera is “The commercial Hadoop company”. Founded by leading experts on Hadoop from Facebook, Google,Oracle and Yahoo. Provides consulting and training services for Hadoop users. Staff includes committers to virtually all Hadoop projects.
Resources Books Hadoop : The Definitive Guide (by Tom White) Hbase : The Definitive Guide (by Lars George) MapReduce Design Patterns (by Donald Miner) Web http://hadoop.apache.org/ http://hbase.apache.org/ http://research.google.com/archive/bigtable.html http://research.google.com/archive/mapreduce-osdi04.pdf
Contact us @ Xebia IndiaWebsitewww.xebia.com Thought Leadershipwww.xebia.in http://blog.xebia.comwww.xebia.fr http://podcast.xebia.com