Whats Big Data?According to Wikipedia (http://en.wikipedia.org/wiki/Big_data) the definition of Big Data isIn information technology, Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. ~ ~
Workload distribution across installationsPig play an important role Hive n in the Hadoop ecosystemhttp://www.cloudera.com/blog/2012/09/what-do-real-life-hadoop-workloads-look-like/
Different Big Data scenariosScenario Is Hadoop good for it? What are the alternatives?Real time processing No HStreaming, Twitter StormIterative Processing No Apache Hama, Apache Giraph, JungAdhoc Interactive No Apache Drill, OpenQuerying DremelBatch Processing Yes
How have Big Data frameworks evolved? Google Paper Apache Component There has been 4-5 years gap between Google releasing a paper andThe Google File System (October, 2003) us seeing an implementation of it. HDFS (2008 became Apache TLP)MapReduce: Simplified Data Processing MapReduce (2008 became Apache TLP)on Large Clusters (December, 2004)Bigtable: A Distributed Storage System for HBase (2010 became Apache TLP),Structured Data (November, 2006) Cassandra (2010 became Apache TLP)Large-scale graph computing at Google Hama, Giraph (2012 became Apache(June, 2009) TLP)Dremel: Interactive Analysis of Web-Scale Apache Drill (Incubated in August, 2012)Datasets (2010)Spanner: Googles Globally-Distributed ????Database (September, 2012)
What happens to the data once it is stored? If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data.Descriptive analytics Predictive and Prescriptive analytics - What happened? - Why did it happen? - When did it happen? - When will it happen again? - What was its impact? - What caused it to happen? - What can be done to avoid it?
Evolution of Big Data use cases Hadoop has evolved from Yahoo and Google which are Web 2.0 companies for their massive text processing requirements like - log processing - search index - recommendations - context based advertisingAds & E-commerce, Astronomy, Social Networks, Bioinformatics/Medical Informatics, Machine Translation, Spatial Data Processing, Information Extraction and Text Processing,Artificial Intelligence/Machine Learning/Data Mining, Search Query Analysis, Information Retrieval (Search), Spam & Malware Detection, Image and Video Processing, Networking, Simulation, Statistics, Numerical Mathematics, Sets & Graphshttp://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/
Few of the Big Data use cases World Bank kicked an initiative to improve the Sanitation and Water that would impact 1B people. Neural Networks for Breast Cancer prize by Google. Fraud Detection in financial industry. Predictive Maintenance scheduling (like aircraft engines). Walmart and Sears Holding use POS information to stock different products in the stores and also for the SCM. Customer profiling and segmentation for targetted campaigns.Follow the competetions in Kaagle for more use case.
Democratization of Education https://www.coursera.org/ http://www.udacity.com/ http://www.khanacademy.org/ http://www.youtube.com/user/nptelhrd/ https://www.edx.org/ toMachine Learning Music
Keep Looking OutThere is a lot more than Hadoop and some of them are mature and some are still evolving !!!