3. What's Big Data?
According to Wikipedia (http://en.wikipedia.org/wiki/Big_data) the definition of Big Data is
In information technology, Big Data is a collection of data sets so large and complex that
it becomes difficult to process using on-hand database management tools.
~
~
5. Workload distribution across
installationsPig play an important role
Hive n
in the Hadoop ecosystem
http://www.cloudera.com/blog/2012/09/what-do-real-life-hadoop-workloads-look-like/
6. Different Big Data scenarios
Scenario Is Hadoop good for it? What are the
alternatives?
Real time processing No HStreaming, Twitter Storm
Iterative Processing No Apache Hama, Apache
Giraph, Jung
Adhoc Interactive No Apache Drill, Open
Querying Dremel
Batch Processing Yes
7. How have Big Data frameworks
evolved?
Google Paper Apache Component
There has been 4-5 years gap between
Google releasing a paper and
The Google File System (October, 2003) us seeing an implementation of it.
HDFS (2008 became Apache TLP)
MapReduce: Simplified Data Processing MapReduce (2008 became Apache TLP)
on Large Clusters (December, 2004)
Bigtable: A Distributed Storage System for HBase (2010 became Apache TLP),
Structured Data (November, 2006) Cassandra (2010 became Apache TLP)
Large-scale graph computing at Google Hama, Giraph (2012 became Apache
(June, 2009) TLP)
Dremel: Interactive Analysis of Web-Scale Apache Drill (Incubated in August, 2012)
Datasets (2010)
Spanner: Google's Globally-Distributed ????
Database (September, 2012)
8. What happens to the data once it
is stored?
If you aren’t taking advantage of big data,
then you don’t have big data,
you have just a pile of data.
Descriptive analytics Predictive and Prescriptive analytics
- What happened? - Why did it happen?
- When did it happen? - When will it happen again?
- What was it's impact? - What caused it to happen?
- What can be done to avoid it?
9. Evolution of Big Data use cases
Hadoop has evolved from Yahoo and Google
which are Web 2.0 companies for their massive
text processing requirements like
- log processing
- search index
- recommendations
- context based advertising
Ads & E-commerce, Astronomy, Social Networks, Bioinformatics/Medical Informatics, Machine Translation,
Spatial Data Processing, Information Extraction and Text Processing,
Artificial Intelligence/Machine Learning/Data Mining, Search Query Analysis, Information Retrieval (Search),
Spam & Malware Detection, Image and Video Processing, Networking,
Simulation, Statistics, Numerical Mathematics, Sets & Graphs
http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/
10. Few of the Big Data use cases
World Bank kicked an initiative to improve the
Sanitation and Water that would impact 1B people.
Neural Networks for Breast Cancer prize by Google.
Fraud Detection in financial industry.
Predictive Maintenance scheduling (like aircraft
engines).
Walmart and Sears Holding use POS information to
stock different products in the stores and also for the
SCM.
Customer profiling and segmentation for targetted
campaigns.
Follow the competetions in Kaagle for more use case.
11. Democratization of Education
https://www.coursera.org/
http://www.udacity.com/
http://www.khanacademy.org/
http://www.youtube.com/user/nptelhrd/
https://www.edx.org/
to
Machine Learning Music
12. Keep Looking Out
There is a lot more than Hadoop and some of them are mature
and some are still evolving !!!