NoSQL & Big Data Analytics: History, Hype, Opportunities

analyze(NoSQL,BigData);
/* history, hype, opportunities */

// By: Vishy Poosala
// Head of Bell Labs, India
// poosala@alcatel-lucent.com
// @vishyp
1

The dark ages of COBOL

2

..then Codd said
let there be tables

Rows &
Columns

Normal
SQL
Forms

ACID

3

www.data-for-humans.com

SET-
WHAT
VALUED
COLUMNS
ATTRIBUT
?
ES

Schema
XML
Evolution

4

Billions of Keys & Values

GFS

Google
Big Table

Hadoop

Cassandra
Dynamo

5

How would you build a super-fast,
FB-scale chat service, in 2012?

(for example)

6

I want my own DB!
• Memcached
Main
Memory • redis

Distr.
• MongoDB
K-V

Versions • CouchDB

Social
Graphs • Neo4j

7

BIG
KB GB TB PB

Data Semi-
FILES TABLES Variety
Structured
Dynamic

Analytics OLAP
STATS Apps Mahout
Cube

Language
COBOL SQL XML NoSQL

60’s 80-96 96-’07 ‘07-

8

Following *AMAZING* Slides Courtesy: Gregory Piatesky-Shapiro, kdnuggets.com

You can find all the slides from his talk at:

http://www.slideshare.net/gpiatetskyshapiro/analytics-and-data-mining-industry-overview

9

Data Tsunami
• In 2010 enterprises
stored 7 exabytes
=7,000,000,000 GB
of new data (McKinsey)
• 90 percent of the
world's data has been
Image with apologies to KDD-2011
generated in the past
two years (IBM)
10

Pre-history

Statistics is the biggest term in 20th century, but
data mining and analytics appears in late
1990s
From Google Ngram viewer – English language books
Note: Our analysis uses only English language data.
Other languages, especially Chinese , need to be considered for full picture
11

Recent History:
Analytics, Data Mining, Knowledge Discovery

Analytics has been used since 1800, but started to rise in 2005
Data Mining jumps around 1996 (soon after first KDD conference) but declines after
2003 (TIA controversy, associated with gov. invasion of privacy).
Knowledge Discovery appears in 1989, jumps in 1996, and plateaus after 2000
12

Google Trends:
After 2006, Data Mining < Analytics

13

Google Insights: searches for
data mining, analytics -google
are most popular in India, US

14

Analytics > Data Mining > Data
Science

15

Data Science, Big Data

16

Data Types Analyzed/Mined

www.KDnuggets.com/polls/2011/data-types-analyzed-mined.html 17

Largest Dataset Analyzed?
2011 median dataset
size ~10-20 GB,
vs 8-10 GB in 2010.

Increase in
10 GB to 1 PB range

www.KDnuggets.com/polls/2011/largest-dataset-analyzed-data-mined.html
18

Which methods/algorithms did you
use for data analysis in 2011
% analysts who used it
0% 10% 20% 30% 40% 50% 60% 70%

Decision Trees
Regression
Clustering
Statistics
Visualization
Time series/Sequence analysis
Support Vector (SVM)
Association rules
Ensemble methods
Text Mining
Neural Nets
Boosting
Bayesian
Bagging
Factor Analysis
Anomaly/Deviation detection
Social Network Analysis
Survival Analysis
Genetic algorithms
Uplift modeling

www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html
19

Cloud Analytics is not common
(yet)

www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html
20

Shortage of Skills
• McKinsey: shortage by 2018 in the US of
– 140-190,000 people with deep analytical skills

– 1.5 M managers/analysts with the know-how
to use the analysis of big data to make
effective decisions.

Source:
www.mckinsey.com/mgi/publications/big_data
/ 21

Job data: Data Scientist

22

Jobs: Data Mining >> Data
Scientist

23

“Ground” Analytics (LinkedIn
Skills)
~ 75,000 with Data Mining skill

~ 7,000 with Predictive Modeling

Also
~ 20,000 with Predictive
Analytics
(not related with Predictive
Modeling ??

24

Analytics LinkedIn Skills

Predictive Analytics Machine Learning

Text
Mining MapReduce

25

Big Data Bubble?

Big Data

Gartner Hype Cycle

26

NoSQL & Big Data Analytics: History, Hype, Opportunities

More Related Content

What's hot

Viewers also liked

Similar to NoSQL & Big Data Analytics: History, Hype, Opportunities

More from Vishy Poosala

Recently uploaded

NoSQL & Big Data Analytics: History, Hype, Opportunities