analyze(NoSQL,BigData);/* history, hype, opportunities */              // By: Vishy Poosala          // Head of Bell Labs,...
The dark ages of COBOL                         2
..then Codd saidlet there be tables              Rows &              Columns                        Normal        SQL     ...
www.data-for-humans.com                        SET-             WHAT                       VALUED            COLUMNS      ...
Billions of Keys & Values                        GFS                       Google                      Big Table          ...
How would you build a super-fast, FB-scale chat service, in 2012?          (for example)                                  ...
I want my own DB!           • Memcached MainMemory     • redis Distr.           • MongoDB K-VVersions   • CouchDBSocialGra...
BIG             KB       GB       TB           PBData                           Semi-            FILES   TABLES           ...
Following *AMAZING* Slides Courtesy: Gregory Piatesky-Shapiro, kdnuggets.comYou can find all the slides from his talk at:h...
Data Tsunami• In 2010 enterprises  stored 7 exabytes  =7,000,000,000 GBof new data (McKinsey)• 90 percent of the  worlds d...
Pre-historyStatistics is the biggest term in 20th century, butdata mining           and analytics          appears in late...
Recent History:Analytics, Data Mining, Knowledge DiscoveryAnalytics has been used since 1800, but started to rise in 2005D...
Google Trends:After 2006, Data Mining < Analytics                                  13
Google Insights: searches fordata mining, analytics -googleare most popular in India, US                                 14
Analytics > Data Mining > Data            Science                                 15
Data Science, Big Data                         16
Data Types Analyzed/Minedwww.KDnuggets.com/polls/2011/data-types-analyzed-mined.html   17
Largest Dataset Analyzed?                                               2011 median dataset                               ...
Which methods/algorithms did you  use for data analysis in 2011                                    % analysts who used it ...
Cloud Analytics is not common             (yet)www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html         ...
Shortage of Skills• McKinsey: shortage by 2018 in the US of  – 140-190,000 people with deep analytical skills  – 1.5 M man...
Job data: Data Scientist                           22
Jobs: Data Mining >> Data        Scientist                            23
“Ground” Analytics (LinkedIn          Skills)                 ~ 75,000 with Data Mining skill                  ~ 7,000 wit...
Analytics LinkedIn Skills  Predictive Analytics Machine Learning Text Mining                                   MapReduce  ...
Big Data Bubble?Big Data            Gartner Hype Cycle                                 26
27
Upcoming SlideShare
Loading in...5
×

NoSQL & Big Data Analytics: History, Hype, Opportunities

1,768

Published on

Looking at NoSQL and Big Data Analytics as an evolution starting from Relational Databases, and go behind the hype. You can find more on this topic in my blog at: http://innovation-edge.blogspot.com/

Thanks to Gregory Piatetsky-Shapiro for the 2nd half of the slides.

Published in: Technology, Education
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,768
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
87
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

NoSQL & Big Data Analytics: History, Hype, Opportunities

  1. 1. analyze(NoSQL,BigData);/* history, hype, opportunities */ // By: Vishy Poosala // Head of Bell Labs, India // poosala@alcatel-lucent.com // @vishyp 1
  2. 2. The dark ages of COBOL 2
  3. 3. ..then Codd saidlet there be tables Rows & Columns Normal SQL Forms ACID 3
  4. 4. www.data-for-humans.com SET- WHAT VALUED COLUMNS ATTRIBUT ? ES Schema XML Evolution 4
  5. 5. Billions of Keys & Values GFS Google Big Table Hadoop Cassandra Dynamo 5
  6. 6. How would you build a super-fast, FB-scale chat service, in 2012? (for example) 6
  7. 7. I want my own DB! • Memcached MainMemory • redis Distr. • MongoDB K-VVersions • CouchDBSocialGraphs • Neo4j 7
  8. 8. BIG KB GB TB PBData Semi- FILES TABLES Variety Structured DynamicAnalytics OLAP STATS Apps Mahout CubeLanguage COBOL SQL XML NoSQL 60’s 80-96 96-’07 ‘07- 8
  9. 9. Following *AMAZING* Slides Courtesy: Gregory Piatesky-Shapiro, kdnuggets.comYou can find all the slides from his talk at:http://www.slideshare.net/gpiatetskyshapiro/analytics-and-data-mining-industry-overview 9
  10. 10. Data Tsunami• In 2010 enterprises stored 7 exabytes =7,000,000,000 GBof new data (McKinsey)• 90 percent of the worlds data has been Image with apologies to KDD-2011 generated in the past two years (IBM) 10
  11. 11. Pre-historyStatistics is the biggest term in 20th century, butdata mining and analytics appears in late1990sFrom Google Ngram viewer – English language booksNote: Our analysis uses only English language data.Other languages, especially Chinese , need to be considered for full picture 11
  12. 12. Recent History:Analytics, Data Mining, Knowledge DiscoveryAnalytics has been used since 1800, but started to rise in 2005Data Mining jumps around 1996 (soon after first KDD conference) but declines after2003 (TIA controversy, associated with gov. invasion of privacy).Knowledge Discovery appears in 1989, jumps in 1996, and plateaus after 2000 12
  13. 13. Google Trends:After 2006, Data Mining < Analytics 13
  14. 14. Google Insights: searches fordata mining, analytics -googleare most popular in India, US 14
  15. 15. Analytics > Data Mining > Data Science 15
  16. 16. Data Science, Big Data 16
  17. 17. Data Types Analyzed/Minedwww.KDnuggets.com/polls/2011/data-types-analyzed-mined.html 17
  18. 18. Largest Dataset Analyzed? 2011 median dataset size ~10-20 GB, vs 8-10 GB in 2010. Increase in 10 GB to 1 PB rangewww.KDnuggets.com/polls/2011/largest-dataset-analyzed-data-mined.html 18
  19. 19. Which methods/algorithms did you use for data analysis in 2011 % analysts who used it 0% 10% 20% 30% 40% 50% 60% 70% Decision Trees Regression Clustering Statistics Visualization Time series/Sequence analysis Support Vector (SVM) Association rules Ensemble methods Text Mining Neural Nets Boosting Bayesian Bagging Factor Analysis Anomaly/Deviation detection Social Network Analysis Survival Analysis Genetic algorithms Uplift modeling www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html 19
  20. 20. Cloud Analytics is not common (yet)www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html 20
  21. 21. Shortage of Skills• McKinsey: shortage by 2018 in the US of – 140-190,000 people with deep analytical skills – 1.5 M managers/analysts with the know-how to use the analysis of big data to make effective decisions. Source: www.mckinsey.com/mgi/publications/big_data / 21
  22. 22. Job data: Data Scientist 22
  23. 23. Jobs: Data Mining >> Data Scientist 23
  24. 24. “Ground” Analytics (LinkedIn Skills) ~ 75,000 with Data Mining skill ~ 7,000 with Predictive Modeling Also ~ 20,000 with Predictive Analytics (not related with Predictive Modeling ?? 24
  25. 25. Analytics LinkedIn Skills Predictive Analytics Machine Learning Text Mining MapReduce 25
  26. 26. Big Data Bubble?Big Data Gartner Hype Cycle 26
  27. 27. 27
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×