Data Science: Not Just For Big Data

15,482 views
15,386 views

Published on

From the webinar presentation "Data Science: Not Just for Big Data", hosted by Kalido and presented by:

David Smith, Data Scientist at Revolution Analytics, and
Gregory Piatetsky, Editor, KDnuggets

These are the slides for David Smith's portion of the presentation.

Watch the full webinar at:
http://www.kalido.com/data-science.htm

Published in: Technology, Education

Data Science: Not Just For Big Data

  1. 1. Revolution Confidential Data Science Not just for big data! David Smith Revolution Analytics @revodavid October 16, 2013
  2. 2. Big Data: the new oil? Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0 Revolution Confidential 2
  3. 3. Big Data is just raw material Revolution Confidential  Data Distillation  Extract quantities of interest  Find complete cases  Derive missing information  Big Data Pitfalls:  Data cleanliness & accuracy  Observational bias  Do the data I have represent the population I’m interested in? 3
  4. 4. Surveys & Experiments Revolution Confidential  Even with Big Data, the data you need isn’t always in the building!  … so ask (survey)!  Survey design  Stratified sampling  … or experiment!  A/B Testing  Experimental Design 4
  5. 5. Data Exploration & Visualization Revolution Confidential  Limited by pixels  Big data = a big black blob  Extract signal from noise     Aggregations Heat maps Smoothing Small multiples 5
  6. 6. Statistical Modeling & Forecasting Revolution Confidential  You don’t always need big data  Sampling can help with observational bias  Model selection  Feature extraction  Confounding?  Interactions?  Model validation  Overfitting  Prediction  Extrapolation  Confidence http://xkcd.com/605/ 6
  7. 7. Summary Revolution Confidential  Big Data is great, but think of it as the “raw materials” for data science  After refining, “big” isn’t always so “Big”  Use statistical insight to avoid pitfalls:  Inferences: Observational bias / Sampling bias  Predictions: Confounding / Overfitting  Think about variances and means (risk!)  Some data scientists may miss these issues  Look for statistical expertise  Further reading:  ComputerWorld: 12 predictive analytics screw-ups 7

×