Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why we really need big data? Can't things work with small data too?

44 views

Published on

These are slide companions to my talk at YouTube, which was a recording of my keynote lecture at Data Natives 2017.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Why we really need big data? Can't things work with small data too?

  1. 1. MAKING BIG DATA COME ALIVE Why is big data fundamentally a replacement for the lack of a good algorithm? And why is this a good thing? Danko Nikolić, Prof. Dr., Senior Data Scientist 17. November 2017
  2. 2. 2 Big Data Smal l data
  3. 3. 3 Textbooks make you believe that a set of tools will cover it all. “You just needs to select the right one.”
  4. 4. 4 Rarely will an off-the-shelf model be outright optimal for a real-life problem.
  5. 5. 5 Correction: a data scientist creates a model. Misconception: a data scientist applies a model.
  6. 6. 6 Commonly used specialization tool: data wrangling + feature engineering. Feature engineering extracts from the data what is important (the signal!) and in a way that is suitable for an off-the-shelf model. Example: Equations for data wrangling Data Neural net + Specific wrangling steps -> form together a highly specialized model. Here, data wrangling plays a role similar to that of convolution in deep neural nets. Less thought may be needed to apply a neural net. This is because neural net alone provides an eclectic algorithm/architecture. + Extensive thought given to data wrangling and feature engineer.
  7. 7. 7 12/20/2017© 2015 Think Big, a Teradata Company
  8. 8. 8 Algorithm (Network Architecture ) Training Human person does the work. Machine does the work. +
  9. 9. 9 Algorithm Training
  10. 10. 10 Big Data Smal l data
  11. 11. 11
  12. 12. 12 12/20/2017© 2015 Think Big, a Teradata Company
  13. 13. 13 Relative contributions to model’s knowledge Highly specialized architecture/alg orithm “Small” data This is the ratio we prefer. Eclectic architecture Big Data This tradeoff is often successful.
  14. 14. 14 high training effort, lower performance Specialized model low training effort, often high performance Eclecticmodel FastlearnersSlowlearners Doing something wrong? Laws of physics Linear regression Deep learning Genetic algo- rithms SVM Decision tree Random forest Naïve Bayes the black triangle of fantasy The slope of optimal model application
  15. 15. 15 12/20/2017© 2015 Think Big, a Teradata Company
  16. 16. 16 12/20/2017© 2015 Think Big, a Teradata Company
  17. 17. 17 “Any two optimization algorithms are equivalent when their performance is averaged across all possible problems.” No free lunch theorem
  18. 18. 18 12/20/2017© 2015 Think Big, a Teradata Company
  19. 19. 19 High training effort Specialized model Low training effort Eclecticmodel Avoid this! A naïve ‘data scientist’ would hope to end up here. .
  20. 20. 2020 12/20/2017© 2015 Think Big, a Teradata Company

×