Why we really need big data? Can't things work with small data too?
MAKING BIG DATA COME ALIVE
Why is big data fundamentally a replacement for the lack of a
good algorithm? And why is this a good thing?
Danko Nikolić, Prof. Dr., Senior Data Scientist
17. November 2017
Textbooks make you believe
that a set of tools will cover it
“You just needs to select the
Rarely will an off-the-shelf model be
outright optimal for a real-life
Correction: a data scientist creates a model.
Misconception: a data scientist applies a model.
Commonly used specialization tool: data wrangling
+ feature engineering.
Feature engineering extracts from the data what is important (the signal!) and in a
way that is suitable for an off-the-shelf model. Example:
Neural net + Specific wrangling steps -> form together a highly specialized
Here, data wrangling plays a role similar to that of convolution in deep
Less thought may be needed to apply a
neural net. This is because neural net
alone provides an eclectic
given to data
Relative contributions to model’s knowledge
This is the ratio
This tradeoff is
high training effort,
low training effort,
often high performance
The slope of