the REAL face of big data
myths and facts
Data is never
clean!
2
Most of time will be spending on data
cleansing and preparing data for analytics.
80% of a typical data science project is
sourcing cleaning and preparing the data,
while the remaining 20% is actual data
analysis.
“It’s an absolute myth that you can send an
algorithm over raw data and have insights
pop up.”
AWS has a great
support services.
Amazon will not be transparent about the
underlying infrastructure, but also will not give
diagrams, machine details, etc., so it is difficult
to get performance and measurement reports.
Even though some EC2 managed services are
available through third-parties, only paid
support is available for “the most common
third-party software running on AWS,”
excluding managed support services.
2
No one cares
how you did it
4
Most technical presentations in industry are 1)
far too long and 2) focus on the portions that
don’t matter (the details of the method applied)
to hide the parts that do (what does that mean
for this prospect).
However, I think a more constructive way to
phrase this is “know your audience”. So makes
awesome presentations. Sounds strange, but
usually is most important than the algorithm
itself.
Data Lakes Will
Replace The Data
Warehouse
5
It's "misleading" for vendors to position data
lakes as replacements for data warehouses.
A data lake's foundational technologies lack the
maturity and breadth of the features found in
established data warehouse technologies.
Many organizations get stuck at the pilot stage
because they don't tie the technology to
business processes or concrete use cases.
Big data will give a
B&W concrete
answer
6
The real analytics is combining, weighting and
judging multiple sources of information. The
more data you have, the more analysis you
must run.
Big data has a strong capacity to brig other
marketing questions, and other insights and
makes leaders think better.
Machine-learning
will be the key!
7
In 90% of cases generalized linear regression
will do the trick.
The most basic and commonly used predictive
analysis will do the job.
Regression estimates are used to describe data
and to explain the relationship between one
dependent variable and one or more
independent variables
Get data scientists
senior team
8
Doctorate in math, a background in computer
science, and what amounts to an MBA, not to
mention actual work experience in all of those
fields. "How old is this person, 90?
Almost impossible find this data-scientist
unicorn, it had to create a working group with a
cross-section of expertise. This is in fact what
you have to do.
95% of tasks do
not require deep
learning.
9
It’s a hands-on job.
Super-intelligent artificial automated algorithm
that will solve all problem with magic-touch
does not exists.
This role requires a lot of dirty data model ing,
coding, patience and focus.

The REAL face of Big Data

  • 1.
    the REAL faceof big data myths and facts
  • 2.
    Data is never clean! 2 Mostof time will be spending on data cleansing and preparing data for analytics. 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. “It’s an absolute myth that you can send an algorithm over raw data and have insights pop up.”
  • 3.
    AWS has agreat support services. Amazon will not be transparent about the underlying infrastructure, but also will not give diagrams, machine details, etc., so it is difficult to get performance and measurement reports. Even though some EC2 managed services are available through third-parties, only paid support is available for “the most common third-party software running on AWS,” excluding managed support services. 2
  • 4.
    No one cares howyou did it 4 Most technical presentations in industry are 1) far too long and 2) focus on the portions that don’t matter (the details of the method applied) to hide the parts that do (what does that mean for this prospect). However, I think a more constructive way to phrase this is “know your audience”. So makes awesome presentations. Sounds strange, but usually is most important than the algorithm itself.
  • 5.
    Data Lakes Will ReplaceThe Data Warehouse 5 It's "misleading" for vendors to position data lakes as replacements for data warehouses. A data lake's foundational technologies lack the maturity and breadth of the features found in established data warehouse technologies. Many organizations get stuck at the pilot stage because they don't tie the technology to business processes or concrete use cases.
  • 6.
    Big data willgive a B&W concrete answer 6 The real analytics is combining, weighting and judging multiple sources of information. The more data you have, the more analysis you must run. Big data has a strong capacity to brig other marketing questions, and other insights and makes leaders think better.
  • 7.
    Machine-learning will be thekey! 7 In 90% of cases generalized linear regression will do the trick. The most basic and commonly used predictive analysis will do the job. Regression estimates are used to describe data and to explain the relationship between one dependent variable and one or more independent variables
  • 8.
    Get data scientists seniorteam 8 Doctorate in math, a background in computer science, and what amounts to an MBA, not to mention actual work experience in all of those fields. "How old is this person, 90? Almost impossible find this data-scientist unicorn, it had to create a working group with a cross-section of expertise. This is in fact what you have to do.
  • 9.
    95% of tasksdo not require deep learning. 9 It’s a hands-on job. Super-intelligent artificial automated algorithm that will solve all problem with magic-touch does not exists. This role requires a lot of dirty data model ing, coding, patience and focus.