Data is the new oil! Modern analytical methods are a decisive success factor for service-oriented business models in IoT and Industry 4.0. A new white paper explains the state of the art and shows what latest methods can achieve in practice
3. 1 Introduction 4
2 A historical perspective 6
Analytics trends over the past 200 years (abbreviated) 7
Ancient times 8
Consolidation and axiomatization 8
Computer time 8
AI 10
Stochastic control 10
Financial mathematics 10
Modern times 11
Neural networks 11
Change of paradigm 11
Prediction, prediction, prediction 12
Summary 13
3 Analytics landscape 14
Algorithmic purpose 15
Regression 15
Classification 15
Anomaly detection 15
Clustering 15
Reinforcement learning 15
Dimensional reduction 15
Software landscape 16
4 The data science pipeline 17
The basic pipeline 18
Feature engineering 18
5 Data science in product business 19
Products and digital twin 20
Conclusion 20
6 Bibliography, online resources and references 21
Learning data science 22
Table of contents
5. 5
four megatrends: Cloud computing, IoT, big data and
algorithmic computing. These areas have been
becoming or are under the way of becoming an
overwhelming success in a very short time frame and
are already delivering on their promises. The latter
point best illustrated by the rise of cloud computing
over the past five years now generating billions of
dollars of revenue for Amazon (AWS), Microsoft
(Azure) and Google (GCP). Therefore understanding
the current hype cycle surrounding analytics and
taking appropriate action is a concern for every
company especially those considering themselves
technology leaders in their respective market seg-
ments and exposed to some or all of the driving
megatrends.
Since analytics in itself is a huge topic and the mega-
trends are big topics in their own right it’s easy
getting lost in the details and be overwhelmed by the
sheer volume of news and stories coming in every
day. We will try here to give a simple overview of the
field with emphasis on describing the main factors
influencing the past, current and future develop-
ments in the field in order to enable the reader to
assess the importance of the topic and dig deeper if
needed. Subsequently we elucidate some practical
issues concerning data science and its importance in
the field of PLM (Product Life Cycle Management) and
give some insight into the roadmap of CIM Database
concerning analytics.
The last few years have seen an explosion of interest
in analytic methods for exploring data. Drivers of this
development have been trends in the availability of
data (“big data”) especially regarding large volumes
of data publicly available from social networks,
breakthroughs in specific areas of algorithmic re-
search (i.e. neural networks applied to classification),
the general availability of ever increasing storage and
computing capacities for rapidly falling costs (i.e.
Moore’s law in connection with cloud computing)
and recently the rise of stunningly cheap devices able
to connect with the internet and providing steadily
increasing amounts of data about their environment
and their own status (i.e. IoT or the internet of things).
These factors contributed to generate the recent
notion of data science as an independent scientific
endeavour apart from the classical statistics curricu-
lum and more specifically the notion of data scientist
as a job title for analysts featuring the necessary skill
sets of programming, statistics and algorithmic
expertise able to navigate and exploit the new oppor-
tunities.
What sets this hype cycle surrounding the topic
analytics apart from many others we had before in
the IT industry is that it is not coming our way as an
isolated topic as compared to hype cycles like XML,
ESB, SOAP or Web 2.0 we have been witnessing
popping up and fading out over the years. Instead
our topic is driven by and at the convergence point of
Introduction
Big Data
Cloud
Computing
IoT
AlgorithmsEfficiency, Insights
Connection, Storage, Management
Storage
AnalyticsPlatform
UseCases
(Failure,Anomaly)
Data
Tools
Data
Com
pute
pow
er
Analitycs
Figure 1
Current megatrends their
mutual dependencies and
influence on analytics
7. 7
Analytics trends over the
past 200 years (abbreviated)
To put things in perspective let us briefly review how
we got here and what happened to the ancient field
of statistics to rise from a 100-year-sleep (at least
regarding the public attention it has received during
its existence, the research community wouldn’t agree
with that). Analytics which up to 1990 was considered
the realm of applied statistics is an ancient topic
going back more than 200 years. It’s birth date may
be given when Gauss applied the method of least
squares to calculate the orbit of minor planets by
fitting orbit parameters via the least squares method
to observations. The prolific Gauss who usually didn’t
bother to publish “minor” work considered it impor-
tant enough to state his priority of discovery after
Legendre published the method in 1805 independ-
ently.1 Not unsurprisingly due to its usefulness the
method is still one of the most heavily used tools to
fit data to some empirical model. In modern terms it
may be described as consisting of an empirical
model, e.g. a linear model like
where x is the input (in various contexts also called
with more fancy names like predictor, regressor,
independent variable, controlled variable etc.), y is
the output (response, regressand, outcome etc.) and
𝝴 an error term due to inaccuracies of measurement
or noise. The parameters of the model a and b are
initially unknown and our goal is to determine them
as best we can according to our measurements of the
input x and output y in many observations. The
method of least squares gives an explicit formula for
this taking the observations (xi , yi) as input and
delivering an estimate for the parameters, e.g. the
estimate
𝑦𝑦 = a + b x + ε
𝑏𝑏⏞ =
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)(𝑦𝑦𝑖𝑖 − 𝑦𝑦̅)𝑛𝑛
𝑖𝑖 =1
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)2𝑛𝑛
𝑖𝑖=1
(𝑏𝑏⏞ − 𝑐𝑐, 𝑏𝑏⏞ + 𝑐𝑐)
𝑃𝑃(𝑝𝑝|𝑘𝑘 = 65) =
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) 𝑃𝑃(𝑝𝑝)
𝑃𝑃(𝑘𝑘 = 65)
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) ~ 𝑝𝑝65
(1 − 𝑝𝑝)35
0 =
𝜕𝜕𝑉𝑉
𝜕𝜕𝑡𝑡
+
1
2
𝜎𝜎2
𝑆𝑆2
𝜕𝜕2
𝑉𝑉
𝜕𝜕𝑆𝑆2
+ 𝑟𝑟𝑟𝑟
𝜕𝜕𝑉𝑉
𝜕𝜕𝑆𝑆
− 𝑟𝑟𝑟𝑟
for b is given by:
In addition according to the statistical model for the
error term accuracy guarantees can be computed in a
probabilistic manner, e.g. the statisticians statement
may be: “The parameter b is contained in the interval
with probability 95%” (of course the
actual stated probability depends on the width of the
interval c: It is somewhat obvious that it is more likely
that the „real“ parameter b is contained in an interval
with wide error margins i.e. big c than a small inter-
val). For more than a hundred years these ingredients
should be the hallmarks of the statistical science:
■ Data generated from experiment
■ A method for producing an estimate
in explicit form
■ A (probability) model for the data
■ Accuracy guarantees justifying the use
of the method
The last step ususally called inference. Without it our
estimates from data lose much of their appeal espe-
cially in the physical sciences since we cannot be
sure if their use is sensible at all: The fancy algorithm
we have used might just have produced a number no
better than guessing.
A historical perspective
Figure 2
The modern field of data
science at the intersection
of statistics, machine
learning and computer
science.
𝑦𝑦 = a + b x + ε
𝑏𝑏⏞ =
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)(𝑦𝑦𝑖𝑖 − 𝑦𝑦̅)𝑛𝑛
𝑖𝑖 =1
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)2𝑛𝑛
𝑖𝑖=1
(𝑏𝑏⏞ − 𝑐𝑐, 𝑏𝑏⏞ + 𝑐𝑐)
𝑃𝑃(𝑝𝑝|𝑘𝑘 = 65) =
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) 𝑃𝑃(𝑝𝑝)
𝑃𝑃(𝑘𝑘 = 65)
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) ~ 𝑝𝑝65
(1 − 𝑝𝑝)35
0 =
𝜕𝜕𝑉𝑉
𝜕𝜕𝑡𝑡
+
1
2
𝜎𝜎2
𝑆𝑆2
𝜕𝜕2
𝑉𝑉
𝜕𝜕𝑆𝑆2
+ 𝑟𝑟𝑟𝑟
𝜕𝜕𝑉𝑉
𝜕𝜕𝑆𝑆
− 𝑟𝑟𝑟𝑟
𝑦𝑦 = a + b x + ε
𝑏𝑏⏞ =
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)(𝑦𝑦𝑖𝑖 − 𝑦𝑦̅)𝑛𝑛
𝑖𝑖 =1
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)2𝑛𝑛
𝑖𝑖=1
(𝑏𝑏⏞ − 𝑐𝑐, 𝑏𝑏⏞ + 𝑐𝑐)
𝑃𝑃(𝑝𝑝|𝑘𝑘 = 65) =
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) 𝑃𝑃(𝑝𝑝)
𝑃𝑃(𝑘𝑘 = 65)
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) ~ 𝑝𝑝65
(1 − 𝑝𝑝)35
0 =
𝜕𝜕𝑉𝑉
𝜕𝜕𝑡𝑡
+
1
2
𝜎𝜎2
𝑆𝑆2
𝜕𝜕2
𝑉𝑉
𝜕𝜕𝑆𝑆2
+ 𝑟𝑟𝑟𝑟
𝜕𝜕𝑉𝑉
𝜕𝜕𝑆𝑆
− 𝑟𝑟𝑟𝑟
𝑦𝑦 = a + b x + ε
𝑏𝑏⏞ =
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)(𝑦𝑦𝑖𝑖 − 𝑦𝑦̅)𝑛𝑛
𝑖𝑖 =1
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)2𝑛𝑛
𝑖𝑖=1
(𝑏𝑏⏞ − 𝑐𝑐, 𝑏𝑏⏞ + 𝑐𝑐)
𝑃𝑃(𝑝𝑝|𝑘𝑘 = 65) =
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) 𝑃𝑃(𝑝𝑝)
𝑃𝑃(𝑘𝑘 = 65)
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) ~ 𝑝𝑝65
(1 − 𝑝𝑝)35
0 =
𝜕𝜕𝑉𝑉
𝜕𝜕𝑡𝑡
+
1
2
𝜎𝜎2
𝑆𝑆2
𝜕𝜕2
𝑉𝑉
𝜕𝜕𝑆𝑆2
+ 𝑟𝑟𝑟𝑟
𝜕𝜕𝑉𝑉
𝜕𝜕𝑆𝑆
− 𝑟𝑟𝑟𝑟
8. 8 A historical perspective
Ancient times
During the development of the statistical science as
sketched above since Gauss during the first 150 years
up to the 1900s the methods were as expected ad
hoc and application specific. Gauss, Laplace and
Pascal contributed a good deal of ideas revolving
around concrete problems with the notable excep-
tion of Bayes who formulated the celebrated Bayes’
theorem2 which lay dormant for a long time only to
make a vigorous return when modern computational
power was finally available to give full power to its
many useful applications in modern statistical
analysis.
Consolidation and
axiomatization
The first half of the twentieth century saw the mathe-
matization of the discipline via the axiomatic formu-
lation by Kolmogorov which provides the solid frame-
work to this day. The great statisticians Fisher and
Pearson contributed now standard ideas and many
of the standard statistical methods in use today. The
body of this work up to 1950 is basically what can be
found in school books today: Hypothesis testing,
confidence intervals, p-values, χ2–testing etc. It
cannot be denied that the rigorous mathematization
of the science gave the discipline a slightly bureau-
cratic and boring flavour probably stifling the playful
advancement of the following period. All these
methods operated on small data sets and were
optimized for computing with pencil and paper.
Computer time
The second half of the twentieth century saw the rise
of computer programms. Whereas the goal of former
times was to avoid computations in a field where
gathering data and evaluating them was a cumber-
some task the advent of computers changed this
attitude albeit slowly. The breakthrough was proba-
bly the invention of the bootstrap method. The idea
sounds simple and intuitive looking back now, but
sounded certainly crazy then: Instead of throwing
heavy mathematics at each special case the boot-
strap method simply resamples from the existing
dataset to get at probabilistic estimates of parame-
ters.
Of course this adds a considerable computational
overhead to the procedure. But the big advantage is
that the procedure is very simple, automatic and
universal in the sense that it is immediately applica-
ble to arbitrary complicated curves instead of only
linear regression for which the classical formula
provided the analytical solution. Arguably as simple
as it is this shift to unabashed usage of modern
computer power ushered in the era of computers and
software in the statistical sciences ultimately result-
ing in software packages such as SAS or R and a more
playful and experimental approach to statistics.
Another point of view coming to its full right only with
the advent of powerful computers is bayesian statis-
tics as mentioned above. Whereas its origins lie with
Bayes’s theorem the full force of its application is
Figure 3
A timeline of computatio-
nal statistics since its
inception with a rough
measure of needed
computer power for the
methods in MIPS (million
instructions per second)
2017
Bootstrap
Bayes
Regression
Hypothesis Testing
Significance Testing
1800 1950
𝑯 𝟎
2
1990
P(A|B)=
𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐵)
ComputationalPower(MIPS)𝟎 𝟏𝟎𝟎 𝟏𝟎𝟎𝟎𝟎𝟎𝟎
Large Scale
Hypothesis Testing
(Human Genome
Project)
Monte Carlo
Bootstrap
Software R
Computational Methods in Statistics
9. 9
hard to understand from its original formulation. The
gist of the bayesian method is to regard everything as
a probability statement. A simple example is estimat-
ing the probability p of a coin coming up heads for a
coin suspected of being not fair (i.e. p ≠ 0.5, the coin
comes up more often heads than tails or vice versa).
In the classical line of thought one would conduct a
hypothesis test with the null hypothesis being the
coin is fair and accept or reject this depending on the
outcome of an extensive coin flipping experiment. In
the bayesian view the “real” value of p is itself a
random variable and if you flip the coin say a hun-
dred times and it comes up 65 times heads up you
are interested in the probability of this outcome given
the value of the observed number of heads i.e. P(p |
k=65) (read: Probability of p given number of heads is
65). Of course you don’t know p but according to
Bayes’ theorem this can be deconstructed to:
Now the first term in the numerator is given by the
binomial distribution and up to a factor not depend-
ing on p is given by
If we factor in a probability distribution for P(p)
(called the prior in bayesian statistics) we have a
functional form for the desired distribution P(p|k=65)
(the posterior in bayesian paralance). The “real” value
A historical perspective
of p can be estimated from this by e.g. the maximum
or the expectation value. The real value of the meth-
od comes up when further experiments are conduct-
ed. We can go on and use the calculated distribution
of p up to now as the new prior P(p) and calculate
further refinements on the go. Note that these tricks
are only possible because the parameter p, which is
an unknown but fixed quantity in classical statistics,
was regarded as a random variable itself. This in-
nocuous method has revolutionized the way statis-
tics is done especially in the field of parameter
estimation. Usually there are many parameters
continuously distributed leading to multidimensional
integrals and therefore bayesian statistics is practical-
ly tractable only with considerable use of computer
power. A further element necessitating a change of
view or an enlargement of the toolset was modern
science in the form of the human genome project
leading to ever larger arrays of genome sequences
(micro arrays) on which simultaneous testing of e.g.
genome expression levels was conducted. The
analysis and inference on the resulting data was
impossible without computers and led to modern
methods such as HMM (Hidden Markov Models),
MCMC (Markov Chain Monte Carlo) and more, also
impossible to conduct without modern computer
power. In summary the field of statistics evolved
slowly but steadily from a theoretically driven field to
a very successful practical endeavour heavily domi-
nated by computational tools.
Figure 4
Bootstrap example,
sample points generated
from f(x)=5x-3x^2+0.3x^3
with added gaussian
noise, fitting curves are
obtained by resampling
from the sample points
𝑦𝑦 = a + b x + ε
𝑏𝑏⏞ =
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)(𝑦𝑦𝑖𝑖 − 𝑦𝑦̅)𝑛𝑛
𝑖𝑖 =1
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)2𝑛𝑛
𝑖𝑖=1
(𝑏𝑏⏞ − 𝑐𝑐, 𝑏𝑏⏞ + 𝑐𝑐)
𝑃𝑃(𝑝𝑝|𝑘𝑘 = 65) =
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) 𝑃𝑃(𝑝𝑝)
𝑃𝑃(𝑘𝑘 = 65)
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) ~ 𝑝𝑝65
(1 − 𝑝𝑝)35
0 =
𝜕𝜕𝑉𝑉
𝜕𝜕𝑡𝑡
+
1
2
𝜎𝜎2
𝑆𝑆2
𝜕𝜕2
𝑉𝑉
𝜕𝜕𝑆𝑆2
+ 𝑟𝑟𝑟𝑟
𝜕𝜕𝑉𝑉
𝜕𝜕𝑆𝑆
− 𝑟𝑟𝑟𝑟
𝑦𝑦 = a + b x + ε
𝑏𝑏⏞ =
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)(𝑦𝑦𝑖𝑖 − 𝑦𝑦̅)𝑛𝑛
𝑖𝑖 =1
∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅)2𝑛𝑛
𝑖𝑖=1
(𝑏𝑏⏞ − 𝑐𝑐, 𝑏𝑏⏞ + 𝑐𝑐)
𝑃𝑃(𝑝𝑝|𝑘𝑘 = 65) =
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) 𝑃𝑃(𝑝𝑝)
𝑃𝑃(𝑘𝑘 = 65)
𝑃𝑃(𝑘𝑘 = 65|𝑝𝑝) ~ 𝑝𝑝65
(1 − 𝑝𝑝)35
0 =
𝜕𝜕𝑉𝑉
𝜕𝜕𝑡𝑡
+
1
2
𝜎𝜎2
𝑆𝑆2
𝜕𝜕2
𝑉𝑉
𝜕𝜕𝑆𝑆2
+ 𝑟𝑟𝑟𝑟
𝜕𝜕𝑉𝑉
𝜕𝜕𝑆𝑆
− 𝑟𝑟𝑟𝑟
10. 10 A historical perspective
only be measured with considerable noise. Their
formulation was a breakthrough which found imme-
diate application in the Apollo space program for
tracking the position of the space capsule. Nowadays
it is a ubiquitous tool for tracking positions from
spacecraft, planes, trucks or drones and steering
them in a reliable way.
Financial mathematics
The framework of stochastic SDEs was brilliantly
applied in the field of finance which underwent a
revolution in the 1970s when Fischer Black and
Myron Scholes solved the problem of pricing Europe-
an-styled options by assuming that the underlying
asset in the form of a stock is governed by Brownian
motion and postulating that there is an efficient
market free of arbitrage. The resulting PDE for the
price V(t)7 of the option as a function of time
though certainly understable only with considerable
mathematical expertise and solvable only with heavy
use of computer power was soon applied all over and
the options market underwent an explosion.
The explosion and complete revolution of the finan-
cial market and its ensuing meltdown in 2008 for
which the extension of these models and their unlim-
ited use without proper risk management is at least
partially responsible demonstrates the power (and
the danger) of (blindly) applying algorithms to practi-
cal problems.
AI
In contrast to the steady evolution of statistics AI
enjoyed a comparably roller coaster history with
many ups and downs. Artificial intelligence came into
being with the advent of the first computers and had
as a result much closer ties to computer science and
algorithmic experiments than to statistics and proba-
bility. Interestingly the idea of neural networks was
formulated3 even before the first computers were
available and a viable algorithm for training (back-
propagation) was proposed in the early days of
computer usage .4 But the history of AI is less glorious
in delivering on its promises. This was largely due to
much higher expectations which simply couldn’t be
delivered upon with the available computer power
before the millenium. Therefore the idea of neural
network computing had to await 30 years to put the
proposition into practice due to the lack of computa-
tional power. During the intervening years AI was a
science characterized by big hopes and big setbacks
leading to several (failed) hype cycles coining the
term AI winter (cf. nuclear winter)5, i.e. phases during
which funding was drastically cut back and the
reputation of the field badly damaged. But since the
millenium the field has a comeback especially in the
form of machine learning (s. below Modern Times).
Stochastic control
In the late 50s and early 60s Kalman and Bucy6
formulated a framework based on SDEs (stochastic
differential equations) to estimate accurately the
state of a dynamical system whose parameters can
Figure 5
A timeline of AI and
machine learning with its
various failures and the
computational power
needed to use the various
methods. Computer power
in red means that the
methods would have
needed this much power
but it wasn’t available at
the time.
Analytics Dr. Udo Göbel 27.07.
Figure 5: A timeline of AI and machine learning with its various failures and the computational power needed to use the
various methods. Computer power in red means that the methods would have needed this much power but it wasn’t
available at the time
Stochastic Control
In the late 50s and early 60s Kalman and Bucy6
formulated a framework based on SDEs (stochasti
differential equations) to estimate accurately the state of a dynamical system whose measureme
can only be measured with considerable noise. Their formulation was a breakthrough which foun
immediate application in the Apollo space program for tracking the position of the space capsule
Nowadays it is a ubiquitous tool for tracking positions from spacecraft, planes, trucks or drones a
steering them in a reliable way.
Financial Mathematics
The framework of stochastic SDEs was brilliantly applied in the field of finance which underwent
revolution in the 1970s when Fischer Black and Myron Scholes solved the problem of pricing
European-styled options by assuming that the underlying asset in the form of a stock is governed
Brownian motion and postulating that there is an efficient market free of arbitrage. The resulting
for the price V(t)7
of the option as a function of time
0 =
𝜕𝜕𝑉𝑉
𝜕𝜕𝑡𝑡
+
1
2
𝜎𝜎2
𝑆𝑆2
𝜕𝜕2
𝑉𝑉
𝜕𝜕𝑆𝑆2
+ 𝑟𝑟𝑟𝑟
𝜕𝜕𝑉𝑉
𝜕𝜕𝑆𝑆
− 𝑟𝑟𝑟𝑟
though certainly understable only with considerable mathematical expertise and solvable only wi
heavy use of computer power was soon applied all over and the options market underwent an
explosion.
6
https://en.wikipedia.org/wiki/Kalman_filter
7
The option price V(t) is dependent on the stock price S(t), its volatility 𝜎𝜎 and the interest rate r. The
movement of the stock is governed by a stochastic differential equation 𝑑𝑑𝑑𝑑 = 𝑟𝑟 𝑆𝑆 𝑑𝑑𝑑𝑑 + 𝜎𝜎 𝑆𝑆𝑆𝑆𝑆𝑆 where W is
socalled Wiener process modelling Brownian motion.
2017
Expert Systems
LISP based
Fail
Machine Translation
Simulate Intelligence
“…may eventuallybe able to learn, make
decisions, and translate languages“
Rosenblatt
Fail
1950 1980 1990
ComputationalPower(MIPS)𝟏𝟎𝟎𝟎 𝟏𝟎𝟎𝟎𝟎 𝟏𝟎𝟎𝟎𝟎𝟎𝟎
Machine Learning
Breakthrougs
Classification
Clustering
Anomaly Detection
Neural Networks
Great success!
AI and Computation
11. 11A historical perspective
This digression into a seemingly unconnected side-
line of statistics is quite informative as a template for
things to come in fields which are just about entering
a phase of mathematization and algorithmic innova-
tion.
Modern times
The years 2000 to 2010 brought a renewed interest in
AI, breakthroughs in neural networks and an explo-
sion in large scale computing available to the public
in the form of cheap GPUs and cloud computing.
Especially from 2010 onwards a shift to algorithmic
computing and a playful expansion of their scope in
various niche domains can be seen with image
classification and speech recognition probably the
most prominent.
Neural networks
The current hype surrounding neural networks as
probably the most visible part of the movement to
automatic prediction can be traced back to some
iconic problems where their application has brought
spectacular improvements. One of these problems is
image classification. By 2010 a large body of images
had been gathered and made freely available for
research by professors from the university of Stan-
ford. Since 2010 the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC)9 has been organized.
One of the tasks is classifying an image according to
given categories e.g. dogs, cats etc. Whereas during
the first two years “classical” algorithms like SVM
(Support Vector Machine) and bayesian methods
dominated the competition and where a mixed
success regarding the ability to classify images
correctly, in 2012 a participating team applied neural
network algorithms and dramatically lowered the
error rate i.e. the amount of misclassified pictures.
During the next years a combination of ever more
refined neural networks in connection with massive
amounts of computing power in the form of GPUs led
to spectacular improvements to the point that the
error rate is now comparable to or even better than
human error rates.10
Since 2014 more than 90% of all entries in the contest
use massive amounts of computer power in the form
of GPUs some teams being sponsored by NVIDIA.
Change of paradigm
It is to be noted that the process of generating insight
from data via neural networks or more generally
machine learning follows a somewhat different path
than that established by the classical statistical
sciences:
■ Data generated or collected “somehow” (often gen-
erated especially for the express purpose of ana-
lyzing, e.g. the corpus for analyzing language or the
pictures within image classification contests)
Figure 6
Daily average option
volume in millions of
trades since 19738
12. 12 A historical perspective
■ An algorithm to generate insight e.g. by classifica-
tion
■ Measuring the score of the algorithm as a measure
of success
Steps two and three often go in cycles to tune the
bells and whistles of the algorithm. Note the absence
of any probability model and in consequence the
lack of any accuracy guarantees here: The score
might become very good by tuning the parameters
but we don’t really know what this means regarding
the robustness of the algorithm when new data
comes in.
A second characteristic notable especially with the
use of neural networks is best explained when com-
pared to expert systems en vogue primarily in the 70s
and early 80s of the last millenium (old style AI).
While expert systems were rule based meaning that
at least in principle their mode of operation is under-
standable (although in practice due to the number of
rules in working systems they are not) neural net-
works pose a new challenge because they function as
a perfect black box. Especially problematic is this
shift when something unforeseen occurs. E.g. consid-
er an autonomous car running on a neural network
crashing into something. If this were due to a rule
based expert system one would simply follow the
path of rules through the decision tree taken by the
system and add or change a rule to ameliorate the
behavior. With a neural network the only answer is:
Add a few thousand crashes of this type and retrain
the network to avoid them.
Due to the success of machine learning in various
fields combined with their promise of delivering
immediate business value there is an increasing
willingness to trust the computer beyond any reason.
Signs of what can go wrong surfaced in the neural
network world in 201412 when Goodfellow et al.
showed that imperceptible changes to images can
fool a neural network image classifier to misclassify
images in an arbitrary way. Follow up work has
shown that this is not restricted to the niche of image
classifiers.13 These examples show that neural net-
works after a lot of training may perform better than
humans on a corpus of sample images but can go
wrong in subtle ways not yet fully understood and
perhaps even more disturbing may be fooled at will
by determined attackers.
Prediction, prediction,
prediction
The common denominator to this long and varied
history of statistics and machine learning in the
Year of Contest
ClassificationError[%]
10
2010 2011 2012 2013 2014 2015 2016
20
30
28%
26%
Switch from SVM to
Neutral Networks
15%
11%
7%
3%
4%
Figure 7
ImageNet contest, error
rates for classification11
13. 13
broadest sense and of its various developments and
methods is the goal to peek into the unnknown:
Prediction. Prediction can take on various forms
depending on the domain:
■ prediction of an unknown value to be discovered
or at least pinpointed later (old style statistics e.g.
determine the fraction of faulty products coming
from a production line)
■ prediction of what an object is (classification, e.g.
tag an image by processing with a neural network)
■ prediction of the future (e.g. option pricing, predict
the price of an asset in the near future)
It is to be expected that the various fields will cross-
fertilize each other and we will see huge expansions
of each method outside their respective fields.
Summary
As detailed above data science today is at the conver-
gence point of several subdisciplines of statistics and
artificial intelligence. At this unique point in history
we are at the convergence of several trends: For the
first time in history the needed computer power to
solve practical machine learning problems and large
scale statistical problems is commonly and cheaply
available (at least as compared to prior use of spe-
cialized super computers) in the form of cheap
specialised hardware (graphic processing units) or as
a service (cloud computing) thus enabling large scale
experimenting and modelling of big data sets for
everyone. The old ideas in machine learning have
been refined and researched to the point that they
solve rather complex and practical problems like
image processing and natural language processing in
a highly efficient and quite usable manner delivering
immediate business value.
The pendulum from academic mathematization to
free experimenting with algorithmic approaches to
data has swung back to the practical side. The emer-
gence of social networks and big shopping platforms
have provided data volumes in quantities never seen
before. The advent of the internet of things will
explode further the data volume in need of being
analyzed and offering huge business opportunities
for new digital businesses.
It is not hard to see that these developments will
accelerate from here on. We will see an ever greater
reliance on powerful computers tackling an increas-
ing breadth of problems. Especially the field of
artificial intelligence in the guise of modern machine
learning is now able to fulfill its promises for the first
time in its roller-coaster history. Thus we stand at a
historical point in time where the computational
challenges and algorithmic ideas meet for the first
time with adequate computer resources available for
everyone. Innovation and ideas in algorithmic com-
puting are no longer limited by computer power.
Perhaps the history of the financial industry since the
70s can be a lesson how an industry evolves when
modeling and mathematization of business pro-
cesses take place on a large scale. There is no holding
back from this point: Machine learning will explode in
the years to come.
Figure 8
The pictures in the left
column in both panels are
correctly classified by a
neural network (e.g. the
yellow bus as “bus”, the
wildcat as “cheetah”, the
white dog as “dog” etc.).
The pictures in the right
column in the panels are
classified uniformly as
“ostrich”, the difference
between left and right
consisting of sme pixel
values depicted in the
middle. Humans cannot
see a difference here
between left and right
columns showcasing the
phenomenon that the
neural network “thinks”
quite differently than
humans do and may be
made to think in arbitrary
ways with knowledge of
the inner workings of the
network.
A historical perspective
15. 15
Anomaly detection
In anomaly detection we are trying to detect outliers
in data not conforming to an expected pattern or
behaviour. This may be seen as a subfield of classifi-
cation but it has its own unique methods setting it
largely apart from classification tasks.
Clustering
Clustering algorithms partition data points in clusters
with similar features. This is akin to classification the
key difference being that the labels are missing and
the algorithm has to come up with the clusters by
itself. Imagine a group of people in a social network
with many features (number of posts, age, number of
friends, content of posts, gender etc.) and giving it to
a machine learning algorithm which should come up
with “similar” people without telling the computer
what you mean by “similar”.
Reinforcement learning
Mainly applied in robot control, machine movement
problems and game theoretical problems where the
computer should learn how to best behave in an
initially unknown environment by specifying certain
rewards if he achieves various goals.
Dimensional reduction
Methods often used in a preliminary step of data
preprocessing to reduce huge feature sets to a mana-
geable subset by selecting the most important ones
or constructing various forms of combined features
giving the essential characteristic without having to
use the whole feature set.
Depending on the task at hand one or more methods
from these categories have to be picked but typically
naturally occuring data is messy and a mixture of
various methods have to be employed to be success-
ful.
After assessing the history and importance of ana-
lytics let’s map out the analytics landscape by various
differentiators to give an impression of breadth and
purpose of todays analytic toolset.
Depending on method or purpose the landscape
may be broken up by classificiation of algorithms or
intended purpose of algorithms. On the other hand
as always in the software industry there are plenty of
software frameworks available for use with various
infrastructures.
Algorithmic purpose
The easiest differentiator is algorithmic purpose.
Here we can classify the algorithms by way of what
we want to achieve. The categories available are
broadly as follows:
■ Regression
■ Classification
■ Anomaly detection
■ Clustering
■ Reinforcement learning
■ Dimensional reduction
Regression
With regression techniques one attempts to estimate
the relationship between variables for the purpose of
prediction or forecasting. Examples are standard
curves in medicine for vital variables to predict health
status or examining a timeline with the purpose of
predicting a future value.
Classification
Classification aims to categorize objects with given
features. These may be pictures with various motives
which the computer has to tag with labels or custo-
mers with certain properties who are categorized
according to potential business value.
Analytics landscape
16. 16 Analytics landscape
Software landscape
As of 2017 there are already many players in the field competing for market shares.The existing software may
be roughly categorized by specifity of task it tries to solve. On one end of the spectrum there are already solu-
tions for very specific industries like retail or finance on the other end we have broad frameworks for data
science bundling the main algorithms for ease of use by a data scientist.
It is to be noted that most of basic data science relies on open source software. Especially important are here
basic statistical libraries where two languages dominate: R14 and Python.
R is a statistical language with emphasis on classical statistics but has been enriched also with libraries for
machine learning.
Python has evolved as the main language data scientists use to develop and analyze models especially for
machine learning. Noteworthy are the data science libraries pandas15 for statistical analysis and scikit-learn16
for machine learning.
All relevant libraries for python including interfaces to R are bundled in the Anaconda17 framework which is a
complete solution for doing data science.
Figure 7
Chart of available
Frameworks and Tools in
the field of machine
learning. Chart appeared
on O’Reilly article “The
current state of machine
intelligence” which is
updated yearly. This is the
third installment in their
series.
18. 18
Cleanse
Wrangle Clean
Model
Explore
Preprocess
Validate
Validate Results
Actionable Insight
Automation
Deployment
Model
The data science pipeline
The basic pipeline
Data Science consists foremost of cleaning and
analyzing often messy data.
The steps at hand are first cleaning data which
consists of getting and importing the data in the first
place, which is often laborious and has earned the
apt name wrangling in the community. After some
preliminary cleaning the second step involves mode-
ling. This comprises exploration, preprocessing and
the actual modeling and goes in cycles until a satis-
factory model has been found describing the data
well enough. After that it is essential to validate the
model for various reasons. One problem is overfitting
the data with the model, meaning that the model
describes the data we have very well but will behave
very badly on future data. Usually this quality as-
surance is done by keeping some of the data back as
test data set only to be used in the validation step.
Finally we arrive at a result from which business value
may be generated by automating a process or deplo-
ying the model on a machine using it to drive some
process.
Feature engineering
Much of the modelling phase especially in machine
learning consists of preparing the data in a form
ingestible by the usual algorithms which expect their
data in the form of numerical features.
E.g. if you want to feed a timeline of sensor data into
an anomaly detection algorithm it doesn’t make
sense to give single data points as input. Instead you
may first define time windows on the time series and
within each time window extract features like mean,
standard deviation, mininum, maximum etc. In this
way the machine may be able to figure out which of
the time windows shows an anomalous behaviour.
This sort of work has been aptly named feature
engineering and the skillful application of features to
data determine to a great extent success or failure in
the field of machine learning.18
Figure 10
The data science pipeline
20. 20
Device
Apps
User
Software-
Management
Community
Device-
Management
Storage
Connectivity Product
Management
Legacy
Digital Master
Software-
Management
Customer Operations
DesignerProvider
Digital Twin
Monitoring/Analytics
Data science in product businesses
Products and digital twin
We give an overview of what the analytics landscape consists today, what tools are available and how to go
about to actually putting them to work.
Product development and product services are an especially interesting field for future data science expan-
sion. Users, sensors and tools agglomerate tons of data and with the advent of IoT the task to collect, structure,
analyze that data, to report insights and to execute measures based on these insights will become even more
important.
The main tool to connect data in the field to the abstract product is the digital twin, which is a proxy for each
physical instance of a product in the field, together with a connection to the virtual model used in product
development.
CONTACT offers tested and widely used open source libraries for data science provided by integration with the
ANACONDA framework. Also, the data model is enhanced by all necessary tools notably the digital twin to
function as coordinating hub for product data science.
For data connection cloud connectors are available and allow seamless integration with field devices directly
or for big data scenarios via cloud data paths through cloud providers like AWS or Azure.
Conclusion
We have given some rough sketches of what the analytics landscape consists today, what tools are available
and how to go about actually putting to work the available tools. Further suggestions can be found in the
bibliography and the list of online resources.
Figure 11
CONTACT stack for
complete data science
and IoT architecture
22. 22 Bibliography, online resources
ne learning with its various failures and the computational power needed to use the
n red means that the methods would have needed this much power but it wasn’t
Kalman and Bucy6
formulated a framework based on SDEs (stochastic
imate accurately the state of a dynamical system whose measurement
onsiderable noise. Their formulation was a breakthrough which found
Apollo space program for tracking the position of the space capsule.
ool for tracking positions from spacecraft, planes, trucks or drones and
ay.
SDEs was brilliantly applied in the field of finance which underwent a
n Fischer Black and Myron Scholes solved the problem of pricing
assuming that the underlying asset in the form of a stock is governed by
ating that there is an efficient market free of arbitrage. The resulting PDE
on as a function of time
0 =
𝜕𝜕𝑉𝑉
𝜕𝜕𝑡𝑡
+
1
2
𝜎𝜎2
𝑆𝑆2
𝜕𝜕2
𝑉𝑉
𝜕𝜕𝑆𝑆2
+ 𝑟𝑟𝑟𝑟
𝜕𝜕𝑉𝑉
𝜕𝜕𝑆𝑆
− 𝑟𝑟𝑟𝑟
only with considerable mathematical expertise and solvable only with
r was soon applied all over and the options market underwent an
Kalman_filter
dent on the stock price S(t), its volatility 𝜎𝜎 and the interest rate r. The
ned by a stochastic differential equation 𝑑𝑑𝑑𝑑 = 𝑟𝑟 𝑆𝑆 𝑑𝑑𝑑𝑑 + 𝜎𝜎 𝑆𝑆𝑆𝑆𝑆𝑆 where W is a
ing Brownian motion.
chine learning with its various failures and the computational power needed to use the
er in red means that the methods would have needed this much power but it wasn’t
0s Kalman and Bucy6
formulated a framework based on SDEs (stochastic
estimate accurately the state of a dynamical system whose measurement
h considerable noise. Their formulation was a breakthrough which found
he Apollo space program for tracking the position of the space capsule.
s tool for tracking positions from spacecraft, planes, trucks or drones and
way.
tic SDEs was brilliantly applied in the field of finance which underwent a
hen Fischer Black and Myron Scholes solved the problem of pricing
y assuming that the underlying asset in the form of a stock is governed by
ulating that there is an efficient market free of arbitrage. The resulting PDE
ption as a function of time
0 =
𝜕𝜕𝑉𝑉
𝜕𝜕𝑡𝑡
+
1
2
𝜎𝜎2
𝑆𝑆2
𝜕𝜕2
𝑉𝑉
𝜕𝜕𝑆𝑆2
+ 𝑟𝑟𝑟𝑟
𝜕𝜕𝑉𝑉
𝜕𝜕𝑆𝑆
− 𝑟𝑟𝑟𝑟
ble only with considerable mathematical expertise and solvable only with
wer was soon applied all over and the options market underwent an
ki/Kalman_filter
endent on the stock price S(t), its volatility 𝜎𝜎 and the interest rate r. The
erned by a stochastic differential equation 𝑑𝑑𝑑𝑑 = 𝑟𝑟 𝑆𝑆 𝑑𝑑𝑑𝑑 + 𝜎𝜎 𝑆𝑆𝑆𝑆𝑆𝑆 where W is a
elling Brownian motion.
Some suggestions for further reading and hints for
digging deeper into the available literature. Of course
the opinions expressed here reflect the biases of the
author:
Efron, B., Hastie, T. (2016). Computer Age
Statistical Inference. Cambridge: Cambridge
University Press.
A scholarly but enjoyable text with an historical
account of statistics from which the author took most
historical references. Knowledge of statistical me-
thods is assumed if you want to follow the text, but
the historical account may be read without paying
too much attention to the math.
Evans, L. C. (2013). An Introduction to Stocha-
stic Differential Equations. American Mathe-
matical Society.
A little known gem introducing
stochastic differential equations in under 150 pages.
If you want to learn something about SDEs and have
a little training in probability theory this is a pleasure
to read. (If you are new to the entire field: SDEs are
used in financial math, it is not needed for neural
networks).
Goodfellow, I. (2016). Deep Learning. Mit Press.
A fine book mainly mapping out the theory of neural
networks. This book is quite self-contained requiring
not many prerequisites and gives you a primer in
linear algebra and probability theory before diving
into machine learning. Combined with a practical
book like Raschka you’re quickly up to speed.
Held, L. (2008). Methoden der statistischen
Inferenz: Likelihood und Bayes. Heidelberg:
Spektrum Akademischer Verlag.
A good overview of classical statistics with a modern
concise presentation of bayesian and likelihood
methods. The author summarizes the main points
without getting lost in mathematical detail. An
update in english under the title “Applied statistical
inference” (2013) is also available from the same
author.
James, G., Witten, D., Hastie, T., Tibshirani,
R. (2017). An Introduction to Statistical Lear-
ning. New York: Springer.
A standard reference for the field of machine lear-
ning. Appropriate and enjoyable for beginners. An
accompanying online course at Stanford University
can be attended (for free). Strongly recommended.
Raschka, S. (2016). Python Machine Learning.
Birmingham: Pack Publishing Ltd.
A practical book especially for python coders and the
anaconda framework giving you a hands on learning
experience. Combined with Goodfellow this may be a
good start into the field of machine learning.
Campolieti, G., Makarov, R. N. (2014). Finan-
cial Mathematics: A Comprehensive Treatment.
Boca Raton: Chapman and Hall/Crc Financial
Mathematics Series.
A truly comprehensive reference to modern financial
mathematics if you are really interested. The advan-
ced chapters on SDEs and option valuation are
probably not understandable without a university
degree in math, but it is one of the best references in
the field.
Learning data science
There are a number of very good free online resour-
ces available:
■ edx
a platform for MOOC (massive open online cour-
ses), search for data science courses
■ Newsletters at o’reilly
There is a newsletter for data science and ai one
can subscribe to at http://www.oreilly.com/data/
newsletter.html and http://www.oreilly.com/ai/
newsletter.html which feature links to interesting
resources and stories
For beginning data science especially learning R and
Python frameworks one may consult DataCamp
which specializes in data science courses. Subscrip-
tion fees are low and some courses are free.
Software resources are freely available mainly
for Python and R
The Anaconda framework from Continuum Analytics
is a bundle which already contains the main machine
learning libraries like scikit-learn, numpy and pandas
R is freely available with many tutorials available
online.
23. 23
15 http://pandas.pydata.org/
16 http://scikit-learn.org/stable/
17 https://www.continuum.io/
18 s. for example: Mastering Feature Engineering:
Principles and Techniques for Data Scientists, Alice
Zhend, O’Reilly, 2017
1 https://en.wikipedia.org/wiki/Least_squares
2 https://en.wikipedia.org/wiki/Bayes%27_theorem
3 McCulloch, Warren; Walter Pitts (1943). „A Logical
Calculus of Ideas Immanent in Nervous Activity“.
Bulletin of Mathematical Biophysics. 5 (4): 115–133
4 https://en.wikipedia.org/wiki/Artificial_neural_net-
work
5 https://en.wikipedia.org/wiki/AI_winter
6 https://en.wikipedia.org/wiki/Kalman_filter
7 The option price V(t) is dependent on the stock
price S(t), its volatility and the interest rate r. The
movement of the stock is governed by a stochastic
differential equation where W is a socalled Wiener
process modelling Brownian motion.
8 Source OCC historical volumes of trade, https://
www.theocc.com/webapps/historical-volume-query
9 http://www.image-net.org/about-stats
10 for an interesting overview of the challenge s.
https://arxiv.org/abs/1409.0575v3
11 Data taken from project site of ImageNet (http://
image-net.org), s. official results published for each
year. The displayed error rates are featured in table
“classification + localization ordered by classification”
and have been rounded to nearest integer in percent.
The labeling of the algorithmic switch from SVM to
NN is an oversimplification by the author: Actual
algorithms used in the contest are complicated mixes
of various methods, s. project site for details.
12 Explaining and harnessing adverserial examples,
Ian J. Goodfellow, Jonathon Shlens Christian
Szegedy, http://arxiv.org/abs/1312.6199
13 Adversarial Perturbations Against Deep Neural
Networks for Malware Classification, Kathrin Grosse
et al., https://arxiv.org/abs/1606.04435
14 s. R project site https://www.r-project.org/ for
more information and downloads
References