WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
Machine learning (using Accord.NET and FSharp)
1. Machine-learning with
Accord.NET: Wine-quality
This example is based on the Wine Quality dataset from the
University of California Irvine Machine Learning Repository:
https://archive.ics.uci.edu/ml/datasets/Wine+Quality
2. Machine-Learning
• Current cloud providers (Microsoft, Amazon, Google, …)
have interest to sell computing power as API
• Machine-learning takes a lot of computing power
They have the interest to make it the next buzz-word.
They made cloud a buzz-word, they can do it again.
e.g. https://azure.microsoft.com/en-in/services/machine-learning/
However, this time we won’t use any APIs, but an open source tool called Accord.NET.
3. A case for machine learning
We have some existing sample data.
We want to estimate a variable known in the sample data,
but not in the real life.
We expect that the real results will follow the sample data.
Randomize the sample data rows order and split it to two
parts:
1) Training set
• Used to find the correct model.
2) Model evaluation set
• Used to verify that the model works to data outside the trained samples
4. A sample dataset:
Wine quality
There is just a parameters of wines and a people
voted quality from 0 to 10:
https://archive.ics.uci.edu/ml/machine-learning-
databases/wine-quality/winequality-red.csv
Can we estimate a quality of non-listed wine based
on the features we know?
fixed acidity
volatile
acidity citric acid residual sugar chlorides
free sulfur
dioxide
total sulfur
dioxide density pH sulphates alcohol quality
7.4 0.7 0 1.9 0.076 11 34 0.9978 3.51 0.56 9.4 5
7.3 0.65 0 1.2 0.065 15 21 0.9946 3.39 0.47 10 7
5. (Linear) Regression
Creating a linear regression over
one feature is relatively simple.
y = k x + b
The dataset has a large amount of
wines, with different alcohol levels
and qualities.
But the dataset has 10 other
features also, so how to make a
regression over combined 11
variables? Takes forever…?
y = k1 x1 + k2 x2 + … + kn xn + b Original picture from: http://brandewinder.com/2016/08/06/gradient-boosting-part-1/
6. Accord .NET cancer example
Age Smokes Had cancer
55 0 FALSE
28 0 FALSE
65 1 FALSE
46 0 TRUE
86 1 TRUE
56 1 TRUE
85 0 FALSE
33 0 FALSE
21 1 FALSE
42 1 TRUE
Feature Odd ratio
Age 1.02
Smoking 5.86
Calculation y(x0, x1) =
0.0206451183100222*x0
+ 1.76788931343272*x1
+ -2.45774643623285
Decide()
http://fssnip.net/7Sz
7. Decision trees
Instead of combining slopes, create a combination
of feature-condition-stumps.
Estimating a few (discrete) categories based on
combination of decision nodes.
What method should I choose?
http://scikit-learn.org/stable/_static/ml_map.png
PH > 3.5
Alcohol > 10.6
Manual example and theory:
http://brandewinder.com/2016/08/06/gradient-boosting-part-1/
http://fssnip.net/7Tz
Figure has just 2 stumps, but real life AI can
generate huge trees.
8. Use-case: Quality for our event’s wine from Alko
https://www.alko.fi/tuotteet/455518/Frontera-Cabernet-Sauvignon-2016-hanapakkaus
Data from Alko analysis laboratory, wine entry L2BIBS34016:
In Finnish In English
Alk-% 12,01 Alcohol 12.01
Sokeri 3,5 g/l Sugar 3.5
Haihtuvat hapot 0,5 g/l Volatile acidity 0.5
Kokonaisrikki 96 mg/l Total sulfur 96
Vapaa rikki 36 mg/l Free sulfur 36
Sitruunahappo 0,045 g/l Citric acid 0.045
• This sample is from Chile and the sample data is from Italy, so our algorithm has to be able to
work outside the dataset.
• Parameter mismatch:
1) Convert parameters,
2) Remove parameter from learning process
Measure the error, effect to model quality
We don’t have Mean
Fixed acidity 8.32 g/l
Chlorides 0.087 g/l
Density 0.9967 g/l
pH 3.31
Sulphates 0.66 g/l
Extra data Known
Total acids 4.62 g/l
Extract 29.7
Density “medium”
Cabernet Sauvignong
(Alko provided the data I asked by email)