Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Machine-learning with
Accord.NET: Wine-quality
This example is based on the Wine Quality dataset from the
University of Ca...
Machine-Learning
• Current cloud providers (Microsoft, Amazon, Google, …)
have interest to sell computing power as API
• M...
A case for machine learning
 We have some existing sample data.
 We want to estimate a variable known in the sample data...
A sample dataset:
Wine quality
There is just a parameters of wines and a people
voted quality from 0 to 10:
https://archiv...
(Linear) Regression
Creating a linear regression over
one feature is relatively simple.
y = k x + b
The dataset has a larg...
Accord .NET cancer example
Age Smokes Had cancer
55 0 FALSE
28 0 FALSE
65 1 FALSE
46 0 TRUE
86 1 TRUE
56 1 TRUE
85 0 FALSE...
Decision trees
Instead of combining slopes, create a combination
of feature-condition-stumps.
Estimating a few (discrete) ...
Use-case: Quality for our event’s wine from Alko
https://www.alko.fi/tuotteet/455518/Frontera-Cabernet-Sauvignon-2016-hana...
Upcoming SlideShare
Loading in …5
×

Machine learning (using Accord.NET and FSharp)

677 views

Published on

Analysing wine quality, F#.
https://www.meetup.com/FSharpHelsinki/events/242919339/

Published in: Software
  • Be the first to comment

  • Be the first to like this

Machine learning (using Accord.NET and FSharp)

  1. 1. Machine-learning with Accord.NET: Wine-quality This example is based on the Wine Quality dataset from the University of California Irvine Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Wine+Quality
  2. 2. Machine-Learning • Current cloud providers (Microsoft, Amazon, Google, …) have interest to sell computing power as API • Machine-learning takes a lot of computing power They have the interest to make it the next buzz-word. They made cloud a buzz-word, they can do it again. e.g. https://azure.microsoft.com/en-in/services/machine-learning/ However, this time we won’t use any APIs, but an open source tool called Accord.NET.
  3. 3. A case for machine learning  We have some existing sample data.  We want to estimate a variable known in the sample data, but not in the real life.  We expect that the real results will follow the sample data. Randomize the sample data rows order and split it to two parts: 1) Training set • Used to find the correct model. 2) Model evaluation set • Used to verify that the model works to data outside the trained samples
  4. 4. A sample dataset: Wine quality There is just a parameters of wines and a people voted quality from 0 to 10: https://archive.ics.uci.edu/ml/machine-learning- databases/wine-quality/winequality-red.csv Can we estimate a quality of non-listed wine based on the features we know? fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality 7.4 0.7 0 1.9 0.076 11 34 0.9978 3.51 0.56 9.4 5 7.3 0.65 0 1.2 0.065 15 21 0.9946 3.39 0.47 10 7
  5. 5. (Linear) Regression Creating a linear regression over one feature is relatively simple. y = k x + b The dataset has a large amount of wines, with different alcohol levels and qualities. But the dataset has 10 other features also, so how to make a regression over combined 11 variables? Takes forever…? y = k1 x1 + k2 x2 + … + kn xn + b Original picture from: http://brandewinder.com/2016/08/06/gradient-boosting-part-1/
  6. 6. Accord .NET cancer example Age Smokes Had cancer 55 0 FALSE 28 0 FALSE 65 1 FALSE 46 0 TRUE 86 1 TRUE 56 1 TRUE 85 0 FALSE 33 0 FALSE 21 1 FALSE 42 1 TRUE Feature Odd ratio Age 1.02 Smoking 5.86 Calculation y(x0, x1) = 0.0206451183100222*x0 + 1.76788931343272*x1 + -2.45774643623285 Decide() http://fssnip.net/7Sz
  7. 7. Decision trees Instead of combining slopes, create a combination of feature-condition-stumps. Estimating a few (discrete) categories based on combination of decision nodes. What method should I choose? http://scikit-learn.org/stable/_static/ml_map.png PH > 3.5 Alcohol > 10.6 Manual example and theory: http://brandewinder.com/2016/08/06/gradient-boosting-part-1/ http://fssnip.net/7Tz Figure has just 2 stumps, but real life AI can generate huge trees.
  8. 8. Use-case: Quality for our event’s wine from Alko https://www.alko.fi/tuotteet/455518/Frontera-Cabernet-Sauvignon-2016-hanapakkaus Data from Alko analysis laboratory, wine entry L2BIBS34016: In Finnish In English Alk-% 12,01 Alcohol 12.01 Sokeri 3,5 g/l Sugar 3.5 Haihtuvat hapot 0,5 g/l Volatile acidity 0.5 Kokonaisrikki 96 mg/l Total sulfur 96 Vapaa rikki 36 mg/l Free sulfur 36 Sitruunahappo 0,045 g/l Citric acid 0.045 • This sample is from Chile and the sample data is from Italy, so our algorithm has to be able to work outside the dataset. • Parameter mismatch: 1) Convert parameters, 2) Remove parameter from learning process  Measure the error, effect to model quality We don’t have Mean Fixed acidity 8.32 g/l Chlorides 0.087 g/l Density 0.9967 g/l pH 3.31 Sulphates 0.66 g/l Extra data Known Total acids 4.62 g/l Extract 29.7 Density “medium” Cabernet Sauvignong (Alko provided the data I asked by email)

×