Successfully reported this slideshow.
Your SlideShare is downloading. ×

.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования производительности с помощью Machine Learning

.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования производительности с помощью Machine Learning

В этом докладе мы обсудим базовые алгоритмы и области применения Machine Learning (ML), затем рассмотрим практический пример построения системы классификации результатов измерения производительности, получаемых в Unity с помощью внутренней системы Performance Test Framework, для поиска регрессий производительности или нестабильных тестов. Также попробуем разобраться в критериях, по которым можно оценивать производительность алгоритмов ML и способы их отладки.

В этом докладе мы обсудим базовые алгоритмы и области применения Machine Learning (ML), затем рассмотрим практический пример построения системы классификации результатов измерения производительности, получаемых в Unity с помощью внутренней системы Performance Test Framework, для поиска регрессий производительности или нестабильных тестов. Также попробуем разобраться в критериях, по которым можно оценивать производительность алгоритмов ML и способы их отладки.

Advertisement
Advertisement

More Related Content

Advertisement

More from NETFest

Advertisement

Related Books

Free with a 30 day trial from Scribd

See all

.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования производительности с помощью Machine Learning

  1. 1. Applying Machine Learning To classify Performance Tests results By Igor Kochetov (@k04a) Kiev 2017
  2. 2. What dog are you? .NET developer since 2007 Python developer since 2015 Toolsmith for Unity Technologies Religious about good code, software design, TDD, SOLID Love to learn new stuff Fun Microsoft booth at NDC Oslo 2016
  3. 3. In this talk ❏ Applications of machine learning and most common algorithms ❏ Using machine learning to classify performance tests results in Unity implemented in .NET ❏ How to debug machine learning algorithms
  4. 4. The definition of Machine Learning (ML) Field of study that gives computers the ability to learn without being explicitly programmed - Arthur Samuel (1959) A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. - Tom Michel (1999)
  5. 5. Cat or Dog?
  6. 6. Applications of Machine Learning ❏ Handwriting recognition ❏ Natural language processing (NLP) ❏ Computer vision (self-driving cars) ❏ Self customizing programs and User activities monitoring ❏ Medical records ❏ Spam filters
  7. 7. Types of learning algorithms ➢ Supervised learning (labeled data) ○ Regression ○ Classification ○ Neural Networks ➢ Unsupervised learning (unlabeled data) ○ Clustering ○ Dimensionality reduction and PCA ○ Anomaly detection
  8. 8. What type of problem we have at hand?
  9. 9. Performance Tests - The problem we are solving In Performance Tests we have: ● Around 120 runtime tests ● Around 500 native tests ● Which run nightly on 8 platforms: iOS, Android, mac/win editor/standalone, ps4, xbox ● Also about 25 editor tests for 2 platformsTotals of 5000 tests producing historical data points (performance of measured component in ms) nightly across few major branches
  10. 10. Performance Tests - Classify into 1 of 4 categories ❏ Stable ❏ Unstable ❏ Progression ❏ Regression 200 inputs - Chronologically ordered set of samples from performance tests 4 outputs - Regression, progression, unstable, stable
  11. 11. Classifying MNIST dataset is the “Hello world” in ML
  12. 12. Introducing Neural Networks
  13. 13. Activation unit modeling a neuron
  14. 14. Logistic (sigmoid) function
  15. 15. Classification problem and Decision boundary Classify input data into one of two discrete classes (yes/no, 1/0, etc) Find the best “line” separating negative and positive examples (y = 1, y = 0)
  16. 16. To better fit data we need more complex model
  17. 17. Every node receives its input from previous layer (forward propagation)
  18. 18. There could be more layers
  19. 19. And more than one output
  20. 20. How do we build and train NN? Structure: ● Define input layer (number of input nodes) ● Define output layer (number of output nodes) ● Define hidden layer (number of nodes and layers) Training: ● Randomize the weights and apply them to the inputs (forward propagation) ● Adjust the weights guided by output error (back propagation) Objective:
  21. 21. Demo
  22. 22. How do we know we did anything good?
  23. 23. To access performance of the algorithm split training data into 3 subsets ● Training set (about 60% of your data) ● Cross validation set (20%) ● Test set (20%) Use test set to validate % of correct answers on unseen data Use cross validation (CV) set to fine tune your algorithm, plot errors as a function for both Training and CV sets
  24. 24. Learning curves or ‘do we need more data?’ Smaller sample size usually means less error on the training data but more error on ‘unseen’ data With more training data CV error should go down, but watch the gap between Jcv and Jtrain (less is better)
  25. 25. More complex models try to fit all training data but tend to perform worse on ‘real’ data
  26. 26. Plot errors as you tweak parameters As you increase d both training error and cross validation error go down as we better fit our data. But at some point CV error starts to go up again, since we overfitting our training data and failing to generalize to new unseen data
  27. 27. Is your data distributed evenly?
  28. 28. Precision, recall and FScore ● True positive (we guessed 1, it was 1) ● False positive (we guessed 1, it was 0) ● True negative (we guessed 0, it was 0) ● False negative (we guessed 0, it was 1) P = TP / (TP + FP) R = TP / (TP + FN) FScore = 2 * (P * R) / (P + R)
  29. 29. Mean normalization and feature scaling
  30. 30. Conclusions
  31. 31. In order to successfully solve machine learning problem ● Identify task at hand and figure out suitable algorithm ● Carefully select your training (and validation and testing) data ● Normalize your data ● Validate results ● Debug your model and diagnose problem instead of randomly tweaking parameters
  32. 32. References C# version developed based on AForge.NET https://github.com/IgorKochetov/Machine-Learning-PerfTests-Classifying http://www.aforgenet.com/framework/docs/ http://accord-framework.net/ Stanford University course on Machine Learning by prof. Andrew Ng https://www.coursera.org/learn/machine-learning Book by Tariq Rashid “Make Your Own Neural Network” https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork
  33. 33. How to reach me Twitter: @k04a Linkedin: Igor Kochetov
  34. 34. Q & A

Editor's Notes

  • Instead of programming some rules we feed training data (learning examples) into algorithm and access results
  • Web data (click-stream or click through data)
    Mine to understand users better
    Huge segment of silicon valley

    Self customizing programs
    Netflix
    Amazon
    iTunes genius
    Take users info
    Learn based on your behavior

    Next - types of learning tasks
  • Unsupervised - unlabeled data. Given the data find patterns and structure in the data

    Anomaly Detection (Fraud detection, Manufacturing, DataCenter monitoring)
    Anomaly detection vs. supervised learning: very small number of positive examples

    Content based recommendation and Collaborative filtering (if we have a set of features for movie rating you can learn a user's preferences, and vice versa, If you have your users preferences you can therefore determine a film's features)

    More examples: cocktail party algorithm More details on Recommender Systems: Recommender systems typically produce a list of recommendations in one of two ways – through collaborative and content-based filtering or the personality-based approach.[7] Collaborative filtering approaches build a model from a user's past behaviour (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that the user may have an interest in.[8] Content-based filtering approaches utilize a series of discrete characteristics of an item in order to recommend additional items with similar properties.[9] These approaches are often combined (see Hybrid Recommender Systems).
  • Each test run provides us with decimal value as a result: milliseconds needed to complete. So we have a historic data for every measured feature and what to know if it increases, decreases, stays the same or jumps all around.
  • Our problem could be modeled as Handwriting recognition one
  • Every image is just an array of numbers Which we feed into an algorithm (i.e. input)
    And the output is one of 10 digits

    Which brings us back to our problem:
  • Brain
    Does loads of crazy things
    Hypothesis is that the brain has a single learning algorithm
    Neuron:
    Three things to notice
    Cell body
    Number of input wires (dendrites)
    Output wire (axon)
    Simple level
    Neuron gets one or more inputs through dendrites
    Does processing
    Sends output down axon
  • a neuron is a logistic unit
    That logistic computation is just like logistic regression hypothesis calculation
    X vector is our input (X0 is a constant, known as bias)
    Ɵ vector is our parameters which may also be called the weights of a model (that’s what we want to learn)
  • This is the sigmoid function, or the logistic function

    Crosses 0.5 at the origin, then flattens out, Asymptotes at 0 and 1

    Which gives us DECISION BOUNDARY


    When using linear regression we did hθ(x) = (θT x)
    For classification hypothesis representation we do hθ(x) = g((θT x))
    Where we define g(z)
    z is a real number
    g(z) = 1/(1 + e-z)
  • It could be more than a line, actually
  • In order to achieve that we can apply higher order polynomial or use NN
  • First layer is the input layer
    Final layer is the output layer - produces value computed by a hypothesis
    Middle layer(s) are called the hidden layers
    ai(j) - activation of unit i in layer j
    Ɵ(j) - matrix of parameters controlling the function mapping from layer j to layer j + 1
    Every input/activation goes to every node in following layer
  • NN is a logistic regression at scale
    Neural networks learning its own features!!!!!
    ai(j) - activation of unit i in layer j
    Ɵ(j) - matrix of parameters controlling the function mapping from layer j to layer j + 1
    Every input/activation goes to every node in following layer


    Next - multiclass

  • Recognizing stable, unstable, regression or progression
    Build a neural network with four output units
    Output a vector of four numbers
    1 is 0/1 stable
    2 is 0/1 unstable
    3 is 0/1 regression
    4 is 0/1 progression
  • Inputs = features
    Outputs = number of classification categories

    Flip back to explain forward and back propagation
  • We will use AForge.NET library.
    We have to prepare Inputs and Outputs, choose Activation function and Network Structure (number of nodes, layers)
    And train the network until error is small enough
  • Having single value to measure performance of the algorithm is really important

    So the first step is to compare labeled inputs with algorithm outputs and calculate %% of correct results
  • Jtrain
    Error on smaller sample sizes is smaller (as less variance to accommodate)
    So as m grows error grows
    Jcv
    Error on cross validation set
    When you have a tiny training set your generalize badly
    But as training set grows your hypothesis generalize better
    So cv error will decrease as m increases
    High bias
    e.g. setting straight line to data
    Jtrain
    Training error is small at first and grows
    Training error becomes close to cross validation
    So the performance of the cross validation and training set end up being similar (but very poor)
    Jcv
    Straight line fit is similar for a few vs. a lot of data
    So it doesn't generalize any better with lots of data because the function just doesn't fit the data
    The problem with high bias is because cross validation and training error are both high
    Also implies that if a learning algorithm as high bias as we get more examples the cross validation error doesn't decrease
    So if an algorithm is already suffering from high bias, more data does not help

    High variance
    e.g. high order polynomial
    Jtrain
    When set is small, training error is small too
    As training set sizes increases, value is still small
    But slowly increases (in a near linear fashion)
    Error is still low
    Jcv
    Error remains high, even when you have a moderate number of examples
    Because the problem with high variance (overfitting) is your model doesn't generalize
    An indicative diagnostic that you have high variance is that there's a big gap between training error and cross validation error
    If a learning algorithm is suffering from high variance, more data is probably going to help
  • Applying higher order polynomial (or complex NN)
  • Precision
    How often does our algorithm cause a false alarm?
    Of all patients we predicted have cancer, what fraction of them actually have cancer
    = true positives / # predicted positive
    = true positives / (true positive + false positive)
    High precision is good (i.e. closer to 1)
    You want a big number, because you want false positive to be as close to 0 as possible
    Recall
    How sensitive is our algorithm?
    Of all patients in set that actually have cancer, what fraction did we correctly detect
    = true positives / # actual positives
    = true positive / (true positive + false negative)
    High recall is good (i.e. closer to 1)
    You want a big number, because you want false negative to be as close to 0 as possible


    F1Score (fscore)
    = 2 * (PR/ [P + R])
    Fscore is like taking the average of precision and recall giving a higher weight to the lower value
    Many formulas for computing comparable precision/accuracy values
    If P = 0 or R = 0 the Fscore = 0
    If P = 1 and R = 1 then Fscore = 1
    The remaining values lie between 0 and 1
  • Find average value (mean) and subtract and, then divide by the range (st deviation)
  • Don’t be afraid to try, even small projects could be fun and useful

×