Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning with Spark and Cassandra - Testing

138 views

Published on

Testing is how we guess at the efficacy of our machine learning models out in the real world. The basics may seem obvious, but specific test metrics can help you emphasize performance on the parts of your application that are the most important.

The previous part in this series (found here: https://www.youtube.com/watch?v=ahqWq6Gkwbw, https://blog.anant.us/spark-and-cassandra-for-machine-learning-data-pre-processing/) discussed data pre-processing methods. This part focuses on how we test the efficacy of our machine learning models and tells us how well they might generalize to real data.

The first part (found here: https://blog.anant.us/spark-and-cassandra-for-machine-learning-setup/) helps set up the environment we work in and discusses why we might want to use Spark and Cassandra here.
Code for the environment can be found here: https://github.com/HadesArchitect/CaSpark
Extra Notebooks and Datasets not included above can be found here: https://github.com/anomnaco/CaSparkExtension

Webinar Recording: https://youtu.be/mHFUJGntk78

Follow Us and Reach Us At:

Anant:
https://www.anant.us/home

Cassandra.Link:
https://cassandra.link/

Email:
solutions@anant.us

LinkedIn:
https://www.linkedin.com/organization...

Twitter:
https://twitter.com/anantcorp

Eventbrite:
https://www.eventbrite.com/o/anant-10...

Facebook:
https://www.facebook.com/AnantCorp/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Machine Learning with Spark and Cassandra - Testing

  1. 1. Machine Learning with Spark and Cassandra - Testing Tests for Binary Classification Models, Regression Models, And Multi-class Classification Models
  2. 2. Series Machine Learning with Spark and Cassandra ● Environment Setup ● Data Pre-processing ● Testing ● Validation ● Model Selection Tests
  3. 3. How do we test machine learning models?
  4. 4. ● Tests are a statistical measure of how well our models work. ● Calculated by running a model on held out data with known properties and comparing model predictions to known labels ● Works differently for different types of ML models ● An attempt to capture the potential performance on data the model will see in day to day operation
  5. 5. When do we test? On what data?
  6. 6. When to test? ● Whenever we have a trained model, we can start testing. Depending on what we find and where we are, the test can have us proceeding on to next steps or returning to previous ones. ○ Sometimes we go back to tune the parameters of our model. ○ Sometimes we may want to pick a new algorithm to train altogether. ○ Other times we move forwards to more complex testing strategies or onwards to deployment. ● The same calculations for test statistics can also be a part of the mathematical process for training our model
  7. 7. What data to train on. ● Should always train on held out data, never the same data that was used to train the model. ○ ML algorithms often involve optimization on test statistics for the training dataset. Testing on the training set completely fails to help us generalize to real data. ● There exist multiple methods for choosing data to be held out, should always be done randomly. ○ Simplest method is to split data into two random chunks, train on one and then test on the other ○ Can also split into three chunks, one for training, one for testing, one for final validation ○ More complex schemes exist, to be covered next time in talk on validation
  8. 8. Binary Classification Tests
  9. 9. ● Binary classifiers predict a value which has a boolean typing. It sometimes focuses on the presence or absence of a particular thing, other times picking between two categories. ● In order to test our binary classification models we use something called a confusion matrix. It categorizes our predictions based on what value we predicted and what the actual value is. ● Binary classifiers predict a value which has a boolean typing. It sometimes focuses on the presence or absence of a particular thing, other times picking between two categories. ● In order to test our binary classification models we use something called a confusion matrix. It categorizes our predictions based on what value we predicted and what the actual value is.
  10. 10. ● We use these values to compute more meaningful metrics. ● The most commonly used is accuracy. Accuracy is computed as correct predictions divided by all predictions. Its a general measure of how likely we are to correctly predict a given example. ● Recall is computed as the number of correctly identified positive values divided by the number of actual positive values. It measures how well our model detects the presence of positive values. ● Precision is calculated as the number of correctly identified positive values divided by the number of positive predictions. It measures the reliability of the positive prediction. ● We can use Recall and Precision to calculate a composite value, the F1 score. If either recall or precision is low, the f1-score will also be small. It emphasizes the importance of the incorrect predictions.
  11. 11. Test Error for Regression Models
  12. 12. ● Regression models estimate functions, and produce predictions in the form of scalar values. Classification tests do not work for them. Instead we use the difference between predicted and actual values as a simple error metric. ● Adding error values without extra processing is a bad idea since errors in different directions can cancel out. ● Instead we use metrics like the sum of squared error (SSE) a simple measure that captures error over the entire test set. ● We can also use mean squared error (MSE), which in some cases is better since it is independent of the number of examples in the test set. ● Root mean squared error (RMSE) is sometimes preferable since it is returned in the same units as our predictions rather than units squared, but still maintains many of the statistical feature of the MSE.
  13. 13. ● We sometimes prefer absolute error measures to squared error measures, which we calculate by taking the absolute value of our error measure rather that squaring them. ● Large error values and therefore outliers are emphasized more by squared error measures. ● The discontinuity in the absolute value function makes it difficult to calculate gradients.
  14. 14. Confusion Matrices for Multiclass Classification Models
  15. 15. ● Multiclass classifiers predict a value that can have more than two but still finite possible values ● We test them by building confusion matrices, similar to binary classification, but these cannot be turned directly into test metrics. ● We build an n by n grid, where n is the number of possible classes and place each test result into its cell based on what was predicted and the actual class of the example. ● We can then turn that into n individual matrices, one for each class. We treat correct predictions on a particular class as true positives, and then all other predictions are classified based on their relation to the class that the matrix is for.
  16. 16. ● From these new matrices, we can calculate our test metrics for each class. We can then combine these values in various ways based on what is important for our application. ● We average scores together, but we can average based on the number of classes, weighting each classes scores equally (called macro-average), or we can weight each score by the number of examples that class has (called micro-average). ● Macro-average can act as a general score though it may obscure very high or low performance on particular classes. If performance on a particular class is important we may choose to micro-average or even look at the individual test scores.
  17. 17. Demo
  18. 18. Any Questions?
  19. 19. Strategy: Scalable Fast Data Architecture: Cassandra, Spark, Kafka Engineering: Node, Python, JVM,CLR Operations: Cloud, Container Rescue: Downtime!! I need help.  www.anant.us | solutions@anant.us | (855) 262-6826 3 Washington Circle, NW | Suite 301 | Washington, DC 20037

×