With latest technologies evolving like Machine Learning, QA's must know the right strategy to test applications as AI apps like Personal Assistants, Smart Cars have direct impact in our life.
2. AGENDA
1. Intro + Quick agenda walkthrough(brief talk)
a. What is AI/ML
b. How technology is shifting towards AI, ML
c. Where does a QA step in
d. Challenges while testing AI,ML application
Hands-ON Activity:
1. Create and Test a basic Beer-Wine Classifier
1. Create an Image Classifier ( via CLI )
a. Retrain a Mobile Net
b. Generate test data
c. Create Optimized graphs
d. Test you classifier
1. Real Time Image Classifier via Android App - (OPTIONAL)
a. Retrain a Mobile Net
b. Generate test data
c. Create Optimized graphs
d. Test you classifier
2
3. PREREQUISITES
Please complete all the following steps:
● Clone all the following repositories at local:
a. https://github.com/tarunmaini16/beer-wine-classifier
b. https://github.com/tarunmaini16/image-classifier
c. https://github.com/tarunmaini16/android-image-classifier
● Pull following docker images (optional):
a. https://cloud.docker.com/u/tarunmaini/repository/docker/tarunmaini/wine-beer-classification
b. https://cloud.docker.com/u/tarunmaini/repository/docker/tarunmaini/image-classifier
● Install Python at system and python plugin in IntelliJ
● Install Tensorflow via terminal $ pip install --upgrade “tensorflow==1.9*”
● Android Studio Setup [v3.1+]
● Android Device OR Virtual Emulator ( API Level = 27/28, Target = Android 8.1/9 )
● Bring your data Cables to connect mobile device
● ADB setup
3
6. “Machine learning is an application
of artificial intelligence (AI) that
provides systems the ability to
automatically learn and improve
from experience without being
explicitly programmed“
6
14. ● Label: Is what you're attempting to predict or forecast
● Features: are an individual measurable property OR the descriptive attributes
● Feature Vectors: A feature vector is a vector in which each dimension represent a certain feature
of an example
● Learning Rate: number of time data is reread in a model to perform accurate predictions.
● Hyperparameters : is a parameter whose value is set before the learning process begins to fine
tune performance such as coefficient of features for logistic regression model.
Frequent terms used in ML
1
4
19. Training data Vs Test data
● Training set— Data subset to train a model
● Test set— Data subset to test the trained model
You could imagine slicing the single data set as follows:
1
9
25. Testing the feature
● Test whether the value of features lies between the threshold values
● Test whether the feature importance changed with respect to previous QA run
● Test the feature unsuitability by testing RAM, usage, inference latency etc.
● Test/Review whether the generated feature violates the data compliance related issues
2
5
27. It depends on application type.
Examples :
● Decision tree,Random forest → classification
● Linear Regression → regression
● Naive bayes algorithm → classification
APIs of few libraries used to develop/test ML models
● Tensorflow
● Cloud Vision API
● Natural Language
● Google Speech
Some algorithmic models
2
7
37. Precision
Out of all the predictions predicted as beer , how many are correctly classified as beer ?
True Positive +False Positive
True Positive
3
7
38. Recall
Out of all the drinks labeled as beer , How many were correctly predicted ?
True Positive
True Positive +False Negative
3
8
39. Metrics used for Regression Model
● Root Mean Square Error : is a measure of accuracy, to compare forecasting errors of different
models for a particular dataset and not between datasets
● Mean Absolute Error : how much % error the model makes in its predictions.
● Entropy : is used as an impurity measure of the model.
3
9
41. Challenges in testing
● Fast machines and processors
● Generate training data
● Generate test Data
● Know the Threshold and test with new data
● Data Filtering/quality of data - Enhancing data, Prevent overfitting & underfitting
4
1
42. PREREQUISITES
Please complete all the following steps:
● Clone all the following repositories at local:
a. https://github.com/tarunmaini16/beer-wine-classifier
b. https://github.com/tarunmaini16/image-classifier
c. https://github.com/tarunmaini16/android-image-classifier
● Pull following docker images (optional):
a. https://cloud.docker.com/u/tarunmaini/repository/docker/tarunmaini/wine-beer-classification
b. https://cloud.docker.com/u/tarunmaini/repository/docker/tarunmaini/image-classifier
● Install Python at system and python plugin in IntelliJ
● Install Tensorflow via terminal $ pip install --upgrade “tensorflow==1.9*”
● Android Studio Setup [v3.1+]
● Android Device OR Virtual Emulator ( API Level = 27/28, Target = Android 8.1/9 )
● Bring your data Cables to connect mobile device
● ADB setup
42
Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.” - Arthur Samuel, 1959
Machine Learning is the science of programming computers so they can “learn from data”
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. - Tom Mitchell, 1997
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed
Machine learning is a form of AI that enables a system to learn from data rather than through explicit programming. However, machine learning is not a simple process. As the algorithms ingest training data, it is then possible to produce more precise models based on that data. A machine learning model is the output generated when you train your machine learning algorithm with data. After training, when you provide a model with an input, you will be given an output. For example, a predictive algorithm will create a predictive model. Then, when you provide the predictive model with data, you will receive a prediction based on the data that trained the model.
Artificial intelligence is the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence.
Within the domain of neural networks, there is an area called Deep Learning(DL), in which neural networks have more than three layers, i.e. more than one hidden layer. These neural networks used in Deep learning are called Deep Neural Networks (DNNs).So, Deep Learning is a technique for implementing Machine Learning
Social Networking: FB automatically recognises faces suggests to tag a friend.
Banking / Finance: Fraud detection algorithms to classify fraudulent transactions are in place.
Mobile:
-Personal Assistants
-Voice to text
-Technology
Online Shopping: Recommendations of similar products
Search Engines: Google’s autocomplete suggestions for search
Medicine : Researches on using ML for disease diagnosis. - Google’s DeepMind Health
Machine learning has the potential to automate a large portion of skilled labor, but the degree to which this affects a workforce depends on the level of difficulty involved in the job.
Education :
1.)Algorithms can analyze test results, drastically reducing the time teachers spend in their leisure time on grading
2.)A student's attendance and academic history can help determine gaps in knowledge and learning disabilities.
Law:
J.P. Morgan, for example, uses a software program dubbed COIN (Control Intelligence) to review documents and previous cases in seconds that would otherwise take 360,000 hours.
Transportation :
1.)
Rolls Royce and Google have teamed up to design and launch the world's first self-driving ship by 2020.
2.)NASA having successfully launched and landed an autonomous space shuttle
Manual Labour :
driverless trucks operating in mining pits in Australia, operated remotely from a distant control center.(particular jobs that involve some element of danger or potential harm, such as work in factories and mining)
Healthcare:
Hospitals are currently using AI algorithms to more accurately detect tumors in radiology scans and analyze different moles for skin cancer, and machine learning is being adapted to accelerate research toward a cure for cancer.
Alexa:
voice-activated control of your smart-home (the dimming of lights, closing of blinds, locking of doors, etc., all at your command).
Supervised: Supervised learning identifies patterns in data given pre-determined features and labeled data.
Unsupervised: Unsupervised learning identifies patterns in data, which is particularly helpful for unlabeled and unstructured data.
Semi-supervised: A blend of supervised and unsupervised learning. Best in situations where there is some labeled data but not a lot.
Reinforcement: Reinforcement learning provides feedback to the algorithm as it trains; it is essentially experience-driven decision
Typical business uses of supervised learning include recognizing objects in images, predicting financial results, detecting fraud, and evaluating risk.
Unsupervised : Categorizing news, books, and other things, recommending items to customers.
Semi : detecting spam, classifying web-content, and analyzing speech
https://semanti.ca/blog/?glossary-of-machine-learning-terms
A feature vector is a one dimensional matrix which is used to describe a feature of an image. It can be used to describe an entire image (Global feature) or a feature present at in a location in the image space (local feature)
Bias
The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting)
Only thing in the process that has human intervention
Gathering data
Preparing that data
Choosing a model
Training
Evaluation
Hyperparameter tuning
Prediction.
Color
Taste (differs with acidity /Alcoholic content)
Things to explain here:
The approach to Development and testing isn’t traditional here . But does that mean NO QA for ML applications ?
NO, the answer is being adaptive enough to learn how to test those predictions and as of now due to lack to knowledge Data Scientist develop + test the models that they create .
Which in long term will not work when applications scale.
So now , the problem at hand is
How do we test predictions ![ The challenge for QA ]
Gathering data
Preparing that data
Choosing a model
Training
Evaluation
Hyperparameter tuning
Prediction.
Training Set : 80% , Test Data : 20%Make sure that your test set meets the following two conditions:
Is large enough to yield statistically meaningful results.
Is representative of the data set as a whole. In other words, don't pick a test set with different characteristics than the training set.
Never train on test data. If you are seeing surprisingly good results on your evaluation metrics, it might be a sign that you are accidentally training on the test set. For example, high accuracy might indicate that test data has leaked into the training set.
For example, consider a model that predicts whether an email is spam, using the subject line, email body, and sender's email address as features. We apportion the data into training and test sets, with an 80-20 split. After training, the model achieves 99% precision on both the training set and the test set. We'd expect a lower precision on the test set, so we take another look at the data and discover that many of the examples in the test set are duplicates of examples in the training set (we neglected to scrub duplicate entries for the same spam email from our input database before splitting the data). We've inadvertently trained on some of our test data, and as a result, we're no longer accurately measuring how well our model generalizes to new data.
Data snooping bias:
Test set have to be created immediately after receiving the dataset. Otherwise as
humans we derive a pattern around all the data and there is a possibility of bias while
training the model, which is called as the ‘data snooping bias’.
A validation dataset is a dataset of examples used to tune the hyperparameters (i.e. the architecture) of a classifier. It is sometimes also called the development set or the "dev set". In artificial neural networks, a hyperparameter is, for example, the number of hidden units.[7][8] It, as well as the testing set (as mentioned above), should follow the same probability distribution as the training dataset
In the figure, "Tweak model" means adjusting anything about the model you can dream up—from changing the learning rate, to adding or removing features, to designing a completely new model from scratch. At the end of this workflow, you pick the model that does best on the test set.
Dividing the data set into two sets is a good idea, but not a panacea. You can greatly reduce your chances of overfitting by partitioning the data set into the three subsets shown in the following figure:
Use the validation set to evaluate results from the training set. Then, use the test set to double-check your evaluation after the model has "passed" the validation set. The following figure shows this new workflow:
In this improved workflow:
Pick the model that does best on the validation set.
Double-check that model against the test set.
Things to explain here:
The approach to Development and testing isn’t traditional here . But does that mean NO QA for ML applications ?
NO, the answer is being adaptive enough to learn how to test those predictions and as of now due to lack to knowledge Data Scientist develop + test the models that they create .
Which in long term will not work when applications scale.
So now, the problem at hand is
How do we test predictions ![ The challenge for QA ]
<Show this data to tarun-k>
Some of the ways of generating data are:
E.g In Linear Regression make_regression() takes several inputs as shown in the example above. The inputs configured above are the number of test data points generated n_samples the number of input features n_features and finally the noise level noise in the output date
* * what was this star for - divya?
In Clustering - make_blobs() from sklearn can be used to clustering data for any number of features n_features with corresponding labels
Underfitting:
A statistical model or a machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data. (It’s just like trying to fit undersized pants!) Underfitting destroys the accuracy of our machine learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough. It usually happens when we have less data to build an accurate model and also when we try to build a linear model with a non-linear data. In such cases the rules of the machine learning model are too easy and flexible to be applied on such a minimal data and therefore the model will probably make a lot of wrong predictions. Underfitting can be avoided by using more data and also reducing the features by feature selection.
Overfitting:
A statistical model is said to be overfitted, when we train it with a lot of data (just like fitting ourselves in an oversized pants!). When a model gets trained with so much of data, it starts learning from the noise and inaccurate data entries in our data set. Then the model does not categorize the data correctly, because of too much of details and noise. The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models. A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.
What do u think was involved in building this algo ?
Take is as your mind reads information after it has been fed similar information !
In this exercise, we will retrain a MobileNet. MobileNet is a a small efficient convolutional neural network. "Convolutional" just means that the same calculations are performed at each location in the image.
Tensorflow: is used for acquiring data, training models, serving predictions, and refining future results
Cloud Vision API provides a REST API to understand and extract information from an image. It uses powerful machine learning models to classify images into thousands of categories, detect faces, identify adult content, emotions, OCR support and more.
Natural Language API is used to identify parts of speech and to detect multiple types of entities like persons, monuments, etc. It can also perform sentiment analysis. It currently supports three languages: English, Spanish and Japanese
Speech API is used to translate audio files into text. It is able to identify over 80 languages and their variants, and can work with most audio files
Description
TensorFlow is an open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks
TensorFlow can train and run deep neural networks for handwritten digit classification, image recognition, word embeddings, recurrent neural networks, sequence-to-sequence models for machine translation, natural language processing, and PDE (partial differential equation) based simulations. Best of all, TensorFlow supports production prediction at scale, with the same models used for training
TensorFlow allows developers to create dataflow graphs—structures that describe how data moves through a graph, or a series of processing nodes. Each node in the graph represents a mathematical operation, and each connection or edge between nodes is a multidimensional data array, or tensor.
-wher edoes it come from
Used for?
Wherwe using it* *
Decision trees can be applied to both classification & regression tasks.
For regression task, decision trees use the MSE instead of gini score.
Scikit uses CART Algorithm to grow decision trees.
Main issue with Decision trees is the sensitivity to change in training data
--------------Random Forest ------------------
Random forest is an ensemble of Decision trees.
Instead of searching for the best feature to split a node, it searches for the best feature among a random subset of features, thus introducing more randomness hence less bias.
Important quality of Random Forests is that they make it easy to measure the relative importance of a feature.
It takes the features which reduces impurity on average to grow trees.
------Naives Bayes-----------
Random forest is an ensemble of Decision trees.
Instead of searching for the best feature to split a node, it searches for the best feature among a random subset of features, thus introducing more randomness hence less bias.
Important quality of Random Forests is that they make it easy to measure the relative importance of a feature.
It takes the features which reduces impurity on average to grow trees.
Things to explain here:
The approach to Development and testing isn’t traditional here . But does that mean NO QA for ML applications ?
NO, the answer is being adaptive enough to learn how to test those predictions and as of now due to lack to knowledge Data Scientist develop + test the models that they create .
Which in long term will not work when applications scale.
So now , the problem at hand is
How do we test predictions ![ The challenge for QA ]
If you specify a small learning_rate, like 0.005, the training will take longer, but the overall precision might increase
For example,
'mobilenet_1.0_224' will pick a model that is 17 MB in size and takes 224
pixel input images, while 'mobilenet_0.25_128_quantized' will choose a much
less accurate, but smaller and faster network that's 920 KB on disk and
takes 128x128 images
Should we talk about F-beta score ?
When False positives are ok and False negatives are NOT ok → use precision .. like you cannot tell a sick person that he is healthy . But you may tell a person that healthy person is sick and needs re-test
When False negatives are OK but False positives are not ok .Then use recall .
Eg. If Important mail goes to spam is wrong .
Spam mail in inbox might be ok .
RMSE: In meteorology, to see how effectively a mathematical model predicts the behavior of the atmosphhere. This is type regression
So, we have data for which we are trying to achieve a prediction/output and we have to chose the best model/ algorithm to achieve accurate prediction . So , we evaluate the model using the different metrics