3. WORKING BACKWARD FROM
PRODUCT REQUIREMENTS
•In classical AI applications, such as CFAR and MNIST, Kaggle contests
and other online challenges, we are accustomed to hearing about
algorithms in the 95-99 percentile + of accuracy
•Frequently predictive solutions have built upon one another over a
period of years, sometimes decades, with new state of the art models
improving performance by a fraction of a percentage point
4.
5.
6. FOR MANY APPLICATIONS WITH NEW
DATASETS THE RETURN ON
INVESTMENT AND WAITING PERIOD TO
REACH THE 99TH PERCENTILE IN
ACCURACY IS PROHIBITIVE
• New datasets require long periods of data exploration to determine and eliminate errors in the dat
The data collection process
• Frequently, basic models can reach acceptable baseline levels of accuracy with in a time period tha
acceptable for early prototype development
• Not all applications require the highest level of accuracy
• Models with higher and higher levels of accuracy can have diminishing returns as they take longer
and are more expensive to train, especially when using cloud services such as AWS
• Simple models such as MLP and KNN can be implemented quickly using tools like scikit-learn and
decent results
8. HOW CAN YOU DETERMINE
WHEN YOU NEED TO INVEST
IN DEVELOPING MORE
COMPLEX METHODS?
9. HIGHEST RISK VS. LOWER RISK
APPLICATIONS
•For many applications the risks are clear and the lowest levels of
error possible are desirable
• Self driving cars
• Medical applications
•For many applications, the risk associated with an error is not fatal,
and the costs associated with 99+ percentile accuracy are large. In
some cases, decision boundaries are not clear to human observers
and/or labels (such as appraisal values) are not agreed upon. These
types of applications are frequent when business, financial or
economic subject matter is the target of a prediction problem, but
can also appear in other low risk applications such as chat-bots,
where occasional errors may not dissuade prospects from converting
to sales.
10. IDEA – USE MONTE CARLO
SIMULATION
• Use Monte Carlo to simulate algorithm performance on real data before developing algorithm
• For example, you can assess the impact of different levels of accuracy on your product performan
investing time and money into developing an AI algorithm
12. SIMULATION
•Select percentage of known labels, in this hypothetical case “buy”
recommendations for hypothetical stocks with returns above a threshold,
and create an “AI” selected data set by randomly sampling 1 – p negative
examples to be mislabeled by hypothetical AI
•Create a model of your product or business performance
•Simulate the performance of the product or business using Monte Carlo
trials. In this case a portfolio of 50 hypothetical stocks were chosen by the AI
and compared to those chosen by a hypothetical human, with some
information, from the same universe of stocks
•Probability distribution of false identification in feature space can be
specified and tested for distributions with same mean precision
•In this hypothetical example, an algorithm with 99% accuracy would be a
good target, but should consider whether or not 80-90% would be sufficient
•Also: should consider what level of accuracy is possible (for example, by
considering variability between human experts, in light of Big Data) and
13. INCOMPLETE DATA
•Another question companies frequently face is whether or not the
cost and time required to gather additional data will significantly
improve model performance
•Concept: utilize simulation on existing data to estimate performance
improvements
•Can sub-sample from data to simulate missing data, either in feature
or label space
•Eliminate field entries or entire examples and track degradation of
algorithm performance
•If performance does not decrease significantly, than more data is
unlikely to be helpful
14. NEW DATASome population data may be available for target populations at a high level,
but predicting labels for individuals from the population requires data to be gathered and significant
Predictive features. Companies need to decide if the investment is worth it.
For example: should we gather data to predict income in the U.S. or Canada first? Can simulate perfo
determine which country would be more profitable to predict on a per-capita basis, given product or
For example, targeted advertising based on predicted income. Different distribution assumptions can
15. ALL OF THIS CAN BE
ACCOMPLISHED BEFORE AN
ALGORITHM IS DEVELOPED OR
DURING EARLY STAGES OF
ALGORITHM DEVELOPMENT
As we can see from the error rates of these simple algorithms on MNIST data,
which can be rapidly prototyped using existing packages, a product prototype can be
built while considering the added benefits of further development on the dataset we
need to work with by simulating the performance within our business or product
model
16. MODEL CALIBRATION
Guo, C., Pleiss, G., Sun,Y., Wienberger, K., (2017) On Calibration of Modern Neural Networks
Proceedings of the 34th International Conference on Machine Learning, 70, pp 1321-1303
17. IMPORTANCE OF CALIBRATION
• Useful when decisions need to be made or risks need to be assessed at the level of single predictio
• For example, in human-ai collaboration paradigms in which human assistance is requested for cas
Machine confidence falls below a threshold
• Investors buying single art works require risk assessments on a per-item basis
• Current calibration methods as reviewed in the referenced article asses calibration across all featur
• However, there is no reason to assume that an algorithm equally well calibrated across all subsets
• For example, there have been many cases in which facial recognition, sentiment analysis fail for pro
subgroups