1. Jongwook Woo
HiPIC
CalStateLA
IDEAS Live Webinar 2019
May 4 2019
Jongwook Woo, PhD, jwoo5@calstatela.edu
Big Data AI Center (BigDAI)
California State University Los Angeles
Big Data and Predictive Analysis
2. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Big Data Predictive Analysis
Summary
3. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself
Experience:
Since 2002, Professor at California State University Los Angeles
– PhD in 2001: Computer Science and Engineering at USC
4. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Universities in Los Angeles
West
North
5. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Universities in Los Angeles
6. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
California State University
Los Angeles
7. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: S/W Development Lead
http://www.mobygames.com/game/windows/matrix-online/credits
8. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Isaac Engineering, HDP, CDH, Oracle
using Hadoop Big Data
https://www.cloudera.com/more/customers/csula.html
9. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Partners for Services
10. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Collaborations
11. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Big Data Predictive Analysis
Summary
12. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Issues
Large-Scale data
Tera-Byte (1012), Peta-byte (1015)
– Because of web
– Sensor Data (IoT), Bioinformatics, Social Computing, Streaming data,
smart phone, online game…
Legacy approach
Can do
– Improve the speed of CPU
Increase the storage size
Only Problem
– Too expensive
13. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Traditional Way
14. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Traditional Way
Becomes too Expensive
15. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Issues
Cannot handle with the legacy approach
Too big
Non-/Semi-structured data
3 Vs, 4 Vs,…
– Velocity, Volume, Variety
Traditional Systems can handle them
– But Again, Too expensive
Need new systems
Non-expensive
16. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Two Cores in Big Data
How to store Big Data
How to compute Big Data
Google
How to store Big Data
– GFS
– Distributed Systems on non-expensive commodity computers
How to compute Big Data
– MapReduce
– Parallel Computing with non-expensive computers
Own super computers
Published papers in 2003, 2004
17. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Not Expensive
From 2017 Korean
Blockbuster Movie,
“The Fortress”
(남한산성)
18. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
But Works Well with the crazy massive data set
Battle of Nagashino,
1575, Japan
19. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Need Resource Management
20. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
What is Hadoop?
20
Apache Hadoop Project in
Jan, 2006 split from Nutch
Hadoop Founder:
o Doug Cutting
Apache Committer:
Lucene, Nutch, …
21. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Super Computer vs Hadoop vs Cloud
Parallel vs. Distributed file systems by Michael Malak
Updated by Jongwook Woo
Cluster for Store Cluster for Compute/Store
Cluster for Compute Cloud Computing adopts
this architecture:
with High Speed N/W
22. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Definition: Big Data
Non-expensive platform that is distributed parallel systems and
that can store a large scale data and process it in parallel [1, 2]
Hadoop
– Non-expensive Super Computer
– More public than the traditional super computers
• You can store and process your applications
– In your university labs, small companies, research centers
Others with storage and computing services
– Spark
• normally integrated into Hadoop with Hadoop community
– NoSQL DB (Cassandra, MongoDB, Redis, Hbase,…)
– ElasticSearch
23. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Data Analysis & Visualization
Sentiment Map of Alphago
Positive
Negative
24. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
K-Election 2017
(April 29 – May 9)
25. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Businesses popular in 5 miles of CalStateLA,
USC , UCLA
26. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Jams and other traffic incidents reported
by users in Dec 2017 – Jan 2018:
(Dalyapraz Dauletbak)
27. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Big Data Predictive Analysis
Summary
28. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Analysis and Prediction
Big Data Analysis
Hadoop, Spark, NoSQL DB, SAP HANA, ElasticSearch,..
Big Data for Data Analysis
– How to store, compute, analyze massive dataset?
Big Data Science
How to predict the future trend and pattern with the massive
dataset? => Machine Learning
29. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark
Limitation in MapReduce
Hard to program in Java
Batch Processing
– Not interactive
Disk storage for intermediate data
– Performance issue
Spark by UC Berkley AMP Lab
Started by Matei Zaharia in 2009,
– and open sourced in 2010
In-Memory storage for intermediate data
20 ~ 100 times faster than
– MapReduce
Good in Machine Learning => Big Data Science
– Iterative algorithms
30. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark (Cont’d)
Spark ML
Supports Machine Learning libraries
Process massive data set to build prediction models
31. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning
Machine Learning
Has been popular since Google Tensorflow
Multiple Cores in GPU
– Even with multiple GPUs and CPUs
Parallel Computing
GPU (Nvidia GTX 1660 Ti)
1280 CUDA cores
Deep Learning Libraries
Tensor Flow
PyTorch
Keras
Caffe, Caffe2
Microsoft Cognitive Toolkit (Previously CNTK)
Apache Mxnet
DeepLearning4j
…
33. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning
CNN
Image Recognition
Video Analysis
NLP for classification, Prediction
RNN
Time Series Prediction
Speech Recognition/Synthesis
Image/Video Captioning
Text Analysis
– Conversation Q&A
GAN
Media Generation
– Photo Realistic Images
Human Image Synthesis: Fake faces
34. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning with Spark
What if we combine Deep Learning and Spark?
35. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning with Spark
Deep Learning Pipelines for Apache Spark
Databricks
TensorFlowOnSpark
Yahoo! Inc
BigDL (Distributed Deep Learning Library for Apache Spark)
Intel
DL4J (Deeplearning4j On Spark)
Skymind
Distributed Deep Learning with Keras & Spark
Elephas
36. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Big Data Predictive Analysis: Use Case
Summary
37. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Use Case in Spark
“Predicting AD click fraud using Azure and Spark ML”,
Accepted at The 14th Asia Pacific International Conference on Information
Science and Technology (APIC-IST 2019), June 23-26 2019, Beijing, China
– By Neha Gupta, Hai Anh Le, Maria Boldina
Machine Learning
Distributed Parallel Computing
– using Spark with Hadoop and Cloud Computing
Not Deep Learning
38. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Ad Click Fraud
A person, automated script or computer program imitates a
legitimate user
clicking on an ad without having an actual interest in the target of the ad's
link
resulting in misleading click data and wasted money
Companies suffers from huge volumes of fraudulent traffic
Especially, in mobile market in the world
Goal
Predict who will download the apps
Using Classification model
Traditional and Big Data approach
39. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Ad Click Fraud (Cont’d)
TalkingData
China’s largest independent big data service platform
– covers over 70% of active mobile devices nationwide
handles 3 billion clicks per day
– 90% of which are potentially fraudulent
Goal of the Predictive Analysis
Predict whether a user will download an app
– after clicking on a mobile app advertisement
To better target the audience,
– to avoid fraudulent practices
– and save money
40. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Set
Dataset: TalkingData AdTracking Fraud Detection
https://www.kaggle.com/c/talkingdata-adtracking-fraud-
detection/data
Dataset Property:
Original dataset size: 7GB
– contains 200 million clicks over 4 day period
Dataset format: csv
Fields: 8
– Target Column to Predict: ‘is_attributed’
41. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Set Details
42. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experiment Environment:
Traditional and Big Data Systems
43. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experiment Environment: Traditional
Azure ML Studio:
Traditional for small data set
Free Workspace
10GB storage
Single node
Implement fundamental prediction models
– Using Sample data: 80MB (1.1% of the original data set)
Select the best model among number of classifications
44. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experiment Environment: Spark
Spark ML:
Data Filtering:
– 1 GB from 8 GB
• Implemented Python code to reduce size to 1GB (15%)
– We have experimental result with 8GB as well
• For another publication
Databricks Subscription
– Cluster 4.0 (includes Apache Spark 2.3.0, Scala 2.11)
• 2 Spark Workers with total of 16 GB Memory and 4 Cores
• Python 2.7
• File System : Databricks File System
45. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experiment Environment: Spark (Cont’d)
Oracle Big Data Spark Cluster
Oracle BDCE
Python 2.7.x, Spark 2.1.x
10 nodes,
– 20 OCPUs, 300GB Memory, 1,154GB Storage
46. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Work Flow in Azure ML
Relatively Easy to build and test
Drag and Drop GUI
Work Flow
1. Data Engineering
– Understanding Data
– Data preparation
– Balancing data statistically
2. Data Science: Machine Learning (ML)
– Model building and validation
• Classification algorithms
– Model evaluation
– Model interpretation
47. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Engineering
Unbalanced dataset
1: 0.19% App downloaded
0: 99.81% App not
downloaded
1GB filtered dataset
still too large for the
traditional systems: Azure
ML Studio
More sampling needed for
Azure ML
48. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Engineering
SMOTE: Synthetic Minority Over
Sampling Technique takes a subset of
data from the minority class and creates
new synthetic similar instances
Helps balance data & avoid overfitting
Increased percent of minority class (1) from
0.19% to 11%
Stratified Split ensures that the output
dataset contains a representative
sample of the values in the selected
column
Ensures that the random sample does not contain
all rows with just 0s
8% sample used = 80 MB
49. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Algorithms in Azure ML Studio
Two-Class Classification:
classify the elements of a given set into two groups
– either downloaded, is_attributed (1)
– or not downloaded, is_attributed (0)
Decision trees
often perform well on imbalanced datasets
– as their hierarchical structure allows them to learn signals from both classes.
Tree ensembles almost always outperform singular decision trees
– Algorithm #1: Two-class Decision Jungle
– Algorithm #2: Two-class Decision Forest
50. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Selecting Performance Metrics
False Positives indicate
the model predicted an app was downloaded when in fact it wasn’t
Goal: minimize the FP => To save $$$
51. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
AZURE ML MODEL #1: TWO-CLASS DECISION JUNGLE
• 8% Sample
• SMOTE 5000%
• 70:30 Split
Train/Test
• Cross-Validation
• Tune Model
Hyperparameters
• Features used: all 7
52. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
AZURE ML MODEL #1: Tune Model Hyperparameters
Without Tune
Hyperparameters
With Tune
Hyperparameters
AUC = 0.905 vs 0.606
Precision = 1.0
TP = 35, FP = 0
53. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
AZURE ML MODEL #2: TWO-CLASS DECISION FOREST
• 8% Sample
• SMOTE 5000%
• 70:30 Split
Train/Test
• Cross-Validation
• Tune Model
Hyperparameters
• Permutation Feature
Importance
54. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
AZURE ML MODEL #2: Improving Precision
Precision
increased to 0.992
FP decreased from
1,659 to 377
FN increased from
1,834 to 5,142 By increasing
threshold from 0.5
to 0.8
55. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experimental Results in Azure ML Studio
Performance:
Execution time with sample data set: 1GB
Decision Forrest
– takes 2.5 hours
Decision Jungle
– takes 3 hours 19 min
Good Guide from the models of Azure ML Studio
to adopt the 2 similar algorithms for Spark ML
– Decision Tree
– Random Forest
56. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experimental Results in AzureML
Two-class Decision Forest is the best model!
57. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experiment with Spark ML in Databricks
1. Load the data source
1.03 GB
Same filtered data set as Azure ML
2. Train and build the models
o Balanced data statistically
3. Evaluate
58. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Engineering
Generate features
Feature 1: extract day of the week and hour of the day from the click time
Feature 2: group clicks by combination of
– (Ip, Day_of_week_number and Hour)
Feature 3: group clicks by combination of
– (Ip, App, Operating System, Day_of_week_number and Hour)
Feature 4: group clicks by combination of
– (App, Day_of_week_number and Hour)
Feature 5: group clicks by combination of
– (Ip, App, Device and Operating System)
Feature 6: group clicks by combination of
– (Ip, Device and Operating System)
59. 59
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark ML MODEL #1: Decision Tree Classifier
Confusion Matrix
60. 60
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark ML MODEL #1: Random Forrest Classifier
Confusion Matrix
61. 6161
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark ML Result Comparison
Decision Tree Classifier is relatively the better model!
Decision Tree
Classifier
Random Forest
Classifier
AUC 0.815 0.746
PRECISION 0.822 0.878
RECALL 0.633 0.495
TP 86,683 67,726
FP 18,727 9,408
TN 7,112,961 7,122,280
FN 50,074 69,031
RMSE 0.0972 0.1038
62. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experiment in Oracle Cluster
Oracle Big Data Spark Cluster
10 nodes, 20 OCPUs, 300GB Memory, 1,154GB Storage
1. Load the data source
1.03 GB
2. Sample the balanced data based on Downloaded
116 MB
3. Train and build the models
o Balanced data statistically
4. Evaluate
63. 6363
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Azure ML Studio and Spark ML Result Comparison
TWO-CLASS
DECISION
JUNGLE
(AzureML)
TWO-CLASS
DECISION
FOREST
(AzureML)
DECISION
TREE
CLASSIFIER
(Databricks
)
RANDOM
FOREST
CLASSIFIER
(Databricks
)
DECISION
TREE
CLASSIFIER
(Balanced
Sample Data,
Oracle)
RANDOM
FOREST
CLASSIFIER
(Balanced
Sample Data,
Oracle)
AUC 0.905 0.997 0.815 0.746 0.896 0.893
PRECISION 1.0 0.992 0.822 0.878 0.935 0.934
RECALL 0.001 0.902 0.633 0.495 0.807 0.800
TP 35 47,199 86,683 67,726 111,187 110,220
FP 0 377 18,727 9,408 7,712 7,791
TN 52,306 406,228 7,112,961 7,122,280 545,302 545,223
FN 406,605 5,142 50,074 69,031 26,604 27,571
Run Time 2 hrs 2-3 hrs 22 mins 50 mins 24 sec 2 mins
64. 6464
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Azure ML Studio and Spark ML Result Comparison
TWO-CLASS
DECISION
JUNGLE
(AzureML)
TWO-CLASS
DECISION
FOREST
(AzureML)
DECISION
TREE
CLASSIFIER
(Databricks
)
RANDOM
FOREST
CLASSIFIER
(Databricks
)
DECISION
TREE
CLASSIFIER
(Balanced
Sample Data,
Oracle)
RANDOM
FOREST
CLASSIFIER
(Balanced
Sample Data,
Oracle)
AUC 0.905 0.997 0.815 0.746 0.896 0.893
PRECISION 1.0 0.992 0.822 0.878 0.935 0.934
RECALL 0.001 0.902 0.633 0.495 0.807 0.800
TP 35 47,199 86,683 67,726 111,187 110,220
FP 0 377 18,727 9,408 7,712 7,791
TN 52,306 406,228 7,112,961 7,122,280 545,302 545,223
FN 406,605 5,142 50,074 69,031 26,604 27,571
Run Time 2 hrs 2-3 hrs 22 mins 50 mins 24 sec 2 mins
• Azure ML Two-class Decision Forest is the best model!
• Spark ML code need to be updated for the better accuracy
• Balanced Sampling based on the fraud in Oracle:
• Decision Tree has 0.935 in Precision
• Execution Time: 24 secs
65. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Big Data Predictive Analysis
Summary
66. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary
Introduction to Big Data
Ad Click Prediction models in Traditional and Big Data
Systems
Azure ML Studio shows best accuracy with Two Class Decision
Forrest model
Spark ML performance is 3.5 – 7 times faster than Azure ML
Studio with 1 GB data set but not accurate
With 2 nodes Spark Cluster
Balanced sample data in Oracle has the close accuracy to the
traditional systems while it is 300 times faster
with 10 nodes Spark Cluster
67. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Questions?
68. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Set Details (Cont‘d)
69. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Precision vs Recall
True Positive (TP): Fraud? Yes it is
False Negative (FN): No fraud? but it is
False Positive (FP): Fraud? but it is not
Precision
TP / (TP + FP)
Recall
TP / (TP + FN)
Ref: https://en.wikipedia.org/wiki/Precision_and_recall
Positive:
Event occurs
(Fraud)
Negative: Event
does not
Occur (non
Fraud)
70. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
1. Priyanka Purushu, Niklas Melcher, Bhagyashree Bhagwat, Jongwook Woo, "Predictive Analysis of Financial
Fraud Detection using Azure and Spark ML", Asia Pacific Journal of Information Systems (APJIS),
VOL.28│NO.4│December 2018, pp308~319
2. Jongwook Woo, DMKD-00150, “Market Basket Analysis Algorithms with MapReduce”, Wiley
Interdisciplinary Reviews Data Mining and Knowledge Discovery, Oct 28 2013, Volume 3, Issue 6, pp445-
452, ISSN 1942-4795
3. Jongwook Woo, “Big Data Trend and Open Data”, UKC 2016, Dallas, TX, Aug 12 2016
4. How to choose algorithms for Microsoft Azure Machine Learning, https://docs.microsoft.com/en-
us/azure/machine-learning/machine-learning-algorithm-choice
5. “Big Data Analysis using Spark for Collision Rate Near CalStateLA” , Manik Katyal, Parag Chhadva, Shubhra
Wahi & Jongwook Woo, https://globaljournals.org/GJCST_Volume16/1-Big-Data-Analysis-using-Spark.pdf
6. Spark Programming Guide: http://spark.apache.org/docs/latest/programming-guide.html
7. TensorFrames: Google Tensorflow on Apache Spark, https://www.slideshare.net/databricks/tensorframes-
google-tensorflow-on-apache-spark
8. Deep learning and Apache Spark, https://www.slideshare.net/QuantUniversity/deep-learning-and-apache-
spark
71. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
9. Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark,
https://www.slideshare.net/SparkSummit/which-is-deeper-comparison-of-deep-learning-frameworks-on-
spark
10. Accelerating Machine Learning and Deep Learning At Scale with Apache Spark,
https://www.slideshare.net/SparkSummit/accelerating-machine-learning-and-deep-learning-at-scalewith-
apache-spark-keynote-by-ziya-ma
11. Deep Learning with Apache Spark and TensorFlow, https://databricks.com/blog/2016/01/25/deep-
learning-with-apache-spark-and-tensorflow.html
12. Tensor Flow Deep Learning Open SAP
13. Overview of Smart Factory, https://www.slideshare.net/BrendanSheppard1/overview-of-smart-factory-
solutions-68137094/6
14. https://dzone.com/articles/sqoop-import-data-from-mysql-tohive
15. https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data
16. https://blogs.msdn.microsoft.com/andreasderuiter/2015/02/09/performance-measures-in-azure-ml-
accuracy-precision-recall-and-f1-score/