SlideShare a Scribd company logo
1 of 42
Download to read offline
USER BEHAVIOUR ANALYSIS
Why Users are doing what they are doing? How to
make sense?
Muhammad Ali Norozi
Why Users are doing what they are doing? How to make sense?
Tracking, collecting and assessing of user data and activities using ML.
U SER
B EHAVIOUR
A NALYSIS
Muhammad Ali Norozi
Outline
● Introduction: User Behaviour Analysis (UBA)
● Generic UBA
○ ML and UBA
● Anomalous and outlier detection
● Negative anomaly
● Positive anomaly
● Discussion
User Behaviour Analysis
● User behavior encompasses all the actions users take on a product: where and what they click
on, how they move from one state to another (being active to inactive), where they stumble, and
where they eventually drop off and leave.
● Tracking user behavior gives you an inside look at how people interact with your product (credit
card for example) and what obstacles or hooks they experience in their journey as your customers.
● User behavior analyses (UBA) is a tailored method for collecting, combining, and analyzing
quantitatively and qualitatively users data to understand how users interact with a product, and why.
What?
When you want an answer to pressing business/research questions such as “Why are users coming
to my product/services?” or “Why are they leaving or not coming?,” Traditional analytics alone can
tell you that quantitative activity is happening, but can’t give you any of the ‘whys’. That's where user
behavior analyses comes in play, with specific tools and techniques that help you get a full picture of
user behavior.
●
● Demographics information: who the users are?
●
● Retention: how regularly they use the product
●
● Engagement: how much time they spend in your
product
●
● Average Revenue: how much they spend
it’s all about events!
Drivers
What are bringing the users?
Hooks
What persuaded
the users to act?
Barriers
Where and why the users
stumble and abandon?
Benefits
● Get real, first-hand insight into what people are interested in, gravitating
towards, or ignoring
● Verify and validate the hypothesis
● How specific user needs change over time
● Investigate how specific flow and sections are performing
● Understand what your customers want and care about and subsequently
align the product and avail the opportunities
In a nutshell find answer to the core question of “users satisfaction”.
Machine Learning and UBA
Machine Learning (ML)
Arthur Samuel (1959):
“Machine Learning is a Science of getting the computers to learn, without
being explicitly programmed!”
Tom Mitchell (1998):
“A computer is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by
P, improves with experience E.”
ML and UBA
● Self adjusted dynamic behaviour pattern
● Find hidden pattern in user behaviour
● Escape postmortem rules and signatures (not a
viable solution)
● Detect unknown patterns and make it visible
and usable for unfolding users’ intent
● Behavioral profiles
Still needs expert knowledge and human intervention
ML ... (rocket science?)
● ML Tasks (given right data and right parameters and time they give results
with good accuracy)
○ Clustering
○ Regression
○ Classification
○ Anomaly detection
○ …
● Learning pattern from data (learn from seen data to predict unseen data)
○ Supervised learning
○ Unsupervised learning
○ Semi-supervised learning with tips from data and human
○ Reinforcement learning with performance feedback loop
○ ...
ML ...
● ML model (snapshot of all trained algorithms, parameters, features and
environments, the most time spent is on tuning params not developing)
○ Feature extraction and engineering (best set of features, requires domain expert
knowledge)
○ Model parameters (learned)
○ Model hyperparameters (architecture)
● ML features
○ Categorical
○ Statistical
○ Empirical
○ Continuous
○ Binary
○ ...
It's all about networks
● User Profile data - the interaction data
● Represent complex tree and network of users and items into a matrix
● Reverse search -
○ User profile and item profile - user features, item features and joint features using matrix
factorizations (collaborative filtering). Feed these to learning algorithm and let it predict the
possibility of user purchasing an item.
● Cluster users and cluster items and see the correlation of user clusters
with item clusters.
Data is at the centre
● Data Sources (all direct or indirect information available)
○ APIs
○ Logs
○ Databases
○ Log archives
○ Log management tools (e.g., humio, dynatrace and others)
○ Monitoring tools (e.g., prometheus)
○ …
● Data formats (json, csv, tsv, ...)
○ Syslog
○ Key-value sources
○ Distributed file formats like hadoop file system
○ ...
Data normalization
● Understand the data sources and formats
● Bring all formats to the same convention
● Find duplicate and missing fields
○ One action generates several entries
○ User do not enter a field in the application
UBA in Practice
IoT and UBA
IoT related sensors, e.g., energy industry, electric usage and generation
UBA & EL segways - annoying?
Trendiness and non-trendiness
Is UBA grey, yes it is :)
● GDPR or other regulatory compliance.
● Processing lots and lots of users data without user being aware of
● Honeypot products which squeeze users data and further cash it.
● ...
Anomaly Deviation Outliers
Anomaly (Statistical deviations?)
● Deviant, unusual data point, when data generating process behaves
unusually it results in outliers
● Outlier-detection is well-research both in statistical and data science
worlds.
● Static anomalies (analyzed individually)
○ Unusual action
○ Unusual context
● Temporal anomalies
○ Unusual time
○ Unexpected event
○ Huge events volume
○ ...
Anomaly categorization
The abnormal data points can be categorized further by whether the data being examined is a
set, a sequence, or a structure of inherently higher dimensionality, which leads to:
1. Point Anomaly:
a. One or more data points in the collection are anomalous
2. Context Anomaly:
a. A data point is anomalous with respect to its neighbors or to other data points which share
a context or some features
3. Collective Anomaly:
a. A collection of possibly similar data points that behaves differently from the rest of the
collection.
Supervised and Unsupervised
● Supervised
○ The training data is pre-labelled or characterized by domain experts, and the task of the anomaly
detection is merely involving measuring the variation of new data points from such models.
○
● Unsupervised
○ On the other hand are used when the data is not labelled or characterized, because it is a laborious task
and / or because of the lack of domain experts. So, there are no prior labels which conclusively
distinguish abnormal data points from the normal ones. These algorithms focus primarily on identifying
anomalies from a finite set of data points. The distance between two data points are depicted using for
instance Euclidean distance or any other distance measure (e.g., Mahalanobis distance, distance
between a point p and distribution d.):
Methods categorization
● Density-based
○ DBSCAN (non-parametric - outliers points that lie
alone in low-density regions)
○ LOF (local outlier factor, local deviation of a given
data point with respect to its neighbours.)
● Distance based
○ K-NN
○ K-Means
○ Distance to Regression hyperplane (if regression is
used)
● Parametric (assume some sort of “form” to the
data)
○ GMM (Gaussian mixture model)
○ Single class SVM
○ Extreme value theory
● Others (non-ml)
○ Statistical tests: Z-score
Spatial
proximity
Isolation Forest uses a different approach: instead of trying to build a model of normal instances, it explicitly
isolates anomalous points in the dataset.
Multivariate Anomaly detection techniques - e.g., PCA based.
Methods based on neural networks, artificial neural networks, e.g., autoencoders
Negative anomaly Fraud Money Laundering Malicious
Application to Banking
● Retail bank (consumer banking)
○ Credit card fraud
○ ML through retail Bank
● Private bank (banking to HNWIs)
○ Market abuse
○ ML through private bank
○ Other fraud
● Investment bank (serves govt, corporations & institutions)
○ Market abuse
○ ML through investment bank
○ Other fraud
As money laundering in these three types are different, therefore different
type of detections, approaches and different types of red flags.
Supervised vs Unsupervised
Automated fraud detection is inherently different problem then automated
money laundry detection and market abuse
● In CC fraud, we know what TP looks like. How? Customer tells us. Can use
supervised approach here.
● In Market abuse detection also we often know what TP look like. How?
Positive PnL and price moves can indicate this. Supervised!
● In Automated ML detection we do not know whether a data record is TP or
not. So it is unsupervised in nature (labelling the data is impractical).
Unsupervised - DIFFICULT?
The main issues with analytical approach to AML.
● SEVERE CLASS IMBALANCE - estimated less than 0.1%
● SEVERE CLASS OVERLAP - ML is mixed with legal financial activity
● CONCEPT DRIFT - ML techniques keeps on changing even by same
culprits org.
● UNCERTAINTY AROUND THE DATA MODEL - the confusion matrix
Automated AML
● It's not a simple anomaly detection problem its not really outlier detection
problem
○ Many patterns of transactions associated to ML differ little from legitimate transactions.
● Outliers are often hidden in the unusual local behaviour of
low-dimensional subspaces
● The choice of normal depends on subject matter
Learning from “negativity”
● Allow a few % of transactions which were tagged/classified as “bad”.
Double benefit from this approach:
○ First and foremost the learner will continue learning the new cases, instead of stopping
them and eventually “forgetting” them
○ And secondly and also importantly reduce the FP cases
Other approaches
Risk-based approaches. Focus on the group of high risk clients
Positive Anomalies = Good anomalies = Good opportunity!
Good anomalies (minority community of good anomalies)
Humans/machine have a tendency of classifying anomalies into “bad” anomalies while failing to
successfully classify “good” anomalies.
● Withdrawal of huge amounts at once or small amounts regularly
● Deposit/receiving of huge amounts and very small amount regularly:
○ The first step is to identify those transactions which are abnormal or outliers. Secondly, predict the occurrence of
such an event.
■ What causes this event to occur, the reasons:
■ Is it because of the time of the year, e.g., christmas etc
■ Is it because of the weather change?
■ Or totally random?
● Changes in the life’s situation:
○ New marital status
○ Having children
○ Needs new loan
○ Kids are old enough to live at their own
○ ...
Any other cases of good anomalies?
Machine learning
● Find right algorithm for the task at hand, i.e., ve anomaly detection (for
temporal anomalies, e.g., LSTM, which has feedback loop)
● Implement the algorithm and its environment
● Optimize the model for its best accuracy
Model parameters (engineering task?)
● Architecture (global or hyper params that define high level behavior of
neural network, in other words translating domain expert into numbers and
algorithms)
○ Layers number, Neurons number, Activation function, Loss function, Optimizer, ...
○
● Data (how the data is cooked)
○ Features, Knowledge base, Sequence length, Normalization, ...
○
● Training (how we see the results and tune accuracy, evaluations)
○ Epochs, Bach size, Threshold, Distance, Smoothing, ...
Conclusions & Discussions
● UBA is Grey, often
● UBA is good and gives the users perspective of the system.
○ Monitoring system from users point of view
○ Test of product as user sees it.
● RPA and UBA in general and anomaly detection in specific?
● Successful automatic anomaly detection starts with asking the right questions about what is
truly unusual and building a set of data models to mimic this: “exploring low dimensional
subspaces with flag (maybe red / green)”
● What could be good anomalies in the some specific use-case?
○ How can they be turned into opportunities?
○ Can ML help?
Thank You!
muhammad.norozi@kantega.no
References
● C.C.Aggarwal, Outlier Analysis, 2nd edition, Springer, 2017. http://charuaggarwal.net/outlierbook.pdf
● Kaggle: Anomaly detection - credit card fraud analysis:
https://www.kaggle.com/pavansanagapati/anomaly-detection-credit-card-fraud-analysis
● V.Veselovsky, Good and bad anomalies.
https://medium.com/@veniaminveselovsky/good-and-bad-anomalies-41f11ce5e6f

More Related Content

Similar to User behavior analyses JavaZone 2020

Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in MalaysiaAhmed Elmalla
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Introduction to Data Analytics.pptx
Introduction to Data Analytics.pptxIntroduction to Data Analytics.pptx
Introduction to Data Analytics.pptxDikshantSharma63
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headachesBenoît Rostykus
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - ReportAkanksha Gohil
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised LearningSara Hooker
 
Big Data and algorithms
Big Data and algorithmsBig Data and algorithms
Big Data and algorithmsmichele minno
 
Further enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learningFurther enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learningInstitute of Contemporary Sciences
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!To Sum It Up
 
Oxford Lectures Part 1
Oxford Lectures Part 1Oxford Lectures Part 1
Oxford Lectures Part 1Andrea Pasqua
 
Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTanvir Moin
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabsChetan Khatri
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummiesMichael Winer
 

Similar to User behavior analyses JavaZone 2020 (20)

Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Introduction to Data Analytics.pptx
Introduction to Data Analytics.pptxIntroduction to Data Analytics.pptx
Introduction to Data Analytics.pptx
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headaches
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised Learning
 
Big Data and algorithms
Big Data and algorithmsBig Data and algorithms
Big Data and algorithms
 
Further enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learningFurther enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learning
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!
 
Oxford Lectures Part 1
Oxford Lectures Part 1Oxford Lectures Part 1
Oxford Lectures Part 1
 
Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike Moin
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 

Recently uploaded

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 

Recently uploaded (20)

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 

User behavior analyses JavaZone 2020

  • 1. USER BEHAVIOUR ANALYSIS Why Users are doing what they are doing? How to make sense? Muhammad Ali Norozi
  • 2. Why Users are doing what they are doing? How to make sense? Tracking, collecting and assessing of user data and activities using ML. U SER B EHAVIOUR A NALYSIS Muhammad Ali Norozi
  • 3. Outline ● Introduction: User Behaviour Analysis (UBA) ● Generic UBA ○ ML and UBA ● Anomalous and outlier detection ● Negative anomaly ● Positive anomaly ● Discussion
  • 5. ● User behavior encompasses all the actions users take on a product: where and what they click on, how they move from one state to another (being active to inactive), where they stumble, and where they eventually drop off and leave. ● Tracking user behavior gives you an inside look at how people interact with your product (credit card for example) and what obstacles or hooks they experience in their journey as your customers. ● User behavior analyses (UBA) is a tailored method for collecting, combining, and analyzing quantitatively and qualitatively users data to understand how users interact with a product, and why. What?
  • 6. When you want an answer to pressing business/research questions such as “Why are users coming to my product/services?” or “Why are they leaving or not coming?,” Traditional analytics alone can tell you that quantitative activity is happening, but can’t give you any of the ‘whys’. That's where user behavior analyses comes in play, with specific tools and techniques that help you get a full picture of user behavior. ●
  • 7. ● Demographics information: who the users are? ● ● Retention: how regularly they use the product ● ● Engagement: how much time they spend in your product ● ● Average Revenue: how much they spend
  • 8. it’s all about events! Drivers What are bringing the users? Hooks What persuaded the users to act? Barriers Where and why the users stumble and abandon?
  • 9. Benefits ● Get real, first-hand insight into what people are interested in, gravitating towards, or ignoring ● Verify and validate the hypothesis ● How specific user needs change over time ● Investigate how specific flow and sections are performing ● Understand what your customers want and care about and subsequently align the product and avail the opportunities In a nutshell find answer to the core question of “users satisfaction”.
  • 11. Machine Learning (ML) Arthur Samuel (1959): “Machine Learning is a Science of getting the computers to learn, without being explicitly programmed!” Tom Mitchell (1998): “A computer is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”
  • 12. ML and UBA ● Self adjusted dynamic behaviour pattern ● Find hidden pattern in user behaviour ● Escape postmortem rules and signatures (not a viable solution) ● Detect unknown patterns and make it visible and usable for unfolding users’ intent ● Behavioral profiles Still needs expert knowledge and human intervention
  • 13. ML ... (rocket science?) ● ML Tasks (given right data and right parameters and time they give results with good accuracy) ○ Clustering ○ Regression ○ Classification ○ Anomaly detection ○ … ● Learning pattern from data (learn from seen data to predict unseen data) ○ Supervised learning ○ Unsupervised learning ○ Semi-supervised learning with tips from data and human ○ Reinforcement learning with performance feedback loop ○ ...
  • 14. ML ... ● ML model (snapshot of all trained algorithms, parameters, features and environments, the most time spent is on tuning params not developing) ○ Feature extraction and engineering (best set of features, requires domain expert knowledge) ○ Model parameters (learned) ○ Model hyperparameters (architecture) ● ML features ○ Categorical ○ Statistical ○ Empirical ○ Continuous ○ Binary ○ ...
  • 15. It's all about networks ● User Profile data - the interaction data ● Represent complex tree and network of users and items into a matrix ● Reverse search - ○ User profile and item profile - user features, item features and joint features using matrix factorizations (collaborative filtering). Feed these to learning algorithm and let it predict the possibility of user purchasing an item. ● Cluster users and cluster items and see the correlation of user clusters with item clusters.
  • 16. Data is at the centre ● Data Sources (all direct or indirect information available) ○ APIs ○ Logs ○ Databases ○ Log archives ○ Log management tools (e.g., humio, dynatrace and others) ○ Monitoring tools (e.g., prometheus) ○ … ● Data formats (json, csv, tsv, ...) ○ Syslog ○ Key-value sources ○ Distributed file formats like hadoop file system ○ ...
  • 17. Data normalization ● Understand the data sources and formats ● Bring all formats to the same convention ● Find duplicate and missing fields ○ One action generates several entries ○ User do not enter a field in the application
  • 19. IoT and UBA IoT related sensors, e.g., energy industry, electric usage and generation
  • 20. UBA & EL segways - annoying?
  • 22. Is UBA grey, yes it is :) ● GDPR or other regulatory compliance. ● Processing lots and lots of users data without user being aware of ● Honeypot products which squeeze users data and further cash it. ● ...
  • 24. Anomaly (Statistical deviations?) ● Deviant, unusual data point, when data generating process behaves unusually it results in outliers ● Outlier-detection is well-research both in statistical and data science worlds. ● Static anomalies (analyzed individually) ○ Unusual action ○ Unusual context ● Temporal anomalies ○ Unusual time ○ Unexpected event ○ Huge events volume ○ ...
  • 25. Anomaly categorization The abnormal data points can be categorized further by whether the data being examined is a set, a sequence, or a structure of inherently higher dimensionality, which leads to: 1. Point Anomaly: a. One or more data points in the collection are anomalous 2. Context Anomaly: a. A data point is anomalous with respect to its neighbors or to other data points which share a context or some features 3. Collective Anomaly: a. A collection of possibly similar data points that behaves differently from the rest of the collection.
  • 26. Supervised and Unsupervised ● Supervised ○ The training data is pre-labelled or characterized by domain experts, and the task of the anomaly detection is merely involving measuring the variation of new data points from such models. ○ ● Unsupervised ○ On the other hand are used when the data is not labelled or characterized, because it is a laborious task and / or because of the lack of domain experts. So, there are no prior labels which conclusively distinguish abnormal data points from the normal ones. These algorithms focus primarily on identifying anomalies from a finite set of data points. The distance between two data points are depicted using for instance Euclidean distance or any other distance measure (e.g., Mahalanobis distance, distance between a point p and distribution d.):
  • 27. Methods categorization ● Density-based ○ DBSCAN (non-parametric - outliers points that lie alone in low-density regions) ○ LOF (local outlier factor, local deviation of a given data point with respect to its neighbours.) ● Distance based ○ K-NN ○ K-Means ○ Distance to Regression hyperplane (if regression is used) ● Parametric (assume some sort of “form” to the data) ○ GMM (Gaussian mixture model) ○ Single class SVM ○ Extreme value theory ● Others (non-ml) ○ Statistical tests: Z-score Spatial proximity
  • 28. Isolation Forest uses a different approach: instead of trying to build a model of normal instances, it explicitly isolates anomalous points in the dataset. Multivariate Anomaly detection techniques - e.g., PCA based. Methods based on neural networks, artificial neural networks, e.g., autoencoders
  • 29. Negative anomaly Fraud Money Laundering Malicious
  • 30. Application to Banking ● Retail bank (consumer banking) ○ Credit card fraud ○ ML through retail Bank ● Private bank (banking to HNWIs) ○ Market abuse ○ ML through private bank ○ Other fraud ● Investment bank (serves govt, corporations & institutions) ○ Market abuse ○ ML through investment bank ○ Other fraud As money laundering in these three types are different, therefore different type of detections, approaches and different types of red flags.
  • 31. Supervised vs Unsupervised Automated fraud detection is inherently different problem then automated money laundry detection and market abuse ● In CC fraud, we know what TP looks like. How? Customer tells us. Can use supervised approach here. ● In Market abuse detection also we often know what TP look like. How? Positive PnL and price moves can indicate this. Supervised! ● In Automated ML detection we do not know whether a data record is TP or not. So it is unsupervised in nature (labelling the data is impractical).
  • 32. Unsupervised - DIFFICULT? The main issues with analytical approach to AML. ● SEVERE CLASS IMBALANCE - estimated less than 0.1% ● SEVERE CLASS OVERLAP - ML is mixed with legal financial activity ● CONCEPT DRIFT - ML techniques keeps on changing even by same culprits org. ● UNCERTAINTY AROUND THE DATA MODEL - the confusion matrix
  • 33. Automated AML ● It's not a simple anomaly detection problem its not really outlier detection problem ○ Many patterns of transactions associated to ML differ little from legitimate transactions. ● Outliers are often hidden in the unusual local behaviour of low-dimensional subspaces ● The choice of normal depends on subject matter
  • 34. Learning from “negativity” ● Allow a few % of transactions which were tagged/classified as “bad”. Double benefit from this approach: ○ First and foremost the learner will continue learning the new cases, instead of stopping them and eventually “forgetting” them ○ And secondly and also importantly reduce the FP cases
  • 35. Other approaches Risk-based approaches. Focus on the group of high risk clients
  • 36. Positive Anomalies = Good anomalies = Good opportunity!
  • 37. Good anomalies (minority community of good anomalies) Humans/machine have a tendency of classifying anomalies into “bad” anomalies while failing to successfully classify “good” anomalies. ● Withdrawal of huge amounts at once or small amounts regularly ● Deposit/receiving of huge amounts and very small amount regularly: ○ The first step is to identify those transactions which are abnormal or outliers. Secondly, predict the occurrence of such an event. ■ What causes this event to occur, the reasons: ■ Is it because of the time of the year, e.g., christmas etc ■ Is it because of the weather change? ■ Or totally random? ● Changes in the life’s situation: ○ New marital status ○ Having children ○ Needs new loan ○ Kids are old enough to live at their own ○ ... Any other cases of good anomalies?
  • 38. Machine learning ● Find right algorithm for the task at hand, i.e., ve anomaly detection (for temporal anomalies, e.g., LSTM, which has feedback loop) ● Implement the algorithm and its environment ● Optimize the model for its best accuracy
  • 39. Model parameters (engineering task?) ● Architecture (global or hyper params that define high level behavior of neural network, in other words translating domain expert into numbers and algorithms) ○ Layers number, Neurons number, Activation function, Loss function, Optimizer, ... ○ ● Data (how the data is cooked) ○ Features, Knowledge base, Sequence length, Normalization, ... ○ ● Training (how we see the results and tune accuracy, evaluations) ○ Epochs, Bach size, Threshold, Distance, Smoothing, ...
  • 40. Conclusions & Discussions ● UBA is Grey, often ● UBA is good and gives the users perspective of the system. ○ Monitoring system from users point of view ○ Test of product as user sees it. ● RPA and UBA in general and anomaly detection in specific? ● Successful automatic anomaly detection starts with asking the right questions about what is truly unusual and building a set of data models to mimic this: “exploring low dimensional subspaces with flag (maybe red / green)” ● What could be good anomalies in the some specific use-case? ○ How can they be turned into opportunities? ○ Can ML help?
  • 42. References ● C.C.Aggarwal, Outlier Analysis, 2nd edition, Springer, 2017. http://charuaggarwal.net/outlierbook.pdf ● Kaggle: Anomaly detection - credit card fraud analysis: https://www.kaggle.com/pavansanagapati/anomaly-detection-credit-card-fraud-analysis ● V.Veselovsky, Good and bad anomalies. https://medium.com/@veniaminveselovsky/good-and-bad-anomalies-41f11ce5e6f