SlideShare a Scribd company logo
1 of 45
Download to read offline
1st edition | July 8-11, 2019
BigML, Inc #DutchMLSchool 2
Association Rules and
Topic Modeling
Charles Parker
VP, Machine Learning Algorithms
BigML, Inc #DutchMLSchool
Association Rules
3
BigML, Inc #DutchMLSchool
Association Discovery
4
An unsupervised learning technique
• No labels necessary
• Useful for data discovery
Finds "significant" correlations/associations/relations
• Shopping cart: Coffee and sugar
• Medical: High plasma glucose and diabetes
Expresses them as "if then rules"
• If "antecedent" then "consequent"
BigML, Inc #DutchMLSchool
Review of methods: clustering
5
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
BigML, Inc #DutchMLSchool
Review of methods: clustering
6
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
similar
BigML, Inc #DutchMLSchool
Review: anomaly detection
7
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
BigML, Inc #DutchMLSchool
Review: anomaly detection
8
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
anomaly
BigML, Inc #DutchMLSchool
Association Discovery
9
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
BigML, Inc #DutchMLSchool
Association Discovery
10
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
{customer = Bob, account = 3421}
BigML, Inc #DutchMLSchool
Association Discovery
11
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
zip = 46140{customer = Bob, account = 3421}
BigML, Inc #DutchMLSchool
Association Discovery
12
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
zip = 46140{customer = Bob, account = 3421}
{class = gas}
BigML, Inc #DutchMLSchool
Association Discovery
13
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
zip = 46140
amount < 100
{customer = Bob, account = 3421}
{class = gas}
BigML, Inc #DutchMLSchool
Association Discovery
14
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
zip = 46140
amount < 100
Rules:
Antecedent Consequent
{customer = Bob, account = 3421}
{class = gas}
BigML, Inc #DutchMLSchool
Use Cases
15
• Data Discovery: how do instances relate?
• Market Basket Analysis: Items that go together
• Behaviors that occur together
• Web usage patterns
• Intrusion detection
• Fraud detection
• Medical risk factors
BigML, Inc #DutchMLSchool
Association Metrics
16
• Coverage
• Support
• Confidence
• Lift
• Leverage
Associations between grocery items
BigML, Inc #DutchMLSchool
Association Metrics: coverage
17
Coverage
Percentage of instances
which match antecedent “A”
Instances
A
C
BigML, Inc #DutchMLSchool
Association Metrics: support
18
Instances
A
C
Support
Percentage of instances
which match antecedent “A”
and Consequent “C”
BigML, Inc #DutchMLSchool
Confidence
Percentage of instances in
the antecedent which also
contain the consequent.
Association Metrics: confidence
19
Coverage
Support
Instances
A
C
BigML, Inc #DutchMLSchool
Association Metrics: confidence
20
C
Instances
A
C
A
Instances
C
Instances
A
Instances
A
C
0% 100%
Instances
A
C
Confidence
A never 

implies C
A sometimes 

implies C
A always 

implies C
A >> C A = C A << C
BigML, Inc #DutchMLSchool
Association Metrics: lift
21
Lift
Ratio of observed support
to support if A and C were
statistically independent.
Support == Confidence
p(A) * p(C) p(C)
Independent
A
C
C
Observed
A
Problem:
if p(C) is "small" then…
lift may be large.
BigML, Inc #DutchMLSchool
Association Metrics: lift
22
C
Observed
A
Observed
A
C
< 1 > 1
Independent
A
C
Lift = 1
Negative
Correlation
No Correlation
Positive
Correlation
Independent
A
C
Independent
A
C
Observed
A
C
BigML, Inc #DutchMLSchool
Association Metrics: leverage
23
Leverage
Difference of observed
support and support if A
and C were statistically
independent. 

Support - [ p(A) * p(C) ]
Independent
A
C
C
Observed
A
BigML, Inc #DutchMLSchool
Association Metrics: leverage
24
C
Observed
A
Observed
A
C
< 0 > 0
Independent
A
C
Leverage = 0
Negative
Correlation
No Correlation
Positive
Correlation
Independent
A
C
Independent
A
C
Observed
A
C
-1…
BigML, Inc #DutchMLSchool
Items Type
25
itemscoffee, sugar, milk, honey,
dish soap, bread
items
• Canonical example: shopping cart contents
• Single feature describing a list of items
• Each item separated by a comma (default)
BigML, Inc #DutchMLSchool
Use Cases
26
GOAL: Discover “interesting” rules about what store items
are typically purchased together.
• Dataset of 9,834 grocery cart transactions
• Each row is a list of all items in a cart at checkout
BigML, Inc #DutchMLSchool
Association Demo
27
BigML, Inc #DutchMLSchool
Summary
28
• Unsupervised learning technique for discovering
interesting associations
• Outputs antecedent/consequent rules
• Metrics: Support / Coverage / Confidence / Lift / Leverage
• Useful for “items” type and market basket analysis
• Applicable to understanding clusters and anomaly detectors
BigML, Inc #DutchMLSchool
Topic Modeling
29
BigML, Inc #DutchMLSchool
What is Topic Modeling?
30
• Unsupervised algorithm
• Learns only from text fields
• Finds hidden topics that
model the text
Text Fields
• How is this different from the Text Analysis
that BigML already offers?
• What does it output and how do we use it?
Questions:
BigML, Inc #DutchMLSchool
What is Topic Modeling?
31
• Finds topics in your text fields
• A topic is a distribution over terms
• Terms with high probability in the same topic often occur
together in the same document
• Topics often correspond to real-world things that the
document may be “about” (e.g., sports, cooking,
technology)
• Each document is “about” one or more topics
• Usually each document is only about one or two topics
• But in practice we assign a probability to every topic for
every document
BigML, Inc #DutchMLSchool
Text Analysis
32
Be not afraid of greatness:
some are born great, some
achieve greatness, and
some have greatness
thrust upon 'em.
great: appears 4 times
1. Stem Words -> Tokens
2. Remove tokens that
occur too often
3. Remove tokens that do
not occur often enough
4. Count occurrences of
remaining “interesting”
tokens
BigML, Inc #DutchMLSchool
Text Analysis
33
Be not afraid of greatness:
some are born great, some achieve
greatness, and some have greatness
thrust upon ‘em.
… great afraid born achieve … …
… 4 1 1 1 … …
… … … … … … …
Model
The token “great” 

occurs more than 3 times
The token “afraid” 

occurs no more than once
BigML, Inc #DutchMLSchool 34
Text Analysis
BigML, Inc #DutchMLSchool
Hodor!
35
BigML, Inc #DutchMLSchool
Text Analysis vs. Topic Modeling
36
Text Topic Model
Creates thousands of
hidden token counts
Token counts are
independently
uninteresting
No semantic importance
Co-occurrence limited to
consecutive n-grams
Creates tens of topics
that model the text
Topics are independently
interesting
Semantic meaning
extracted
Topics indicate broader
co-occurrences
BigML, Inc #DutchMLSchool
Generating Documents
37
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
shoe asteroid
flashlight
pizza…
plate giraffe
purple jump…
Be not afraid
of greatness: 

some are born
great, some
achieve 

greatness…
• "Machine" that generates a random word with equal
probability with each pull.
• Pull random number of times to generate a document.
• All documents can be generated, but most are nonsense.
word probability
shoe ϵ
asteroid ϵ
flashlight ϵ
pizza ϵ
… ϵ
BigML, Inc #DutchMLSchool
Topic Model
38
• Written documents have meaning - one way to
describe meaning is to assign a topic.
• For our random machine, the topic can be thought
of as increasing the probability of certain words.
Intuition:
Topic: travel
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
airplane
passport pizza
…
word probability
travel 23,55 %
airplane 2,33 %
mars 0,003 %
mantle ϵ
… ϵ
Topic: space
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
mars quasar
lightyear soda
word probability
space 38,94 %
airplane ϵ
mars 13,43 %
mantle 0,05 %
… ϵ
BigML, Inc #DutchMLSchool
Topic Model
39
plate giraffe
purple
jump…
Topic: "1"
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
travel 23,55 %
airplane 2,33 %
mars 0,003 %
mantle ϵ
… ϵ
Topic: "k"
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
shoe 12,12 %
coffee 3,39 %
telephone 13,43 %
paper 4,11 %
… ϵ
…Topic: "2"
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
space 38,94 %
airplane ϵ
mars 13,43 %
mantle 0,05 %
… ϵ
airplane
passport
pizza …
plate giraffe
purple
jump…
• Each text field in a row is concatenated into a document
• The documents are analyzed to generate "k" related topics
• Each topic is represented by a distribution of term
probabilities
BigML, Inc #DutchMLSchool 40
Training Topic Models
BigML, Inc #DutchMLSchool
Topic Distribution
41
• Any given document is likely a mixture of the
modeled topics…
• This can be represented as a distribution of topic
probabilities
Intuition:
Will 2020 be
the year that
humans will
embrace
space
exploration
and finally
travel to Mars?
Topic: travel
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
travel 23,55 %
airplane 2,33 %
mars 0,003 %
mantle ϵ
… ϵ
11%
Topic: space
cat shoe zebra
ball tree jump
pen asteroid
cable box step
cabinet yellow
plate flashlight…
word probability
space 38,94 %
airplane ϵ
mars 13,43 %
mantle 0,05 %
… ϵ
89%
BigML, Inc #DutchMLSchool 42
Topic Distributions
BigML, Inc #DutchMLSchool
Prediction?
43
Unlabelled Data
Centroid Label
Unlabelled Data
topic 1
prob
topic 3
prob
topic k
prob
Clustering Batch Centroid
Topic Model
Text Fields
Batch Topic Distribution
…
BigML, Inc #DutchMLSchool
Topic Model Use Cases
44
• As a preprocessor for other techniques
• Building better models
• Bootstrapping categories for classification
• Recommendation
• Discovery in large, heterogeneous text datasets
Co-organized by: Sponsor:
Business Partners:

More Related Content

Similar to DutchMLSchool 1st Edition Topic Modeling

MLSEV. Association Discovery and Topic Modeling
MLSEV. Association Discovery and Topic ModelingMLSEV. Association Discovery and Topic Modeling
MLSEV. Association Discovery and Topic ModelingBigML, Inc
 
VSSML18. Association Discovery and Anomaly Detection
VSSML18. Association Discovery and Anomaly DetectionVSSML18. Association Discovery and Anomaly Detection
VSSML18. Association Discovery and Anomaly DetectionBigML, Inc
 
BSSML17 - Association Discovery
BSSML17 - Association DiscoveryBSSML17 - Association Discovery
BSSML17 - Association DiscoveryBigML, Inc
 
VSSML17 L4. Association Discovery and Latent Dirichlet Allocation
VSSML17 L4. Association Discovery and Latent Dirichlet AllocationVSSML17 L4. Association Discovery and Latent Dirichlet Allocation
VSSML17 L4. Association Discovery and Latent Dirichlet AllocationBigML, Inc
 
BigML Education - Supervised vs Unsupervised
BigML Education - Supervised vs UnsupervisedBigML Education - Supervised vs Unsupervised
BigML Education - Supervised vs UnsupervisedBigML, Inc
 
BigML Fall 2015 Release
BigML Fall 2015 ReleaseBigML Fall 2015 Release
BigML Fall 2015 ReleaseBigML, Inc
 
VSSML17 L3. Clusters and Anomaly Detection
VSSML17 L3. Clusters and Anomaly DetectionVSSML17 L3. Clusters and Anomaly Detection
VSSML17 L3. Clusters and Anomaly DetectionBigML, Inc
 
DutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesDutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBigML, Inc
 
BSSML17 - Anomaly Detection
BSSML17 - Anomaly DetectionBSSML17 - Anomaly Detection
BSSML17 - Anomaly DetectionBigML, Inc
 
DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesBigML, Inc
 
The Value Of Competitor Analysis On Social Media
The Value Of Competitor Analysis On Social MediaThe Value Of Competitor Analysis On Social Media
The Value Of Competitor Analysis On Social MediaBen Harper
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationBigML, Inc
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
 
Time collapsingmegaconference 9successaccelerationstrategies10-12-16
Time collapsingmegaconference 9successaccelerationstrategies10-12-16Time collapsingmegaconference 9successaccelerationstrategies10-12-16
Time collapsingmegaconference 9successaccelerationstrategies10-12-16Roland Frasier
 
Using Google to beat Amazon’s algorithms and Vice Versa | #AmafestUK October...
Using Google to beat Amazon’s algorithms and Vice Versa  | #AmafestUK October...Using Google to beat Amazon’s algorithms and Vice Versa  | #AmafestUK October...
Using Google to beat Amazon’s algorithms and Vice Versa | #AmafestUK October...#AmafestUK - Amazon Sellers Conference
 
EthUX - ethics and ux
EthUX - ethics and uxEthUX - ethics and ux
EthUX - ethics and uxEric Reiss
 
BigML Education - Association Discovery
BigML Education - Association DiscoveryBigML Education - Association Discovery
BigML Education - Association DiscoveryBigML, Inc
 
MCG - How to make Marketing Sexy
MCG - How to make Marketing SexyMCG - How to make Marketing Sexy
MCG - How to make Marketing Sexybschklar
 

Similar to DutchMLSchool 1st Edition Topic Modeling (20)

MLSEV. Association Discovery and Topic Modeling
MLSEV. Association Discovery and Topic ModelingMLSEV. Association Discovery and Topic Modeling
MLSEV. Association Discovery and Topic Modeling
 
VSSML18. Association Discovery and Anomaly Detection
VSSML18. Association Discovery and Anomaly DetectionVSSML18. Association Discovery and Anomaly Detection
VSSML18. Association Discovery and Anomaly Detection
 
BSSML17 - Association Discovery
BSSML17 - Association DiscoveryBSSML17 - Association Discovery
BSSML17 - Association Discovery
 
VSSML17 L4. Association Discovery and Latent Dirichlet Allocation
VSSML17 L4. Association Discovery and Latent Dirichlet AllocationVSSML17 L4. Association Discovery and Latent Dirichlet Allocation
VSSML17 L4. Association Discovery and Latent Dirichlet Allocation
 
BigML Education - Supervised vs Unsupervised
BigML Education - Supervised vs UnsupervisedBigML Education - Supervised vs Unsupervised
BigML Education - Supervised vs Unsupervised
 
BigML Fall 2015 Release
BigML Fall 2015 ReleaseBigML Fall 2015 Release
BigML Fall 2015 Release
 
VSSML17 L3. Clusters and Anomaly Detection
VSSML17 L3. Clusters and Anomaly DetectionVSSML17 L3. Clusters and Anomaly Detection
VSSML17 L3. Clusters and Anomaly Detection
 
DutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesDutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
 
BSSML17 - Anomaly Detection
BSSML17 - Anomaly DetectionBSSML17 - Anomaly Detection
BSSML17 - Anomaly Detection
 
DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and Ensembles
 
The Value Of Competitor Analysis On Social Media
The Value Of Competitor Analysis On Social MediaThe Value Of Competitor Analysis On Social Media
The Value Of Competitor Analysis On Social Media
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet Allocation
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 
Time collapsingmegaconference 9successaccelerationstrategies10-12-16
Time collapsingmegaconference 9successaccelerationstrategies10-12-16Time collapsingmegaconference 9successaccelerationstrategies10-12-16
Time collapsingmegaconference 9successaccelerationstrategies10-12-16
 
Using Google to beat Amazon’s algorithms and Vice Versa | #AmafestUK October...
Using Google to beat Amazon’s algorithms and Vice Versa  | #AmafestUK October...Using Google to beat Amazon’s algorithms and Vice Versa  | #AmafestUK October...
Using Google to beat Amazon’s algorithms and Vice Versa | #AmafestUK October...
 
EthUX - ethics and ux
EthUX - ethics and uxEthUX - ethics and ux
EthUX - ethics and ux
 
BigML Education - Association Discovery
BigML Education - Association DiscoveryBigML Education - Association Discovery
BigML Education - Association Discovery
 
MCG - How to make Marketing Sexy
MCG - How to make Marketing SexyMCG - How to make Marketing Sexy
MCG - How to make Marketing Sexy
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 

Recently uploaded

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

DutchMLSchool 1st Edition Topic Modeling

  • 1. 1st edition | July 8-11, 2019
  • 2. BigML, Inc #DutchMLSchool 2 Association Rules and Topic Modeling Charles Parker VP, Machine Learning Algorithms
  • 4. BigML, Inc #DutchMLSchool Association Discovery 4 An unsupervised learning technique • No labels necessary • Useful for data discovery Finds "significant" correlations/associations/relations • Shopping cart: Coffee and sugar • Medical: High plasma glucose and diabetes Expresses them as "if then rules" • If "antecedent" then "consequent"
  • 5. BigML, Inc #DutchMLSchool Review of methods: clustering 5 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51
  • 6. BigML, Inc #DutchMLSchool Review of methods: clustering 6 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 similar
  • 7. BigML, Inc #DutchMLSchool Review: anomaly detection 7 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51
  • 8. BigML, Inc #DutchMLSchool Review: anomaly detection 8 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 anomaly
  • 9. BigML, Inc #DutchMLSchool Association Discovery 9 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51
  • 10. BigML, Inc #DutchMLSchool Association Discovery 10 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 {customer = Bob, account = 3421}
  • 11. BigML, Inc #DutchMLSchool Association Discovery 11 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 zip = 46140{customer = Bob, account = 3421}
  • 12. BigML, Inc #DutchMLSchool Association Discovery 12 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 zip = 46140{customer = Bob, account = 3421} {class = gas}
  • 13. BigML, Inc #DutchMLSchool Association Discovery 13 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 zip = 46140 amount < 100 {customer = Bob, account = 3421} {class = gas}
  • 14. BigML, Inc #DutchMLSchool Association Discovery 14 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 zip = 46140 amount < 100 Rules: Antecedent Consequent {customer = Bob, account = 3421} {class = gas}
  • 15. BigML, Inc #DutchMLSchool Use Cases 15 • Data Discovery: how do instances relate? • Market Basket Analysis: Items that go together • Behaviors that occur together • Web usage patterns • Intrusion detection • Fraud detection • Medical risk factors
  • 16. BigML, Inc #DutchMLSchool Association Metrics 16 • Coverage • Support • Confidence • Lift • Leverage Associations between grocery items
  • 17. BigML, Inc #DutchMLSchool Association Metrics: coverage 17 Coverage Percentage of instances which match antecedent “A” Instances A C
  • 18. BigML, Inc #DutchMLSchool Association Metrics: support 18 Instances A C Support Percentage of instances which match antecedent “A” and Consequent “C”
  • 19. BigML, Inc #DutchMLSchool Confidence Percentage of instances in the antecedent which also contain the consequent. Association Metrics: confidence 19 Coverage Support Instances A C
  • 20. BigML, Inc #DutchMLSchool Association Metrics: confidence 20 C Instances A C A Instances C Instances A Instances A C 0% 100% Instances A C Confidence A never implies C A sometimes implies C A always implies C A >> C A = C A << C
  • 21. BigML, Inc #DutchMLSchool Association Metrics: lift 21 Lift Ratio of observed support to support if A and C were statistically independent. Support == Confidence p(A) * p(C) p(C) Independent A C C Observed A Problem: if p(C) is "small" then… lift may be large.
  • 22. BigML, Inc #DutchMLSchool Association Metrics: lift 22 C Observed A Observed A C < 1 > 1 Independent A C Lift = 1 Negative Correlation No Correlation Positive Correlation Independent A C Independent A C Observed A C
  • 23. BigML, Inc #DutchMLSchool Association Metrics: leverage 23 Leverage Difference of observed support and support if A and C were statistically independent. Support - [ p(A) * p(C) ] Independent A C C Observed A
  • 24. BigML, Inc #DutchMLSchool Association Metrics: leverage 24 C Observed A Observed A C < 0 > 0 Independent A C Leverage = 0 Negative Correlation No Correlation Positive Correlation Independent A C Independent A C Observed A C -1…
  • 25. BigML, Inc #DutchMLSchool Items Type 25 itemscoffee, sugar, milk, honey, dish soap, bread items • Canonical example: shopping cart contents • Single feature describing a list of items • Each item separated by a comma (default)
  • 26. BigML, Inc #DutchMLSchool Use Cases 26 GOAL: Discover “interesting” rules about what store items are typically purchased together. • Dataset of 9,834 grocery cart transactions • Each row is a list of all items in a cart at checkout
  • 28. BigML, Inc #DutchMLSchool Summary 28 • Unsupervised learning technique for discovering interesting associations • Outputs antecedent/consequent rules • Metrics: Support / Coverage / Confidence / Lift / Leverage • Useful for “items” type and market basket analysis • Applicable to understanding clusters and anomaly detectors
  • 30. BigML, Inc #DutchMLSchool What is Topic Modeling? 30 • Unsupervised algorithm • Learns only from text fields • Finds hidden topics that model the text Text Fields • How is this different from the Text Analysis that BigML already offers? • What does it output and how do we use it? Questions:
  • 31. BigML, Inc #DutchMLSchool What is Topic Modeling? 31 • Finds topics in your text fields • A topic is a distribution over terms • Terms with high probability in the same topic often occur together in the same document • Topics often correspond to real-world things that the document may be “about” (e.g., sports, cooking, technology) • Each document is “about” one or more topics • Usually each document is only about one or two topics • But in practice we assign a probability to every topic for every document
  • 32. BigML, Inc #DutchMLSchool Text Analysis 32 Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon 'em. great: appears 4 times 1. Stem Words -> Tokens 2. Remove tokens that occur too often 3. Remove tokens that do not occur often enough 4. Count occurrences of remaining “interesting” tokens
  • 33. BigML, Inc #DutchMLSchool Text Analysis 33 Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon ‘em. … great afraid born achieve … … … 4 1 1 1 … … … … … … … … … Model The token “great” occurs more than 3 times The token “afraid” occurs no more than once
  • 34. BigML, Inc #DutchMLSchool 34 Text Analysis
  • 36. BigML, Inc #DutchMLSchool Text Analysis vs. Topic Modeling 36 Text Topic Model Creates thousands of hidden token counts Token counts are independently uninteresting No semantic importance Co-occurrence limited to consecutive n-grams Creates tens of topics that model the text Topics are independently interesting Semantic meaning extracted Topics indicate broader co-occurrences
  • 37. BigML, Inc #DutchMLSchool Generating Documents 37 cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… shoe asteroid flashlight pizza… plate giraffe purple jump… Be not afraid of greatness: some are born great, some achieve greatness… • "Machine" that generates a random word with equal probability with each pull. • Pull random number of times to generate a document. • All documents can be generated, but most are nonsense. word probability shoe ϵ asteroid ϵ flashlight ϵ pizza ϵ … ϵ
  • 38. BigML, Inc #DutchMLSchool Topic Model 38 • Written documents have meaning - one way to describe meaning is to assign a topic. • For our random machine, the topic can be thought of as increasing the probability of certain words. Intuition: Topic: travel cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… airplane passport pizza … word probability travel 23,55 % airplane 2,33 % mars 0,003 % mantle ϵ … ϵ Topic: space cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… mars quasar lightyear soda word probability space 38,94 % airplane ϵ mars 13,43 % mantle 0,05 % … ϵ
  • 39. BigML, Inc #DutchMLSchool Topic Model 39 plate giraffe purple jump… Topic: "1" cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability travel 23,55 % airplane 2,33 % mars 0,003 % mantle ϵ … ϵ Topic: "k" cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability shoe 12,12 % coffee 3,39 % telephone 13,43 % paper 4,11 % … ϵ …Topic: "2" cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability space 38,94 % airplane ϵ mars 13,43 % mantle 0,05 % … ϵ airplane passport pizza … plate giraffe purple jump… • Each text field in a row is concatenated into a document • The documents are analyzed to generate "k" related topics • Each topic is represented by a distribution of term probabilities
  • 40. BigML, Inc #DutchMLSchool 40 Training Topic Models
  • 41. BigML, Inc #DutchMLSchool Topic Distribution 41 • Any given document is likely a mixture of the modeled topics… • This can be represented as a distribution of topic probabilities Intuition: Will 2020 be the year that humans will embrace space exploration and finally travel to Mars? Topic: travel cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability travel 23,55 % airplane 2,33 % mars 0,003 % mantle ϵ … ϵ 11% Topic: space cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… word probability space 38,94 % airplane ϵ mars 13,43 % mantle 0,05 % … ϵ 89%
  • 42. BigML, Inc #DutchMLSchool 42 Topic Distributions
  • 43. BigML, Inc #DutchMLSchool Prediction? 43 Unlabelled Data Centroid Label Unlabelled Data topic 1 prob topic 3 prob topic k prob Clustering Batch Centroid Topic Model Text Fields Batch Topic Distribution …
  • 44. BigML, Inc #DutchMLSchool Topic Model Use Cases 44 • As a preprocessor for other techniques • Building better models • Bootstrapping categories for classification • Recommendation • Discovery in large, heterogeneous text datasets