SlideShare a Scribd company logo
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Autonomous Data Warehouse
Oracle Machine Learning
Oracle Analytics Cloud
Market Basket Analysis Revisited using SQL
Pattern Matching
Shankar Somayajula
May 29th, 2019
Confidential – Oracle Internal
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Patterns Everywhere
• Typical MBA
• MBA Revisted – Motivations & Design
• SQL Pattern Matching Intro/Usage
• Issues/ Challenges
• Demo / Screenshots
• Feedback
3
Agenda
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Finding Patterns in Data
Typical use cases in today’s world of fast exploration of data
Financial
Services
Money
Laundering
Fraud
Tracking Stock
Market
Law
&
Order
Monitoring
Suspicious
Activities
Retail
Returns
FraudBuying
Patterns
Session-
ization
Telcos
Money
Laundering
SIM Card
Fraud
Call
Quality
Utilities
Network
Analysis
Fraud
Unusual
Usage
Lots of
Data
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Typical Pattern Matching Use Cases
Input Data Pattern Result
Sessionization Weblogs continuous clicks by same
user
Generate reports on number of distinct
sessions, average page views per session, etc
Fraud Credit card
transactions
two transactions in different
locations within a short
period of time
Find cases in which a credit card may have been
used fraudulently since a physical person cannot
be in two places at once
In-game
purchases
Games logs events leading up to an in-
game purchase
Detect common sequences of event that results
in an in-game purchase
Fraud (mobiles) CDR logs SIM card being used in
multiple handsets
Flag individual SIM cards being used by multiple
handsets within a specified time period
Stock market
analysis
Ticker logs Track possible fraudulent
linked patterns of behavior
Track known patterns of behavior such as head
and shoulders, triangles, channels and wedges
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Typical Pattern Matching Use Cases
Input Data Pattern Result
Auditing/Complia
nce
Application
logs
Analyze changes to secure
customer data
Find instances where operator has made
suspect modifications to secure client data
Money
laundering
Transaction
logs
Search for small transfers
within a time window
following by large transfer
within “x” days of last small
transfer
Detect suspicious money transfer pattern for an
account and report account, date of first small
transfer, date of last large transfer
Call service
quality
CDR logs Search for
dropped/reconnected calls
Identify how many times calls were restarted in
a session, total effective call duration and total
interrupted duration
Login security Application
logs
Search for attempted logins Identify attempts to gain access to
application/schema that can be linked to
hackers or inappropriate access
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Confidential – Oracle Internal
7
Typical MBA
• Transaction Data of type Master-Detail
• Rules of type "IF … THEN … "
• with KPIs – Support, Confidence and Lift
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8
Some examples of MB Rules/Insights
• (diapers) => (beer)
• (peanutButter, jelly) => (bread)
Multidimensional Rules with artificial/virtual products …
• (Item=X, isOver18=TRUE, isNewCustomer=TRUE) => (Item=Y)
• (buyerAge >= 63, loyaltyAge>= 2) => (toothBrushBuy >=2)
• age(X,"20...29"), income(X,"52k...58k") => buys(X, "iPad")
•
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 9
Typical MBA
• MB Rule: A => B [ s, c ]
– Support: denotes the frequency of the rule within transactions. A high value means
that the rule involves a great part of database.
support(A => B [ s, c ]) = p(A U B)
– Confidence: denotes the percentage of transactions containing A which also contain B.
It is an estimation of conditioned probability .
confidence(A => B [ s, c ]) = p(B|A) = sup(A,B)/sup(A)
– Lift: a measure of how much better a rule is at predicting the result than just assuming
the result in the first place.
lift(A => B [ s, c ]) = p(B|A)/p(B) = sup(A,B)/sup(A).sup(B)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 10
Typical MBA
• MB Rule: A => B [ s, c ]
– Conviction: measure of the number of times the rule would be incorrect if the association
between A and B was purely random chance. Conviction is a measure of the implication and has
value 1 if items are unrelated. Sensitive to the directionality of the rule (unlike lift)
conviction(A => B [ s, c ]) = sup(A).sup( B’)/sup(A,B’)
– Leverage or Piatetsky‐Shapiro: is the proportion of additional elements covered by both A and B
above the expected if independent.
leverage(A => B [ s, c ]) = sup(A,B) - sup(A). sup(B)
– Max Confidence: KPI max_conf is the maximum confidence of the two association rules related
to A and B, namely, “p(A|B)” and “p(B|A)”.
max_conf(A => B [ s, c ]) = max(p(B|A), p(A|B))
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 11
Typical MBA
• MB Rule: A => B [ s, c ]
– All Confidence: Null invariant KPI all_conf is also the minimum confidence of the two association
rules related to A and B, namely, “p(A|B)” and “p(B|A)”.
all_conf(A => B [ s, c ]) = min(p(B|A), p(A|B))
– Cosine: Null invariant KPI cosine measure can be viewed as a harmonized lift measure: The two
formulae are similar except that for cosine, the square root is taken on the product of the
probabilities of A and B.
cosine(A => B [ s, c ]) = sqrt(p(B|A).p(A|B))
– Kulc (Kulczynski) : Null invariant KPI Kulczynski measure with a preference for skewed patterns.
Range: [0; 1].
kulc(A => B [ s, c ]) = (P(A/B) + P(B/A)) / 2 = support(A,B)/2 * (1/support(A) + 1/support(B))
– IR (Imbalance Ratio): Null invariant KPI IR measures the degree of imbalance between two
events that A and B are contained in a transaction. The ratio is close to 0 if the conditional
probabilities are similar (i.e., very balanced) and close to 1 if they are very different. Range: [0; 1]
(0 indicates a balanced rule).
IR(A => B [ s, c ]) = |sup(A) - sup(B)| / (sup(A) + sup(B) - sup(A,B) )
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Confidential – Oracle Internal
12
MB Rules >> Patterns >> Insights … #1
• MB Rules
– IF antecedents (set of products/items) THEN consequent (single product/item)
– This is extracted from an Association Rule (AR) model after running the Apriori algorithm on the input Transactional data
– MB Rule can be captured/stored in a single row format as well as in a multi-row format (and in many flavors at that).
– Possible to store the same rule in many ways. For e.g. for rule "b, p, r => c“, we can store the rule in the following ways:
• single row "b, p, r => c" format
• multi-row model format with separate columns for antecedents and single column for consequent (consequent repeats
across rows)
– Row #1: antecedent = "b“, consequent = "c", sequence = 1
– Row #2: antecedent = "p", consequent = "c", sequence = 2
– Row #3: antecedent = "r", consequent = "c", sequence = 3
• multi-row metadata format with generic columns for products/items and metadata tags for order sequence, whether
product belongs to antecedents or consequent (no repeats)
– Row #1: product = "b", tag = "ant", sequence = 1
– Row #2: product = "p", tag = "ant", sequence = 2
– Row #3: product = "r", tag = "ant", sequence = 3
– Row #4: product = "c", tag = “cons", sequence = 4
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 13
MB Rules >> Patterns >> Insights … #2
• Insights: How does one come up with them? How do we begin?
• How do we go from "finding patterns in data“ to "generating meaningful insights from data”?
• Tool/Framework for “Patterns identified … What then”
• Localized /Partition / Global Patterns
• Trust/Distrust but verify … Allow for ‘critical’ Evaluation of the patterns under different contexts.
– Against newer data
– Against data from a specific period of interest (Promotion/Campaign/Festival)
– Across regions/geography /organizations etc.
• BI Actions on ML output/artifacts can popularize content with the ability to comment, collaborate,
highlight, share etc.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 14
MB Rules >> Patterns >> Insights … #3
• Make it possible to act on the MB Rule independent of the entire model
– The output of modeling exercise is a lot of MB Rules but Not all patterns are useful.
– Association Rules Model based Patterns (i.e. MB Rules) are independent of each other. Allows focused analysis. Unlike a
Decision Tree based Rule or a Clustering Model, we can zoom in on a set of rules or even a single rule of interest and
analyze it w/o affecting the rest of the patterns/Rules.
– We have ways to identify "interesting" rules using technical criteria/KPIs but context/business exigency trumps technical
analysis.
– Multiple MB Models can be in play at the same time working on the same input transactional dataset but baking in
business context into the model. E.g. analyze product purchase patterns with model1, analyze mode of payment choices in
model2, analyze behavior of customer segments say, newly signed up customers or customers responding to a Marketing
Campaign in model3 etc.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 15
MB Rules >> Patterns >> Insights … #4
• Taking the MB Rule and analyzing it in different contexts is typically an offline exercise
– Typically this would involve a lot of offline actions/modeling exercises to look at the Transactional dataset from different
perspectives
– From frinkiac :D
– Well, There is a way … and that’s where SQL Pattern Matching comes in. Not easy :D
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 16
MB Rules >> Patterns >> Insights … #5
• Allow for What-if actions on MB Rules/Patterns
– From frinkiac :D
– SQL Tools allow what-if ... This use case can also facilitate end users to perform what if actions via BI Tools.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 17
MBA Revisited
• Transaction Data augmented with "Tags"
• Rules of type "IF … THEN … “
• with more rules involving the added tags …
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 18
MBA Revisited
• with all the standard KPIs …
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 19
MBA Revisited
• and a lot more …
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 20
MB Rule Transformations/Metadata
• Extract from AR model
• Transformed to
• With additional rule level metadata (nm’s shown here, same for ids)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 21
MB Rule Transformations/Metadata
• With additional rule level metadata (same for ids)
• Also in single row format
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 22
KPIs … MB Rule and MB Rule-MB Trx levels
• (Rule KPIs) aSold – Count of Trx … antecedents part of the MB Rule has been sold
• (Rule KPIs) bSold – Count of Trx …. consequent part of the MB Rule …
• (Rule KPIs) abSold – Count of Trx … both antecedents and consequent parts of the MB Rule …
• (Rule KPIs) aobSold - Count of Trx … either antecedents or consequent parts of the MB Rule …
• (Sequential KPIs) aSeq - … antecedents (as per MB Rule defn) have been sold in the same order in the Trx
• (Sequential KPIs) abSeq - … antecedents (in any order, all of them) sold before consequent
• (Sequential KPIs) aSeqb - … antecedents (in order, all of them ) sold before consequent
• (Sequential KPIs) bSeqa - … consequent sold before antecedents (any order but all of them)
• (Sequential KPIs) baSeq - … consequent sold before antecedents (in order as per MB Rule defn)
• (Share of wallet/Trx KPIs) aSA – a Sold Alone i.e. Trx is wholly composed of antecedents
• (Share of wallet/Trx KPIs) bSA – b Sold Alone i.e. Trx is wholly composed of consequent
• (Share of wallet/Trx KPIs) abSA – ab Sold Alone i.e. Trx is wholly composed of antecendents and consequent
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 23
MBA Revisted – Design Considerations
• Other KPIs needed: R package arules has lot of additional KPIs available. Lots of literature
relating to why the standard set of 3 KPIs are not enough.
– Lift: Not directional. Susceptible to null Trx.
– Support, Confidence: Sometimes some rules can have higher support and higher confidence but be a worse
representation of reality than similar rules with lower support and lower confidence (s, c as well as lift can mislead).
• Sequential KPIs: Co-occurrence may not suffice based on domain/business. Useful for analyses
like Path to purchase, Recommendations, Purchase/Usage/Fraud Behavior patterns.
– If Rule extracted from model is “P, Q, R => D” but Q, R, P is the dominant sequence of purchase (9/12) within (P, Q, R)
then maybe the MB Rule is better represented and consumed as “Q, R, P => D” instead of “P, Q, R => D”.
– Recommendation (Shopping Cart) … For Next Purchase Recommendation, the recommended product should be such
that it’s a popular choice after all items of antecedent (P,Q,R) have been added previously (w/o considering strict order
of adding items to cart).
• Null invariant KPIs ideal for Exploratory Data Analysis (EDA) wherein Analysts can explore data
under the Rule/Pattern context using standard BI Techniques (Slice and Dice actions).
– Null-invariant KPI measures (all_conf, cosine, kulc and IR): KPI value is only influenced by the supports of A, B, and (A 
B), or more exactly, by the conditional probabilities of A|B and B|A, but not by the total number of transactions.
Another common property is that each measure ranges from 0 to 1, and the higher the value, the closer the
relationship between A and B.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 24
MBA Architecture/Process Flow
Oracle Db/ADW
(Application Schema)
Extract MB Rules
and related
reporting info
Pre – processing
(ETL): Prepare Data
for MBA
MBA Settings Config:
Define scope,
parameters for MBA
Other …
ClickStream, IoT,
Mktg etc.
Weather Data
Ingestion?
POS Ingestion of
Rtl Trx
Build OAA/ODM models
using dbms_data_mining
package
Post – processing
ETL: Prepare Data
for MBA Reporting
Store MBA
Reporting results
back to Oracle Db
OAC/OBIEE Reporting –
Market Basket Analysis
Reporting – OJET,
Others
UI Config (Read Write)
– OBIEE/APEX
Market Basket
Analysis
process
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 26
SQL Pattern Matching: Match Process Preparation
• Match Process Preparation (ETL or view)
•
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
1. Define input
2. Partition/order input
3. Process pattern
4. using defined conditions
5. Output: rows per match
6. Output: columns per row
7. Go where after match?
8. Return Data at match level (select * …)
or higher levels (based on group bys)
27
select
rule, --get KPIs at rule level
--trx, --uncomment to get KPIs at rule-trx level
KPIs
from INPUT_TABLE_OR_QUERY
match_recognize(
partition by "model", rule, trx order by
rule_mbcomp_seq
measures text_kpi_meas, kpi_meas,
agg_kpi_meas, kpi_partial_meas etc
one row per match
after match skip past last row
pattern (PERMUTE(apli*, bpli*, opli*))
define apli as (mb_comp = 'a_part'), bpli as
(mb_comp = 'b_part'), opli as (mb_comp = 'o_part')
)
group by
rule
--, trx --uncomment to get KPIs at rule-trx level
;
Slide Credit: Customized from Stew Ashton’s Advanced Pattern Matching pptx which was part of Oracle Code Paris 2018
SQL Pattern Matching: Input, Processing, Output
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 29
Some Issues/Challenges
• Pattern Matching SQL needs a fixed pattern to use for matching.
– We can write SQL for a single Rule … to match against a dataset (many Trx)
• For e.g. for rule “p, q, r => a” we use ….
PATTERN ( permute(p,q,r) | a )
DEFINE
p as trxli_prod_nm = 'p',
q as trxli_prod_nm = 'q',
r as trxli_prod_nm = 'r',
a as trxli_prod_nm = 'a'
– When we need to match many patterns (say, act on a whole AR model with 100 rules)
against a trx dataset we should define the patterns via metadata/component structures.
PATTERN (PERMUTE(apli*, bpli*, opli*))
DEFINE
apli as (mb_comp = 'a_part'),
bpli as (mb_comp = 'b_part'),
opli as (mb_comp = 'o_part')
• Same sql for any pattern => Allows integration into ETL or use in sql view to match
dynamically via sql query (issued by BI Tools).
Metadata based pattern, SQL
Data driven pattern, Dyn SQL
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 30
Some Issues/Challenges
• How should one handle the issue of merging multiple dimensions into a single
analysis dimension? This problems comes up in Graphs too (All dimension members
need to be combined into a set of nodes … "node Id" identifier conflicts).
• Currently using offsets for each dimension to base their Ids. Product has the highest
offset (unbounded).
•
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 31
Sample Screenshots
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32
Sample Screenshots
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 33
Demo
• Autonomous Database Warehouse (ADW) … Oracle 18c database
– ADW is optional but Oracle Database 12c+ is mandatory. ANSI SQL Feature Row Pattern
Matching via keyword MATCH_RECOGNIZE should be available in the Db.
– SQL scripts used for pre-processing/data preparation of the input data
– SQL scripts also used for post-processing
• Oracle Machine Learning (OML) bundled/packaged with ADW
– OML is optional but Oracle Data Mining (part of Oracle Advanced Analytics) is mandatory
– ODM based pl/sql api is used to make a call to the Apriori Algorithm for performing Market
Basket Analysis
– SQL queries used to extract the patterns from the Association Rules (AR) model.
• Oracle Analytics Cloud (OAC)
– (optional component) You can replace OAC with Tableau/Looker/… w/o losing core
functionality
– However many advanced features of the solution leverage the rpd (data modeling layer)
component of OAC
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 34
• Useful?
• Very little shown of ADW/OML currently (end goal), using SQL Developer for most Db actions
• Need more details on SQL Pattern Matching?
Market Basket Analysis Revisited using SQL Pattern Matching

More Related Content

What's hot

Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Edureka!
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath Yogi
Tekendra Nath Yogi
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
ankit panigrahy
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Optimization
OptimizationOptimization
Optimization
QuantUniversity
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Ec3212561262
Ec3212561262Ec3212561262
Ec3212561262IJMER
 
Association in Frequent Pattern Mining
Association in Frequent Pattern MiningAssociation in Frequent Pattern Mining
Association in Frequent Pattern Mining
ShreeaBose
 
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084Editor IJARCET
 

What's hot (9)

Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath Yogi
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Optimization
OptimizationOptimization
Optimization
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Ec3212561262
Ec3212561262Ec3212561262
Ec3212561262
 
Association in Frequent Pattern Mining
Association in Frequent Pattern MiningAssociation in Frequent Pattern Mining
Association in Frequent Pattern Mining
 
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084
 

Similar to Market Basket Analysis Revisited using SQL Pattern Matching

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 
Mi0036 business intelligence & tools...
Mi0036  business intelligence & tools...Mi0036  business intelligence & tools...
Mi0036 business intelligence & tools...
smumbahelp
 
Mi0036 business intelligence & tools...
Mi0036  business intelligence & tools...Mi0036  business intelligence & tools...
Mi0036 business intelligence & tools...smumbahelp
 
Mi0036 business intelligence & tools...
Mi0036  business intelligence & tools...Mi0036  business intelligence & tools...
Mi0036 business intelligence & tools...smumbahelp
 
Analytics
AnalyticsAnalytics
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
SrujanaMerugu1
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
Girish Gore
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
BalaBit
 
Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data Warehouse
Sandesh Rao
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
M Baddar
 
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Sandesh Rao
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
vishwajeetparmar1
 
Earned Value Management Meets Big Data
Earned Value Management Meets Big DataEarned Value Management Meets Big Data
Earned Value Management Meets Big Data
Glen Alleman
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Value Amplify Consulting
 
Telemarketing prediction project
Telemarketing prediction projectTelemarketing prediction project
Telemarketing prediction project
Learnbay Datascience
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
ShivarkarSandip
 
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Francesca Lazzeri, PhD
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
Francesca Lazzeri, PhD
 
LTV Predictions: How do real-life companies use them & what can you learn fro...
LTV Predictions: How do real-life companies use them & what can you learn fro...LTV Predictions: How do real-life companies use them & what can you learn fro...
LTV Predictions: How do real-life companies use them & what can you learn fro...
AppAgent / Strategic & Creative Mobile Marketing Agency
 
Experimenting with Data!
Experimenting with Data!Experimenting with Data!
Experimenting with Data!
Andrea Montemaggio
 

Similar to Market Basket Analysis Revisited using SQL Pattern Matching (20)

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Mi0036 business intelligence & tools...
Mi0036  business intelligence & tools...Mi0036  business intelligence & tools...
Mi0036 business intelligence & tools...
 
Mi0036 business intelligence & tools...
Mi0036  business intelligence & tools...Mi0036  business intelligence & tools...
Mi0036 business intelligence & tools...
 
Mi0036 business intelligence & tools...
Mi0036  business intelligence & tools...Mi0036  business intelligence & tools...
Mi0036 business intelligence & tools...
 
Analytics
AnalyticsAnalytics
Analytics
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
 
Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data Warehouse
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Earned Value Management Meets Big Data
Earned Value Management Meets Big DataEarned Value Management Meets Big Data
Earned Value Management Meets Big Data
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
Telemarketing prediction project
Telemarketing prediction projectTelemarketing prediction project
Telemarketing prediction project
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
 
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
LTV Predictions: How do real-life companies use them & what can you learn fro...
LTV Predictions: How do real-life companies use them & what can you learn fro...LTV Predictions: How do real-life companies use them & what can you learn fro...
LTV Predictions: How do real-life companies use them & what can you learn fro...
 
Experimenting with Data!
Experimenting with Data!Experimenting with Data!
Experimenting with Data!
 

Recently uploaded

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 

Recently uploaded (20)

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 

Market Basket Analysis Revisited using SQL Pattern Matching

  • 1.
  • 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Autonomous Data Warehouse Oracle Machine Learning Oracle Analytics Cloud Market Basket Analysis Revisited using SQL Pattern Matching Shankar Somayajula May 29th, 2019 Confidential – Oracle Internal
  • 3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Patterns Everywhere • Typical MBA • MBA Revisted – Motivations & Design • SQL Pattern Matching Intro/Usage • Issues/ Challenges • Demo / Screenshots • Feedback 3 Agenda
  • 4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Finding Patterns in Data Typical use cases in today’s world of fast exploration of data Financial Services Money Laundering Fraud Tracking Stock Market Law & Order Monitoring Suspicious Activities Retail Returns FraudBuying Patterns Session- ization Telcos Money Laundering SIM Card Fraud Call Quality Utilities Network Analysis Fraud Unusual Usage Lots of Data
  • 5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Typical Pattern Matching Use Cases Input Data Pattern Result Sessionization Weblogs continuous clicks by same user Generate reports on number of distinct sessions, average page views per session, etc Fraud Credit card transactions two transactions in different locations within a short period of time Find cases in which a credit card may have been used fraudulently since a physical person cannot be in two places at once In-game purchases Games logs events leading up to an in- game purchase Detect common sequences of event that results in an in-game purchase Fraud (mobiles) CDR logs SIM card being used in multiple handsets Flag individual SIM cards being used by multiple handsets within a specified time period Stock market analysis Ticker logs Track possible fraudulent linked patterns of behavior Track known patterns of behavior such as head and shoulders, triangles, channels and wedges
  • 6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Typical Pattern Matching Use Cases Input Data Pattern Result Auditing/Complia nce Application logs Analyze changes to secure customer data Find instances where operator has made suspect modifications to secure client data Money laundering Transaction logs Search for small transfers within a time window following by large transfer within “x” days of last small transfer Detect suspicious money transfer pattern for an account and report account, date of first small transfer, date of last large transfer Call service quality CDR logs Search for dropped/reconnected calls Identify how many times calls were restarted in a session, total effective call duration and total interrupted duration Login security Application logs Search for attempted logins Identify attempts to gain access to application/schema that can be linked to hackers or inappropriate access
  • 7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal 7 Typical MBA • Transaction Data of type Master-Detail • Rules of type "IF … THEN … " • with KPIs – Support, Confidence and Lift
  • 8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8 Some examples of MB Rules/Insights • (diapers) => (beer) • (peanutButter, jelly) => (bread) Multidimensional Rules with artificial/virtual products … • (Item=X, isOver18=TRUE, isNewCustomer=TRUE) => (Item=Y) • (buyerAge >= 63, loyaltyAge>= 2) => (toothBrushBuy >=2) • age(X,"20...29"), income(X,"52k...58k") => buys(X, "iPad") •
  • 9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 9 Typical MBA • MB Rule: A => B [ s, c ] – Support: denotes the frequency of the rule within transactions. A high value means that the rule involves a great part of database. support(A => B [ s, c ]) = p(A U B) – Confidence: denotes the percentage of transactions containing A which also contain B. It is an estimation of conditioned probability . confidence(A => B [ s, c ]) = p(B|A) = sup(A,B)/sup(A) – Lift: a measure of how much better a rule is at predicting the result than just assuming the result in the first place. lift(A => B [ s, c ]) = p(B|A)/p(B) = sup(A,B)/sup(A).sup(B)
  • 10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 10 Typical MBA • MB Rule: A => B [ s, c ] – Conviction: measure of the number of times the rule would be incorrect if the association between A and B was purely random chance. Conviction is a measure of the implication and has value 1 if items are unrelated. Sensitive to the directionality of the rule (unlike lift) conviction(A => B [ s, c ]) = sup(A).sup( B’)/sup(A,B’) – Leverage or Piatetsky‐Shapiro: is the proportion of additional elements covered by both A and B above the expected if independent. leverage(A => B [ s, c ]) = sup(A,B) - sup(A). sup(B) – Max Confidence: KPI max_conf is the maximum confidence of the two association rules related to A and B, namely, “p(A|B)” and “p(B|A)”. max_conf(A => B [ s, c ]) = max(p(B|A), p(A|B))
  • 11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 11 Typical MBA • MB Rule: A => B [ s, c ] – All Confidence: Null invariant KPI all_conf is also the minimum confidence of the two association rules related to A and B, namely, “p(A|B)” and “p(B|A)”. all_conf(A => B [ s, c ]) = min(p(B|A), p(A|B)) – Cosine: Null invariant KPI cosine measure can be viewed as a harmonized lift measure: The two formulae are similar except that for cosine, the square root is taken on the product of the probabilities of A and B. cosine(A => B [ s, c ]) = sqrt(p(B|A).p(A|B)) – Kulc (Kulczynski) : Null invariant KPI Kulczynski measure with a preference for skewed patterns. Range: [0; 1]. kulc(A => B [ s, c ]) = (P(A/B) + P(B/A)) / 2 = support(A,B)/2 * (1/support(A) + 1/support(B)) – IR (Imbalance Ratio): Null invariant KPI IR measures the degree of imbalance between two events that A and B are contained in a transaction. The ratio is close to 0 if the conditional probabilities are similar (i.e., very balanced) and close to 1 if they are very different. Range: [0; 1] (0 indicates a balanced rule). IR(A => B [ s, c ]) = |sup(A) - sup(B)| / (sup(A) + sup(B) - sup(A,B) )
  • 12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal 12 MB Rules >> Patterns >> Insights … #1 • MB Rules – IF antecedents (set of products/items) THEN consequent (single product/item) – This is extracted from an Association Rule (AR) model after running the Apriori algorithm on the input Transactional data – MB Rule can be captured/stored in a single row format as well as in a multi-row format (and in many flavors at that). – Possible to store the same rule in many ways. For e.g. for rule "b, p, r => c“, we can store the rule in the following ways: • single row "b, p, r => c" format • multi-row model format with separate columns for antecedents and single column for consequent (consequent repeats across rows) – Row #1: antecedent = "b“, consequent = "c", sequence = 1 – Row #2: antecedent = "p", consequent = "c", sequence = 2 – Row #3: antecedent = "r", consequent = "c", sequence = 3 • multi-row metadata format with generic columns for products/items and metadata tags for order sequence, whether product belongs to antecedents or consequent (no repeats) – Row #1: product = "b", tag = "ant", sequence = 1 – Row #2: product = "p", tag = "ant", sequence = 2 – Row #3: product = "r", tag = "ant", sequence = 3 – Row #4: product = "c", tag = “cons", sequence = 4
  • 13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 13 MB Rules >> Patterns >> Insights … #2 • Insights: How does one come up with them? How do we begin? • How do we go from "finding patterns in data“ to "generating meaningful insights from data”? • Tool/Framework for “Patterns identified … What then” • Localized /Partition / Global Patterns • Trust/Distrust but verify … Allow for ‘critical’ Evaluation of the patterns under different contexts. – Against newer data – Against data from a specific period of interest (Promotion/Campaign/Festival) – Across regions/geography /organizations etc. • BI Actions on ML output/artifacts can popularize content with the ability to comment, collaborate, highlight, share etc.
  • 14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 14 MB Rules >> Patterns >> Insights … #3 • Make it possible to act on the MB Rule independent of the entire model – The output of modeling exercise is a lot of MB Rules but Not all patterns are useful. – Association Rules Model based Patterns (i.e. MB Rules) are independent of each other. Allows focused analysis. Unlike a Decision Tree based Rule or a Clustering Model, we can zoom in on a set of rules or even a single rule of interest and analyze it w/o affecting the rest of the patterns/Rules. – We have ways to identify "interesting" rules using technical criteria/KPIs but context/business exigency trumps technical analysis. – Multiple MB Models can be in play at the same time working on the same input transactional dataset but baking in business context into the model. E.g. analyze product purchase patterns with model1, analyze mode of payment choices in model2, analyze behavior of customer segments say, newly signed up customers or customers responding to a Marketing Campaign in model3 etc.
  • 15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 15 MB Rules >> Patterns >> Insights … #4 • Taking the MB Rule and analyzing it in different contexts is typically an offline exercise – Typically this would involve a lot of offline actions/modeling exercises to look at the Transactional dataset from different perspectives – From frinkiac :D – Well, There is a way … and that’s where SQL Pattern Matching comes in. Not easy :D
  • 16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 16 MB Rules >> Patterns >> Insights … #5 • Allow for What-if actions on MB Rules/Patterns – From frinkiac :D – SQL Tools allow what-if ... This use case can also facilitate end users to perform what if actions via BI Tools.
  • 17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 17 MBA Revisited • Transaction Data augmented with "Tags" • Rules of type "IF … THEN … “ • with more rules involving the added tags …
  • 18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 18 MBA Revisited • with all the standard KPIs …
  • 19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 19 MBA Revisited • and a lot more …
  • 20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 20 MB Rule Transformations/Metadata • Extract from AR model • Transformed to • With additional rule level metadata (nm’s shown here, same for ids)
  • 21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 21 MB Rule Transformations/Metadata • With additional rule level metadata (same for ids) • Also in single row format
  • 22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 22 KPIs … MB Rule and MB Rule-MB Trx levels • (Rule KPIs) aSold – Count of Trx … antecedents part of the MB Rule has been sold • (Rule KPIs) bSold – Count of Trx …. consequent part of the MB Rule … • (Rule KPIs) abSold – Count of Trx … both antecedents and consequent parts of the MB Rule … • (Rule KPIs) aobSold - Count of Trx … either antecedents or consequent parts of the MB Rule … • (Sequential KPIs) aSeq - … antecedents (as per MB Rule defn) have been sold in the same order in the Trx • (Sequential KPIs) abSeq - … antecedents (in any order, all of them) sold before consequent • (Sequential KPIs) aSeqb - … antecedents (in order, all of them ) sold before consequent • (Sequential KPIs) bSeqa - … consequent sold before antecedents (any order but all of them) • (Sequential KPIs) baSeq - … consequent sold before antecedents (in order as per MB Rule defn) • (Share of wallet/Trx KPIs) aSA – a Sold Alone i.e. Trx is wholly composed of antecedents • (Share of wallet/Trx KPIs) bSA – b Sold Alone i.e. Trx is wholly composed of consequent • (Share of wallet/Trx KPIs) abSA – ab Sold Alone i.e. Trx is wholly composed of antecendents and consequent
  • 23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 23 MBA Revisted – Design Considerations • Other KPIs needed: R package arules has lot of additional KPIs available. Lots of literature relating to why the standard set of 3 KPIs are not enough. – Lift: Not directional. Susceptible to null Trx. – Support, Confidence: Sometimes some rules can have higher support and higher confidence but be a worse representation of reality than similar rules with lower support and lower confidence (s, c as well as lift can mislead). • Sequential KPIs: Co-occurrence may not suffice based on domain/business. Useful for analyses like Path to purchase, Recommendations, Purchase/Usage/Fraud Behavior patterns. – If Rule extracted from model is “P, Q, R => D” but Q, R, P is the dominant sequence of purchase (9/12) within (P, Q, R) then maybe the MB Rule is better represented and consumed as “Q, R, P => D” instead of “P, Q, R => D”. – Recommendation (Shopping Cart) … For Next Purchase Recommendation, the recommended product should be such that it’s a popular choice after all items of antecedent (P,Q,R) have been added previously (w/o considering strict order of adding items to cart). • Null invariant KPIs ideal for Exploratory Data Analysis (EDA) wherein Analysts can explore data under the Rule/Pattern context using standard BI Techniques (Slice and Dice actions). – Null-invariant KPI measures (all_conf, cosine, kulc and IR): KPI value is only influenced by the supports of A, B, and (A  B), or more exactly, by the conditional probabilities of A|B and B|A, but not by the total number of transactions. Another common property is that each measure ranges from 0 to 1, and the higher the value, the closer the relationship between A and B.
  • 24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 24 MBA Architecture/Process Flow Oracle Db/ADW (Application Schema) Extract MB Rules and related reporting info Pre – processing (ETL): Prepare Data for MBA MBA Settings Config: Define scope, parameters for MBA Other … ClickStream, IoT, Mktg etc. Weather Data Ingestion? POS Ingestion of Rtl Trx Build OAA/ODM models using dbms_data_mining package Post – processing ETL: Prepare Data for MBA Reporting Store MBA Reporting results back to Oracle Db OAC/OBIEE Reporting – Market Basket Analysis Reporting – OJET, Others UI Config (Read Write) – OBIEE/APEX Market Basket Analysis process
  • 25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 26 SQL Pattern Matching: Match Process Preparation • Match Process Preparation (ETL or view) •
  • 26. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 1. Define input 2. Partition/order input 3. Process pattern 4. using defined conditions 5. Output: rows per match 6. Output: columns per row 7. Go where after match? 8. Return Data at match level (select * …) or higher levels (based on group bys) 27 select rule, --get KPIs at rule level --trx, --uncomment to get KPIs at rule-trx level KPIs from INPUT_TABLE_OR_QUERY match_recognize( partition by "model", rule, trx order by rule_mbcomp_seq measures text_kpi_meas, kpi_meas, agg_kpi_meas, kpi_partial_meas etc one row per match after match skip past last row pattern (PERMUTE(apli*, bpli*, opli*)) define apli as (mb_comp = 'a_part'), bpli as (mb_comp = 'b_part'), opli as (mb_comp = 'o_part') ) group by rule --, trx --uncomment to get KPIs at rule-trx level ; Slide Credit: Customized from Stew Ashton’s Advanced Pattern Matching pptx which was part of Oracle Code Paris 2018 SQL Pattern Matching: Input, Processing, Output
  • 27. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 29 Some Issues/Challenges • Pattern Matching SQL needs a fixed pattern to use for matching. – We can write SQL for a single Rule … to match against a dataset (many Trx) • For e.g. for rule “p, q, r => a” we use …. PATTERN ( permute(p,q,r) | a ) DEFINE p as trxli_prod_nm = 'p', q as trxli_prod_nm = 'q', r as trxli_prod_nm = 'r', a as trxli_prod_nm = 'a' – When we need to match many patterns (say, act on a whole AR model with 100 rules) against a trx dataset we should define the patterns via metadata/component structures. PATTERN (PERMUTE(apli*, bpli*, opli*)) DEFINE apli as (mb_comp = 'a_part'), bpli as (mb_comp = 'b_part'), opli as (mb_comp = 'o_part') • Same sql for any pattern => Allows integration into ETL or use in sql view to match dynamically via sql query (issued by BI Tools). Metadata based pattern, SQL Data driven pattern, Dyn SQL
  • 28. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 30 Some Issues/Challenges • How should one handle the issue of merging multiple dimensions into a single analysis dimension? This problems comes up in Graphs too (All dimension members need to be combined into a set of nodes … "node Id" identifier conflicts). • Currently using offsets for each dimension to base their Ids. Product has the highest offset (unbounded). •
  • 29. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 31 Sample Screenshots
  • 30. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32 Sample Screenshots
  • 31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 33 Demo • Autonomous Database Warehouse (ADW) … Oracle 18c database – ADW is optional but Oracle Database 12c+ is mandatory. ANSI SQL Feature Row Pattern Matching via keyword MATCH_RECOGNIZE should be available in the Db. – SQL scripts used for pre-processing/data preparation of the input data – SQL scripts also used for post-processing • Oracle Machine Learning (OML) bundled/packaged with ADW – OML is optional but Oracle Data Mining (part of Oracle Advanced Analytics) is mandatory – ODM based pl/sql api is used to make a call to the Apriori Algorithm for performing Market Basket Analysis – SQL queries used to extract the patterns from the Association Rules (AR) model. • Oracle Analytics Cloud (OAC) – (optional component) You can replace OAC with Tableau/Looker/… w/o losing core functionality – However many advanced features of the solution leverage the rpd (data modeling layer) component of OAC
  • 32. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 34 • Useful? • Very little shown of ADW/OML currently (end goal), using SQL Developer for most Db actions • Need more details on SQL Pattern Matching?

Editor's Notes

  1. This is a Branded Title Slide with Graphic slide ideal for including a picture with a brief title, subtitle and presenter information. Do not customize this slide with your own picture.
  2. With Big Data it is just not possible to slice & dice data sets using a BI table. However with SQL Pattern Matching available in Oracle Database 12c onwards, you can process these types of data sets to uncover the real interesting stories hidden inside the huge mass of data points and then bake the SQL Pattern Matching logic into the BI/Analytical schema (via views, say) to actually slice and dice data just like the good old days of star schema based SQL BI Tools. Most discovery workflows are based on finding patterns and this is true across a broad range of industries. Everything starts with pattern discovery that then drives other analytical workflows…