Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Autonomous Data Warehouse
Oracle Machine Learning
Oracle Analytics Cloud
Market Basket Analysis Revisited using SQL
Pattern Matching
Shankar Somayajula
May 29th, 2019
Confidential – Oracle Internal
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Patterns Everywhere
• Typical MBA
• MBA Revisted – Motivations & Design
• SQL Pattern Matching Intro/Usage
• Issues/ Challenges
• Demo / Screenshots
• Feedback
3
Agenda
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Finding Patterns in Data
Typical use cases in today’s world of fast exploration of data
Financial
Services
Money
Laundering
Fraud
Tracking Stock
Market
Law
&
Order
Monitoring
Suspicious
Activities
Retail
Returns
FraudBuying
Patterns
Session-
ization
Telcos
Money
Laundering
SIM Card
Fraud
Call
Quality
Utilities
Network
Analysis
Fraud
Unusual
Usage
Lots of
Data
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Typical Pattern Matching Use Cases
Input Data Pattern Result
Sessionization Weblogs continuous clicks by same
user
Generate reports on number of distinct
sessions, average page views per session, etc
Fraud Credit card
transactions
two transactions in different
locations within a short
period of time
Find cases in which a credit card may have been
used fraudulently since a physical person cannot
be in two places at once
In-game
purchases
Games logs events leading up to an in-
game purchase
Detect common sequences of event that results
in an in-game purchase
Fraud (mobiles) CDR logs SIM card being used in
multiple handsets
Flag individual SIM cards being used by multiple
handsets within a specified time period
Stock market
analysis
Ticker logs Track possible fraudulent
linked patterns of behavior
Track known patterns of behavior such as head
and shoulders, triangles, channels and wedges
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Typical Pattern Matching Use Cases
Input Data Pattern Result
Auditing/Complia
nce
Application
logs
Analyze changes to secure
customer data
Find instances where operator has made
suspect modifications to secure client data
Money
laundering
Transaction
logs
Search for small transfers
within a time window
following by large transfer
within “x” days of last small
transfer
Detect suspicious money transfer pattern for an
account and report account, date of first small
transfer, date of last large transfer
Call service
quality
CDR logs Search for
dropped/reconnected calls
Identify how many times calls were restarted in
a session, total effective call duration and total
interrupted duration
Login security Application
logs
Search for attempted logins Identify attempts to gain access to
application/schema that can be linked to
hackers or inappropriate access
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Confidential – Oracle Internal
7
Typical MBA
• Transaction Data of type Master-Detail
• Rules of type "IF … THEN … "
• with KPIs – Support, Confidence and Lift
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8
Some examples of MB Rules/Insights
• (diapers) => (beer)
• (peanutButter, jelly) => (bread)
Multidimensional Rules with artificial/virtual products …
• (Item=X, isOver18=TRUE, isNewCustomer=TRUE) => (Item=Y)
• (buyerAge >= 63, loyaltyAge>= 2) => (toothBrushBuy >=2)
• age(X,"20...29"), income(X,"52k...58k") => buys(X, "iPad")
•
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 9
Typical MBA
• MB Rule: A => B [ s, c ]
– Support: denotes the frequency of the rule within transactions. A high value means
that the rule involves a great part of database.
support(A => B [ s, c ]) = p(A U B)
– Confidence: denotes the percentage of transactions containing A which also contain B.
It is an estimation of conditioned probability .
confidence(A => B [ s, c ]) = p(B|A) = sup(A,B)/sup(A)
– Lift: a measure of how much better a rule is at predicting the result than just assuming
the result in the first place.
lift(A => B [ s, c ]) = p(B|A)/p(B) = sup(A,B)/sup(A).sup(B)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 10
Typical MBA
• MB Rule: A => B [ s, c ]
– Conviction: measure of the number of times the rule would be incorrect if the association
between A and B was purely random chance. Conviction is a measure of the implication and has
value 1 if items are unrelated. Sensitive to the directionality of the rule (unlike lift)
conviction(A => B [ s, c ]) = sup(A).sup( B’)/sup(A,B’)
– Leverage or Piatetsky‐Shapiro: is the proportion of additional elements covered by both A and B
above the expected if independent.
leverage(A => B [ s, c ]) = sup(A,B) - sup(A). sup(B)
– Max Confidence: KPI max_conf is the maximum confidence of the two association rules related
to A and B, namely, “p(A|B)” and “p(B|A)”.
max_conf(A => B [ s, c ]) = max(p(B|A), p(A|B))
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 11
Typical MBA
• MB Rule: A => B [ s, c ]
– All Confidence: Null invariant KPI all_conf is also the minimum confidence of the two association
rules related to A and B, namely, “p(A|B)” and “p(B|A)”.
all_conf(A => B [ s, c ]) = min(p(B|A), p(A|B))
– Cosine: Null invariant KPI cosine measure can be viewed as a harmonized lift measure: The two
formulae are similar except that for cosine, the square root is taken on the product of the
probabilities of A and B.
cosine(A => B [ s, c ]) = sqrt(p(B|A).p(A|B))
– Kulc (Kulczynski) : Null invariant KPI Kulczynski measure with a preference for skewed patterns.
Range: [0; 1].
kulc(A => B [ s, c ]) = (P(A/B) + P(B/A)) / 2 = support(A,B)/2 * (1/support(A) + 1/support(B))
– IR (Imbalance Ratio): Null invariant KPI IR measures the degree of imbalance between two
events that A and B are contained in a transaction. The ratio is close to 0 if the conditional
probabilities are similar (i.e., very balanced) and close to 1 if they are very different. Range: [0; 1]
(0 indicates a balanced rule).
IR(A => B [ s, c ]) = |sup(A) - sup(B)| / (sup(A) + sup(B) - sup(A,B) )
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Confidential – Oracle Internal
12
MB Rules >> Patterns >> Insights … #1
• MB Rules
– IF antecedents (set of products/items) THEN consequent (single product/item)
– This is extracted from an Association Rule (AR) model after running the Apriori algorithm on the input Transactional data
– MB Rule can be captured/stored in a single row format as well as in a multi-row format (and in many flavors at that).
– Possible to store the same rule in many ways. For e.g. for rule "b, p, r => c“, we can store the rule in the following ways:
• single row "b, p, r => c" format
• multi-row model format with separate columns for antecedents and single column for consequent (consequent repeats
across rows)
– Row #1: antecedent = "b“, consequent = "c", sequence = 1
– Row #2: antecedent = "p", consequent = "c", sequence = 2
– Row #3: antecedent = "r", consequent = "c", sequence = 3
• multi-row metadata format with generic columns for products/items and metadata tags for order sequence, whether
product belongs to antecedents or consequent (no repeats)
– Row #1: product = "b", tag = "ant", sequence = 1
– Row #2: product = "p", tag = "ant", sequence = 2
– Row #3: product = "r", tag = "ant", sequence = 3
– Row #4: product = "c", tag = “cons", sequence = 4
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 13
MB Rules >> Patterns >> Insights … #2
• Insights: How does one come up with them? How do we begin?
• How do we go from "finding patterns in data“ to "generating meaningful insights from data”?
• Tool/Framework for “Patterns identified … What then”
• Localized /Partition / Global Patterns
• Trust/Distrust but verify … Allow for ‘critical’ Evaluation of the patterns under different contexts.
– Against newer data
– Against data from a specific period of interest (Promotion/Campaign/Festival)
– Across regions/geography /organizations etc.
• BI Actions on ML output/artifacts can popularize content with the ability to comment, collaborate,
highlight, share etc.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 14
MB Rules >> Patterns >> Insights … #3
• Make it possible to act on the MB Rule independent of the entire model
– The output of modeling exercise is a lot of MB Rules but Not all patterns are useful.
– Association Rules Model based Patterns (i.e. MB Rules) are independent of each other. Allows focused analysis. Unlike a
Decision Tree based Rule or a Clustering Model, we can zoom in on a set of rules or even a single rule of interest and
analyze it w/o affecting the rest of the patterns/Rules.
– We have ways to identify "interesting" rules using technical criteria/KPIs but context/business exigency trumps technical
analysis.
– Multiple MB Models can be in play at the same time working on the same input transactional dataset but baking in
business context into the model. E.g. analyze product purchase patterns with model1, analyze mode of payment choices in
model2, analyze behavior of customer segments say, newly signed up customers or customers responding to a Marketing
Campaign in model3 etc.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 15
MB Rules >> Patterns >> Insights … #4
• Taking the MB Rule and analyzing it in different contexts is typically an offline exercise
– Typically this would involve a lot of offline actions/modeling exercises to look at the Transactional dataset from different
perspectives
– From frinkiac :D
– Well, There is a way … and that’s where SQL Pattern Matching comes in. Not easy :D
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 16
MB Rules >> Patterns >> Insights … #5
• Allow for What-if actions on MB Rules/Patterns
– From frinkiac :D
– SQL Tools allow what-if ... This use case can also facilitate end users to perform what if actions via BI Tools.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 17
MBA Revisited
• Transaction Data augmented with "Tags"
• Rules of type "IF … THEN … “
• with more rules involving the added tags …
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 18
MBA Revisited
• with all the standard KPIs …
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 19
MBA Revisited
• and a lot more …
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 20
MB Rule Transformations/Metadata
• Extract from AR model
• Transformed to
• With additional rule level metadata (nm’s shown here, same for ids)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 21
MB Rule Transformations/Metadata
• With additional rule level metadata (same for ids)
• Also in single row format
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 22
KPIs … MB Rule and MB Rule-MB Trx levels
• (Rule KPIs) aSold – Count of Trx … antecedents part of the MB Rule has been sold
• (Rule KPIs) bSold – Count of Trx …. consequent part of the MB Rule …
• (Rule KPIs) abSold – Count of Trx … both antecedents and consequent parts of the MB Rule …
• (Rule KPIs) aobSold - Count of Trx … either antecedents or consequent parts of the MB Rule …
• (Sequential KPIs) aSeq - … antecedents (as per MB Rule defn) have been sold in the same order in the Trx
• (Sequential KPIs) abSeq - … antecedents (in any order, all of them) sold before consequent
• (Sequential KPIs) aSeqb - … antecedents (in order, all of them ) sold before consequent
• (Sequential KPIs) bSeqa - … consequent sold before antecedents (any order but all of them)
• (Sequential KPIs) baSeq - … consequent sold before antecedents (in order as per MB Rule defn)
• (Share of wallet/Trx KPIs) aSA – a Sold Alone i.e. Trx is wholly composed of antecedents
• (Share of wallet/Trx KPIs) bSA – b Sold Alone i.e. Trx is wholly composed of consequent
• (Share of wallet/Trx KPIs) abSA – ab Sold Alone i.e. Trx is wholly composed of antecendents and consequent
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 23
MBA Revisted – Design Considerations
• Other KPIs needed: R package arules has lot of additional KPIs available. Lots of literature
relating to why the standard set of 3 KPIs are not enough.
– Lift: Not directional. Susceptible to null Trx.
– Support, Confidence: Sometimes some rules can have higher support and higher confidence but be a worse
representation of reality than similar rules with lower support and lower confidence (s, c as well as lift can mislead).
• Sequential KPIs: Co-occurrence may not suffice based on domain/business. Useful for analyses
like Path to purchase, Recommendations, Purchase/Usage/Fraud Behavior patterns.
– If Rule extracted from model is “P, Q, R => D” but Q, R, P is the dominant sequence of purchase (9/12) within (P, Q, R)
then maybe the MB Rule is better represented and consumed as “Q, R, P => D” instead of “P, Q, R => D”.
– Recommendation (Shopping Cart) … For Next Purchase Recommendation, the recommended product should be such
that it’s a popular choice after all items of antecedent (P,Q,R) have been added previously (w/o considering strict order
of adding items to cart).
• Null invariant KPIs ideal for Exploratory Data Analysis (EDA) wherein Analysts can explore data
under the Rule/Pattern context using standard BI Techniques (Slice and Dice actions).
– Null-invariant KPI measures (all_conf, cosine, kulc and IR): KPI value is only influenced by the supports of A, B, and (A 
B), or more exactly, by the conditional probabilities of A|B and B|A, but not by the total number of transactions.
Another common property is that each measure ranges from 0 to 1, and the higher the value, the closer the
relationship between A and B.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 24
MBA Architecture/Process Flow
Oracle Db/ADW
(Application Schema)
Extract MB Rules
and related
reporting info
Pre – processing
(ETL): Prepare Data
for MBA
MBA Settings Config:
Define scope,
parameters for MBA
Other …
ClickStream, IoT,
Mktg etc.
Weather Data
Ingestion?
POS Ingestion of
Rtl Trx
Build OAA/ODM models
using dbms_data_mining
package
Post – processing
ETL: Prepare Data
for MBA Reporting
Store MBA
Reporting results
back to Oracle Db
OAC/OBIEE Reporting –
Market Basket Analysis
Reporting – OJET,
Others
UI Config (Read Write)
– OBIEE/APEX
Market Basket
Analysis
process
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 26
SQL Pattern Matching: Match Process Preparation
• Match Process Preparation (ETL or view)
•
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
1. Define input
2. Partition/order input
3. Process pattern
4. using defined conditions
5. Output: rows per match
6. Output: columns per row
7. Go where after match?
8. Return Data at match level (select * …)
or higher levels (based on group bys)
27
select
rule, --get KPIs at rule level
--trx, --uncomment to get KPIs at rule-trx level
KPIs
from INPUT_TABLE_OR_QUERY
match_recognize(
partition by "model", rule, trx order by
rule_mbcomp_seq
measures text_kpi_meas, kpi_meas,
agg_kpi_meas, kpi_partial_meas etc
one row per match
after match skip past last row
pattern (PERMUTE(apli*, bpli*, opli*))
define apli as (mb_comp = 'a_part'), bpli as
(mb_comp = 'b_part'), opli as (mb_comp = 'o_part')
)
group by
rule
--, trx --uncomment to get KPIs at rule-trx level
;
Slide Credit: Customized from Stew Ashton’s Advanced Pattern Matching pptx which was part of Oracle Code Paris 2018
SQL Pattern Matching: Input, Processing, Output
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 29
Some Issues/Challenges
• Pattern Matching SQL needs a fixed pattern to use for matching.
– We can write SQL for a single Rule … to match against a dataset (many Trx)
• For e.g. for rule “p, q, r => a” we use ….
PATTERN ( permute(p,q,r) | a )
DEFINE
p as trxli_prod_nm = 'p',
q as trxli_prod_nm = 'q',
r as trxli_prod_nm = 'r',
a as trxli_prod_nm = 'a'
– When we need to match many patterns (say, act on a whole AR model with 100 rules)
against a trx dataset we should define the patterns via metadata/component structures.
PATTERN (PERMUTE(apli*, bpli*, opli*))
DEFINE
apli as (mb_comp = 'a_part'),
bpli as (mb_comp = 'b_part'),
opli as (mb_comp = 'o_part')
• Same sql for any pattern => Allows integration into ETL or use in sql view to match
dynamically via sql query (issued by BI Tools).
Metadata based pattern, SQL
Data driven pattern, Dyn SQL
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 30
Some Issues/Challenges
• How should one handle the issue of merging multiple dimensions into a single
analysis dimension? This problems comes up in Graphs too (All dimension members
need to be combined into a set of nodes … "node Id" identifier conflicts).
• Currently using offsets for each dimension to base their Ids. Product has the highest
offset (unbounded).
•
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 31
Sample Screenshots
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32
Sample Screenshots
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 33
Demo
• Autonomous Database Warehouse (ADW) … Oracle 18c database
– ADW is optional but Oracle Database 12c+ is mandatory. ANSI SQL Feature Row Pattern
Matching via keyword MATCH_RECOGNIZE should be available in the Db.
– SQL scripts used for pre-processing/data preparation of the input data
– SQL scripts also used for post-processing
• Oracle Machine Learning (OML) bundled/packaged with ADW
– OML is optional but Oracle Data Mining (part of Oracle Advanced Analytics) is mandatory
– ODM based pl/sql api is used to make a call to the Apriori Algorithm for performing Market
Basket Analysis
– SQL queries used to extract the patterns from the Association Rules (AR) model.
• Oracle Analytics Cloud (OAC)
– (optional component) You can replace OAC with Tableau/Looker/… w/o losing core
functionality
– However many advanced features of the solution leverage the rpd (data modeling layer)
component of OAC
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 34
• Useful?
• Very little shown of ADW/OML currently (end goal), using SQL Developer for most Db actions
• Need more details on SQL Pattern Matching?
Market Basket Analysis Revisited using SQL Pattern Matching

Market Basket Analysis Revisited using SQL Pattern Matching

  • 2.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Autonomous Data Warehouse Oracle Machine Learning Oracle Analytics Cloud Market Basket Analysis Revisited using SQL Pattern Matching Shankar Somayajula May 29th, 2019 Confidential – Oracle Internal
  • 3.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | • Patterns Everywhere • Typical MBA • MBA Revisted – Motivations & Design • SQL Pattern Matching Intro/Usage • Issues/ Challenges • Demo / Screenshots • Feedback 3 Agenda
  • 4.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Finding Patterns in Data Typical use cases in today’s world of fast exploration of data Financial Services Money Laundering Fraud Tracking Stock Market Law & Order Monitoring Suspicious Activities Retail Returns FraudBuying Patterns Session- ization Telcos Money Laundering SIM Card Fraud Call Quality Utilities Network Analysis Fraud Unusual Usage Lots of Data
  • 5.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Typical Pattern Matching Use Cases Input Data Pattern Result Sessionization Weblogs continuous clicks by same user Generate reports on number of distinct sessions, average page views per session, etc Fraud Credit card transactions two transactions in different locations within a short period of time Find cases in which a credit card may have been used fraudulently since a physical person cannot be in two places at once In-game purchases Games logs events leading up to an in- game purchase Detect common sequences of event that results in an in-game purchase Fraud (mobiles) CDR logs SIM card being used in multiple handsets Flag individual SIM cards being used by multiple handsets within a specified time period Stock market analysis Ticker logs Track possible fraudulent linked patterns of behavior Track known patterns of behavior such as head and shoulders, triangles, channels and wedges
  • 6.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Typical Pattern Matching Use Cases Input Data Pattern Result Auditing/Complia nce Application logs Analyze changes to secure customer data Find instances where operator has made suspect modifications to secure client data Money laundering Transaction logs Search for small transfers within a time window following by large transfer within “x” days of last small transfer Detect suspicious money transfer pattern for an account and report account, date of first small transfer, date of last large transfer Call service quality CDR logs Search for dropped/reconnected calls Identify how many times calls were restarted in a session, total effective call duration and total interrupted duration Login security Application logs Search for attempted logins Identify attempts to gain access to application/schema that can be linked to hackers or inappropriate access
  • 7.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal 7 Typical MBA • Transaction Data of type Master-Detail • Rules of type "IF … THEN … " • with KPIs – Support, Confidence and Lift
  • 8.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 8 Some examples of MB Rules/Insights • (diapers) => (beer) • (peanutButter, jelly) => (bread) Multidimensional Rules with artificial/virtual products … • (Item=X, isOver18=TRUE, isNewCustomer=TRUE) => (Item=Y) • (buyerAge >= 63, loyaltyAge>= 2) => (toothBrushBuy >=2) • age(X,"20...29"), income(X,"52k...58k") => buys(X, "iPad") •
  • 9.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 9 Typical MBA • MB Rule: A => B [ s, c ] – Support: denotes the frequency of the rule within transactions. A high value means that the rule involves a great part of database. support(A => B [ s, c ]) = p(A U B) – Confidence: denotes the percentage of transactions containing A which also contain B. It is an estimation of conditioned probability . confidence(A => B [ s, c ]) = p(B|A) = sup(A,B)/sup(A) – Lift: a measure of how much better a rule is at predicting the result than just assuming the result in the first place. lift(A => B [ s, c ]) = p(B|A)/p(B) = sup(A,B)/sup(A).sup(B)
  • 10.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 10 Typical MBA • MB Rule: A => B [ s, c ] – Conviction: measure of the number of times the rule would be incorrect if the association between A and B was purely random chance. Conviction is a measure of the implication and has value 1 if items are unrelated. Sensitive to the directionality of the rule (unlike lift) conviction(A => B [ s, c ]) = sup(A).sup( B’)/sup(A,B’) – Leverage or Piatetsky‐Shapiro: is the proportion of additional elements covered by both A and B above the expected if independent. leverage(A => B [ s, c ]) = sup(A,B) - sup(A). sup(B) – Max Confidence: KPI max_conf is the maximum confidence of the two association rules related to A and B, namely, “p(A|B)” and “p(B|A)”. max_conf(A => B [ s, c ]) = max(p(B|A), p(A|B))
  • 11.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 11 Typical MBA • MB Rule: A => B [ s, c ] – All Confidence: Null invariant KPI all_conf is also the minimum confidence of the two association rules related to A and B, namely, “p(A|B)” and “p(B|A)”. all_conf(A => B [ s, c ]) = min(p(B|A), p(A|B)) – Cosine: Null invariant KPI cosine measure can be viewed as a harmonized lift measure: The two formulae are similar except that for cosine, the square root is taken on the product of the probabilities of A and B. cosine(A => B [ s, c ]) = sqrt(p(B|A).p(A|B)) – Kulc (Kulczynski) : Null invariant KPI Kulczynski measure with a preference for skewed patterns. Range: [0; 1]. kulc(A => B [ s, c ]) = (P(A/B) + P(B/A)) / 2 = support(A,B)/2 * (1/support(A) + 1/support(B)) – IR (Imbalance Ratio): Null invariant KPI IR measures the degree of imbalance between two events that A and B are contained in a transaction. The ratio is close to 0 if the conditional probabilities are similar (i.e., very balanced) and close to 1 if they are very different. Range: [0; 1] (0 indicates a balanced rule). IR(A => B [ s, c ]) = |sup(A) - sup(B)| / (sup(A) + sup(B) - sup(A,B) )
  • 12.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal 12 MB Rules >> Patterns >> Insights … #1 • MB Rules – IF antecedents (set of products/items) THEN consequent (single product/item) – This is extracted from an Association Rule (AR) model after running the Apriori algorithm on the input Transactional data – MB Rule can be captured/stored in a single row format as well as in a multi-row format (and in many flavors at that). – Possible to store the same rule in many ways. For e.g. for rule "b, p, r => c“, we can store the rule in the following ways: • single row "b, p, r => c" format • multi-row model format with separate columns for antecedents and single column for consequent (consequent repeats across rows) – Row #1: antecedent = "b“, consequent = "c", sequence = 1 – Row #2: antecedent = "p", consequent = "c", sequence = 2 – Row #3: antecedent = "r", consequent = "c", sequence = 3 • multi-row metadata format with generic columns for products/items and metadata tags for order sequence, whether product belongs to antecedents or consequent (no repeats) – Row #1: product = "b", tag = "ant", sequence = 1 – Row #2: product = "p", tag = "ant", sequence = 2 – Row #3: product = "r", tag = "ant", sequence = 3 – Row #4: product = "c", tag = “cons", sequence = 4
  • 13.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 13 MB Rules >> Patterns >> Insights … #2 • Insights: How does one come up with them? How do we begin? • How do we go from "finding patterns in data“ to "generating meaningful insights from data”? • Tool/Framework for “Patterns identified … What then” • Localized /Partition / Global Patterns • Trust/Distrust but verify … Allow for ‘critical’ Evaluation of the patterns under different contexts. – Against newer data – Against data from a specific period of interest (Promotion/Campaign/Festival) – Across regions/geography /organizations etc. • BI Actions on ML output/artifacts can popularize content with the ability to comment, collaborate, highlight, share etc.
  • 14.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 14 MB Rules >> Patterns >> Insights … #3 • Make it possible to act on the MB Rule independent of the entire model – The output of modeling exercise is a lot of MB Rules but Not all patterns are useful. – Association Rules Model based Patterns (i.e. MB Rules) are independent of each other. Allows focused analysis. Unlike a Decision Tree based Rule or a Clustering Model, we can zoom in on a set of rules or even a single rule of interest and analyze it w/o affecting the rest of the patterns/Rules. – We have ways to identify "interesting" rules using technical criteria/KPIs but context/business exigency trumps technical analysis. – Multiple MB Models can be in play at the same time working on the same input transactional dataset but baking in business context into the model. E.g. analyze product purchase patterns with model1, analyze mode of payment choices in model2, analyze behavior of customer segments say, newly signed up customers or customers responding to a Marketing Campaign in model3 etc.
  • 15.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 15 MB Rules >> Patterns >> Insights … #4 • Taking the MB Rule and analyzing it in different contexts is typically an offline exercise – Typically this would involve a lot of offline actions/modeling exercises to look at the Transactional dataset from different perspectives – From frinkiac :D – Well, There is a way … and that’s where SQL Pattern Matching comes in. Not easy :D
  • 16.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 16 MB Rules >> Patterns >> Insights … #5 • Allow for What-if actions on MB Rules/Patterns – From frinkiac :D – SQL Tools allow what-if ... This use case can also facilitate end users to perform what if actions via BI Tools.
  • 17.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 17 MBA Revisited • Transaction Data augmented with "Tags" • Rules of type "IF … THEN … “ • with more rules involving the added tags …
  • 18.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 18 MBA Revisited • with all the standard KPIs …
  • 19.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 19 MBA Revisited • and a lot more …
  • 20.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 20 MB Rule Transformations/Metadata • Extract from AR model • Transformed to • With additional rule level metadata (nm’s shown here, same for ids)
  • 21.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 21 MB Rule Transformations/Metadata • With additional rule level metadata (same for ids) • Also in single row format
  • 22.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 22 KPIs … MB Rule and MB Rule-MB Trx levels • (Rule KPIs) aSold – Count of Trx … antecedents part of the MB Rule has been sold • (Rule KPIs) bSold – Count of Trx …. consequent part of the MB Rule … • (Rule KPIs) abSold – Count of Trx … both antecedents and consequent parts of the MB Rule … • (Rule KPIs) aobSold - Count of Trx … either antecedents or consequent parts of the MB Rule … • (Sequential KPIs) aSeq - … antecedents (as per MB Rule defn) have been sold in the same order in the Trx • (Sequential KPIs) abSeq - … antecedents (in any order, all of them) sold before consequent • (Sequential KPIs) aSeqb - … antecedents (in order, all of them ) sold before consequent • (Sequential KPIs) bSeqa - … consequent sold before antecedents (any order but all of them) • (Sequential KPIs) baSeq - … consequent sold before antecedents (in order as per MB Rule defn) • (Share of wallet/Trx KPIs) aSA – a Sold Alone i.e. Trx is wholly composed of antecedents • (Share of wallet/Trx KPIs) bSA – b Sold Alone i.e. Trx is wholly composed of consequent • (Share of wallet/Trx KPIs) abSA – ab Sold Alone i.e. Trx is wholly composed of antecendents and consequent
  • 23.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 23 MBA Revisted – Design Considerations • Other KPIs needed: R package arules has lot of additional KPIs available. Lots of literature relating to why the standard set of 3 KPIs are not enough. – Lift: Not directional. Susceptible to null Trx. – Support, Confidence: Sometimes some rules can have higher support and higher confidence but be a worse representation of reality than similar rules with lower support and lower confidence (s, c as well as lift can mislead). • Sequential KPIs: Co-occurrence may not suffice based on domain/business. Useful for analyses like Path to purchase, Recommendations, Purchase/Usage/Fraud Behavior patterns. – If Rule extracted from model is “P, Q, R => D” but Q, R, P is the dominant sequence of purchase (9/12) within (P, Q, R) then maybe the MB Rule is better represented and consumed as “Q, R, P => D” instead of “P, Q, R => D”. – Recommendation (Shopping Cart) … For Next Purchase Recommendation, the recommended product should be such that it’s a popular choice after all items of antecedent (P,Q,R) have been added previously (w/o considering strict order of adding items to cart). • Null invariant KPIs ideal for Exploratory Data Analysis (EDA) wherein Analysts can explore data under the Rule/Pattern context using standard BI Techniques (Slice and Dice actions). – Null-invariant KPI measures (all_conf, cosine, kulc and IR): KPI value is only influenced by the supports of A, B, and (A  B), or more exactly, by the conditional probabilities of A|B and B|A, but not by the total number of transactions. Another common property is that each measure ranges from 0 to 1, and the higher the value, the closer the relationship between A and B.
  • 24.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 24 MBA Architecture/Process Flow Oracle Db/ADW (Application Schema) Extract MB Rules and related reporting info Pre – processing (ETL): Prepare Data for MBA MBA Settings Config: Define scope, parameters for MBA Other … ClickStream, IoT, Mktg etc. Weather Data Ingestion? POS Ingestion of Rtl Trx Build OAA/ODM models using dbms_data_mining package Post – processing ETL: Prepare Data for MBA Reporting Store MBA Reporting results back to Oracle Db OAC/OBIEE Reporting – Market Basket Analysis Reporting – OJET, Others UI Config (Read Write) – OBIEE/APEX Market Basket Analysis process
  • 25.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 26 SQL Pattern Matching: Match Process Preparation • Match Process Preparation (ETL or view) •
  • 26.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 1. Define input 2. Partition/order input 3. Process pattern 4. using defined conditions 5. Output: rows per match 6. Output: columns per row 7. Go where after match? 8. Return Data at match level (select * …) or higher levels (based on group bys) 27 select rule, --get KPIs at rule level --trx, --uncomment to get KPIs at rule-trx level KPIs from INPUT_TABLE_OR_QUERY match_recognize( partition by "model", rule, trx order by rule_mbcomp_seq measures text_kpi_meas, kpi_meas, agg_kpi_meas, kpi_partial_meas etc one row per match after match skip past last row pattern (PERMUTE(apli*, bpli*, opli*)) define apli as (mb_comp = 'a_part'), bpli as (mb_comp = 'b_part'), opli as (mb_comp = 'o_part') ) group by rule --, trx --uncomment to get KPIs at rule-trx level ; Slide Credit: Customized from Stew Ashton’s Advanced Pattern Matching pptx which was part of Oracle Code Paris 2018 SQL Pattern Matching: Input, Processing, Output
  • 27.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 29 Some Issues/Challenges • Pattern Matching SQL needs a fixed pattern to use for matching. – We can write SQL for a single Rule … to match against a dataset (many Trx) • For e.g. for rule “p, q, r => a” we use …. PATTERN ( permute(p,q,r) | a ) DEFINE p as trxli_prod_nm = 'p', q as trxli_prod_nm = 'q', r as trxli_prod_nm = 'r', a as trxli_prod_nm = 'a' – When we need to match many patterns (say, act on a whole AR model with 100 rules) against a trx dataset we should define the patterns via metadata/component structures. PATTERN (PERMUTE(apli*, bpli*, opli*)) DEFINE apli as (mb_comp = 'a_part'), bpli as (mb_comp = 'b_part'), opli as (mb_comp = 'o_part') • Same sql for any pattern => Allows integration into ETL or use in sql view to match dynamically via sql query (issued by BI Tools). Metadata based pattern, SQL Data driven pattern, Dyn SQL
  • 28.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 30 Some Issues/Challenges • How should one handle the issue of merging multiple dimensions into a single analysis dimension? This problems comes up in Graphs too (All dimension members need to be combined into a set of nodes … "node Id" identifier conflicts). • Currently using offsets for each dimension to base their Ids. Product has the highest offset (unbounded). •
  • 29.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 31 Sample Screenshots
  • 30.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 32 Sample Screenshots
  • 31.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 33 Demo • Autonomous Database Warehouse (ADW) … Oracle 18c database – ADW is optional but Oracle Database 12c+ is mandatory. ANSI SQL Feature Row Pattern Matching via keyword MATCH_RECOGNIZE should be available in the Db. – SQL scripts used for pre-processing/data preparation of the input data – SQL scripts also used for post-processing • Oracle Machine Learning (OML) bundled/packaged with ADW – OML is optional but Oracle Data Mining (part of Oracle Advanced Analytics) is mandatory – ODM based pl/sql api is used to make a call to the Apriori Algorithm for performing Market Basket Analysis – SQL queries used to extract the patterns from the Association Rules (AR) model. • Oracle Analytics Cloud (OAC) – (optional component) You can replace OAC with Tableau/Looker/… w/o losing core functionality – However many advanced features of the solution leverage the rpd (data modeling layer) component of OAC
  • 32.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | 34 • Useful? • Very little shown of ADW/OML currently (end goal), using SQL Developer for most Db actions • Need more details on SQL Pattern Matching?

Editor's Notes

  • #3 This is a Branded Title Slide with Graphic slide ideal for including a picture with a brief title, subtitle and presenter information. Do not customize this slide with your own picture.
  • #5 With Big Data it is just not possible to slice & dice data sets using a BI table. However with SQL Pattern Matching available in Oracle Database 12c onwards, you can process these types of data sets to uncover the real interesting stories hidden inside the huge mass of data points and then bake the SQL Pattern Matching logic into the BI/Analytical schema (via views, say) to actually slice and dice data just like the good old days of star schema based SQL BI Tools. Most discovery workflows are based on finding patterns and this is true across a broad range of industries. Everything starts with pattern discovery that then drives other analytical workflows…