Market Basket Analysis Revisited using SQL Pattern Matching

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Autonomous Data Warehouse
Oracle Machine Learning
Oracle Analytics Cloud
Market Basket Analysis Revisited using SQL
Pattern Matching
Shankar Somayajula
May 29th, 2019
Confidential – Oracle Internal

• Patterns Everywhere
• Typical MBA
• MBA Revisted – Motivations & Design
• SQL Pattern Matching Intro/Usage
• Issues/ Challenges
• Demo / Screenshots
• Feedback
3
Agenda

Finding Patterns in Data
Typical use cases in today’s world of fast exploration of data
Financial
Services
Money
Laundering
Fraud
Tracking Stock
Market
Law
&
Order
Monitoring
Suspicious
Activities
Retail
Returns
FraudBuying
Patterns
Session-
ization
Telcos
Money
Laundering
SIM Card
Fraud
Call
Quality
Utilities
Network
Analysis
Fraud
Unusual
Usage
Lots of
Data

Typical Pattern Matching Use Cases
Input Data Pattern Result
Sessionization Weblogs continuous clicks by same
user
Generate reports on number of distinct
sessions, average page views per session, etc
Fraud Credit card
transactions
two transactions in different
locations within a short
period of time
Find cases in which a credit card may have been
used fraudulently since a physical person cannot
be in two places at once
In-game
purchases
Games logs events leading up to an in-
game purchase
Detect common sequences of event that results
in an in-game purchase
Fraud (mobiles) CDR logs SIM card being used in
multiple handsets
Flag individual SIM cards being used by multiple
handsets within a specified time period
Stock market
analysis
Ticker logs Track possible fraudulent
linked patterns of behavior
Track known patterns of behavior such as head
and shoulders, triangles, channels and wedges

Typical Pattern Matching Use Cases
Input Data Pattern Result
Auditing/Complia
nce
Application
logs
Analyze changes to secure
customer data
Find instances where operator has made
suspect modifications to secure client data
Money
laundering
Transaction
logs
Search for small transfers
within a time window
following by large transfer
within “x” days of last small
transfer
Detect suspicious money transfer pattern for an
account and report account, date of first small
transfer, date of last large transfer
Call service
quality
CDR logs Search for
dropped/reconnected calls
Identify how many times calls were restarted in
a session, total effective call duration and total
interrupted duration
Login security Application
logs
Search for attempted logins Identify attempts to gain access to
application/schema that can be linked to
hackers or inappropriate access

7
Typical MBA
• Transaction Data of type Master-Detail
• Rules of type "IF … THEN … "
• with KPIs – Support, Confidence and Lift

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8
Some examples of MB Rules/Insights
• (diapers) => (beer)
• (peanutButter, jelly) => (bread)
Multidimensional Rules with artificial/virtual products …
• (Item=X, isOver18=TRUE, isNewCustomer=TRUE) => (Item=Y)
• (buyerAge >= 63, loyaltyAge>= 2) => (toothBrushBuy >=2)
• age(X,"20...29"), income(X,"52k...58k") => buys(X, "iPad")
•

Typical MBA
• MB Rule: A => B [ s, c ]
– Support: denotes the frequency of the rule within transactions. A high value means
that the rule involves a great part of database.
support(A => B [ s, c ]) = p(A U B)
– Confidence: denotes the percentage of transactions containing A which also contain B.
It is an estimation of conditioned probability .
confidence(A => B [ s, c ]) = p(B|A) = sup(A,B)/sup(A)
– Lift: a measure of how much better a rule is at predicting the result than just assuming
the result in the first place.
lift(A => B [ s, c ]) = p(B|A)/p(B) = sup(A,B)/sup(A).sup(B)

Typical MBA
• MB Rule: A => B [ s, c ]
– Conviction: measure of the number of times the rule would be incorrect if the association
between A and B was purely random chance. Conviction is a measure of the implication and has
value 1 if items are unrelated. Sensitive to the directionality of the rule (unlike lift)
conviction(A => B [ s, c ]) = sup(A).sup( B’)/sup(A,B’)
– Leverage or Piatetsky‐Shapiro: is the proportion of additional elements covered by both A and B
above the expected if independent.
leverage(A => B [ s, c ]) = sup(A,B) - sup(A). sup(B)
– Max Confidence: KPI max_conf is the maximum confidence of the two association rules related
to A and B, namely, “p(A|B)” and “p(B|A)”.
max_conf(A => B [ s, c ]) = max(p(B|A), p(A|B))

Typical MBA
• MB Rule: A => B [ s, c ]
– All Confidence: Null invariant KPI all_conf is also the minimum confidence of the two association
rules related to A and B, namely, “p(A|B)” and “p(B|A)”.
all_conf(A => B [ s, c ]) = min(p(B|A), p(A|B))
– Cosine: Null invariant KPI cosine measure can be viewed as a harmonized lift measure: The two
formulae are similar except that for cosine, the square root is taken on the product of the
probabilities of A and B.
cosine(A => B [ s, c ]) = sqrt(p(B|A).p(A|B))
– Kulc (Kulczynski) : Null invariant KPI Kulczynski measure with a preference for skewed patterns.
Range: [0; 1].
kulc(A => B [ s, c ]) = (P(A/B) + P(B/A)) / 2 = support(A,B)/2 * (1/support(A) + 1/support(B))
– IR (Imbalance Ratio): Null invariant KPI IR measures the degree of imbalance between two
events that A and B are contained in a transaction. The ratio is close to 0 if the conditional
probabilities are similar (i.e., very balanced) and close to 1 if they are very different. Range: [0; 1]
(0 indicates a balanced rule).
IR(A => B [ s, c ]) = |sup(A) - sup(B)| / (sup(A) + sup(B) - sup(A,B) )

12
MB Rules >> Patterns >> Insights … #1
• MB Rules
– IF antecedents (set of products/items) THEN consequent (single product/item)
– This is extracted from an Association Rule (AR) model after running the Apriori algorithm on the input Transactional data
– MB Rule can be captured/stored in a single row format as well as in a multi-row format (and in many flavors at that).
– Possible to store the same rule in many ways. For e.g. for rule "b, p, r => c“, we can store the rule in the following ways:
• single row "b, p, r => c" format
• multi-row model format with separate columns for antecedents and single column for consequent (consequent repeats
across rows)
– Row #1: antecedent = "b“, consequent = "c", sequence = 1
– Row #2: antecedent = "p", consequent = "c", sequence = 2
– Row #3: antecedent = "r", consequent = "c", sequence = 3
• multi-row metadata format with generic columns for products/items and metadata tags for order sequence, whether
product belongs to antecedents or consequent (no repeats)
– Row #1: product = "b", tag = "ant", sequence = 1
– Row #2: product = "p", tag = "ant", sequence = 2
– Row #3: product = "r", tag = "ant", sequence = 3
– Row #4: product = "c", tag = “cons", sequence = 4

• Insights: How does one come up with them? How do we begin?
• How do we go from "finding patterns in data“ to "generating meaningful insights from data”?
• Tool/Framework for “Patterns identified … What then”
• Localized /Partition / Global Patterns
• Trust/Distrust but verify … Allow for ‘critical’ Evaluation of the patterns under different contexts.
– Against newer data
– Against data from a specific period of interest (Promotion/Campaign/Festival)
– Across regions/geography /organizations etc.
• BI Actions on ML output/artifacts can popularize content with the ability to comment, collaborate,
highlight, share etc.

• Make it possible to act on the MB Rule independent of the entire model
– The output of modeling exercise is a lot of MB Rules but Not all patterns are useful.
– Association Rules Model based Patterns (i.e. MB Rules) are independent of each other. Allows focused analysis. Unlike a
Decision Tree based Rule or a Clustering Model, we can zoom in on a set of rules or even a single rule of interest and
analyze it w/o affecting the rest of the patterns/Rules.
– We have ways to identify "interesting" rules using technical criteria/KPIs but context/business exigency trumps technical
analysis.
– Multiple MB Models can be in play at the same time working on the same input transactional dataset but baking in
business context into the model. E.g. analyze product purchase patterns with model1, analyze mode of payment choices in
model2, analyze behavior of customer segments say, newly signed up customers or customers responding to a Marketing
Campaign in model3 etc.

• Taking the MB Rule and analyzing it in different contexts is typically an offline exercise
– Typically this would involve a lot of offline actions/modeling exercises to look at the Transactional dataset from different
perspectives
– From frinkiac :D
– Well, There is a way … and that’s where SQL Pattern Matching comes in. Not easy :D

• Allow for What-if actions on MB Rules/Patterns
– From frinkiac :D
– SQL Tools allow what-if ... This use case can also facilitate end users to perform what if actions via BI Tools.

MBA Revisited
• Transaction Data augmented with "Tags"
• Rules of type "IF … THEN … “
• with more rules involving the added tags …

MBA Revisited
• with all the standard KPIs …

MBA Revisited
• and a lot more …

MB Rule Transformations/Metadata
• Extract from AR model
• Transformed to
• With additional rule level metadata (nm’s shown here, same for ids)

MB Rule Transformations/Metadata
• With additional rule level metadata (same for ids)
• Also in single row format

KPIs … MB Rule and MB Rule-MB Trx levels
• (Rule KPIs) aSold – Count of Trx … antecedents part of the MB Rule has been sold
• (Rule KPIs) bSold – Count of Trx …. consequent part of the MB Rule …
• (Rule KPIs) abSold – Count of Trx … both antecedents and consequent parts of the MB Rule …
• (Rule KPIs) aobSold - Count of Trx … either antecedents or consequent parts of the MB Rule …
• (Sequential KPIs) aSeq - … antecedents (as per MB Rule defn) have been sold in the same order in the Trx
• (Sequential KPIs) abSeq - … antecedents (in any order, all of them) sold before consequent
• (Sequential KPIs) aSeqb - … antecedents (in order, all of them ) sold before consequent
• (Sequential KPIs) bSeqa - … consequent sold before antecedents (any order but all of them)
• (Sequential KPIs) baSeq - … consequent sold before antecedents (in order as per MB Rule defn)
• (Share of wallet/Trx KPIs) aSA – a Sold Alone i.e. Trx is wholly composed of antecedents
• (Share of wallet/Trx KPIs) bSA – b Sold Alone i.e. Trx is wholly composed of consequent
• (Share of wallet/Trx KPIs) abSA – ab Sold Alone i.e. Trx is wholly composed of antecendents and consequent

MBA Revisted – Design Considerations
• Other KPIs needed: R package arules has lot of additional KPIs available. Lots of literature
relating to why the standard set of 3 KPIs are not enough.
– Lift: Not directional. Susceptible to null Trx.
– Support, Confidence: Sometimes some rules can have higher support and higher confidence but be a worse
representation of reality than similar rules with lower support and lower confidence (s, c as well as lift can mislead).
• Sequential KPIs: Co-occurrence may not suffice based on domain/business. Useful for analyses
like Path to purchase, Recommendations, Purchase/Usage/Fraud Behavior patterns.
– If Rule extracted from model is “P, Q, R => D” but Q, R, P is the dominant sequence of purchase (9/12) within (P, Q, R)
then maybe the MB Rule is better represented and consumed as “Q, R, P => D” instead of “P, Q, R => D”.
– Recommendation (Shopping Cart) … For Next Purchase Recommendation, the recommended product should be such
that it’s a popular choice after all items of antecedent (P,Q,R) have been added previously (w/o considering strict order
of adding items to cart).
• Null invariant KPIs ideal for Exploratory Data Analysis (EDA) wherein Analysts can explore data
under the Rule/Pattern context using standard BI Techniques (Slice and Dice actions).
– Null-invariant KPI measures (all_conf, cosine, kulc and IR): KPI value is only influenced by the supports of A, B, and (A 
B), or more exactly, by the conditional probabilities of A|B and B|A, but not by the total number of transactions.
Another common property is that each measure ranges from 0 to 1, and the higher the value, the closer the
relationship between A and B.

MBA Architecture/Process Flow
Oracle Db/ADW
(Application Schema)
Extract MB Rules
and related
reporting info
Pre – processing
(ETL): Prepare Data
for MBA
MBA Settings Config:
Define scope,
parameters for MBA
Other …
ClickStream, IoT,
Mktg etc.
Weather Data
Ingestion?
POS Ingestion of
Rtl Trx
Build OAA/ODM models
using dbms_data_mining
package
Post – processing
ETL: Prepare Data
for MBA Reporting
Store MBA
Reporting results
back to Oracle Db
OAC/OBIEE Reporting –
Market Basket Analysis
Reporting – OJET,
Others
UI Config (Read Write)
– OBIEE/APEX
Market Basket
Analysis
process

SQL Pattern Matching: Match Process Preparation
• Match Process Preparation (ETL or view)
•

1. Define input
2. Partition/order input
3. Process pattern
4. using defined conditions
5. Output: rows per match
6. Output: columns per row
7. Go where after match?
8. Return Data at match level (select * …)
or higher levels (based on group bys)
27
select
rule, --get KPIs at rule level
--trx, --uncomment to get KPIs at rule-trx level
KPIs
from INPUT_TABLE_OR_QUERY
match_recognize(
partition by "model", rule, trx order by
rule_mbcomp_seq
measures text_kpi_meas, kpi_meas,
agg_kpi_meas, kpi_partial_meas etc
one row per match
after match skip past last row
pattern (PERMUTE(apli*, bpli*, opli*))
define apli as (mb_comp = 'a_part'), bpli as
(mb_comp = 'b_part'), opli as (mb_comp = 'o_part')
)
group by
rule
--, trx --uncomment to get KPIs at rule-trx level
;
Slide Credit: Customized from Stew Ashton’s Advanced Pattern Matching pptx which was part of Oracle Code Paris 2018
SQL Pattern Matching: Input, Processing, Output

Some Issues/Challenges
• Pattern Matching SQL needs a fixed pattern to use for matching.
– We can write SQL for a single Rule … to match against a dataset (many Trx)
• For e.g. for rule “p, q, r => a” we use ….
PATTERN ( permute(p,q,r) | a )
DEFINE
p as trxli_prod_nm = 'p',
q as trxli_prod_nm = 'q',
r as trxli_prod_nm = 'r',
a as trxli_prod_nm = 'a'
– When we need to match many patterns (say, act on a whole AR model with 100 rules)
against a trx dataset we should define the patterns via metadata/component structures.
PATTERN (PERMUTE(apli*, bpli*, opli*))
DEFINE
apli as (mb_comp = 'a_part'),
bpli as (mb_comp = 'b_part'),
opli as (mb_comp = 'o_part')
• Same sql for any pattern => Allows integration into ETL or use in sql view to match
dynamically via sql query (issued by BI Tools).
Metadata based pattern, SQL
Data driven pattern, Dyn SQL

Some Issues/Challenges
• How should one handle the issue of merging multiple dimensions into a single
analysis dimension? This problems comes up in Graphs too (All dimension members
need to be combined into a set of nodes … "node Id" identifier conflicts).
• Currently using offsets for each dimension to base their Ids. Product has the highest
offset (unbounded).
•

Sample Screenshots

Demo
• Autonomous Database Warehouse (ADW) … Oracle 18c database
– ADW is optional but Oracle Database 12c+ is mandatory. ANSI SQL Feature Row Pattern
Matching via keyword MATCH_RECOGNIZE should be available in the Db.
– SQL scripts used for pre-processing/data preparation of the input data
– SQL scripts also used for post-processing
• Oracle Machine Learning (OML) bundled/packaged with ADW
– OML is optional but Oracle Data Mining (part of Oracle Advanced Analytics) is mandatory
– ODM based pl/sql api is used to make a call to the Apriori Algorithm for performing Market
Basket Analysis
– SQL queries used to extract the patterns from the Association Rules (AR) model.
• Oracle Analytics Cloud (OAC)
– (optional component) You can replace OAC with Tableau/Looker/… w/o losing core
functionality
– However many advanced features of the solution leverage the rpd (data modeling layer)
component of OAC

• Useful?
• Very little shown of ADW/OML currently (end goal), using SQL Developer for most Db actions
• Need more details on SQL Pattern Matching?

Market Basket Analysis Revisited using SQL Pattern Matching

Market Basket Analysis Revisited using SQL Pattern Matching

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to Market Basket Analysis Revisited using SQL Pattern Matching

Similar to Market Basket Analysis Revisited using SQL Pattern Matching (20)

Recently uploaded

Recently uploaded (20)

Market Basket Analysis Revisited using SQL Pattern Matching

Editor's Notes