AnDSummit2020 Session Pattern Analysis Data Model

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Autonomous Data Warehouse
Oracle Machine Learning
Oracle Analytics Cloud
A Data Model Approach to performing
Pattern Analysis
Shankar Somayajula
shankar.somayajula@oracle.com
Feb 25th, 2020

• Pattern Analysis Data Model … as extension to analytical star schema
• Pattern/MB Rule Definition
• SQL Pattern Matching
• Market Basket BI Application/usecase
• Demo / Screenshots
• Benefits of Pattern Analysis – other possibilities
• Q&A
3
Agenda

Finding Patterns in Data
Typical use cases in today’s world of fast exploration of data
Financial
Services
Money
Laundering
Fraud
Tracking Stock
Market
Law
&
Order
Monitoring
Suspicious
Activities
Retail
Returns
FraudBuying
Patterns
Session-
ization
Telcos
Money
Laundering
SIM Card
Fraud
Call
Quality
Utilities
Network
Analysis
Fraud
Unusual
Usage
Lots of
Data

Typical Pattern Matching Use Cases
Input Data Pattern Result
Sessionization Weblogs continuous clicks by same
user
Generate reports on number of distinct
sessions, average page views per session, etc
Fraud Credit card
transactions
two transactions in different
locations within a short
period of time
Find cases in which a credit card may have been
used fraudulently since a physical person cannot
be in two places at once
In-game
purchases
Games logs events leading up to an in-
game purchase
Detect common sequences of event that results
in an in-game purchase
Fraud (mobiles) CDR logs SIM card being used in
multiple handsets
Flag individual SIM cards being used by multiple
handsets within a specified time period
Stock market
analysis
Ticker logs Track possible fraudulent
linked patterns of behavior
Track known patterns of behavior such as head
and shoulders, triangles, channels and wedges

Typical Pattern Matching Use Cases
Input Data Pattern Result
Auditing/Complia
nce
Application
logs
Analyze changes to secure
customer data
Find instances where operator has made
suspect modifications to secure client data
Money
laundering
Transaction
logs
Search for small transfers
within a time window
following by large transfer
within “x” days of last small
transfer
Detect suspicious money transfer pattern for an
account and report account, date of first small
transfer, date of last large transfer
Call service
quality
CDR logs Search for
dropped/reconnected calls
Identify how many times calls were restarted in
a session, total effective call duration and total
interrupted duration
Login security Application
logs
Search for attempted logins Identify attempts to gain access to
application/schema that can be linked to
hackers or inappropriate access

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8
PADM Evolution
• OAC/OBIEE Business Model • Typical MB involves
extraction of MB
Rules/Patterns from
Trx Data.
• MB Rules are
qualified with default
MB KPIs
• BI schema for adhoc
reporting/analysis
can involve source Trx
data analysis as well
as pattern/MB Rule
analysis (disjoint)
Store
Customer
Channel
Promotion
Product
MB Rule Trx
MB Prod
MB OML KPIs
MB Rules
MB Trx

PADM Evolution
extraction of MB
Rules/Patterns from
Trx Data.
• MB Rules are
MB KPIs
• BI schema for adhoc
reporting/analysis
can involve source Trx
data analysis as well
as pattern/MB Rule
analysis
• Add Model
Dimension for
analysis context.
MB Rule Trx
MB Prod MB OML KPIs
MB Rules
MB Trx KPIs
MB Model

PADM Evolution
extraction of MB
Rules/Patterns from
Trx Data.
• MB Rules are
MB KPIs
• Advanced BI schema
to support adhoc
reporting/analysis of
MB Rules/Patterns
across whole dataset
or split by attribute
fields as well against
source Trx subset of
interest.
• Model for analysis
context.
MB Rule Trx
MB Prod MB OML KPIs
MB Rules
MB Trx KPIs
MB Model
MB Rule KPIs

PADM Evolution
• OAC/OBIEE Business Model
MBKPIs (Model - Rule) –
Dataset, All Trx
MB Rules
MB Model
MB Rule KPIs

PADM Evolution
• OAC/OBIEE Business Model
MBKPIs (Model – Rule – Trx) –
Data Subset, Partition, Deepdives
MB Rules
MB Model
MB Rule KPIs
MB Rule Trx

Patterns – Some examples
• Complete Dataset (DS)
•
Credits: 1. Photo by Markus Spiske on Unsplash

• Find Big Dark Red panels (here, brown = red)

– Assume each horizontal row is a set/transaction of ordered events
• Find a large Blue and a large Red combination of panels
– (here, brown = red) panels
Natural order of events

– Assume each horizontal row is a set/transaction
• Find combination: large Dark Blue and large Pink

– Assume each horizontal row is a set/transaction
• Find combination: large Dark Blue followed by large Pink

Global Models

Pattern Definition
• Global Pattern: p, q => c … {If (p,q) THEN (c)}
– Global KPIs
Model 3
Model 4
Model 1
Model 2

Partitioned Models
• DB/Star Schema/Analysis Container (Host), MB Model (Context), MB Rules and MB KPIs
– Lab like environment for multiple models being in play
Model 1
Model 2
Model 3
Model 4 MB Model Partitioned by Country (say)

Pattern Definition
• Complete Dataset (DS) split by Country: {(C1), (C2), (C3)}
• Partitioned Pattern: p, q => c … {If (p,q) THEN (c)}
• For partition, country=C1 … p, q => c
Model 3
Model 4 Dataset Partitioned along Geography by Country Name(s) C1 C2 C3

Pattern Definition
• Partition - country=C1 … p, q => c
• Partition - country=C2 … NA (Knowledge Discovery),
Available (via SQL)
Model 3

Pattern Definition
• Partition - country=C2 …
Model 3

Pattern Definition
• Pattern or MB Rule
– IF antecedents ((optional) set of logical Partitions, set of products/items)
– THEN consequent (single product/item)
• Complete Dataset (DS):

Pattern Definition
• E.g.: Complete Dataset (DS) split by Country and Year:
{(C1, Y1), {C1, Y2), (C2, Y1), (C2, Y2), (C3, Y1), (C3, Y2)}
Time Year
Country Name
Dataset Partitioned along Time by Year(s)
Dataset Partitioned along Geography by Country Name(s) C1 C2 C3
Y1
Y2

Pattern Definition
• Complete Dataset (DS) split by Country and Year: {(C1, Y1), {C1, Y2), (C2, Y1), (C2, Y2), (C3,
Y1), (C3, Y2)}
• Pattern: country=C2 (LP), year=Y1 (LP), p, q => c
– Logical Partition (Part KPIs) : {(C2, Y1)}
– Non-Partition (NP KPIs): {(C1, Y1), (C1, Y2), (C2, Y2), (C3, Y1), (C3, Y2)}
– Global KPIs: {(C1, Y1), {C1, Y2), (C2, Y1), (C2, Y2), (C3, Y1), (C3, Y2)}
– Core pattern: p, q => c
Time Year
Country Name
Y1
Y2

Pattern Definition
• Complete Dataset (DS) split by Country and Year: {(C1, Y1), {C1, Y2), (C2, Y1), (C2, Y2),
(C3, Y1), (C3, Y2)}
Time Year
Country Name
Y1
Y2

Pattern Definition
(C3, Y1), (C3, Y2)}
Time Year
Country Name
Y1
Y2

Pattern Definition
(C3, Y1), (C3, Y2)}
• Core pattern: p, q => c
– Pattern Logical Partition can act as Filters (performant)
• Not concerned with KPIs at Global or NP levels
• Can be highly selective
Time Year
Country Name
Y1
Y2

Pattern Definition
(C3, Y1), (C3, Y2)}
Time Year
Country Name
Y1
Y2

Pattern Definition
• Pattern: country=C2, year=Y1, p, q => c
– Logical Partition (Part KPIs) : No LP, hence Full DS
– Non-Partition (NP KPIs): NA
– Global KPIs: Full DS
• Core pattern: C2, Y1, p, q => c
Time Year
Country Name

Examples of MB Rules/Insights
• (diapers) => (beer)
• (peanutButter, jelly) => (bread)
• Many ways to improve traditional MB
– Multiple levels of dimension … SKU to Sub-Category to Category (ideally at same time)
– Add additional dimensions – Trx/ Dimensional Attributes as tags
Multidimensional Rules with artificial/virtual products gives richer picture …
• (Item=X, isOver18=TRUE, isNewCustomer=TRUE) => (Item=Y)
• (buyerAge >= 63, loyaltyAge>= 2) => (toothBrushBuy >=2)
• age(X,"20...29"), income(X,"52k...58k") => buys(X, "iPad")
•

Data Model can handle Multiple Datasets and
multiple models within a Dataset
• DB/Star Schema/Analysis Container (Host), MB Model (Context), MB Rules and MB KPIs
– Lab like environment for multiple models being in play
Trx Dataset #1 (SS1, SS #1) Trx Dataset #2 (SH2, SS #2)
Credits: 1. Photo by Markus Spiske on Unsplash, 2. Photo by Andrew Ridley on Unsplash

MB Rules >> Patterns >> Insights … #1a
• MB Rules
– IF antecedents (set of products/items)
– This is extracted from an Association Rule (AR) model after running the Apriori algorithm on the input Transactional data
– Possible to store the MB Rule in many ways. For e.g. for rule "b, p, r => c“, we can store the rule in the following ways:

MB Rules >> Patterns >> Insights … #1b
• MB Rules
– Possible to store the MB Rule in many ways. For e.g. for rule "country=C2 (LP, year=Y1 (LP), b,p,r => c “, we can store the rule
in the following ways:

MB Rules >> Patterns >> Insights … #2
• A lot of MB Rules and Not all patterns are useful.
• Taking the MB Rule and analyzing it in different contexts is typically an offline exercise
– Typically this would involve a lot of offline actions/modeling exercises to look at the Transactional dataset from different
perspectives
– From frinkiac :D
– Well, There is a way … and that’s where SQL Pattern Matching comes in.

• Make it possible to act on the MB Rule independent of the entire model
– The output of modeling exercise is a lot of MB Rules but Not all patterns are useful.
– Association Rules Model based Patterns (i.e. MB Rules) are independent of each other. Allows focused
analysis. Unlike a Decision Tree based Rule or a Clustering Model, we can zoom in on a set of rules or even a
single rule of interest and analyze it w/o affecting the rest of the patterns/Rules.
– We have ways to identify "interesting" rules using technical criteria/KPIs but context/business exigency
trumps technical analysis.
– Multiple MB Models can be in play at the same time working on the same input transactional dataset but
baking in business context into the model. E.g. analyze product purchase patterns with model1, analyze
mode of payment choices for products/product categories in model2, analyze behavior of customer
segments say, newly signed up customers or customers responding to a Marketing Campaign in model3 etc.

MB Rules >> Patterns >> Insights … #4 (cont.)
Credits: 1. Photo by Zhifei Zhou on Unsplash, 2. Photo by Niklas Hamann on Unsplash
Model 3
Model 1
Model 2
Model 4
Rule 1
Rule N
…
Rule 2

Model 3
Model 1
Model 2
Model 4
Rule 1
Rule N
…
Rule 2

Model 3
Model 1
Model 2
Model 4
Rule …
Rule N
Rule 101
Rule 102

• Allow for What-if actions on MB Rules/Patterns
– From frinkiac :D
– SQL Tools allow what-if ... Facilitate end users to perform what if actions via BI Tools.

Demo
• Autonomous Database Warehouse (ADW) … Oracle 18c database
• Oracle Machine Learning (OML) bundled/packaged with ADW
• Oracle Analytics Cloud (OAC)
– Many advanced features of the solution leverage the rpd (data modeling layer) component
of OAC
– KPI Calculations and Deepdives on-demand need the modeling layer (rpd or equivalent)

Why Data Model Solution instead of hand
crafted SQL?
• Pattern Matching SQL benefits from using a fixed pattern for matching.
– We can write SQL for a single Rule … to match against a dataset (many Trx)
• For e.g. for rule “p, q, r => a” we use ….
PATTERN ( permute(p,q,r) | a )
DEFINE
p as (mb_prod_id = 'p'),
q as (mb_prod_id = ‘q'),
r as (mb_prod_id = ‘r'),
a as (mb_prod_id = ‘a')
– When we need to match many patterns (say, act on a whole AR model with 100+ rules of
varying sizes) -- each against a trx dataset we should define the patterns via
metadata/component structures.
PATTERN ((apli|bpli|opli)*)
DEFINE
apli as (mb_comp_li = 'ap'),
bpli as (mb_comp_li = 'bp'),
opli as (mb_comp_li = 'op')
• Same sql for any pattern => Allows integration into ETL or use in sql view to match
dynamically via sql query (issued by BI Tools).
Metadata based pattern, SQL
Data driven pattern, Dyn SQL

Why Data Model Solution?
• Expand defn of "MB Products" to cover other dimensions - channel,
city, country, dayname, timeofday, ... as artificial products
• Design-Time/ETL/offline modeling decisions can be deferred to online
analysis for more interactive/dynamic analysis, BI Dashboard time
decisions
• Possible to model Complex behavior for analysis (sometimes need
extra ETL step but we get full analytics capability thereafter)
– "Avid" Reader/Browser
– "Very Active/Interested in product: X" during Sale/Holiday
– No/Regular/Aggressive Treatment of Patients and its effect on outcomes
– Use a datapoint or (set of) Trx as source for pattern definition (What If) … (no ETL)

Why Data Model Solution?
• Use case: Classification Models (Single Row Trx) can be coerced into
Master-Detail multi-row format needed for SQL Pattern Matching.
– Decision Tree or any other Classification model (Linear Regression, Random
Forests based models as well as other models built using NN, CNN etc.) can be
analyzed using the True Positive (TP) pattern.
– Confusion Matrix KPIs like Accuracy, Recall, Precision etc can be calculated and
recreated at Model/Global level. As shown, ability to do the same for logical
Partition(s) is also possible.

Summary
• Pattern Analysis (Earlier)
– Pattern Discovery via OAA/ODM
– Model used to extract Rules and
core KPIs
– No way to score Rules (need to
rebuild)
– Patterns of special interest
(anomalous/obscure) cannot be
found unless model settings are
relaxed. If we relax the criteria,
we will get those patterns but
also many many more.
• Pattern Analysis via Data Model (This Solution)
– Pattern Discovery via OML (no change)
– Rules/KPIs extracted into a Data Model allowing for BI/Adhoc
analysis
– Post – processing to setup the analysis context (superset of
analysis dimensions/attributes)
– SQL approach allows
• New KPIs – KPIs of statistical nature as well as KPIs related to Business needs
(as elaborate as needed)
• Scoring against new data possible – patterns can degrade in performance
• Score/Track Patterns against specific Trx subsets of interest
• Adhoc BI/Exploratory Data Analysis of Patterns
• Special Patterns of interest (Fraud use cases) with very low support can also be
found as well as analyzed (what-if)
• 2 independent ways to MB KPIs – ETL + DB/BI (faster) or DB View + DB/BI
(slower, on demand)

• Useful?
• Very little shown of ADW/OML currently (end goal), using SQL Developer for most Db actions
• Need more details on Market Basket Analysis (MBA)? SQL Pattern Matching? 40 min talk
precludes possibility of giving lot of introduction to the material.

AnDSummit2020 Session Pattern Analysis Data Model

AnDSummit2020 Session Pattern Analysis Data Model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AnDSummit2020 Session Pattern Analysis Data Model

Similar to AnDSummit2020 Session Pattern Analysis Data Model (20)

Recently uploaded

Recently uploaded (20)

AnDSummit2020 Session Pattern Analysis Data Model