My presentation about building predictive analytics and machine learning solutions. Presented using a number of real world projects that I've worked on over the past couple of years
Predictive analytics: Mining gold and creating valuable product
1.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Predic)ve
Analy)cs
in
Oracle:
Mining
the
Gold
&
Crea)ng
Valuable
Products
Brendan Tierney
2.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
§ Data
Warehousing
since
1997
§ Data
Mining
since
1998
§ Analy)cs
since
1993
3. Big
Data
–
Example
Applica)ons
Not
all
of
these
are
using
Hadoop
or
require
Hadoop
or
…..
!
4.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
How
to
approach
a
Data
Science
Project
Find
me
something
interes.ng
in
my
data
is
a
ques.on
from
hell.
Analy.cs
should
be
guided
by
business
goals
Focus
hard
on
Business
Ques.on
(and
the
relevant
variables)
that
captures
the
essence
of
the
ques.on.
Before
you
can
measure
something
you
really
need
to
lay
down
a
very
concrete
defini.on
of
what
you’re
measuring
5.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Be
Specific
in
Problem
Statement
Poorly
Defined
Be-er
Data
Mining
Technique
Predict
employees
that
leave
• Based
on
past
employees
that
voluntarily
leV:
• Create
New
AWribute
EmplTurnover
à
O/1
Predict
customers
that
churn
• Based
on
past
customers
that
have
churned:
• Create
New
AWribute
Churn ! YES/NO
Target
“best”
customers
• Recency,
Frequency
Monetary
(RFM)
Analysis
• Specific
Dollar
Amount
over
Time
Window:
• Who
has
spent
$500+
in
most
recent
18
months
How
can
I
make
more
$$?
• What
helps
me
sell
soV
drinks
&
coffee?
Which
customers
are
likely
to
buy?
• How
much
is
each
customer
likely
to
spend?
Who
are
my
“best
customers”?
• What
descrip)ve
“rules”
describe
“best
customers”?
How
can
I
combat
fraud?
• Which
transac)ons
are
the
most
anomalous?
• Then
roll-‐up
to
physician,
claimant,
employee,
etc.
How
are
you
going
to
measure
the
results?
What
are
the
evalua)on
metrics?
How
are
you
going
to
use
models
&
results?
6.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Be
Specific
in
Problem
Statement
Poorly
Defined
Be-er
Data
Mining
Technique
Predict
employees
that
leave
• Based
on
past
employees
that
voluntarily
leV:
• Create
New
AWribute
EmplTurnover
à
O/1
Predict
customers
that
churn
• Based
on
past
customers
that
have
churned:
• Create
New
AWribute
Churn ! YES/NO
Target
“best”
customers
• Recency,
Frequency
Monetary
(RFM)
Analysis
• Specific
Dollar
Amount
over
Time
Window:
• Who
has
spent
$500+
in
most
recent
18
months
How
can
I
make
more
$$?
• What
helps
me
sell
soV
drinks
&
coffee?
Which
customers
are
likely
to
buy?
• How
much
is
each
customer
likely
to
spend?
Who
are
my
“best
customers”?
• What
descrip)ve
“rules”
describe
“best
customers”?
How
can
I
combat
fraud?
• Which
transac)ons
are
the
most
anomalous?
• Then
roll-‐up
to
physician,
claimant,
employee,
etc.
How
are
you
going
to
measure
the
results?
What
are
the
evalua)on
metrics?
I’ve
got
all
this
data;
can
you
“mine”
it
and
find
useful
insights?
7.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Define
the
Ques)on
Why
is
this
Topic
Important
(any
sub
topics/areas)
What
has
been
done
before
What
are
the
evalua)on
measures?
What
is
the
relevant
data.
(what
data
is
accessible
and
what
is
not)
Define
techniques
to
use.
How
are
you
going
to
use
the
results.
Out
of
lab
and
into
Architecture
How
Frequently
are
you
going
to
revisit/update
Predic)ve
Analy)cs
Requirements
Gathering
(6-‐10
day
exercise)
8.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Define
the
Ques)on
Why
is
this
Topic
Important
(any
sub
topics/areas)
What
has
been
done
before
What
are
the
evalua)on
measures?
What
is
the
relevant
data.
(what
data
is
accessible
and
what
is
not)
Define
techniques
to
use.
How
are
you
going
to
use
the
results.
Out
of
lab
and
into
Architecture
How
Frequently
are
you
going
to
revisit/update
Predic)ve
Analy)cs
Requirements
Gathering
(6-‐10
day
exercise)
9.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Define
what
data
is
relevant
to
the
Ques)on/Specific
Problem
– What
data
is
easily
available
now
– What
data
is
not
easily
available
now
– What
data
do
you
not
have,
not
captured
etc.
What
is
the
relevant
data.
(what
data
is
accessible
and
what
is
not)
Your
Data
hWps://hbr.org/2016/11/you-‐dont-‐need-‐big-‐data-‐you-‐need-‐the-‐right-‐data
10.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
All
of
the
following
are
Real
projects
§ on
Real
data
§ using
Real
products
§ on
Real
business
problems
§ Are
full
cycle
implementa)ons
&
in
produc)on
Most
Data
Science
/
Predic)ve
Analy)cs
stories
you
hear
about
are
very
limited
§ Many
only
exist
on
paper,
in
a
test
lab/environment,
on
a
presenta)on,
etc
11.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Fraud
Detec)on
§ I’m
not
allowed
to
talk
about
what
I
did
§ But
12.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Insurance
Fraud
Insurers
discovered
a
total
118,500
false
claims
were
made,
equivalent
to
2,279
a
week.
§ Using
OAA
to
assess
each
Claim
as
it
is
received
– Iden)fy
possibility
of
it
being
a
Claim
– Iden)fy
possible
Claim
Amount
– Measure
of
Risk
Exposure
:
Used
to
manage
work
flow
and
priority
§ Works
in
conjunc)on
with
other
Fraud
preven)on
measures
§ Supports
Claim
Risk
Exposure
measures
– Various
regulatory,
group
and
share
holder
requirements
on
Risk
Exposure
13.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Retail
Banking
Fraud
§ Using
OAA
being
used
to
monitor
retail
banking
transac)ons
– Iden)fy
unusual
paWerns
in
transac)ons
– Iden)fy
unusual
paWerns
on
accounts
– Iden)fy
unusual
paWerns
between
branches
– Iden)fy
unusual
Staff
behavior
§ Working
with
exis)ng
Freud
Detec)on
methods
to
give
– Addi)onal
insights
– Near
real-‐)me
monitoring
– Working
within
their
exis)ng
Informa)on
Architecture
§ Near
Real-‐)me
Fraud
Preven)on
measures
– Previous/Current
Fraud
inves)ga)on
is
next
day
or
next
week
– Now
can
iden)fy
intra-‐day,
Fraud
teams
acts
quicker,
etc
14.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
An
Post
§ An
Post
has
made
innova)ve
use
of
Oracle’s
business
intelligence
and
data
warehousing
systems
to
deliver
efficiencies
across
a
range
of
areas,
including
HR,
mail
processing
and
quality
of
service.
§ Oracle’s
business
intelligence
suite
founda)on
Edi)on,
which
provides
same-‐day
view
of
cash
flow
through
an
easy-‐to-‐use
dashboard
• Near
Real-‐)me
Fraud
Preven)on
measures
• Previous/Current
Fraud
inves)ga)on
is
next
day
or
next
week
• Now
can
iden)fy
intra-‐day,
Fraud
teams
acts
quicker,
etc
• Iden)fy
unusual
paWerns
in
transac)ons
• Iden)fy
unusual
paWerns
on
accounts
• Iden)fy
unusual
paWerns
between
branches
• Iden)fy
unusual
Staff
behavior
15.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Higher
Educa)on
Example
:
Higher
EducaNon
Student
Reten)on
Funding
Model
of
Universi)es
in
the
UK
How
can
we
maximise
our
Student
Reten)on
and
increase
our
funding
Can
we
manage
our
Student
selec)on
process
beWer?
16.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
OBIEE
Dashboard
E-Learner
Female
Science“unknown”
Poor data quality
17.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
What
about
the
Money
££££
What
was
the
original
problem?
We
want
to
reduce
the
number
of
students
who
withdraw
early
from
their
courses?
(student
churn)
and
increase
our
funding
(revenue)
Did
we
achieve
this
?
Typical
student
churn
of
2,300
per
year
x
£10K
=
(£23,000,000)
82%
success
=
£18,860,000
of
poten.al
revenue
gain
Implemented
using
a
mixture
of
Provide
beTer/addi.onal
student
support
Be
more
selec.ve
with
making
offers
Restructure
Courses
Look
at
how
courses
are
adver.sed
and
entry
requirements
You
can
imagine
how
much
this
would
save
across
all
the
133
Universi)es
in
UK
>
£1b
18.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
HCM
Analy)cs
§ An
Oxford
economics
report
(Feb
2014)
es)mates
the
average
cost
per
employee
is
£30,614:
– Lost
“cost
of
lost
output”
whilst
replacement
employees
get
up
to
speed
– The
“logical
lost”
of
recrui)ng
and
absorbing
a
new
worker
§ Average
employee
turnover
rate
in
the
UK
is
approx
15%
19.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
HCM
Analy)cs
§ Oracle
HCM
20.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
HCM
Analy)cs
§ Major
World-‐Wide
Financial
Ins)tu)on
§ >300K
employees
§ Employee
Churn.
It
is
all
about
the
money?
or
promo)ons?
Right?
§ Not
what
we
discovered
– Employee
engagement,
training,
support,
regulatory
requirements,
staff
requirements,
etc.
– 93%
accuracy
– 68%
of
these
had
monetary
reward
indicators
Some)mes
we
discover
trends
that
are
not
expected.
Un-‐biased
trends
Can
be
difficult
to
accept
21.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Merchandising
Management
of
Outlets
§ Supply
Chain
Management
§ Ensuring
your
products
are
on
the
shelves
in
the
outlets
§ Limited
Staff
to
visit
outlet
:
Who
should
they
be
targe)ng?
§ Learn
from
the
Past
§ >85%
accuracy
11th
Feb,
2015
Out
of
stock
considered
supply
chain
problem.
Problem
is
'not
on
shelf'.
'Out
back'
no
good,
according
to
consumers
@dunnhumby
#BASummit2015
Holland
Belgium
Spain
Eastern
Canada
22.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Customer
Churn
Management
§ Mobile
Phone
Companies
§ Why
is
Customer
Churn
management
is
important?
– It
costs
a
lot
more
to
recruit
a
new
customer
than
to
keep
an
exis)ng
one.
– You
don
not
want
to
target
all
possible
churner.
– High
value
customers
:
How
to
you
determine
high
value?
– 69%-‐75%
accuracy
§ Social
Network
Analysis
– How
big
is
your
Social
Network
– How
valuable
is
your
Social
Network
23.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
§ Tracking
customer
Sen)ment
–
Call
Centre
&
Customer
reten)on
– Part
of
Customer
Churn
management
– Combined
with
other
Predic)ve
Analy)cs
methods
– Ensemble
Data
Mining/Predic)ve
Analy)cs
§ Can
we
predict
what
)meframe
they
might
churn?
– Is
this
Big
Data?
• Most
of
this
processing
is
done
on
a
Laptop/Desktop
Customer
Sen)ment
26.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Is
Advanced
Analy)cs
for
you?
§ If
you
have
Data
then
YES
§ You
don’t
need
to
have
Big
Data
to
do
Advanced
Analy)cs
§ You
don’t
need
to
hire
PhDs
or
Data
Scien)sts
§ You
can
do
Advanced
Analy)cs
on
the
the
data
you
have.
§ Do
you
have
any
historical
data?
§ Use
what
data
you
have
available
– As
new
data
becomes
available
you
can
add
these
in
27.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Everyone
only
talks
up
to
this
point
Nobody
talks
about
Deployment
Or
what
happens
aVer
Deployment
28.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Create
Valuate
Products
✓
✗
29.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
brendan.)erney@oraly)cs.com
@brendan)erney
www.oraly)cs.com
ie.linkedin.com/in/brendan)erney
30.
www.oraly)cs.com
t
:
@brendan)erney
e
:
brendan.)erney@oraly)cs.com
Word
Cloud
of
the
Oracle
Advanced
Analy)cs
web-‐pages
hWp://www.oraly)cs.com/2015/01/crea)ng-‐word-‐cloud-‐of-‐oracle-‐oaa.html