+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
AnalyticOps - Chicago PAW 2016
1. Best
Prac*ces
for
Deploying
Analy*c
Models
into
Opera*ons
Robert
L.
Grossman
Open
Data
Group
and
University
of
Chicago
Predic*ve
Analy*c
World
Chicago
June
21,
2016
rgrossman.com
@bobgrossman
2. Exploratory
Data
Analysis
Get
and
clean
the
data
Build
model
in
dev/
modeling
environment
Deploy
model
in
opera*onal
systems
with
scoring
applica*on
Monitor
performance
and
employ
champion-‐challenger
methodology
Analy*c
modeling
Analy*c
opera*ons
Deploy
model
Re*re
model
and
deploy
improved
model
Select
analy*c
problem
&
approach
Scale
up
deployment
Model Env
Deployment Env
Perf.
data
Life
Cycle
of
an
Analy*c
Model
3. Differences
Between
the
Modeling
and
Deployment
Environments
• Typically
modelers
use
specialized
languages
such
as
SAS,
SPSS
or
R.
• Usually,
developers
responsible
for
products
and
services
use
languages
such
as
Java,
JavaScript,
Python,
C++,
etc.
• This
can
result
in
significant
effort
and
significant
delays
moving
the
model
from
the
modeling
environment
to
the
deployment
environment.
4. Would you minding writing
all your models in Java?
Alice,
Data
Scien*st
Bob,
Data
Scien*st
Joe,
IT
I write all my models
in R, why don’t you
do the same?
I write all my
models in scikit-
learn, why don’t you
do the same?
5. Ways
to
Deploy
Models
into
Products/Services/Opera*ons
• Push
code.
• Embed
a
model
into
a
product
or
service.
• Export
and
import
tables
of
scores
• Export
and
import
tables
of
parameters
• Have
the
product/service
interact
with
the
model
as
a
web
or
message
service.
• Import
the
models
into
a
database
How
quickly
can
the
model
be
updated?
• Model
parameters?
• New
features?
• New
pre-‐
&
post-‐
processing?
7. What
is
an
Analy*c
Engine?
• An
analy*c
engine
is
a
component
that
is
integrated
into
products
or
enterprise
IT
that
deploys
analy*c
models
in
opera*onal
workflows
for
products
and
services.
• A
Model
Interchange
Format
is
a
format
that
supports
the
expor*ng
of
a
model
by
one
applica*on
and
the
impor*ng
of
a
model
by
another
applica*on.
• Model
Interchange
Formats
include
the
Predic*ve
Model
Markup
Language
(PMML),
the
Portable
Format
for
Analy*cs
(PFA),
and
various
in-‐house
or
custom
formats.
• Analy*c
engines
are
integrated
once,
but
allow
applica*ons
to
update
models
as
quickly
as
reading
a
a
model
interchange
format
file.
7
9. PMML
Philosophy
• PMML
is
a
XML
specifica/on
of
a
model,
not
an
implementa/on
of
a
model
• PMML
provides
a
simple
means
of
binding
parameters
to
values
for
an
agreed
upon
set
of
data
mining
models
&
transforma*ons
in
a
safe
way.
9
11. PFA
Philosophy
• Define
primi*ves
for
data
transforma*ons,
data
aggrega*ons,
and
sta*s*cal
and
analy*c
models.
• Support
composi*on
of
data
mining
primi*ves
(which
makes
it
easy
to
specify
machine
learning
algorithms
and
pre-‐/post-‐
processing
of
data).
• Be
extensible.
• Designed
to
be
“safe”
to
deploy
in
enterprise
IT
opera*onal
environments.
• This
is
a
philosophy
that
is
different
and
complementary
to
Predic*ve
Model
Markup
Language
(PMML).
11
12. PFA
Case
Study
1
• 20+
person
data
science
group
developing
models
in
R,
Python,
Scikit-‐learn,
MATLAB,
...
• All
the
data
scien*sts
export
their
model
in
PFA.
• The
company’s
product
imports
models
in
PFA
and
runs
on
their
customers
data
as
required.
Export
PFA
Import
PFA
Widget
records
Widget
scores
13. PFA
Func*onality
• PFA
codes
arbitrary
mathema*cal
algorithms
in
a
*ghtly
controlled
environment.
• PFA
has
all
the
standard
flow
control
of
a
programming
language:
if/then/else
&
for/while
loops.
• PFA
has
func*on
calls
and
func*on
call
backs
• PFA
has
algebraic
data
types.
• PFA
is
encoded
as
func*on
calls
in
JSON
{func*on:
[arg
1,
arg
2,
…,
arg
n]
}
13
14. Benefits
of
PFA
• PFA
is
based
upon
JSON
and
Avro
and
integrates
easily
into
modern
big
data
environments.
• PFA
allows
models
to
be
easily
chained
and
composed.
• PFA
allows
developers
and
users
users
of
analy*c
systems
to
pre-‐process
inputs
and
to
post-‐process
outputs
to
models.
• PFA
is
easily
integrated
with
Hadoop,
Spark,
etc.
• PFA
is
easily
integrated
with
Kaoa,
Storm,
Akka
and
other
streaming
environments.
• PFA
can
be
used
to
integrate
mul*ple
tools
applica*ons
within
an
analy*c
ecosystem.
17. PFA
Case
Study
2
• Two
teams
of
data
scien*sts
develop
analy*c
models
for
an
adversarial
analy*cs
project.
• Models
developed
in
Hadoop
and
exported
in
PFA
every
4
weeks.
• Models
updated
in
client
systems
every
2
weeks.
• It’s
not
quite
this
simple,
but
that’s
the
general
idea.
Export
PFA
Import
PFA
Event
records
Event
scores
Weeks
1-‐4,
5-‐8,
…
Weeks
3-‐6,
7-‐10,
…
Weeks
4,
6,
8,
10,
…
20. Gaussian
Process
Model
(3
of
5)
input: {type: array, items: double}
output: {type: array, items: double}
cells:
table:
type:
{type: array, items: {type: record, name: GP, fields: [
- {name: x, type: {type: array, items: double}}
- {name: to, type: {type: array, items: double}}
- {name: sigma, type: {type: array, items: double}}]}}
init:
- {x: [ 0, 0], to: [0.01870587, 0.96812508], sigma: [0.2, 0.2]}
- {x: [ 0, 36], to: [0.00242101, 0.95369720], sigma: [0.2, 0.2]}
- {x: [ 0, 72], to: [0.13131668, 0.53822666], sigma: [0.2, 0.2]}
...
- {x: [324, 324], to: [-0.6815587, 0.82271760], sigma: [0.2, 0.2]}
action:
model.reg.gaussianProcess:
- input
- {cell: table}
- null
- {fcn: m.kernel.rbf, fill: {gamma: 2.0}}
type
(also
Avro)
and
value
(as
JSON,
truncated)
Gaussian
Process
model
parameters
Source:
dmg.org/pfa
21. Gaussian
Process
Model
(4
of
5)
input: {type: array, items: double}
output: {type: array, items: double}
cells:
table:
type:
{type: array, items: {type: record, name: GP, fields: [
- {name: x, type: {type: array, items: double}}
- {name: to, type: {type: array, items: double}}
- {name: sigma, type: {type: array, items: double}}]}}
init:
- {x: [ 0, 0], to: [0.01870587, 0.96812508], sigma: [0.2, 0.2]}
- {x: [ 0, 36], to: [0.00242101, 0.95369720], sigma: [0.2, 0.2]}
- {x: [ 0, 72], to: [0.13131668, 0.53822666], sigma: [0.2, 0.2]}
...
- {x: [324, 324], to: [-0.6815587, 0.82271760], sigma: [0.2, 0.2]}
action:
model.reg.gaussianProcess:
- input
- {cell: table}
- null
- {fcn: m.kernel.rbf, fill: {gamma: 2.0}}
calling
method:
parameters
expressed
as
JSON
input:
get
interpola*on
point
from
input
{cell:
table}:
get
parameters
from
table
null:
no
explicit
Kriging
weight
(universal)
{fcn:
…}:
kernel
func*on
Source:
dmg.org/pfa
22. Gaussian
Process
Model
(5
of
5)
• Appears
declara*ve,
but
this
is
a
func*on
call.
– Fourth
parameter
is
another
func*on:
m.kernel.rbf
(radial
basis
kernel,
a.k.a.
squared
exponen*al).
–
m.kernel.rbf
was
intended
for
SVM,
but
is
reusable
anywhere.
– One
argument
(gamma)
preapplied
so
that
it
fits
the
signature
for
model.reg.gaussianProcess.
• Any
kernel
func*on
could
be
used,
including
user-‐defined
func*ons
wriuen
with
PFA
“code.”
• The
Gaussian
Process
could
be
used
anywhere,
even
as
a
pre-‐
processing
or
post-‐processing
step.
model.reg.gaussianProcess:
- input
- {cell: table}
- null
- {fcn: m.kernel.rbf, fill: {gamma: 2.0}}
Source:
dmg.org/pfa
23. Genomics
dataset:
2.5+
PB
consis*ng
of
577,878
files
about
14,052
cases
(pa*ents),
in
42
cancer
types,
across
29
primary
sites.
2.5+
PB
of
cancer
genomics
data
+
Bionimbus
data
commons
technology
running
mul*ple
community
developed
variant
calling
pipelines.
Over
12,000
cores
and
10
PB
of
raw
storage
in
18+
racks
running
for
months.
Case
Study
4:
Analy*cOps
for
the
Genomic
Data
Commons*
Source:
Based
in
part
on
“The
Genomic
Data
Commons”,
the
GDC
team,
ms
in
prepara*on.
24. Dev Ops
• Virtualiza*on
and
the
requirement
for
massive
scale
out
spawned
infrastructure
automa*on
(“infrastructure
as
code”).
• Requirement
for
reducing
the
*me
to
deploying
code
created
tools
for
con*nuous
integra*on
and
tes*ng.
25. ModelDev AnalyticOps
• Use
virtualiza*on
/
containers,
infrastructure
automa*on
and
scale
out
to
support
large
scale
analy*cs.
• Requirement:
reduce
the
*me
and
cost
to
do
high
quality
analy*cs
over
large
amounts
of
data.
26. Sowware
Development
Quality
Assurance
Opera*ons
DevOps
The
goal
of
DevOps
is
to
establish
a
culture
and
an
environment
where
building,
tes*ng,
releasing,
and
opera*ng
sowware
can
happen
rapidly,
frequently,
and
more
reliably.*
*Adapted
from
Wikipedia,
en.wikipedia.org/wiki/DevOps.
27. Analy*c
Workflows
Quality
Assurance
Analy*c
Opera*ons
Analy*cOps
The
goal
of
Analy*cOps
is
to
establish
a
culture
and
an
environment
where
building,
valida*ng,
deploying,
and
running
analy*c
models
happen
rapidly,
frequently,
and
reliably.
• Sowware
• Model
• Data
Building
the
right
analy*c
model.
Is
the
analy*c
model
running
right?
Source:
Robert
L.
Grossman,
A
Quick
Introduc*on
to
Analy*cOps.
28. • Model
quality
(confusion
matrix)
• Data
quality
(six
dimensions)
• Lack
of
ground
truth
• Sowware
errors
• Monitoring
system
• Quality
of
workflow
and
scheduling
• Boulenecks,
stragglers,
hot
spots,
etc.
• Analy*c
configura*ons
problems
• System
failures
• Human
errors
Ten
Factors
Effec*ng
Analy*cOps
Source:
Based
in
part
on
“The
Genomic
Data
Commons”,
the
GDC
team,
ms
in
prepara*on.
29. Summary
• Deploying
analy*c
models
is
core
technical
competency.
• The
Portable
Format
for
Analy*cs
(PFA)
is
a
model
interchange
format
for
building
analy*c
models
in
one
environment
and
deploying
them
in
another
one.
• PFA
is
based
upon
data
mining
primi*ves
&
supports
pre-‐processing,
common
analy*c
models,
post-‐
processing,
&
composi*on
of
primi*ves
and
models.
• It
is
easy
to
add
your
own
PFA
func*ons
and
models.
• There
is
reference
implementa*on
&
compliance
tests.
• PFA
is
being
developed
by
the
not-‐for-‐profit
DMG.
• A
discipline
of
analy*cOps
is
emerging
that
supports
more
complex
analy*c
flows
at
greater
scale.
30. Ques*ons?
30
For
more
informa*on
about
PFA,
see:
dmg.org/pfa
rgrossman.com
@BobGrossman