1. Big Data Conference 2013:
Analytics and Applications for Federal Big Data
Data Tactics Corp: A Blended Approach to Big
Data Analytics
!
Richard Heimann,
Data Scientist at Data Tactics Corporation
2. !
Data Tactics Analytics Practice
The Team:
(Nathan D., Shrayes R., David P., Adam VE., Geoffrey B., Rich H.)
Graduates from top universities...
!
Advanced degrees include:
mathematics, computer science, astrophysics, electrical
engineering, mechanical engineering, statistics, social sciences.
!
Base competencies (horizontals): clustering, association rules,
regression, naive bayesian classiﬁer, decision trees, time-series,
text analysis.
!
Going beyond the base (verticals)...
3. th
an
pl
st
RT
CA
Ra
ru
nd
om
se
ct
nt
ni
co
ur
ng
im Fo
ns
al
en res
alg
tra
eq
ta
t
in
or
ua
na
ed
ith
tio
to
lys
m
op
n
pi
ec
s
is
m
tim
c
on
od
m
om
od iza
eli
ng
els tion fac
et
sp
ri
to
s
ra
at cs
ial
na
ec
di
lys
au
ba
m
on
is
to
ye
en
om
re
sia
sio
gr
et
n
es
na
ric
st
siv
lr
at
s
ed
ist
e
m
uc
lat
ics
od
tio PC
en
els
n
tc
A
las
IC
s
A
as
an
hi
tro
gr
aly
er
ph
ap
ar
ys sis
ch
h
th
ica
ica
eo
lt
lm
ry
im
od
DL
alg
enu IRT
els
se
IS
or
m
A
rie
ith
er
s
m
ica
an
s
l in
aly
te
sis
m
gr
ba
ixt
at
gg
ur
io
SV
e
in
n
m
g/
M
te
od
bo
ch
m
els
os
ni
ax
qu
tin
en
es
g
t
pa
Horizontals & Verticals
Clustering || Regression || Decision Trees || Text Analysis
Association Rules || Naive Bayesian Classiﬁer || Time Series Analysis
4. Data Tactics Analytics Practice
Hierarchy of Data Scientists
5. Why Analytics [Business]???
Why are analytics important?
(Business, Analytics, Practical)
!
!
!
"We need to stop reinventing the cloud
and start using it!"
(Dave Boyd)
!
!
!
!
6. Why Analytics [Analytics]???
Why are analytics important?
(Business, Analytics, Practical)
!
!
No Free Lunch (NFL): no algorithm performs better than
any other when their performance is averaged uniformly
over all possible problems of a particular type. Algorithms
must be designed for a particular domain or style of
problem, and that there is no such thing as a general
purpose algorithm.
!
!
!
7. Why Analytics [Practical]???
Academic Publications Scale
N
Web Scales
IC Scales
t
If this guy doesn’t scale - none of us do.
t
8. algo to users > algo to data
Development
Deployment
Machine
User
Parallel
Distributed
Objective
Subjective
M/R
HDFS
Valid
Useful
MPP
SOA
Nontrivial
Novel
Accurate
Comprehensible
GPU
9. Shiny
Open Sourced by RStudio in November 2012
!
Not the ﬁrst to wrap R in the browser but perhaps the
easiest for R developers
!
Don’t need to know HTML, CSS and javascript to get
started
!
Reactive Programming model
!
Web sockets for communication
10. server.R
# Define server logic required to generate and plot a random
# distribution!
shinyServer(function(input, output) {!
!
# Expression that generates a plot of the distribution.!
# renderPlot:!
#!
# 1: Is "reactive" and will therefore automatically !
#
re-executed when inputs change.!
# 2: Its output type is a plot. !
!
output$distPlot <- renderPlot({!
!
# generate an rnorm distribution and plot it!
dist <- rnorm(input$obs)!
hist(dist)!
})!
})
11. ui.R
library(shiny)!
!
# Define UI for application that plots random distributions !
shinyUI(pageWithSidebar(!
!
# Application title:!
headerPanel("My Shiny App!"),!
!
# Sidebar with a slider input for number of observations:!
sidebarPanel(!
sliderInput("obs", !
"Number of observations:", !
min = 0, !
max = 1000, !
value = 500)!
),!
# Show a plot of the generated distribution:!
mainPanel(!
plotOutput("distPlot")!
)!
))
12. ui.R
headerPanel()
sidebarPanel()
mainPanel()
13. server.R + ui.R = microscope
adjustable parameters (knobs): 0 < knobs < small k
knobs = lighting, varying objectives, focusing (ﬁne and course)
!
knobs:
ﬁne and course ﬁltering:
geography
time
variable of interest
observations of interest
promote signiﬁcant (objective) patterns
change model parameters
14. BDE + Shiny
15. Overlapping Solutions
Multiple models allow more nuanced
learning from data.
Latent Spatial Traﬃc Patterns
!
Convergent results serve as crossvalidation.
!
2
Points of divergence provide additional
insights and allow models to be
calibrated further.
!
Different models can provide answers to
different questions or answers to the
same question for different analysts.
!
Multi-method excels to diverse teams
with mutable missions.
!
smooth + rough = data
!
New paradigm where the question, “Are
there multiple, overlapping ways to solve
this problem” dominate.
3
1
16. Overlapping Solutions
Are there multiple, overlapping ways to solve this problem?
yt
ic
yt
al
A
An
An
B
al
ic
A+B
+
+
B
C
A+B+C
A
C
Analytic C
19. Data Science for Government (DS4G)
About (DS4G):
!
1: Improve on deﬁnitions of analytics.
2: Outline optimal interactions with Data Scientists.
3: Provide a life-cycle for Data Science.
4: Most importantly, share a taxonomy to identify analytical questions one
could ask of data (Causal Effects, Classiﬁcation, Outlier Detection, Big Data and
Analytics, Measurement Models, & Text Analysis)
!
Presented by Data Tactics Analytics Team
Location: TBD
Time: 1Q 2014
Duration: ~ 5 hrs.
Cost: FREE
Audience: Government managers and Data Tactics partners with their
customers.
Be the first to comment