Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Blended Approach to Analytics at Data Tactics Corporation


Published on

Slides from Big Data and Analytics for the Federal Government

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

A Blended Approach to Analytics at Data Tactics Corporation

  1. 1. Big Data Conference 2013: Analytics and Applications for Federal Big Data Data Tactics Corp: A Blended Approach to Big Data Analytics ! Richard Heimann, Data Scientist at Data Tactics Corporation
  2. 2. ! Data Tactics Analytics Practice The Team: (Nathan D., Shrayes R., David P., Adam VE., Geoffrey B., Rich H.) Graduates from top universities... ! Advanced degrees include: mathematics, computer science, astrophysics, electrical engineering, mechanical engineering, statistics, social sciences. ! Base competencies (horizontals): clustering, association rules, regression, naive bayesian classifier, decision trees, time-series, text analysis. ! Going beyond the base (verticals)...
  3. 3. th an pl st RT CA Ra ru nd om se ct nt ni co ur ng im Fo ns al en res alg tra eq ta t in or ua na ed ith tio to lys m op n pi ec s is m tim c on od m om od iza eli ng els tion fac et sp ri to s ra at cs ial na ec di lys au ba m on is to ye en om re sia sio gr et n es na ric st siv lr at s ed ist e m uc lat ics od tio PC en els n tc A las IC s A as an hi tro gr aly er ph ap ar ys sis ch h th ica ica eo lt lm ry im od DL alg enu IRT els se IS or m A rie ith er s m ica an s l in aly te sis m gr ba ixt at gg ur io SV e in n m g/ M te od bo ch m els os ni ax qu tin en es g t pa Horizontals & Verticals Clustering || Regression || Decision Trees || Text Analysis Association Rules || Naive Bayesian Classifier || Time Series Analysis
  4. 4. Data Tactics Analytics Practice Hierarchy of Data Scientists
  5. 5. Why Analytics [Business]??? Why are analytics important? (Business, Analytics, Practical) ! ! ! "We need to stop reinventing the cloud and start using it!" (Dave Boyd) ! ! ! !
  6. 6. Why Analytics [Analytics]??? Why are analytics important? (Business, Analytics, Practical) ! ! No Free Lunch (NFL): no algorithm performs better than any other when their performance is averaged uniformly over all possible problems of a particular type. Algorithms must be designed for a particular domain or style of problem, and that there is no such thing as a general purpose algorithm. ! ! !
  7. 7. Why Analytics [Practical]??? Academic Publications Scale N Web Scales IC Scales t If this guy doesn’t scale - none of us do. t
  8. 8. algo to users > algo to data Development Deployment Machine User Parallel Distributed Objective Subjective M/R HDFS Valid Useful MPP SOA Nontrivial Novel Accurate Comprehensible GPU
  9. 9. Shiny Open Sourced by RStudio in November 2012 ! Not the first to wrap R in the browser but perhaps the easiest for R developers ! Don’t need to know HTML, CSS and javascript to get started ! Reactive Programming model ! Web sockets for communication
  10. 10. server.R # Define server logic required to generate and plot a random # distribution! shinyServer(function(input, output) {! ! # Expression that generates a plot of the distribution.! # renderPlot:! #! # 1: Is "reactive" and will therefore automatically ! # re-executed when inputs change.! # 2: Its output type is a plot. ! ! output$distPlot <- renderPlot({! ! # generate an rnorm distribution and plot it! dist <- rnorm(input$obs)! hist(dist)! })! })
  11. 11. ui.R library(shiny)! ! # Define UI for application that plots random distributions ! shinyUI(pageWithSidebar(! ! # Application title:! headerPanel("My Shiny App!"),! ! # Sidebar with a slider input for number of observations:! sidebarPanel(! sliderInput("obs", ! "Number of observations:", ! min = 0, ! max = 1000, ! value = 500)! ),! # Show a plot of the generated distribution:! mainPanel(! plotOutput("distPlot")! )! ))
  12. 12. ui.R headerPanel() sidebarPanel() mainPanel()
  13. 13. server.R + ui.R = microscope adjustable parameters (knobs): 0 < knobs < small k knobs = lighting, varying objectives, focusing (fine and course) ! knobs: fine and course filtering: geography time variable of interest observations of interest promote significant (objective) patterns change model parameters
  14. 14. BDE + Shiny
  15. 15. Overlapping Solutions Multiple models allow more nuanced learning from data. Latent Spatial Traffic Patterns ! Convergent results serve as crossvalidation. ! 2 Points of divergence provide additional insights and allow models to be calibrated further. ! Different models can provide answers to different questions or answers to the same question for different analysts. ! Multi-method excels to diverse teams with mutable missions. ! smooth + rough = data ! New paradigm where the question, “Are there multiple, overlapping ways to solve this problem” dominate. 3 1
  16. 16. Overlapping Solutions Are there multiple, overlapping ways to solve this problem? yt ic yt al A An An B al ic A+B + + B C A+B+C A C Analytic C
  17. 17. Summary: # our blended approach ! dt.philosophy <- lm(analytics ~ bigdata + smalldata + objective +, data=data)
  18. 18. Overlapping Solutions
  19. 19. Data Science for Government (DS4G) About (DS4G): ! 1: Improve on definitions of analytics. 2: Outline optimal interactions with Data Scientists. 3: Provide a life-cycle for Data Science. 4: Most importantly, share a taxonomy to identify analytical questions one could ask of data (Causal Effects, Classification, Outlier Detection, Big Data and Analytics, Measurement Models, & Text Analysis) ! Presented by Data Tactics Analytics Team Location: TBD Time: 1Q 2014 Duration: ~ 5 hrs. Cost: FREE Audience: Government managers and Data Tactics partners with their customers.
  20. 20. LUBAP goes wild! 421 attending!
  21. 21. Thank you... Questions? Homepage: Blog: Twitter: @DataTactics Slideshare: Or, me (Rich Heimann):