A Blended Approach to Analytics at Data Tactics Corporation


Slides from Big Data and Analytics for the Federal Government

Slides from Big Data and Analytics for the Federal Government

  • 1. Big Data Conference 2013: Analytics and Applications for Federal Big Data Data Tactics Corp: A Blended Approach to Big Data Analytics ! Richard Heimann, Data Scientist at Data Tactics Corporation
  • 2. ! Data Tactics Analytics Practice The Team: (Nathan D., Shrayes R., David P., Adam VE., Geoffrey B., Rich H.) Graduates from top universities... ! Advanced degrees include: mathematics, computer science, astrophysics, electrical engineering, mechanical engineering, statistics, social sciences. ! Base competencies (horizontals): clustering, association rules, regression, naive bayesian classifier, decision trees, time-series, text analysis. ! Going beyond the base (verticals)...
  • 3. th an pl st RT CA Ra ru nd om se ct nt ni co ur ng im Fo ns al en res alg tra eq ta t in or ua na ed ith tio to lys m op n pi ec s is m tim c on od m om od iza eli ng els tion fac et sp ri to s ra at cs ial na ec di lys au ba m on is to ye en om re sia sio gr et n es na ric st siv lr at s ed ist e m uc lat ics od tio PC en els n tc A las IC s A as an hi tro gr aly er ph ap ar ys sis ch h th ica ica eo lt lm ry im od DL alg enu IRT els se IS or m A rie ith er s m ica an s l in aly te sis m gr ba ixt at gg ur io SV e in n m g/ M te od bo ch m els os ni ax qu tin en es g t pa Horizontals & Verticals Clustering || Regression || Decision Trees || Text Analysis Association Rules || Naive Bayesian Classifier || Time Series Analysis
  • 4. Data Tactics Analytics Practice Hierarchy of Data Scientists
  • 5. Why Analytics [Business]??? Why are analytics important? (Business, Analytics, Practical) ! ! ! "We need to stop reinventing the cloud and start using it!" (Dave Boyd) ! ! ! !
  • 6. Why Analytics [Analytics]??? Why are analytics important? (Business, Analytics, Practical) ! ! No Free Lunch (NFL): no algorithm performs better than any other when their performance is averaged uniformly over all possible problems of a particular type. Algorithms must be designed for a particular domain or style of problem, and that there is no such thing as a general purpose algorithm. ! ! !
  • 7. Why Analytics [Practical]??? Academic Publications Scale N Web Scales IC Scales t If this guy doesn’t scale - none of us do. t
  • 8. algo to users > algo to data Development Deployment Machine User Parallel Distributed Objective Subjective M/R HDFS Valid Useful MPP SOA Nontrivial Novel Accurate Comprehensible GPU
  • 9. Shiny Open Sourced by RStudio in November 2012 ! Not the first to wrap R in the browser but perhaps the easiest for R developers ! Don’t need to know HTML, CSS and javascript to get started ! Reactive Programming model ! Web sockets for communication
  • 10. server.R # Define server logic required to generate and plot a random # distribution! shinyServer(function(input, output) {! ! # Expression that generates a plot of the distribution.! # renderPlot:! #! # 1: Is "reactive" and will therefore automatically ! # re-executed when inputs change.! # 2: Its output type is a plot. ! ! output$distPlot <- renderPlot({! ! # generate an rnorm distribution and plot it! dist <- rnorm(input$obs)! hist(dist)! })! })
  • 11. ui.R library(shiny)! ! # Define UI for application that plots random distributions ! shinyUI(pageWithSidebar(! ! # Application title:! headerPanel("My Shiny App!"),! ! # Sidebar with a slider input for number of observations:! sidebarPanel(! sliderInput("obs", ! "Number of observations:", ! min = 0, ! max = 1000, ! value = 500)! ),! # Show a plot of the generated distribution:! mainPanel(! plotOutput("distPlot")! )! ))
  • 12. ui.R headerPanel() sidebarPanel() mainPanel()
  • 13. server.R + ui.R = microscope adjustable parameters (knobs): 0 < knobs < small k knobs = lighting, varying objectives, focusing (fine and course) ! knobs: fine and course filtering: geography time variable of interest observations of interest promote significant (objective) patterns change model parameters
  • 14. BDE + Shiny
  • 15. Overlapping Solutions Multiple models allow more nuanced learning from data. Latent Spatial Traffic Patterns ! Convergent results serve as crossvalidation. ! 2 Points of divergence provide additional insights and allow models to be calibrated further. ! Different models can provide answers to different questions or answers to the same question for different analysts. ! Multi-method excels to diverse teams with mutable missions. ! smooth + rough = data ! New paradigm where the question, “Are there multiple, overlapping ways to solve this problem” dominate. 3 1
  • 16. Overlapping Solutions Are there multiple, overlapping ways to solve this problem? yt ic yt al A An An B al ic A+B + + B C A+B+C A C Analytic C
  • 17. Summary: # our blended approach ! dt.philosophy <- lm(analytics ~ bigdata + smalldata + objective +, data=data)
  • 18. Overlapping Solutions
  • 19. Data Science for Government (DS4G) About (DS4G): ! 1: Improve on definitions of analytics. 2: Outline optimal interactions with Data Scientists. 3: Provide a life-cycle for Data Science. 4: Most importantly, share a taxonomy to identify analytical questions one could ask of data (Causal Effects, Classification, Outlier Detection, Big Data and Analytics, Measurement Models, & Text Analysis) ! Presented by Data Tactics Analytics Team Location: TBD Time: 1Q 2014 Duration: ~ 5 hrs. Cost: FREE Audience: Government managers and Data Tactics partners with their customers.
  • 20. LUBAP goes wild! 421 attending!
  • 21. Thank you... Questions? Homepage: Blog: Twitter: @DataTactics Slideshare: Or, me (Rich Heimann):