How to Identify, Train or Become a Data Scientist


Published on

The Briefing Room with Neil Raden and Actian
Live Webcast Sept. 3, 2013

Respected research institutes keep saying we have a shortage of data scientists, which makes sense because the title is so new. But most business analysts and serious data managers have at least some of the necessary training to fill this new role. And any number of curious, diligent professionals can learn how to be a data scientist, if they can get access to the right tools and education.

Register for this episode of The Briefing Room to hear veteran Analyst Neil Raden of Hired Brains offer insights about how to identify the key characteristics of a data scientist role. He'll then explain how professionals can incrementally improve their data science skills. He'll be briefed by John Santaferraro of Actian, who will showcase his company's Data Flow Engine, which provides unprecedented visual access to highly complex data flows. This, coupled with Actian's multiple analytics database technologies, opens the door to whole new avenues of possible insights.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How to Identify, Train or Become a Data Scientist

  1. 1. The Briefing Room How to Identify, Train or Become a Data Scientist
  2. 2. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh
  3. 3. Twitter Tag: #briefr The Briefing Room !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers! Mission
  4. 4. Twitter Tag: #briefr The Briefing Room Topics This Month: ANALYTICS October: DATA PROCESSING November: DATA DISCOVERY & VISUALIZATION
  5. 5. Twitter Tag: #briefr The Briefing Room Analytics
  6. 6. Twitter Tag: #briefr The Briefing Room Analyst: Neil Raden Neil Raden is the founder and Principal Analyst at Hired Brains Research. He is the co-author, with James Taylor, of “Smart (Enough) Systems: How To Deliver Competitive Advantage by Automating Hidden Decisions.” With 30 years experience, he is a widely published writer, well-known speaker, analyst and consultant, having personally designed and implemented dozens of large analytical applications in finance, marketing, distribution, logistics, actuarial, intelligence, scientific, statistical and consumer products. As an industry analyst, he has published over 40 white papers, hundreds of articles, blogs and research reports. He welcomes your comments and can be reached at
  7. 7. Twitter Tag: #briefr The Briefing Room Actian ! Actian is a database and software development company ! Actian offers the ParAccel DataFlow Engine, a scalable parallel platform which provides visual access to complex data flows !   The DataFlow Engine is designed to reduce cluster complexity, manage multi-petabytes of data, and scale with the size and dimensionality of the data
  8. 8. Twitter Tag: #briefr The Briefing Room Guest: John Santaferraro John Santaferraro is the Vice President of Product Marketing at Actian. Prior to joining Actian, Santaferraro was an independent industry analyst in the business intelligence and analytics market. Before that he developed and executed a vertical market strategy for Hewlett Packard's BI group, focusing on energy, communications, retail, healthcare and financial services; he was also instrumental in helping establish HP’s new BI business group with a combination of solutions, products and consulting. In 2000, John founded a marketing and sales consulting company, Ferraro Consulting, providing business acceleration strategy for technology companies.
  9. 9. Enabling the Business Scientist John Santaferraro Vice President of Marketing, ParAccel Platform Group September 3, 2013
  10. 10. What is a “business scientist”? Requirements of a “business scientist” Tools of a “business scientist” Creating a culture of “business science” 10© 2013 Actian Corporation
  11. 11. The “Moneyball” Effect § Analytics Go Mainstream •  Major League Baseball §  Hire the best team •  NSA and Big Data §  ??????????????? •  Target and Pregnancy §  Predicting pregnancies 11© 2013 Actian Corporation
  12. 12. What is a Data Scientist? 12© 2013 Actian Corporation
  13. 13. A data scientist “…incorporates varying elements and builds on techniques and theories from many fields, including mathematics, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.” 13 What is a Data Scientist? © 2013 Actian Corporation Created by Calvin Andrus, depicts a mash-up of disciplines from which Data Science is derived, 13 July 2012
  14. 14. What is a Business Scientist? “A business scientist is an expert in the science of business, sitting between the business analyst and the data scientist, pulling together cross-functional expertise from data science, analytics, business applications, business processes, and business strategy. 14© 2013 Actian Corporation
  15. 15. Business Science Practice Areas Business Science Sales Marketing Supply Chain Logistics Finance Human Resource Risk Fraud 15© 2013 Actian Corporation
  16. 16. Business Science Skillset Understand How Analytics Work Understand Emerging Data Types Understand Business Operations & Strategy Learn Quickly Think Outside the Box Tell Compelling Stories 16© 2013 Actian Corporation
  17. 17. §  Libraries of Analytic Functions Run at Extreme Speed •  Transformational Analytics •  Statistical Analytics •  Machine Learning Analytics •  Clustering Analytics •  Discovery Analytics §  Visual Framework for Data Discovery, Preparation and Analytics •  Drag and Drop Interaction •  Libraries of Data Preparation Operators •  Libraries of Analytic Operators •  High-Performance, Parallel Processing on Hadoop (or other file systems) 17 The Tools of the Business Scientist © 2013 Actian Corporation
  18. 18. ParAccel Platform – Unconstrained Analytics Business  Intelligence   and  Repor3ng  Tools   Advanced     Analy3cs   Analy3c     Applica3ons   Machine   Data   Opera3onal   Data   3rd  Party   Info   Provider   Streaming   Data   Logs   On-­‐Demand  Integra3on   On  Demand  Integra3on  Services   Enterprise   Data  Warehouse   Hadoop   Big  Data   Apps   Embedded   Analy3cs   18© 2013 Actian Corporation In-­‐Database  Analy3cs  
  19. 19. Accelerate Time to Value with Libraries of Analytic Functions Corporate FinanceStatistical •  Standard Deviation •  Correlation •  Covariance, etc. •  Present Value Analysis •  Stock Valuation •  Asset Valuation, etc. Options / Derivatives Univariate •  Gamma distribution •  Maxwell distribution •  Weibull, etc. •  Risk neutral valuation (with/without Black- Scholes) •  Greeks, etc. Portfolio Management Multivariate •  Normal Copula •  Hypothesis Testing •  Gumbel Copula, etc. •  Currency / Cross-currency derivatives •  Merton Models, etc. Fixed IncomeData Mining •  K-Means •  Logistic Regression, •  Neural Networks, etc. •  Price and Yield •  Duration •  Convexity, etc. Time Series Analysis Mathematical •  Trigonometric •  Permutation / Combination •  Exponential / Logarithm, etc. •  ARMA / ARIMA models •  ARCH/GRACH model •  Regime Switch, etc. u  100+ pre-loaded SQL, windows, and mathematical functions pre-loaded u  500+ advanced analytics available for purchase © 2013 Actian Corporation
  20. 20. Business Analyst to Business Scientist 20© 2013 Actian Corporation Unconstrained Analytics Load and Go Run Ad Hoc Queries Query Any Time Query Any Data Query All Data Run Any Analytics Execute Sophisticated Analytics Return Results Quickly Iterate Quickly Through Discovery Share Workloads With Any Platform Support All Analysts Run Many Applications Create Analytic Services
  21. 21. ParAccel Dataflow & Hadoop Analytics 21 ›  On-demand integration ›  Data and Application Integration ›  In-flight preparation ›  In-Hadoop preparation ›  Dataflow optimizations ›  Hadoop optimizations ›  In-Hadoop analytics ›  Non-Hadoop analytics Business Intelligence Analytics Enterprise Social New Data Applications DW www Mobile Machine Data High-Performance BI High-Performance Analytics Connect Prepare AnalyzeOptimize DATA VALUE A visual framework for high-performance, data provisioning, ETL, and analytics on Hadoop (or other file systems) without any knowledge of MapReduce or parallel programming © 2013 Actian Corporation
  22. 22. ParAccel Dataflow – Designer §  Single UI for Data Preparation and Advanced Analytics 22© 2013 Actian Corporation
  23. 23. Dataflow Operator Libraries © 2013 Actian Corporation 23
  24. 24. HDFS ParAccel Platform in Action 24© 2013 Actian Corporation ParAccel  PlaEorm   Read Write Prepare Analyze Read Write Analyze Read Write
  25. 25. Creating a Culture for Business Science 25© 2013 Actian Corporation Create Educational Opportunities Provide Incentives for Participants Reorganize to Support Business Science Deploy Infrastructure to Support Analytics
  26. 26. Contact me at… 408.373.7500 Visit Actian at… 26© 2013 Actian Corporation
  27. 27. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Neil Raden
  28. 28. Analy5c  Types  and  Roles   Neil  Raden   Founder,  Hired  Brains  Research   Twi>er:  NeilRaden       Blog:  h>p://   Website:  h>p://   Mail:   LinkedIn:  h>p://   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC   28  
  29. 29. No  More  Managing  from  Scarcity   29  
  30. 30. Even  Big  Data  Doesn’t  Speak  for  Itself   30   •  Incomplete! •  Behaviors under- represented! •  Anonymizing disasters! •  Selection! •  ML still needs analyst! Not  a  crystal  ball  
  31. 31. Anscombe’s  Quartet   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC   31     Mean  of  x  =  9     Variance  of  x  =  11     Mean  of  y  =  7.50     Variance  of  y  =  4.122   Correla5on  between  x  and  y   =  0.816   Linear  regression  line  y  =  3.00   +  0.500x  
  32. 32. Descrip3ve  Title Quan3ta3ve  Sophis3ca3on/ Numeracy Sample  Roles Type  I Quan5ta5ve  R&D PhD  or  equivalent Crea5on  of  theory,   development  of  algorithms.   Academic  /research.  Work  in   business/government  for   very  specialized  roles Type  II Data  Scien5st  or  Quan5ta5ve   Analyst Advanced  Math/Stat,  not   necessarily  PhD Internal  expert  in  sta5s5cal   and  mathema5cal  modelling   and  development,  with  solid   business  domain  knowledge.   Type  III Opera5onal  Analy5cs     Good  business  domain,   background  in  sta5s5cs   op5onal Running  and  managing   analy5cal  models.  Strong   skills  in  and/or  project   management  of  analy5cal   systems  implementa5on Type  IV Business  Intelligence/   Discovery Data  and  numbers  oriented,   but  no  special  advanced   sta5s5cal  skills Repor5ng,  dashboard,  OLAP   and  visualiza5on,  some   design,  posterior  analysis  of   results  from  quan5ta5ve   methods.  Spreadsheets,   “business  discovery  tools”   32   Analy3c  Types   Types  of  Analysis   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC  
  33. 33. Ques5ons   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC   33   Analy3c  Types   •  How  would  you  describe  the  difference   between  a  data  scien5st  and  a  business   scien5st?   •  What  tools  are  needed  to  support  a  business   analyst?   •  What’s  the  career  path  for  a  business  analyst?   •  Is  big  data  suffering  from  hype?  
  34. 34. Ques5ons   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC   34   Analy3c  Types   •  Why  do  you  think  people  are  afraid  of  math?   •  Should  universi5es  prepare  people  for   business  science  or  should  industry?  
  35. 35. Twitter Tag: #briefr The Briefing Room
  36. 36. Twitter Tag: #briefr The Briefing Room Upcoming Topics September: ANALYTICS October: DATA PROCESSING November: DATA DISCOVERY & VISUALIZATION
  37. 37. Twitter Tag: #briefr The Briefing Room Thank You for Your Attention
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.