How to Identify, Train or Become a Data Scientist
 

How to Identify, Train or Become a Data Scientist

on

  • 2,730 views

The Briefing Room with Neil Raden and Actian ...

The Briefing Room with Neil Raden and Actian
Live Webcast Sept. 3, 2013
Visit: www.insideanalysis.com

Respected research institutes keep saying we have a shortage of data scientists, which makes sense because the title is so new. But most business analysts and serious data managers have at least some of the necessary training to fill this new role. And any number of curious, diligent professionals can learn how to be a data scientist, if they can get access to the right tools and education.

Register for this episode of The Briefing Room to hear veteran Analyst Neil Raden of Hired Brains offer insights about how to identify the key characteristics of a data scientist role. He'll then explain how professionals can incrementally improve their data science skills. He'll be briefed by John Santaferraro of Actian, who will showcase his company's Data Flow Engine, which provides unprecedented visual access to highly complex data flows. This, coupled with Actian's multiple analytics database technologies, opens the door to whole new avenues of possible insights.

Statistics

Views

Total Views
2,730
Views on SlideShare
1,518
Embed Views
1,212

Actions

Likes
2
Downloads
30
Comments
0

3 Embeds 1,212

http://insideanalysis.com 1178
https://twitter.com 32
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

How to Identify, Train or Become a Data Scientist How to Identify, Train or Become a Data Scientist Presentation Transcript

  • The Briefing Room How to Identify, Train or Become a Data Scientist
  • Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com
  • Twitter Tag: #briefr The Briefing Room !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers! Mission
  • Twitter Tag: #briefr The Briefing Room Topics This Month: ANALYTICS October: DATA PROCESSING November: DATA DISCOVERY & VISUALIZATION
  • Twitter Tag: #briefr The Briefing Room Analytics
  • Twitter Tag: #briefr The Briefing Room Analyst: Neil Raden Neil Raden is the founder and Principal Analyst at Hired Brains Research. He is the co-author, with James Taylor, of “Smart (Enough) Systems: How To Deliver Competitive Advantage by Automating Hidden Decisions.” With 30 years experience, he is a widely published writer, well-known speaker, analyst and consultant, having personally designed and implemented dozens of large analytical applications in finance, marketing, distribution, logistics, actuarial, intelligence, scientific, statistical and consumer products. As an industry analyst, he has published over 40 white papers, hundreds of articles, blogs and research reports. He welcomes your comments and can be reached at nraden@hiredbrains.com.
  • Twitter Tag: #briefr The Briefing Room Actian ! Actian is a database and software development company ! Actian offers the ParAccel DataFlow Engine, a scalable parallel platform which provides visual access to complex data flows !   The DataFlow Engine is designed to reduce cluster complexity, manage multi-petabytes of data, and scale with the size and dimensionality of the data
  • Twitter Tag: #briefr The Briefing Room Guest: John Santaferraro John Santaferraro is the Vice President of Product Marketing at Actian. Prior to joining Actian, Santaferraro was an independent industry analyst in the business intelligence and analytics market. Before that he developed and executed a vertical market strategy for Hewlett Packard's BI group, focusing on energy, communications, retail, healthcare and financial services; he was also instrumental in helping establish HP’s new BI business group with a combination of solutions, products and consulting. In 2000, John founded a marketing and sales consulting company, Ferraro Consulting, providing business acceleration strategy for technology companies.
  • Enabling the Business Scientist John Santaferraro Vice President of Marketing, ParAccel Platform Group September 3, 2013
  • What is a “business scientist”? Requirements of a “business scientist” Tools of a “business scientist” Creating a culture of “business science” 10© 2013 Actian Corporation
  • The “Moneyball” Effect § Analytics Go Mainstream •  Major League Baseball §  Hire the best team •  NSA and Big Data §  ??????????????? •  Target and Pregnancy §  Predicting pregnancies 11© 2013 Actian Corporation
  • What is a Data Scientist? 12© 2013 Actian Corporation
  • A data scientist “…incorporates varying elements and builds on techniques and theories from many fields, including mathematics, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.” 13 What is a Data Scientist? © 2013 Actian Corporation Created by Calvin Andrus, depicts a mash-up of disciplines from which Data Science is derived, 13 July 2012 http://en.wikipedia.org/wiki/Data_science
  • What is a Business Scientist? “A business scientist is an expert in the science of business, sitting between the business analyst and the data scientist, pulling together cross-functional expertise from data science, analytics, business applications, business processes, and business strategy. 14© 2013 Actian Corporation
  • Business Science Practice Areas Business Science Sales Marketing Supply Chain Logistics Finance Human Resource Risk Fraud 15© 2013 Actian Corporation
  • Business Science Skillset Understand How Analytics Work Understand Emerging Data Types Understand Business Operations & Strategy Learn Quickly Think Outside the Box Tell Compelling Stories 16© 2013 Actian Corporation
  • §  Libraries of Analytic Functions Run at Extreme Speed •  Transformational Analytics •  Statistical Analytics •  Machine Learning Analytics •  Clustering Analytics •  Discovery Analytics §  Visual Framework for Data Discovery, Preparation and Analytics •  Drag and Drop Interaction •  Libraries of Data Preparation Operators •  Libraries of Analytic Operators •  High-Performance, Parallel Processing on Hadoop (or other file systems) 17 The Tools of the Business Scientist © 2013 Actian Corporation
  • ParAccel Platform – Unconstrained Analytics Business  Intelligence   and  Repor3ng  Tools   Advanced     Analy3cs   Analy3c     Applica3ons   Machine   Data   Opera3onal   Data   3rd  Party   Info   Provider   Streaming   Data   Logs   On-­‐Demand  Integra3on   On  Demand  Integra3on  Services   Enterprise   Data  Warehouse   Hadoop   Big  Data   Apps   Embedded   Analy3cs   18© 2013 Actian Corporation In-­‐Database  Analy3cs  
  • Accelerate Time to Value with Libraries of Analytic Functions Corporate FinanceStatistical •  Standard Deviation •  Correlation •  Covariance, etc. •  Present Value Analysis •  Stock Valuation •  Asset Valuation, etc. Options / Derivatives Univariate •  Gamma distribution •  Maxwell distribution •  Weibull, etc. •  Risk neutral valuation (with/without Black- Scholes) •  Greeks, etc. Portfolio Management Multivariate •  Normal Copula •  Hypothesis Testing •  Gumbel Copula, etc. •  Currency / Cross-currency derivatives •  Merton Models, etc. Fixed IncomeData Mining •  K-Means •  Logistic Regression, •  Neural Networks, etc. •  Price and Yield •  Duration •  Convexity, etc. Time Series Analysis Mathematical •  Trigonometric •  Permutation / Combination •  Exponential / Logarithm, etc. •  ARMA / ARIMA models •  ARCH/GRACH model •  Regime Switch, etc. u  100+ pre-loaded SQL, windows, and mathematical functions pre-loaded u  500+ advanced analytics available for purchase © 2013 Actian Corporation
  • Business Analyst to Business Scientist 20© 2013 Actian Corporation Unconstrained Analytics Load and Go Run Ad Hoc Queries Query Any Time Query Any Data Query All Data Run Any Analytics Execute Sophisticated Analytics Return Results Quickly Iterate Quickly Through Discovery Share Workloads With Any Platform Support All Analysts Run Many Applications Create Analytic Services
  • ParAccel Dataflow & Hadoop Analytics 21 ›  On-demand integration ›  Data and Application Integration ›  In-flight preparation ›  In-Hadoop preparation ›  Dataflow optimizations ›  Hadoop optimizations ›  In-Hadoop analytics ›  Non-Hadoop analytics Business Intelligence Analytics Enterprise Social New Data Applications DW www Mobile Machine Data High-Performance BI High-Performance Analytics Connect Prepare AnalyzeOptimize DATA VALUE A visual framework for high-performance, data provisioning, ETL, and analytics on Hadoop (or other file systems) without any knowledge of MapReduce or parallel programming © 2013 Actian Corporation
  • ParAccel Dataflow – Designer §  Single UI for Data Preparation and Advanced Analytics 22© 2013 Actian Corporation
  • Dataflow Operator Libraries © 2013 Actian Corporation 23
  • HDFS ParAccel Platform in Action 24© 2013 Actian Corporation ParAccel  PlaEorm   Read Write Prepare Analyze Read Write Analyze Read Write
  • Creating a Culture for Business Science 25© 2013 Actian Corporation Create Educational Opportunities Provide Incentives for Participants Reorganize to Support Business Science Deploy Infrastructure to Support Analytics
  • Contact me at… john.santaferraro@actian.com 408.373.7500 Visit Actian at… www.actian.com 26© 2013 Actian Corporation
  • Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Neil Raden
  • Analy5c  Types  and  Roles   Neil  Raden   Founder,  Hired  Brains  Research   Twi>er:  NeilRaden       Blog:  h>p://hiredbrains.wordpress.com   Website:  h>p://www.hiredbrains.com   Mail:  nraden@hiredbrains.com   LinkedIn:  h>p://www.linkedin.com/in/neilraden   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC   28  
  • No  More  Managing  from  Scarcity   29  
  • Even  Big  Data  Doesn’t  Speak  for  Itself   30   •  Incomplete! •  Behaviors under- represented! •  Anonymizing disasters! •  Selection! •  ML still needs analyst! Not  a  crystal  ball  
  • Anscombe’s  Quartet   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC   31     Mean  of  x  =  9     Variance  of  x  =  11     Mean  of  y  =  7.50     Variance  of  y  =  4.122   Correla5on  between  x  and  y   =  0.816   Linear  regression  line  y  =  3.00   +  0.500x  
  • Descrip3ve  Title Quan3ta3ve  Sophis3ca3on/ Numeracy Sample  Roles Type  I Quan5ta5ve  R&D PhD  or  equivalent Crea5on  of  theory,   development  of  algorithms.   Academic  /research.  Work  in   business/government  for   very  specialized  roles Type  II Data  Scien5st  or  Quan5ta5ve   Analyst Advanced  Math/Stat,  not   necessarily  PhD Internal  expert  in  sta5s5cal   and  mathema5cal  modelling   and  development,  with  solid   business  domain  knowledge.   Type  III Opera5onal  Analy5cs     Good  business  domain,   background  in  sta5s5cs   op5onal Running  and  managing   analy5cal  models.  Strong   skills  in  and/or  project   management  of  analy5cal   systems  implementa5on Type  IV Business  Intelligence/   Discovery Data  and  numbers  oriented,   but  no  special  advanced   sta5s5cal  skills Repor5ng,  dashboard,  OLAP   and  visualiza5on,  some   design,  posterior  analysis  of   results  from  quan5ta5ve   methods.  Spreadsheets,   “business  discovery  tools”   32   Analy3c  Types   Types  of  Analysis   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC  
  • Ques5ons   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC   33   Analy3c  Types   •  How  would  you  describe  the  difference   between  a  data  scien5st  and  a  business   scien5st?   •  What  tools  are  needed  to  support  a  business   analyst?   •  What’s  the  career  path  for  a  business  analyst?   •  Is  big  data  suffering  from  hype?  
  • Ques5ons   Copyright  2013  Neil  Raden  and  Hired  Brains   Research  LLC   34   Analy3c  Types   •  Why  do  you  think  people  are  afraid  of  math?   •  Should  universi5es  prepare  people  for   business  science  or  should  industry?  
  • Twitter Tag: #briefr The Briefing Room
  • Twitter Tag: #briefr The Briefing Room Upcoming Topics www.insideanalysis.com September: ANALYTICS October: DATA PROCESSING November: DATA DISCOVERY & VISUALIZATION
  • Twitter Tag: #briefr The Briefing Room Thank You for Your Attention