…in data
new professional careers
Who am I?
• David Rostcheck
• I’m a consulting data
scientist
• Follow my articles on
LinkedIn
We will talk about 4 things:
Big Data Data
Science
Data
Engineering
Business
Intelligence
BIG DATA
What is big data?
is data that is
so big
that it
requires
specialized
techniques
to handle
like: clusters
or cloud computing
or graph algorithms
Data may
   change   
rapidly
so big data may also be fast data
big data requires
specialized tools
to handleMAP/RED
UCE
big data tools are in demand
but
keep your perspective
Big Data tools can be complex
It is often easier to solve problems at small scale, then scale
up, if possible
remember:
not all companies use big data
but
all companies use data
DATA SCIENCE
What is data science?
Data science is
industrial research
on a company’s
own data
What is its goal?
to produce
advanced algorithms
that deliver a
competitive advantage
data scientists often work with unstructured data
… which can be large
“The qualifications for the job include the
strength to tunnel through mountains of
information and the vision to discern patterns
where others see none”
- Bloomberg Businessweek
Is data science really science?
let’s compare…
academic science data science
Teams PhDs, graduate
students
PhDs, technologists
Setting University Company
Publication Formal (academic
publications,
conferences)
Less formal (blogs, white
papers, open source)
Funding Public grants Corporate
Goal Advance human
knowledge
Create competitive
advantage
Data science is industrial science
It shares some attributes with academic science,
but has other differences
What kind of work do data scientists
do?
data scientists create artificially intelligent
systems
these are often called “narrow AI”
examples
• Recommender systems
• Self-driving cars
• AI agents
• Smart energy management
• Medical diagnosis
• Machine vision
DATA ENGINEERING
What is data engineering?
data engineering is a specialized kind of
software engineering
with additional skills in
handling and processing data
data science vs. data engineering
data science data engineering
Approach Scientific (Exploration) Engineering (Development)
Problems Unbounded Bounded
Path to Solution Iterative, exploratory, nonlinear Mostly linear
Education More is better (PhD’s common) BS and/or self-trained
Presentation Skills Important Not as important
Research
experience
Important Not as important
Programming skills Not as important Important
Data skills Important Important
What kind of special training does a data
engineer need?
Data storage and processing
– structured: (SQL)
– unstructured (NoSQL)
– Big Data (Hadoop, Apache Spark/Storm/Flink, cloud)
Data visualization
Machine Learning algorithms and platforms (ex. Dato)
Predictive APIs (ex. Watson)
Does a data engineer need more math
than a regular software engineer?
It really helps.
Linear algebra & calculus are important to
understand machine learning
BUSINESS INTELLIGENCE
Wait – aren’t data science and
business intelligence really the same
thing?
Maybe. Let’s compare…
business intelligence (BI) data science
Data analysis Yes Yes
Statistics Yes Yes
Visualization Yes Yes
Data Sources Usually SQL, often Data
Warehouse
Less structured (logs, cloud data,
SQL, noSQL, text)
Tools Statistics, Visualization Statistics, Machine Learning,
Graph Analysis, NLP
Focus Present and past Future
Approach Analytic Scientific
Goal Better strategic decisions Advanced functionality
The two fields are closely related.
In some ways data science is an evolution of
business intelligence.
which industries most use data-
focused jobs?
right now:
Technology
Education
Finance
Consulting
Health Care
( Technology employs over 50% of data workers)
but...
“Technology” companies like
Uber, Amazon, AirBnB
compete in other industries
(transportation,
retail,
hotels)
“Software is eating the world”
– Andreessen Horowitz
which industries will AI change?
Ultimately, all of them.
Incorporating AI is a large business opportunity
data jobs are in demand
• “The hot job of the decade… Data scientists
today are akin to Wall Street “quants” of
the 1980s and 1990s”
- Harvard Business Review
• “18.7% projected growth 2010-2020”
- VentureBeat
• “McKinsey projects […] ‘50 percent to
60 percent gap between supply and requisite
demand’”
- Bloomberg Businessweek
On the other hand…
Some people believe data jobs themselves will be
automated:
“New Teradata Platform Reduces
Demand For Data Scientists”
- Forbes
“Automating the Data Scientist”
- MIT Technology Review
What do we think?
• Yes, advanced tools will automate some data
exploration
• But: research and communication are
fundamental skills and are always in demand
when the world is changing
• Data will continue to explode (Internet of Things)
• We will see more change and faster change
education for data jobs
options include:
academic programs,
boot camps,
and online classes
(Coursera ,
Udacity)
for data engineering:
– documentation and webinars (self-education)
– focus on data manipulation tools and machine
learning
for data science:
– The more academic science and research expertise,
the better
– Focus on projects that solve unknown problems
– Work with more experienced data scientists
Questions?
Contact: drostcheck@leopardllc.com, twitter: @davidrostcheck
Articles: http://linkedin.com/in/davidrostcheck

New professional careers in data

  • 1.
  • 2.
    Who am I? •David Rostcheck • I’m a consulting data scientist • Follow my articles on LinkedIn
  • 3.
    We will talkabout 4 things: Big Data Data Science Data Engineering Business Intelligence
  • 4.
  • 5.
  • 6.
    is data thatis so big that it requires specialized techniques to handle
  • 7.
  • 8.
  • 9.
  • 10.
    Data may   change    rapidly so big data may also be fast data
  • 11.
    big data requires specializedtools to handleMAP/RED UCE
  • 12.
    big data toolsare in demand but keep your perspective
  • 13.
    Big Data toolscan be complex It is often easier to solve problems at small scale, then scale up, if possible
  • 14.
    remember: not all companiesuse big data but all companies use data
  • 15.
  • 16.
    What is datascience?
  • 17.
    Data science is industrialresearch on a company’s own data
  • 18.
  • 19.
    to produce advanced algorithms thatdeliver a competitive advantage
  • 20.
    data scientists oftenwork with unstructured data … which can be large
  • 21.
    “The qualifications forthe job include the strength to tunnel through mountains of information and the vision to discern patterns where others see none” - Bloomberg Businessweek
  • 22.
    Is data sciencereally science?
  • 23.
    let’s compare… academic sciencedata science Teams PhDs, graduate students PhDs, technologists Setting University Company Publication Formal (academic publications, conferences) Less formal (blogs, white papers, open source) Funding Public grants Corporate Goal Advance human knowledge Create competitive advantage
  • 24.
    Data science isindustrial science It shares some attributes with academic science, but has other differences
  • 25.
    What kind ofwork do data scientists do?
  • 26.
    data scientists createartificially intelligent systems these are often called “narrow AI”
  • 27.
    examples • Recommender systems •Self-driving cars • AI agents • Smart energy management • Medical diagnosis • Machine vision
  • 28.
  • 29.
    What is dataengineering?
  • 30.
    data engineering isa specialized kind of software engineering with additional skills in handling and processing data
  • 31.
    data science vs.data engineering data science data engineering Approach Scientific (Exploration) Engineering (Development) Problems Unbounded Bounded Path to Solution Iterative, exploratory, nonlinear Mostly linear Education More is better (PhD’s common) BS and/or self-trained Presentation Skills Important Not as important Research experience Important Not as important Programming skills Not as important Important Data skills Important Important
  • 32.
    What kind ofspecial training does a data engineer need?
  • 33.
    Data storage andprocessing – structured: (SQL) – unstructured (NoSQL) – Big Data (Hadoop, Apache Spark/Storm/Flink, cloud) Data visualization Machine Learning algorithms and platforms (ex. Dato) Predictive APIs (ex. Watson)
  • 34.
    Does a dataengineer need more math than a regular software engineer?
  • 35.
    It really helps. Linearalgebra & calculus are important to understand machine learning
  • 36.
  • 37.
    Wait – aren’tdata science and business intelligence really the same thing?
  • 38.
    Maybe. Let’s compare… businessintelligence (BI) data science Data analysis Yes Yes Statistics Yes Yes Visualization Yes Yes Data Sources Usually SQL, often Data Warehouse Less structured (logs, cloud data, SQL, noSQL, text) Tools Statistics, Visualization Statistics, Machine Learning, Graph Analysis, NLP Focus Present and past Future Approach Analytic Scientific Goal Better strategic decisions Advanced functionality
  • 39.
    The two fieldsare closely related. In some ways data science is an evolution of business intelligence.
  • 40.
    which industries mostuse data- focused jobs?
  • 41.
    right now: Technology Education Finance Consulting Health Care (Technology employs over 50% of data workers)
  • 42.
    but... “Technology” companies like Uber,Amazon, AirBnB compete in other industries (transportation, retail, hotels)
  • 43.
    “Software is eatingthe world” – Andreessen Horowitz
  • 44.
  • 45.
    Ultimately, all ofthem. Incorporating AI is a large business opportunity
  • 46.
    data jobs arein demand • “The hot job of the decade… Data scientists today are akin to Wall Street “quants” of the 1980s and 1990s” - Harvard Business Review • “18.7% projected growth 2010-2020” - VentureBeat • “McKinsey projects […] ‘50 percent to 60 percent gap between supply and requisite demand’” - Bloomberg Businessweek
  • 47.
    On the otherhand… Some people believe data jobs themselves will be automated: “New Teradata Platform Reduces Demand For Data Scientists” - Forbes “Automating the Data Scientist” - MIT Technology Review
  • 48.
    What do wethink? • Yes, advanced tools will automate some data exploration • But: research and communication are fundamental skills and are always in demand when the world is changing • Data will continue to explode (Internet of Things) • We will see more change and faster change
  • 49.
    education for datajobs options include: academic programs, boot camps, and online classes (Coursera , Udacity)
  • 50.
    for data engineering: –documentation and webinars (self-education) – focus on data manipulation tools and machine learning
  • 51.
    for data science: –The more academic science and research expertise, the better – Focus on projects that solve unknown problems – Work with more experienced data scientists
  • 52.
    Questions? Contact: drostcheck@leopardllc.com, twitter:@davidrostcheck Articles: http://linkedin.com/in/davidrostcheck

Editor's Notes

  • #27 Statistically model human behavior Predict and respond to humans Understand natural language and the natural world Understand subtle patterns in big data
  • #32 On a large team, Data Science and Data Engineering are separate roles On a small team, a Data Scientist must do (at least some) of his/her own Data Engineering The roles are new and not strictly defined. Today, often one role is called by the other’s name.
  • #49 - Machine Learning is here to stay