A new beginning in your career
The Future of Data Science:
What does Data Science have in store?
Saif Shaikh
Innovations in Business Solutions Inc. (IIBS)
403-151 City Centre Dr.| Mississauga, ON | L5B 2T4|
Tel: (905)-268-0958|
E-mail: info@iibs.ca, Website: http://www.iibs.ca
Agenda
Defining data science
Skills required
Choosing R programming
Career opportunities
Q&A
Presenter introduction
Saif Shaikh
• Instructor at IIBS for the Data Scientist with R Programming course
• Consultant involved in the data analytics and modeling fields
• Formerly employed in the medical devices field
• Education: B.S.E.E. (Massachusetts), M.Eng. (McMaster)
• Relatively new multidisciplinary field where scientific procedures are used
to gain knowledge from data that can be in various forms
• With the arrival of big data (enormous data sets) thanks to inexpensive
data collection and storage, data science can be applied on it due to
inexpensive computational power
• A data scientist follows the data science process
• Types of data to be analyzed
 Structured: Stored as a model and organized such as a relational database or
spreadsheet
 Unstructured: No model or organization such a raw data including text, images,
sound files, video
 Semi-structured: Combination of the two such as a smartphone picture where the
image data is unstructured but the appended camera information is structured
Defining data science
Defining data science
Data science is multidisciplinary
http://blogs.gartner.com/christi-eubanks/three-lessons-crossfit-taught-data-science/
Defining data science
Various disciplines contribute to data science
https://en.wikibooks.org/wiki/Data_Science:_An_Introduction/A_Mash-up_of_Disciplines
Defining data science
The data science process
https://en.wikipedia.org/wiki/Data_science http://blog.operasolutions.com/bid/384900/what-is-data-science
• Data science subfields:
 Machine learning: Subfield of artificial intelligence that gives computers access to
data so they can learn themselves. It focuses on designing algorithms that can learn
and make predictions using the supplied data.
 Natural language processing: Subfield of artificial intelligence that uses computers to
understand and derive meaning from human languages without explicit clues.
 Deep learning: Subfield of machine learning concerned with algorithms inspired by
the structure and function of the brain called artificial neural networks.
 Data mining: Discovering patterns in data using different methods such as machine
learning, statistics and database systems to explain a phenomenon.
 Data visualization: Presentation of data in graphical format so patterns, trends and
correlations can be noticed easily.
 Statistical modeling: Subfield of mathematics used to find relationships between
variables in data using mathematical equations.
Defining data science
• Data science applications:
 Finance: Fraud detection, risk modeling, trading
 Communication, media and entertainment: Consumer insights, recommend content,
sentiment analysis, customer acquisition
 Healthcare and pharmaceutical: Clinical trials, genetics analysis, epidemic
forecasting
 Education: Behavioral classification, teacher effectiveness
 Manufacturing: Internet of Things, failure detection
 Retail: Shelf-space optimization, pricing, promotions, up-sell
 Energy and utilities: Smart meter analysis, service quality optimization, outage
management and restoration, accident prevention, exploration
 Agriculture: Climatology insights, field characteristics, weather information
 Transportation: Fleet vehicle maintenance, self-driving vehicles, logistic optimization
 Insurance: Fraud detection, call center optimization, risk assessment
Defining data science
• Challenging field with high potential having 4 prerequisites:
• Programming
 Analytic software such as R
 Ability to readily adapt and learn new software tools and libraries as required
• Quantitative analysis
 Ability to analyze data, develop algorithms and build models
 Basic statistics, probability theory, algebra, read mathematical notations
• Communication
 Ability to express your findings to a non-technical audience such as marketing or
sales
 Can easily tell a story with graphs and presentations
 Will be working with many departments so teamwork skills are needed
Skills required
• Intuition
 Ability to understand the business product will allow you to ask the correct questions
and find the correct answers
 Should have a curious mindset to solve problems
Skills required
Choosing R Programming
• Software environment for statistical computing and graphics
• Free and open source language
• Excellent R software tools to get started right away such as RStudio
• Own set of tools to write publication-quality plots and documentation
• Continuous backing by statisticians, scientists, scholars and research
institutes
• Publicly available for over 20 years with a regular release cycle
• Comes with a robust packaging system to allow developers and domain
experts to easily distribute their code, often written by researchers and
accompany scientific papers
Choosing R Programming
• All in one environment for data manipulation, visualization, machine
learning, reporting and more
• R popularity in academia is important because it creates a pool of talent
that feeds industry that in turns creates more demand for R talent
• Top tier companies using R such as: Facebook, Google, Twitter, Microsoft,
Uber, Airbnb, IBM, HP, Ford, Accenture, American Express, Citibank and
many more
TIOBE SEP17 Index: Number of search engine searches
https://www.tiobe.com/tiobe-index/r/
Choosing R Programming
RedMonk JUN17 Programming Language Rankings: Usage and discussion
http://redmonk.com/sogrady/2017/06/08/language-rankings-6-17/
Choosing R Programming
Scholarly articles with data science software
http://r4stats.com/articles/popularity/
Choosing R Programming
• One of the hottest jobs right now because of 3 reasons: Shortage of talent,
organizations continue to face enormous challenges in organizing data and
the need for data scientists is no longer restricted to tech giants
• Data science can apply to a number of occupations
• Glassdoor released a report in January 2017 with data scientist as the best
job in America (median salary $110,000)
• Careercast revealed in 2017 that data scientists have the best growth
potential over the next 7 years as they are the toughest job to fill (median
salary $111,267)
Career opportunities
Career opportunities
Indeed: Data scientist job trends
https://www.indeed.com/jobtrends/q-%22Data-Scientist%22.html
Career opportunities
Emsi: Q416 data science related occupations and their earnings
https://www.forbes.com/sites/emsi/2016/11/16/want-to-become-a-data-scientist-where-the-jobs-are-and-what-employers-are-looking-for/2/
Career opportunities
Robert Half: Top 10 tech jobs in 2017
https://www.roberthalf.com/sites/default/files/Media_Root/images/rht-pdfs/rht_0916_ig_sg2017-jobstowatch_nam_eng.pdf
• If you require further training, we have a course available at IIBS:
Data Scientist with R Programming (55 hours) – Saif Shaikh
Weekend classes
Mississauga, ON
905-268-0958
info@iibs.ca
Q&A

Data scientist What is inside it?

  • 1.
    A new beginningin your career The Future of Data Science: What does Data Science have in store? Saif Shaikh Innovations in Business Solutions Inc. (IIBS) 403-151 City Centre Dr.| Mississauga, ON | L5B 2T4| Tel: (905)-268-0958| E-mail: info@iibs.ca, Website: http://www.iibs.ca
  • 2.
    Agenda Defining data science Skillsrequired Choosing R programming Career opportunities Q&A
  • 3.
    Presenter introduction Saif Shaikh •Instructor at IIBS for the Data Scientist with R Programming course • Consultant involved in the data analytics and modeling fields • Formerly employed in the medical devices field • Education: B.S.E.E. (Massachusetts), M.Eng. (McMaster)
  • 4.
    • Relatively newmultidisciplinary field where scientific procedures are used to gain knowledge from data that can be in various forms • With the arrival of big data (enormous data sets) thanks to inexpensive data collection and storage, data science can be applied on it due to inexpensive computational power • A data scientist follows the data science process • Types of data to be analyzed  Structured: Stored as a model and organized such as a relational database or spreadsheet  Unstructured: No model or organization such a raw data including text, images, sound files, video  Semi-structured: Combination of the two such as a smartphone picture where the image data is unstructured but the appended camera information is structured Defining data science
  • 5.
    Defining data science Datascience is multidisciplinary http://blogs.gartner.com/christi-eubanks/three-lessons-crossfit-taught-data-science/
  • 6.
    Defining data science Variousdisciplines contribute to data science https://en.wikibooks.org/wiki/Data_Science:_An_Introduction/A_Mash-up_of_Disciplines
  • 7.
    Defining data science Thedata science process https://en.wikipedia.org/wiki/Data_science http://blog.operasolutions.com/bid/384900/what-is-data-science
  • 8.
    • Data sciencesubfields:  Machine learning: Subfield of artificial intelligence that gives computers access to data so they can learn themselves. It focuses on designing algorithms that can learn and make predictions using the supplied data.  Natural language processing: Subfield of artificial intelligence that uses computers to understand and derive meaning from human languages without explicit clues.  Deep learning: Subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.  Data mining: Discovering patterns in data using different methods such as machine learning, statistics and database systems to explain a phenomenon.  Data visualization: Presentation of data in graphical format so patterns, trends and correlations can be noticed easily.  Statistical modeling: Subfield of mathematics used to find relationships between variables in data using mathematical equations. Defining data science
  • 9.
    • Data scienceapplications:  Finance: Fraud detection, risk modeling, trading  Communication, media and entertainment: Consumer insights, recommend content, sentiment analysis, customer acquisition  Healthcare and pharmaceutical: Clinical trials, genetics analysis, epidemic forecasting  Education: Behavioral classification, teacher effectiveness  Manufacturing: Internet of Things, failure detection  Retail: Shelf-space optimization, pricing, promotions, up-sell  Energy and utilities: Smart meter analysis, service quality optimization, outage management and restoration, accident prevention, exploration  Agriculture: Climatology insights, field characteristics, weather information  Transportation: Fleet vehicle maintenance, self-driving vehicles, logistic optimization  Insurance: Fraud detection, call center optimization, risk assessment Defining data science
  • 10.
    • Challenging fieldwith high potential having 4 prerequisites: • Programming  Analytic software such as R  Ability to readily adapt and learn new software tools and libraries as required • Quantitative analysis  Ability to analyze data, develop algorithms and build models  Basic statistics, probability theory, algebra, read mathematical notations • Communication  Ability to express your findings to a non-technical audience such as marketing or sales  Can easily tell a story with graphs and presentations  Will be working with many departments so teamwork skills are needed Skills required
  • 11.
    • Intuition  Abilityto understand the business product will allow you to ask the correct questions and find the correct answers  Should have a curious mindset to solve problems Skills required
  • 12.
    Choosing R Programming •Software environment for statistical computing and graphics • Free and open source language • Excellent R software tools to get started right away such as RStudio • Own set of tools to write publication-quality plots and documentation • Continuous backing by statisticians, scientists, scholars and research institutes • Publicly available for over 20 years with a regular release cycle • Comes with a robust packaging system to allow developers and domain experts to easily distribute their code, often written by researchers and accompany scientific papers
  • 13.
    Choosing R Programming •All in one environment for data manipulation, visualization, machine learning, reporting and more • R popularity in academia is important because it creates a pool of talent that feeds industry that in turns creates more demand for R talent • Top tier companies using R such as: Facebook, Google, Twitter, Microsoft, Uber, Airbnb, IBM, HP, Ford, Accenture, American Express, Citibank and many more
  • 14.
    TIOBE SEP17 Index:Number of search engine searches https://www.tiobe.com/tiobe-index/r/ Choosing R Programming
  • 15.
    RedMonk JUN17 ProgrammingLanguage Rankings: Usage and discussion http://redmonk.com/sogrady/2017/06/08/language-rankings-6-17/ Choosing R Programming
  • 16.
    Scholarly articles withdata science software http://r4stats.com/articles/popularity/ Choosing R Programming
  • 17.
    • One ofthe hottest jobs right now because of 3 reasons: Shortage of talent, organizations continue to face enormous challenges in organizing data and the need for data scientists is no longer restricted to tech giants • Data science can apply to a number of occupations • Glassdoor released a report in January 2017 with data scientist as the best job in America (median salary $110,000) • Careercast revealed in 2017 that data scientists have the best growth potential over the next 7 years as they are the toughest job to fill (median salary $111,267) Career opportunities
  • 18.
    Career opportunities Indeed: Datascientist job trends https://www.indeed.com/jobtrends/q-%22Data-Scientist%22.html
  • 19.
    Career opportunities Emsi: Q416data science related occupations and their earnings https://www.forbes.com/sites/emsi/2016/11/16/want-to-become-a-data-scientist-where-the-jobs-are-and-what-employers-are-looking-for/2/
  • 20.
    Career opportunities Robert Half:Top 10 tech jobs in 2017 https://www.roberthalf.com/sites/default/files/Media_Root/images/rht-pdfs/rht_0916_ig_sg2017-jobstowatch_nam_eng.pdf
  • 21.
    • If yourequire further training, we have a course available at IIBS: Data Scientist with R Programming (55 hours) – Saif Shaikh Weekend classes Mississauga, ON 905-268-0958 info@iibs.ca Q&A