Data Science Skills Study 2018 by AIM & Great Learning

2 Data Science Skills
Study 2018
04
INTRODUCTION
06
WHICH LANGUAGE
DO DATA
SCIENTISTS PREFER
FOR STATISTICAL
MODELLING?
08
WHICH DATA
SCIENCE METHODS
ARE THE MOST
POPULAR AT
WORK?
10
WHICH IS THE
MOST POPULAR
PYTHON GENERAL
PURPOSE LIBRARY?
12
WHIC
DATA
PREF
14
WHIC
DASH
VISU
TOOL
SCIE
16
WHIC
PROV
SCIE
18
WHA
OF LE
RESO
DATA
USE
THEM
UPDA

3By AIM &
Great Learning
2
CH TOOLS DO
A SCIENTISTS
FER?
4
CH
HBOARD/
UALISATION
LS DO DATA
NTISTS PREFER?
6
CH CLOUD
VIDER DO DATA
NTISTS PREFER?
8
AT KIND
EARNING
OURCES DO
A SCIENTISTS
TO KEEP
MSELVES
ATED?
20
WHERE DO DATA
SCIENTISTS FIND
OPEN DATA?
22
WHICH OS DO MOST
DATA SCIENTISTS
USE AT WORK?
24
PREFERRED
DEVELOPMENT
ENVIRONMENT
26
HOW IS CODE
SHARED AT YOUR
WORKPLACE?
CONTENTS
28
WHAT IS THE
NEURAL NETWORK
ARCHITECTURE THAT
DATA SCIENTISTS USE
MOST FREQUENTLY?
30
WHICH BIG DATA TOOL
HAVE YOU USED THE
MOST?
32
WHICH GPUs DO DATA
SCIENTISTS USE AT
WORK?
34
OUR RESPONDENTS
36
CONCLUSION

Data Science Skills
Study 2018
INTRODUCTION
Data Science is an emerging
field which is now being
integrated with industries across
all sectors. This year Analytics
India Magazine, in association
with Great Learning, decided to
find out what goes on behind the
making of a good Data Scientist.
We spent a lot of time finding
out the tools and techniques
used by these new technology
professionals.
From language to coding and
GPUs,wegarneredinteresting
and insightful answers from our
comprehensive survey.
About The Study:
We tried to look at variations
in technology, tools, work
experience and educational
qualifications in this survey. We
took opinions from all those who
practice data science — from
professionals with less than two
years of experience to CXOs
— to get a thorough idea of the
working environment in this
growing field.
Our survey was met with much
enthusiasm — and we got some
great insights from it. Some of
them were expected, and many
of them were real eye-openers.
Without further ado, let’s take a
deep dive into the study!
4

By AIM &
Great Learning
Disclaimer: This document is
the result of continued research
by Analytics India Magazine and
Great Learning.Permission may
be required from at least one of
the parties for reproduction of
the information on this report.
All rights are reserved with the
aforementioned parties.
This study is a result of extensive
primaryandsecondaryresearch,
carried out over a period of one
monthbyAnalyticsIndiaMagazine,
inassociationwithGreatLearning.
The research methodology
included a systematic plan
to identify various factors that
influenced data scientists to
use a particular set of tools and
techniques in their professional
work. The data was collected
by sending survey questions to
readers, professionals from the
community, students and others,
across all major cities in India.
5

WHICH LANGUAGE
DO DATA SCIENTISTS
PREFER FOR STATISTICAL
MODELLING?
• The favourite language for data
scientists in today’s era is Python,
used by almost 44% professionals
• A close second is R at 35% —
another clear favourite with data
scientists, due to its versatility
• SQL (6%) and SAS (7%) claim only
a minor share of the attention of
data scientists
6
44%professionals use Python
for Data Science
Data Science Skills
Study 2018

7By AIM &
Great Learning
Python
44%
R
35%
SAS
7%
SQL
6%
Other
5%
Matlab
3%

Data Science Skills
Study 2018
WHICH DATA SCIENCE
METHODS ARE THE
MOST POPULAR AT
WORK?
Inthissection,weaskeddatascientists
to pick out the most frequently-used
statistical method.
• 72% scientists answered that they
used Logistic Regression the most
at work
• This was followed by Decision
Trees at 56% and Neural Network
at 48%
Study 2018

9By AIM &
Great Learning
scientists use
Logistic Regression
the most72%
LogisticReg
DecisionTree
RandomForest
NeuralNetworks
BayesianTech
EnsembleMethods
SVMS
GBM
CNNS
RNNS
EvolutionaryApp
MarkovLogic
HMMs
0
50
100
150
No.ofresponses
MOST POPULAR DATA SCIENCE METHOD AT WORK

Study 2018
WHICH IS THE MOST
POPULAR PYTHON
GENERAL PURPOSE
LIBRARY?
Python is one of the largest
programming communities in the
world. There are plenty of libraries
which a data scientist can use to
analyze large amounts of data. But
here are our readers’ favourites:
• Pandas emerged as a clear choice
for most data scientists at almost
41%
• Numpy was the second-favourite
at 24%
• Sklearn and MatPlotLib followed at
17% and 14% respectively
10
41%
of data scientists use
Pandas as their preferred
Python library
Data Science Skills
Study 2018

By AIM &
Great Learning
Pandas
41%
Numpy
24%
Sklearn
17%
MatPlotLib
14%
Other
4%
MOST POPULAR PYTHON GENERAL PURPOSE LIBRARY

With a plethora of data analytics
tools available online, we asked data
scientists if they were willing to use
open sourced tools at work. The
answer was a resounding yes.
• Almost 89% of the data scientists
said that they preferred to work
with open sourced tools
• Only 8% data scientists said that
they liked to work with custom-
made tools which are tweaked and
personalised for their particular
projects
89%data scientists work with
open sourced tools
12
WHICH TOOLS
DO DATA SCIENTISTS
PREFER?
Data Science Skills
Study 2018

13By AIM &
Great Learning
Open Source
89%
Custom Made
8%
Paid
3%
WHAT KINDS OF TOOLS DO YOU PREFER?

51%data scientists use Tableau
14
WHICH DASHBOARD/
VISUALISATION TOOLS
DO DATA SCIENTISTS
PREFER?
Data visualisation might be tricky
for many data scientists. Crunching
numbers is one thing, but telling a story
with numbers is a whole different deal.
When we asked about this to our readers
they had one clear winner:
• More than half the respondents, 51%,
said that they preferred to use Tableau
as a dashboard or visualization tool.

15By AIM &
Great Learning
Tableau
51%
Microsoft BI
12%
Others
12%
IBM Analytics
11%
Qlikview
8%
SAP Analytics
6%
MOST FAVORITE DASHBOARD/VISUALISATION TOOL
15By AIM &
Great Learning

Information flow is a part of data
science. While data usage and storage
are important, security and privacy of
the data are also key to the job.
• Amazon Web Services is a clear
winner here with over 45% of the
votes
• Google Cloud is the second
favourite with over almost 34%
votes
AWSis the most preferred cloud
provider
16
WHICH CLOUD
PROVIDER DO DATA
SCIENTISTS PREFER?
Data Science Skills
Study 2018

17By AIM &
Great Learning
AWS
45%
Google Cloud
34%
Microsoft Azure
18%
Other 3
3%
MOST PREFERRED CLOUD PROVIDER
Others

Analytics Jobs
Study 2018
Due to ever-changing technology
it is vital for data scientists to keep
themselves updated. And they seem
to have found out an interesting way
to do so!
• 76% of our readers said that they
liked watching tutorials and videos
on YouTube
• Almost 54% of the data scientists
said that they like learning the old-
school way — through books and
e-books
• 46% of respondents also look
at MOOCs as a way to upskill
themselves
18
WHAT KIND
OF LEARNING
RESOURCES DO DATA
SCIENTISTS USE TO
KEEP THEMSELVES
UPDATED?
Data Science Skills
Study 2018

19By AIM &
Great Learning
76%professionals learn from
YouTube videos
MOOCs
YouTubeVideos
Books/eBooks
Courses
Podcasts
OnlineCommunities
Arxiv
Kaggle
SocialMedia
Tutoring
0
50
100
150
No.ofresponses
Conferences
WHAT KIND OF LEARNING RESOURCES DO YOU USE TO KEEP
YOURSELF UPDATED?
19By AIM &
Great Learning

27%respondents use GitHub
20
WHERE DO
DATA SCIENTISTS
FIND OPEN DATA?
Finding open data is not that hard,
but getting clean open data is often
a trying experience. No data scientist
wants to waste their time cleaning it.
There were four clear popular options
here:
• 27% respondents use GitHub
• 22% readers used university
websites and the data uploaded
by them for research
• 20% data scientists also use
data publicly uploaded on official
government websites
• 15% of the respondents source
their data manually

21By AIM &
Great Learning
WHERE DO YOU FIND OPEN DATA?
Github
27%
Uni Websites
22%
Public Govt Data
20%
Scrapping
15%
NGOs
7%
Reddit
5%
Others
4%
Scraping
Public Govt. Data

Analytics Jobs
Study 2018
The operating system which data
scientists use plays a crucial role.
Compatibility with their tools and ease
of use are two key factors. For this
question, the respondents had a liking
for one OS:
• Almost 69% of data scientists use
Windows OS
• 24% prefer Linux
• And only 7% prefer macOS.
22
WHICH OS DO MOST
DATA SCIENTISTS USE
AT WORK?
Data Science Skills
Study 2018

23By AIM &
Great Learning
Windows
69%
Linux
24%
MacOS
7%
69%of data scientists use
Windows OS
WHICH OS DO YOU USE AT WORK?
23By AIM &
Great Learning
macOS

An integrated development
environment (IDE) is very important
to set up and streamline data science
processes. Our respondents chose
the following options from the tools
presented to them:
• Almost 38% prefer using RStudio
• And close to 37% data scientists
like using Notebook
38%prefer using RStudio
24
PREFERRED
DEVELOPMENT
ENVIRONMENT
Data Science Skills
Study 2018

25By AIM &
Great Learning
PREFERRED DEVELOPMENT ENVIRONMENT
RStudio
38%
Notebook
37%
PyCharm
14%
Idle
6%
Others
5%

45%use Git to share codes at
workplaces
26
HOW IS CODE
SHARED AT
YOUR WORKPLACE?
Like we said earlier, privacy,
operational efficiency and security
are of paramount importance in any
organization that deals with data.
Here’s what we found out:
• Over 45% of the respondents use
Git to share codes at workplaces
• 28% of the data scientists said
that their organizations use cloud-
based programs to share codes
• And 24% of our readers shared
codes over non-cloud based
programs

27By AIM &
Great Learning
HOW IS CODE SHARED AT YOUR WORKPLACE?
Git
45%
Cloud-Based Prog
28%
Non-Cloud Based
24%
Others
3%
.

28
WHAT IS THE
NEURAL NETWORK
ARCHITECTURE THAT
DATA SCIENTISTS USE
MOST FREQUENTLY?
Neural networks are a crucial part
of programming as well as data
science. We got a clear picture
that data scientists, as well as
their organizations, use a variety of
architectures. According to our study,
convolutional neural network was the
most frequently used NN at 33%.
Data Science Skills
Study 2018
28

29By AIM &
Great Learning
CNN
33%
Feedforward NN
25%
RNN
20%
Moduler NN
14%
Radial Basis NN
5%
WHICH NEURAL NETWORK ARCHITECTURE DO YOU USE
MOST FREQUENTLY?
CNN is the most popular
neural network at
33%
GAN
3%

From open source tools to paid or
customized ones, many professionals
prefer different tools based on the
projects or the organization they are
working for. Data scientists from our
survey rated their most-favoured big
data tools in the following order:
• 52% of the users said they used
Hadoop the most
• Almost 22% data scientists used
NoSQL
52%respondents favourite Big
Data tool is Hadoop
30
WHICH BIG DATA
TOOL HAVE YOU
USED THE MOST?
Data Science Skills
Study 2018

31By AIM &
Great Learning
WHICH BIG DATA TOOL HAVE YOU USED THE MOST?
Hadoop
52%
NoSQL
22%
Paid/Customised
12%
Hive
10%
Polybase
3%
Presto
1%

Analytics Jobs
Study 2018
Over 19% of our respondents said
that they preferred using the Nvidia
GeForce GTX 8 Series for intensive
data usage. The GTX 8 series model
is a middle-level GPU — multipurpose
and flexible.
34%use low-end GPU models
for intensive data usage
32
WHICH GPUs DO
DATA SCIENTISTS USE
AT WORK?
Data Science Skills
Study 2018

33By AIM &
Great Learning
Lower-End Models
34%
GTX 8 Series
19%
GTX 10 Series
16%
High-End Models
12%
GTX 9 Series
8%
Tesla K Series
7%
WHICH GPU DO YOU USE AT WORK?
33By AIM &
Great Learning
Tesla K Series
7%
Tesla P Series
4%

34
OUR RESPONDENTS’
PROFILE:
WHICH INDUSTRY DO MOST DATA
SCIENTISTS BELONG TO?
IT
38%
Others
24%
BFSI
10%
Manufacturing
9%
Healthcare
8%
Ecommerce
5%
Retail
3%
48.6%
22.1%
10.6%9.9%
8.8%
Other
31%
Bangaluru
27%
Mumbai
13%
Hyderabad
12%
Delhi/NCR
10%
Chennai
7%
WORK EXPERIENCE:
0-2 YEARS
HIGHEST FORMAL EDUCATION: CITY OF WORK AND RESIDENCE:
2-5 YEARS
5-10 YEARS
10-15 YEARS
15 YEARS AND MORE
Data Science Skills
Study 2018
PG/Master's
49%
Graduation
28%
Undergraduate
20%
PhD
3%
Customer Service
3%
Bengaluru

35By AIM &
Great Learning
37.5%respondents are from the IT
background

Data Science Skills
Study 2018
CONCLUSION
A
s the Analytics industry
grows at the rate of
33.5% CAGR, more
professionals are expected to
segue into the Data Science
and Analytics sector. We
realised that apart from hard
work and dedication, the
tools and skillsets also play
a key role in the success of
data scientists. Some of the
eye-opening inferences were
that Python is still the all-
time favourite programming
language preferred in the
Analytics and Data Science
sector. The most popular Data
Visualisation tool used in this
industry right now is Tableau.
Another interesting aspect that
we found was professionals
were aware of the importance
of upskilling themselves and
how willing they were to do so.
Most working professionals like
to keep themselves updated by
watching videos and reading
books. Overall, the study
reveals a positive picture of
the Indian Analytics and Data
Science sector.
Study 2018

Study 2018
RESEARCH
METHODOLOGY
T
he samples were
collected by asking
respondents to fill in a
survey created by Analytics
India Magazine about what
tools and techniques data
scientists use at work. This
included various sub-topics
such as data visualisation
tools, preferred operating
systems and programming
languages, among others. We
took opinions from all those
who practice data science —
from professionals with less
than two years of experience to
CXOs — to get a thorough idea
of the working environment in
this growing field.
Data Science Skills
Study 2018
38

39By AIM &
Great Learning
39By AIM &
Great Learning

Study 2018
ABOUT
ANALYTICS INDIA MAGAZINE
Founded in 2012, Analytics India
Magazine has since been
dedicated to passionately
championing and promoting the
analytics ecosystem in India.
It chronicles the technological
progress in the space of analytics,
artificial intelligence, data
science, big data by highlighting
the innovations, players in the
field, challenges shaping the
future, through the promotion
and discussion of ideas and
thoughts by smart, ardent, ,
action-oriented individuals who
want to change the world.
AnalyticsIndiaMagazinehasbeen
a pre-eminent source of news,
information and analysis for the
Indian Analytics ecosystem by
covering opinions, analysis and
insights on the key breakthroughs
and developments in data-
driven technologies as well as
highlighting how they are being
leveraged for future impact.
With a dedicated editorial
staff and a network of more
than 250 expert contributors,
AIM’s stories are targeted at
futurists, AI researchers, Data
science entrepreneurs, analytics
aficionados and technophiles.
Data Science Skills
Study 2018
40

41By AIM &
Great Learning
ABOUT
GREAT LEARNING
Great Learning is an ed-tech
company that offers programs in
career critical competencies such
asAnalytics,DataScience,Machine
Learning, Artificial Intelligence,
Cloud Computing and Deep
Learning. Our programs are taken
by thousands of professionals
every year who build competencies
in these emerging areas to secure
and grow their careers.
We are on a mission to make
professionals proficient and future
ready. We believe learning a new
skill is tough and high-quality
education has to be rigorous. In
addition to all our programs being
extremely comprehensive, a core
part of the learning experience is
the learning assistance provided
to candidates. We use technology,
content and a wide network of
industry experts (Great Learning
Gurus) to help candidates learn in
the most impactful manner, whether
it be through our unique blended
model of classroom sessions and
online content or online content with
personalized weekend mentorship
sessions.
Impact
• Great Learning is among the top
5 Ed-Tech startups in India in
terms of revenue and scale
• Over 5000+ professionals have
taken Great Learning programs
and we have delivered 3.5+
million hours of learning
• We have a network of 500+ Great
Learning Gurus, all of whom are
industry experts engaged in
teaching, guiding and mentoring
our candidates through our
programs
• Our Analytics program has been
ranked as India’s no.1 program
for 2015, 2016, 2017 & 2018
• We have learning centres
establishedin6cities:Bangalore,
Chennai, Hyderabad, Gurgaon,
Pune and Mumbai
Great Learning Programs At A
Glance
1. PGP BABI:
The Great Learning PG Program
in Business Analytics & Business
41By AIM &
Great Learning

Study 2018
Intelligence is a 12-month program
that builds candidates’ Analytical
and management capabilities
through a structured learning
framework, preparing them for
business and functional roles in the
Analytics industry.
PGP-BABI is a 12-month program
offered in two formats:
• A Blended format with weekend
classroom sessions and online
learning.
• Online content with personalized
weekend mentorship sessions
The classroom sessions are
assisted by online webinars,
discussions and assignments that
keep your learning continuous and
cumulative.
2. BACP (Online):
The Great Learning Business
Analytics Certificate Program
is India’s 1st mentorship-driven
online program that runs for
6 months. Students attend
interactive sessions with program
mentors in small cohorts. Learning
sessions are supported by industry
interactions, webinars and hands-
on projects.
3. PGP DSE :
The Great Learning PG Program in
Data Science and Engineering is a
5-month program for early career
professionals looking to expedite
their move to roles such as Business
Analysts, Data Analysts, Data
Engineer, Analytics Engineers etc.
by learning relevant Data Science
techniques, tools and technologies
and hands-on application through
industry case studies. The program
is offered in a boot camp format with
16 weeks of classrooms sessions
and 4 weeks of project sessions.
4. PGP AIML:
The Great Learning PG Program
in Artificial Intelligence & Machine
Learning is designed to develop
competence in AI and ML for future-
oriented working professionals.
PGP-AIML is a 12-month program
offered in two formats:
• A Blended format with weekend
classroom sessions and online
learning.
• Online content with personalized
weekend mentorship sessions
Both these formats are designed
to suit the tight schedules of busy
Data Science Skills
Study 2018
42

43By AIM &
Great Learning
working professionals. Learning
sessions are complemented by
Hackathons, Labs and 12 projects
including a Capstone project. All
project submissions are made on
Github, ensuring that learners can
showcase their entire body of work
upon completion of the program.
5. PGP CC:
The PG Program in Cloud
Computing is a 6-month online
program that includes online
and live virtual classes. IT
professionalsnotalreadyworking
with Cloud technologies will gain
a solid foundation while those
with some Cloud experience
will gain a more structured and
hands-onunderstandingofCloud
technologies, including issues
such as migration, deployment,
integration, platform choice,
architecture and TCO. The
program will help you become
proficient in working with a range
of Cloud environments.
6. PGP-ML:
Great Learning’s PG Program in
Machine Learning is a 7-month
comprehensive program
(available in both classroom as
well as online formats) that gives
learners a solid grounding in
Machine Learning technologies
and methodologies. The program
is co-created by Great Lakes
faculty and industry professionals
and includes video lectures,
well-defined projects and class
assignments to get a jump-start
in this buzzing field. Learning
sessions are complemented with
hackathons, 8 hands-on projects
and 1 Capstone project.
7. DLCP:
The Deep Learning Certificate
Program is a 3-month structured,
online program with hands-on
projects and learning support,
all designed to help one become
proficient in Deep Learning.
Students learn through a
combination of world-class
online content, industry sessions
and a series of projects.
For more information on programs,
visit www.greatlearning.in.
43By AIM &
Great Learning

Study 2018
Data Science Skills
Study 2018
44

45By AIM &
Great Learning
45By AIM &
Great Learning
© 2018 Analytics India Magazine Pvt Ltd and Great Learning.
All rights reserved. Images or text from this publication may
not be reproduced or distributed in any form without prior
written permission from Analytics India Magazine or Great
Learning. The information contained in this publication has
been obtained from sources believed to be reliable. Analytics
India Magazine and Great Learning disclaim all warranties as to
the accuracy, completeness or adequacy of such information
and shall have no liability for errors, omissions or inadequacies
in such information. This publication consists of the opinions
of Analytics India Magazine and should not be construed as
statements of fact. The opinions expressed herein are subject
to change without notice.

Study 2018
CONTACT US
ANALYTICS INDIA MAGAZINE
#280 , 2ND FLOOR, 5TH MAIN,
15 A CROSS, SECTOR 6 , HSR LAYOUT
BENGALURU, KARNATAKA 560102
GREAT LEARNING BANGALORE
PLOT NO. 758 - 759, GROUND FLOOR,
19TH MAIN RD, SECTOR 2, HSR LAYOUT,
(NEAR SRI SAI MANDIR), BENGALURU,
KARNATAKA 560102
080 4683 7050

Data Science Skills Study 2018 by AIM & Great Learning

Recommended

Recommended

More Related Content

Similar to Data Science Skills Study 2018 by AIM & Great Learning

Similar to Data Science Skills Study 2018 by AIM & Great Learning (20)

More from Analytics India Magazine

More from Analytics India Magazine (20)

Recently uploaded

Recently uploaded (20)

Data Science Skills Study 2018 by AIM & Great Learning