Data Scientist
By: Professor Lili Saghafi
Montreal 2019
proflilisaghafi@gmail.com
@Lili_PLS
Data Scientist
• There are ten areas in Data Science which
are a key part of a project, and you need
to master those to be able to work as a
Data Scientist in much big organization.
Who is a Data scientist
1-Data Engineering
• Data Engineering – To work in any Data
Science project, the most important aspect of it
is the data.
• You need to understand which data to use, how
to organize the data, and so on.
• This bit of manipulation with the data is done by
a Data Engineer in a Data Science team.
• It is a superset of Data Warehousing and
Business Intelligence which included the
concept of big data in the context.
Data Engineering
• Building, and maintain a Data warehouse
is a key skill which a Data Engineer must
have.
• They would prepare the structured, and
the unstructured data to be used by the
Analytics team for model building purpose.
• They build pipelines which extract data
from multiple sources and then
manipulates it to make it usable.
Data Engineering
• Python, SQL, Scala, Hadoop, Spark, etc., are
some of the skills that a Data Engineer has.
They should also understand the concept of
ETL. The data lakes in Hadoop is one of the key
areas of work for a Data Engineer.
• The NoSQL database is mostly used as part of
the data workflows.
• Lambda architecture allows both batch and real-
time processing.
• Some of the job role available in the data
engineering domain is Database Developer,
Data Engineer, etc.
2-Data Mining
• Data Mining – It is the process of extracts
insights from the data using certain
methodologies for the business to make smart
decisions.
• It distinguishes the previously unknown patterns
and relationships from the data.
• Through data mining, one could transform the
data into various meaningful structures in
accordance with the business. The application of
data mining depends on the industry.
Data Mining
• Suppose in finance, it is used in risk or
fraud analytics. In manufacturing, product
safety, and quality issues could be
analyzed with accurate mining.
• Some of the parameters in data mining are
Path Analysis, Forecasting, Clustering,
and so on.
• Business Analyst, Statistician are some of
the related jobs in the data mining space.
3-Cloud Computing
• Cloud Computing – A lot of companies these
days are migrating their infrastructure from local
to the cloud merely because of the ready-made
availability of the resources, and the huge
computational power which not always available
in a system. Cloud computing generally refers to
the implementation of platforms for distributed
computing. The system requirements are
analyzed to ensure seamless integration with
present applications. Cloud Architect, Platform
Engineer are some of the jobs related to it.
4-Database Management
• Database Management – The rapidly changing data
makes it imperative for the companies to ensure
accuracy in tracking the data on a regular basis. This
minute data could empower the business to make time
strategic decisions, and maintain a systematic workflow.
The collected data is used to generate reports and is
made available for the management in the form of
relational databases. The Database management
system maintains a link among the data, and also allows
newer updates. The structured format in the form of
databases helps management to look for data in an
efficient manner. Data Specialist, Database
Administrator are some of the jobs for it.
5-Business Intelligence
• Business Intelligence – The area of business
intelligence refers to finding patterns in historical data of
a business.
• Business Intelligence analysts would find the trends for a
data scientist to build predictive models upon. It is about
answering not-so-obvious questions. Business
Intelligence answers the ‘what’ of a business.
• Business Intelligence is about creating dashboards and
drawing insights from the data.
• For a BI analyst, it is important to learn data handling,
and masters the tools like Tableau, Power BI, SQL, and
so on. Additionally, proficiency in Excel is a must in
business intelligence.
6-Machine Learning
• Machine Learning – Machine Learning is the state-of-
the-art methodology to make predictions from the data,
and help the business make better decisions. Once the
data is curated by the Data Engineer and analyzed by a
Business Intelligence Analyst, it is provided to a Machine
Learning Engineer to build predictive models based on
the use case in hand.
• The field of machine learning is categorized into
supervised, unsupervised, and reinforcement learning.
• The dataset is labeled in supervised unlike in
unsupervised learning. To build a model, it is first trained
with data to let them identify the patterns and learn from
it to make predictions on the unknown set of data. The
accuracy of the model is determined based on the
metric, and the KPI used which is decided by the
business beforehand.
7-Deep Learning
• Deep Learning – Deep Learning is a branch of
Machine Learning which h uses neural network
to make predictions. The neural networks work
similar to our brain and makes builds predictive
models compared to the traditional ML systems.
Unlike in Machine Learning, no manual feature
selection is required in Deep Learning but huge
volumes of data and enormous computational
power is needed to run deep learning
frameworks.
• Some of the Deep Learning frameworks like
TensorFlow, Keras, PyTorch.
8-Natural Language Processing
• Natural Language Processing – NLP or Natural
Language Processing is a specialization in Data Science
which deals with raw text. The natural language or
speech is processed using several NLP libraries, and
various hidden insights could be extracted from it. NLP
has gained popularity in recent times with the amount of
unstructured raw text that’s getting generated from a
plethora of sources, and the unprecedented information
that those natural data carries.
• Some of the applications of Natural Language
Processing are Amazon’s Alexa, Google’s Siri. Even
many companies are using NLP for sentiment analysis,
resume parsing, and so on.
9-Data Visualization
• Data Visualization – Needless to say, the importance of
presenting your insights either through scripting or with
the help of various visualization tools. A lot of Data
Science tasks could be solved with an accurate data
visualizations as the charts, and the graphs presents
enough hidden information for the business to take
relevant decisions.
• Often, it gets difficult for an organization to build
predictive models, and thus they rely on only visualizing
the data for their workflow.
• Moreover, one needs to understand which graphs or
charts to use for a particular business, and keep the
visualization simple, as well as informative.
10-Domain Expertise
• Domain Expertise – As mentioned earlier,
professionals from different disciplines are using
data in their business, and thus its wide range of
applications makes it imperative for people to
understand the domain they are applying their
Data Science skills. The domain knowledge
could be operations-related where you would
leverage the tools to improve the business
operations that could be focused on financials,
logistics, etc. It could also be sector specific
such as Finance, Healthcare, etc.
Conclusion
• Data Science is a broad field with a multitude of skills,
and technology that needs to be mastered. It is a life-
long learning journey, and with frequent arrival of new
technologies, one has to update themselves constantly.
• Often it could be challenging to keep up with some
frequent changes. Thus it is required to learn all these
skills, and at least be a master of one particular skill. In a
big corporation, a Data Science team would comprise of
people assigned with different roles such as data
engineering, modeling, and so on. Thus focusing on one
particular area would give you an edge over others in
finding a role within a Data Science team in an
organization.
• Data Scientist is the most wanted job in this decade, and
it would continue to be so in years to come.
Data Scientist
By: Professor Lili Saghafi
Montreal 2019
proflilisaghafi@gmail.com
@Lili_PLS

Data Scientist By: Professor Lili Saghafi

  • 1.
    Data Scientist By: ProfessorLili Saghafi Montreal 2019 proflilisaghafi@gmail.com @Lili_PLS
  • 2.
    Data Scientist • Thereare ten areas in Data Science which are a key part of a project, and you need to master those to be able to work as a Data Scientist in much big organization.
  • 3.
    Who is aData scientist
  • 4.
    1-Data Engineering • DataEngineering – To work in any Data Science project, the most important aspect of it is the data. • You need to understand which data to use, how to organize the data, and so on. • This bit of manipulation with the data is done by a Data Engineer in a Data Science team. • It is a superset of Data Warehousing and Business Intelligence which included the concept of big data in the context.
  • 6.
    Data Engineering • Building,and maintain a Data warehouse is a key skill which a Data Engineer must have. • They would prepare the structured, and the unstructured data to be used by the Analytics team for model building purpose. • They build pipelines which extract data from multiple sources and then manipulates it to make it usable.
  • 8.
    Data Engineering • Python,SQL, Scala, Hadoop, Spark, etc., are some of the skills that a Data Engineer has. They should also understand the concept of ETL. The data lakes in Hadoop is one of the key areas of work for a Data Engineer. • The NoSQL database is mostly used as part of the data workflows. • Lambda architecture allows both batch and real- time processing. • Some of the job role available in the data engineering domain is Database Developer, Data Engineer, etc.
  • 9.
    2-Data Mining • DataMining – It is the process of extracts insights from the data using certain methodologies for the business to make smart decisions. • It distinguishes the previously unknown patterns and relationships from the data. • Through data mining, one could transform the data into various meaningful structures in accordance with the business. The application of data mining depends on the industry.
  • 11.
    Data Mining • Supposein finance, it is used in risk or fraud analytics. In manufacturing, product safety, and quality issues could be analyzed with accurate mining. • Some of the parameters in data mining are Path Analysis, Forecasting, Clustering, and so on. • Business Analyst, Statistician are some of the related jobs in the data mining space.
  • 12.
    3-Cloud Computing • CloudComputing – A lot of companies these days are migrating their infrastructure from local to the cloud merely because of the ready-made availability of the resources, and the huge computational power which not always available in a system. Cloud computing generally refers to the implementation of platforms for distributed computing. The system requirements are analyzed to ensure seamless integration with present applications. Cloud Architect, Platform Engineer are some of the jobs related to it.
  • 13.
    4-Database Management • DatabaseManagement – The rapidly changing data makes it imperative for the companies to ensure accuracy in tracking the data on a regular basis. This minute data could empower the business to make time strategic decisions, and maintain a systematic workflow. The collected data is used to generate reports and is made available for the management in the form of relational databases. The Database management system maintains a link among the data, and also allows newer updates. The structured format in the form of databases helps management to look for data in an efficient manner. Data Specialist, Database Administrator are some of the jobs for it.
  • 14.
    5-Business Intelligence • BusinessIntelligence – The area of business intelligence refers to finding patterns in historical data of a business. • Business Intelligence analysts would find the trends for a data scientist to build predictive models upon. It is about answering not-so-obvious questions. Business Intelligence answers the ‘what’ of a business. • Business Intelligence is about creating dashboards and drawing insights from the data. • For a BI analyst, it is important to learn data handling, and masters the tools like Tableau, Power BI, SQL, and so on. Additionally, proficiency in Excel is a must in business intelligence.
  • 15.
    6-Machine Learning • MachineLearning – Machine Learning is the state-of- the-art methodology to make predictions from the data, and help the business make better decisions. Once the data is curated by the Data Engineer and analyzed by a Business Intelligence Analyst, it is provided to a Machine Learning Engineer to build predictive models based on the use case in hand. • The field of machine learning is categorized into supervised, unsupervised, and reinforcement learning. • The dataset is labeled in supervised unlike in unsupervised learning. To build a model, it is first trained with data to let them identify the patterns and learn from it to make predictions on the unknown set of data. The accuracy of the model is determined based on the metric, and the KPI used which is decided by the business beforehand.
  • 17.
    7-Deep Learning • DeepLearning – Deep Learning is a branch of Machine Learning which h uses neural network to make predictions. The neural networks work similar to our brain and makes builds predictive models compared to the traditional ML systems. Unlike in Machine Learning, no manual feature selection is required in Deep Learning but huge volumes of data and enormous computational power is needed to run deep learning frameworks. • Some of the Deep Learning frameworks like TensorFlow, Keras, PyTorch.
  • 18.
    8-Natural Language Processing •Natural Language Processing – NLP or Natural Language Processing is a specialization in Data Science which deals with raw text. The natural language or speech is processed using several NLP libraries, and various hidden insights could be extracted from it. NLP has gained popularity in recent times with the amount of unstructured raw text that’s getting generated from a plethora of sources, and the unprecedented information that those natural data carries. • Some of the applications of Natural Language Processing are Amazon’s Alexa, Google’s Siri. Even many companies are using NLP for sentiment analysis, resume parsing, and so on.
  • 19.
    9-Data Visualization • DataVisualization – Needless to say, the importance of presenting your insights either through scripting or with the help of various visualization tools. A lot of Data Science tasks could be solved with an accurate data visualizations as the charts, and the graphs presents enough hidden information for the business to take relevant decisions. • Often, it gets difficult for an organization to build predictive models, and thus they rely on only visualizing the data for their workflow. • Moreover, one needs to understand which graphs or charts to use for a particular business, and keep the visualization simple, as well as informative.
  • 20.
    10-Domain Expertise • DomainExpertise – As mentioned earlier, professionals from different disciplines are using data in their business, and thus its wide range of applications makes it imperative for people to understand the domain they are applying their Data Science skills. The domain knowledge could be operations-related where you would leverage the tools to improve the business operations that could be focused on financials, logistics, etc. It could also be sector specific such as Finance, Healthcare, etc.
  • 22.
    Conclusion • Data Scienceis a broad field with a multitude of skills, and technology that needs to be mastered. It is a life- long learning journey, and with frequent arrival of new technologies, one has to update themselves constantly. • Often it could be challenging to keep up with some frequent changes. Thus it is required to learn all these skills, and at least be a master of one particular skill. In a big corporation, a Data Science team would comprise of people assigned with different roles such as data engineering, modeling, and so on. Thus focusing on one particular area would give you an edge over others in finding a role within a Data Science team in an organization. • Data Scientist is the most wanted job in this decade, and it would continue to be so in years to come.
  • 23.
    Data Scientist By: ProfessorLili Saghafi Montreal 2019 proflilisaghafi@gmail.com @Lili_PLS