data & content design
Frieda Brioschi - frieda.brioschi@gmail.com
Emma Tracanella - emma.tracanella@gmail.com
AROUND DATA SCIENCE
LESSON 5 - 2019/20
WITH YOUR DATA PROJECT
LET’S START
data & content design
LESSON 5
4
DESCRIBE YOUR PROJECT
Photo by William Iven on Unsplash
data & content design
LESSON 5
MARGARET HAMILTON
5
DATA
LINKED
data & content design
LESSON 5
AN EXAMPLE OF ONTOLOGY
http://mappings.dbpedia.org/server/ontology/classes/
7
data & content design
LESSON 5
LINKED DATA / LOD
8
Linked data is structured data which is interlinked with other data so it becomes
more useful through semantic queries.It builds upon standard Web technologies
but rather than using them to serve web pages only for human readers, it extends
them to share information in a way that can be read automatically by computers.
Part of the vision of linked data is for the Internet to become a global database.
Linked data may also be open data, in which case it is usually described as linked
open data (LOD).
▸ https://en.wikipedia.org/wiki/Linked_data
data & content design
LESSON 5
SCHEMA.ORG
http://schema.org/docs/full.html
9
data & content design
LESSON 5
GOOGLE KNOWLEDGE GRAPH
10
https://www.youtube.com/watch?v=mmQl6VGvX-c
data & content design
LESSON 5
WHY LINKED DATA MATTERS
Linked data is a method for publishing structured data using vocabularies like
schema.org that can be connected together and interpreted by machines. Using
linked data, statements encoded in triples can be spread across different
websites.
This enables data from different sources to be connected and queried.
▸ https://wordlift.io/blog/en/entity/linked-data/
11
data & content design
LESSON 5
12
To Making Better
Data-Informed
Decisions
data & content design
LESSON 5
13
Formulate
a focused
question
ASK
Data-Informed
DECISION
MAKING
PROCESS
Monitor the
outcome
ASSESS
Search for the
best available
data
ACQUIRE
Critically
appraise and
analyze the data
ANALYZE
Integrate the data
with your professional
expertise and be
conscious about your
mental models
APPLY
Decide and
communicate
ANNOUNCE
data & content design
LESSON 5
14
ASK
Turn the business questions into analytical question(s).
ACQUIRE
Find and source all relevant data. Remember to think
about the question systemically and include any
interrelated data that could be relevant. This includes
not only internal but external data and information too.
Ensure the sourced data is available, trusted, and in
the right form (extracted, profiled, tagged, cataloged,
standardized, treated for sensitivity, etc…)
ANALYZE
Create a measurement framework
to describe your data with KPIs.
Use exploratory analytics to find patterns and
trends and relationships that may exist and
not be obvious to start to drill into root cause.
?
data & content design
LESSON 5
15
ANNOUNCE
Announce your decision at the right level to ALL stakeholders
(direct, indirect, upstream, and downstream) by leveraging methodologies
like the ‘Rule of 3’ and the ‘Pyramid Principle’ in your storytelling
APPLY
Review and orientate yourself to the
information and data so far and apply your
personal experiences to it.
Challenge the data and look for information
and data to disprove it.
Review with a cognitively diverse team (or if
you are alone, be aware of your bias and
play devil’s advocate and reframe).
If applicable, leverage predictive analytics
to run simulations or similar to test
potential decisions
and solutions.
data & content design
LESSON 5
16
© 2019 QlikTech International AB. All rights reserved. Qlik®, Qlik Sense®, QlikView®, QlikTech®, Qlik Cloud®, Qlik DataMarket®, Qlik Analytics Platform®, Qlik NPrinting®, Qlik
Connectors®, Qlik GeoAnalytics®, Qlik Core®, Associative Difference®, Qlik Data Catalyst™, Qlik Associative Big Data Index™ and the QlikTech logos are trademarks of QlikTech
International AB which have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or registered trademarks of their respective owners.
About Qlik®
Qlik is on a mission to create a data-literate world, where everyone can use data to solve their most challenging problems. Only Qlik’s end-to-end data
management and analytics platform brings together all of an organization’s data from any source, enabling people at any skill level to use their
curiosity to uncover new insights. Companies use Qlik products to see more deeply into customer behavior, reinvent business processes, discover new
revenue streams, and balance risk and reward. Qlik does business in more than 100 countries and serves over 48,000 customers around the world.
ASSESS
Setup a review mechanism to monitor the
impacts of the decision after it is made and
acted upon.
Leverage that review mechanism and
fail/fix/learn fast including improvements to
data, measurement frameworks, accountability,
decisions, and anything else relevant
To learn more about Data-Informed Decision Making and explore our free courses and resources, visit
qlik.com/GetDataLiterate.
DATA MINING
CLASSICAL
Photo by ev on Unsplash
data & content design
LESSON 5
CONTEXT
You don’t have to be a fancy statistician to do data mining, but you do
have to know something about what the data signifies and how the
business works.
Only when you understand the data and the problem that you need to
solve can data-mining processes help you to discover useful
information and put it to use.
18
data & content design
LESSON 5
NINE LAWS OF DATA MINING - 1
Pioneering data miner Thomas Khabaza developed his “Nine Laws of Data Mining”
to guide new data miners as they get down to work
▸ 1 - “Business Goals Law” 

Business objectives are the origin of every data mining solution.

A data miner is someone who discovers useful information from data to support
specific business goals. Data mining isn’t defined by the tool you use.
▸ 2 - “Business Knowledge Law”

Business Knowledge is central to every step of the data mining process.

You don’t have to be a fancy statistician to do data mining, but you do have to
know something about what the data signifies and how the business works.
19
data & content design
LESSON 5
NINE LAWS OF DATA MINING - 2
▸ 3. “Data Preparation Law”

Data preparation is more than half of every data mining process.

Pretty much every data miner will spend more time on data preparation than on
analysis.
▸ 4. “No Free Lunch for the Data Miner”

The right model for a given application can only be discovered by experiment.

In data mining, models are selected through trial and error.
▸ 5 - “Patterns”

There are always patterns in the data.

As a data miner, you explore data in search of useful patterns. Understanding patterns
in the data enables you to influence what happens in the future.
20
data & content design
LESSON 5
NINE LAWS OF DATA MINING - 3
▸ 6.  “Insight Law”

Data mining amplifies perception in the business domain.

Data mining methods enable you to understand your business better than you
could have done without them.
▸ 7 - “Prediction Law”

Prediction increases information locally by generalization.

Data mining helps us use what we know to make better predictions (or
estimates) of things we don’t know.
21
data & content design
LESSON 5
NINE LAWS OF DATA MINING - 4
▸ 8. “Value Law”

The value of data mining results is not determined by the accuracy or stability
of predictive models.

Your model must produce good predictions, consistently. That’s it.
▸ 9. “Law of Change”

All patterns are subject to change.

Any model that gives you great predictions today may be useless tomorrow.
22
data & content design
LESSON 5
PHASES OF THE DATA MINING PROCESS
The Cross-Industry Standard Process for
Data Mining (CRISP-DM) is the dominant
data-mining process framework. It’s an
open standard; anyone may use it.
23
data & content design
LESSON 5
BUSINESS UNDERSTANDING
Get a clear understanding of the problem you’re out to solve, how it impacts your
organization, and your goals for addressing it.
Tasks in this phase include:
▸ Identifying your business goals
▸ Assessing your situation
▸ Defining your data mining goals
▸ Producing your project plan
24
data & content design
LESSON 5
DATA UNDERSTANDING
Review the data that you have, document it, identify data management and data quality
issues.
Tasks in this phase include:
▸ Gathering data
▸ Describing
▸ Exploring
▸ Verifying quality
25
data & content design
LESSON 5
DATA PREPARATION
Get your data ready to use for modeling.
Tasks in this phase include:
▸ Selecting data
▸ Cleaning data
▸ Constructing
▸ Integrating
▸ Formatting
26
data & content design
LESSON 5
MODELING
Use mathematical techniques to identify patterns within your data.
Tasks in this phase include:
▸ Selecting techniques
▸ Designing tests
▸ Building models
▸ Assessing models
27
data & content design
LESSON 5
EVALUATION
Review the patterns you have discovered and assess their potential for business
use.
Tasks in this phase include:
▸ Evaluating results
▸ Reviewing the process
▸ Determining the next steps
28
data & content design
LESSON 5
DEPLOYMENT
Put your discoveries to work in everyday business. 
Tasks in this phase include:
▸ Planning deployment (your methods for integrating data mining discoveries
into use)
▸ Reporting final results
▸ Reviewing final results
29
DATA AGGREGATION
CLASSICAL
Photo by ev on Unsplash
data & content design
LESSON 5
DATA AGGREGATION
Data aggregation is the process where raw data is gathered and expressed in a summary
form for statistical analysis.
For example, raw data can be aggregated over a given time period to provide statistics. After
the data is aggregated and written to a view or report, you can analyze the aggregated data
to gain insights about particular resources or resource groups.
There are two types of data aggregation:
▸ Time aggregation - All data points for a single resource over a specified time period.
▸ Spatial aggregation - All data points for a group of resources over a specified time
period.
31
data & content design
LESSON 5
SUMMARY STATISTICS
When data is aggregated, groups of observations are replaced with summary statistics based on those observations.
Summary statistics are used tto communicate the largest amount of information as simply as possible.
▸ Mean
▸ Count
▸ Maximum
▸ Median
▸ Minimum
▸ Mode
▸ Range
▸ Sum
32
data & content design
LESSON 5
TABLES
Tables are the format in which most numerical data are initially stored and analysed and
are likely to be the means you use to organise data collected during experiments and
dissertation research.
Tables are an effective way of presenting data:
• when you wish to show how a single category of information varies when
measured at different points (in time or space).
• when the dataset contains relatively few numbers.
• when the precise value is crucial to your argument and a graph would not convey
33
data & content design
LESSON 5
BAR CHARTS
Bar charts are one of the most commonly
used types of graph and are used to display
and compare the number, frequency or other
measure for different discrete categories or
groups.
The bars can be drawn either vertically or
horizontally depending upon the number of
categories and length or complexity of the
category labels.
34
data & content design
LESSON 5
HISTOGRAMS
Histograms are a special form of bar chart
where the data represent continuous rather
than discrete categories. Since a
continuous category may have a large
number of possible values the data are
often grouped to reduce the number of data
points.
35
data & content design
LESSON 5
PIE CHARTS
Pie charts are a visual way of displaying how
the total data are distributed between different
categories. Pie charts should only be used for
displaying nominal data. They are generally
best for showing information grouped into a
small number of categories and are a
graphical way of displaying data that might
otherwise be presented as a simple table.
36
Pie chart of populations of English native speakers
data & content design
LESSON 5
LINE GRAPHS
Line graphs are usually used to show time
series data – that is how one or more
variables vary over a continuous period of
time. Line graphs are particularly useful for
identifying patterns and trends in the data
such as seasonal effects, large changes and
turning points. As well as time series data,
line graphs can also be appropriate for
displaying data that are measured over other
continuous variables such as distance.
37
data & content design
LESSON 5
SCATTER PLOT
Scatter plots are used to show the
relationship between pairs of quantitative
measurements made for the same object or
individual. By analysing the pattern of dots
that make up a scatter plot it is possible to
identify whether there is any systematic or
causal relationship between the two
measurements.
▸ https://www2.le.ac.uk/offices/ld/resources/
numerical-data/numerical-data
38
DATA SCIENCE
WHAT IS
Photo by ev on Unsplash
data & content design
LESSON 5
DEFINITION
Data Science is a blend of various tools, algorithms, and machine learning
principles with the goal to discover hidden patterns from the raw data and solve
analytically complicated problems.
40
data & content design
LESSON 5
APPLICATION OF DATA SCIENCE
41
data & content design
LESSON 5
42
data & content design
LESSON 5
EXPLAINING VS PREDICTING
43
By 2020 more than 80 % of the data
will be unstructured. This data is
generated from different sources like
financial logs, text files, multimedia
forms, sensors, and instruments.
data & content design
LESSON 5
44https://databasetown.com/introduction-to-data-science-a-beginners-guide/#What_is_Data_Science
data & content design
LESSON 5
45
data & content design
LESSON 5
46
The Data Scientist has the ability to handle the crude data using the latest
technologies and techniques, can perform the necessary analysis, and can
present the acquired knowledge to his associates in an informative way.
data & content design
LESSON 5
47
The Data Analyst works with R, Python and SQL; the role combines technical
and analytical knowledge.
data & content design
LESSON 5
48
The Data Architect integrates, centralizes, protects and maintains data
sources.
data & content design
LESSON 5
49
The Statistician can be seen as the pioneer of the data science field. It is often
he who reaps the information from the data and transforms it into actionable
insights.
data & content design
LESSON 5
50
The Database Administrator ensures that the database is accessible to every
stakeholder in the organizations and performs the necessary safety measures
to keep the stored data safe.
data & content design
LESSON 5
51
The Business Analyst is probably the least technical profile, he has a deep
understanding of the various business processes that are in place. He often
performs the role of the middle person between the business folks and the
technicians.
data & content design
LESSON 5
52
The Data and Analytics Manager steers the direction of the data science
team. He consolidates strong and specialized skills in a various arrangement
of advancements (SQL, R, SAS, … ) with the social aptitudes required to deal
with a group.
EXAMPLES
SOME
PHOTO BY JAREDD CRAIG ON UNSPLASH
data & content design
LESSON 5
THE NY TIMES
https://www.nytimes.com/interactive/2019/11/02/us/politics/trump-twitter-
disinformation.html
54

Around Data Science

  • 1.
    data & contentdesign Frieda Brioschi - frieda.brioschi@gmail.com Emma Tracanella - emma.tracanella@gmail.com AROUND DATA SCIENCE LESSON 5 - 2019/20
  • 2.
    WITH YOUR DATAPROJECT LET’S START
  • 3.
    data & contentdesign LESSON 5 4 DESCRIBE YOUR PROJECT Photo by William Iven on Unsplash
  • 4.
    data & contentdesign LESSON 5 MARGARET HAMILTON 5
  • 5.
  • 6.
    data & contentdesign LESSON 5 AN EXAMPLE OF ONTOLOGY http://mappings.dbpedia.org/server/ontology/classes/ 7
  • 7.
    data & contentdesign LESSON 5 LINKED DATA / LOD 8 Linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries.It builds upon standard Web technologies but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database. Linked data may also be open data, in which case it is usually described as linked open data (LOD). ▸ https://en.wikipedia.org/wiki/Linked_data
  • 8.
    data & contentdesign LESSON 5 SCHEMA.ORG http://schema.org/docs/full.html 9
  • 9.
    data & contentdesign LESSON 5 GOOGLE KNOWLEDGE GRAPH 10 https://www.youtube.com/watch?v=mmQl6VGvX-c
  • 10.
    data & contentdesign LESSON 5 WHY LINKED DATA MATTERS Linked data is a method for publishing structured data using vocabularies like schema.org that can be connected together and interpreted by machines. Using linked data, statements encoded in triples can be spread across different websites. This enables data from different sources to be connected and queried. ▸ https://wordlift.io/blog/en/entity/linked-data/ 11
  • 11.
    data & contentdesign LESSON 5 12 To Making Better Data-Informed Decisions
  • 12.
    data & contentdesign LESSON 5 13 Formulate a focused question ASK Data-Informed DECISION MAKING PROCESS Monitor the outcome ASSESS Search for the best available data ACQUIRE Critically appraise and analyze the data ANALYZE Integrate the data with your professional expertise and be conscious about your mental models APPLY Decide and communicate ANNOUNCE
  • 13.
    data & contentdesign LESSON 5 14 ASK Turn the business questions into analytical question(s). ACQUIRE Find and source all relevant data. Remember to think about the question systemically and include any interrelated data that could be relevant. This includes not only internal but external data and information too. Ensure the sourced data is available, trusted, and in the right form (extracted, profiled, tagged, cataloged, standardized, treated for sensitivity, etc…) ANALYZE Create a measurement framework to describe your data with KPIs. Use exploratory analytics to find patterns and trends and relationships that may exist and not be obvious to start to drill into root cause. ?
  • 14.
    data & contentdesign LESSON 5 15 ANNOUNCE Announce your decision at the right level to ALL stakeholders (direct, indirect, upstream, and downstream) by leveraging methodologies like the ‘Rule of 3’ and the ‘Pyramid Principle’ in your storytelling APPLY Review and orientate yourself to the information and data so far and apply your personal experiences to it. Challenge the data and look for information and data to disprove it. Review with a cognitively diverse team (or if you are alone, be aware of your bias and play devil’s advocate and reframe). If applicable, leverage predictive analytics to run simulations or similar to test potential decisions and solutions.
  • 15.
    data & contentdesign LESSON 5 16 © 2019 QlikTech International AB. All rights reserved. Qlik®, Qlik Sense®, QlikView®, QlikTech®, Qlik Cloud®, Qlik DataMarket®, Qlik Analytics Platform®, Qlik NPrinting®, Qlik Connectors®, Qlik GeoAnalytics®, Qlik Core®, Associative Difference®, Qlik Data Catalyst™, Qlik Associative Big Data Index™ and the QlikTech logos are trademarks of QlikTech International AB which have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or registered trademarks of their respective owners. About Qlik® Qlik is on a mission to create a data-literate world, where everyone can use data to solve their most challenging problems. Only Qlik’s end-to-end data management and analytics platform brings together all of an organization’s data from any source, enabling people at any skill level to use their curiosity to uncover new insights. Companies use Qlik products to see more deeply into customer behavior, reinvent business processes, discover new revenue streams, and balance risk and reward. Qlik does business in more than 100 countries and serves over 48,000 customers around the world. ASSESS Setup a review mechanism to monitor the impacts of the decision after it is made and acted upon. Leverage that review mechanism and fail/fix/learn fast including improvements to data, measurement frameworks, accountability, decisions, and anything else relevant To learn more about Data-Informed Decision Making and explore our free courses and resources, visit qlik.com/GetDataLiterate.
  • 16.
  • 17.
    data & contentdesign LESSON 5 CONTEXT You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signifies and how the business works. Only when you understand the data and the problem that you need to solve can data-mining processes help you to discover useful information and put it to use. 18
  • 18.
    data & contentdesign LESSON 5 NINE LAWS OF DATA MINING - 1 Pioneering data miner Thomas Khabaza developed his “Nine Laws of Data Mining” to guide new data miners as they get down to work ▸ 1 - “Business Goals Law” 
 Business objectives are the origin of every data mining solution.
 A data miner is someone who discovers useful information from data to support specific business goals. Data mining isn’t defined by the tool you use. ▸ 2 - “Business Knowledge Law”
 Business Knowledge is central to every step of the data mining process.
 You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signifies and how the business works. 19
  • 19.
    data & contentdesign LESSON 5 NINE LAWS OF DATA MINING - 2 ▸ 3. “Data Preparation Law”
 Data preparation is more than half of every data mining process.
 Pretty much every data miner will spend more time on data preparation than on analysis. ▸ 4. “No Free Lunch for the Data Miner”
 The right model for a given application can only be discovered by experiment.
 In data mining, models are selected through trial and error. ▸ 5 - “Patterns”
 There are always patterns in the data.
 As a data miner, you explore data in search of useful patterns. Understanding patterns in the data enables you to influence what happens in the future. 20
  • 20.
    data & contentdesign LESSON 5 NINE LAWS OF DATA MINING - 3 ▸ 6.  “Insight Law”
 Data mining amplifies perception in the business domain.
 Data mining methods enable you to understand your business better than you could have done without them. ▸ 7 - “Prediction Law”
 Prediction increases information locally by generalization.
 Data mining helps us use what we know to make better predictions (or estimates) of things we don’t know. 21
  • 21.
    data & contentdesign LESSON 5 NINE LAWS OF DATA MINING - 4 ▸ 8. “Value Law”
 The value of data mining results is not determined by the accuracy or stability of predictive models.
 Your model must produce good predictions, consistently. That’s it. ▸ 9. “Law of Change”
 All patterns are subject to change.
 Any model that gives you great predictions today may be useless tomorrow. 22
  • 22.
    data & contentdesign LESSON 5 PHASES OF THE DATA MINING PROCESS The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It’s an open standard; anyone may use it. 23
  • 23.
    data & contentdesign LESSON 5 BUSINESS UNDERSTANDING Get a clear understanding of the problem you’re out to solve, how it impacts your organization, and your goals for addressing it. Tasks in this phase include: ▸ Identifying your business goals ▸ Assessing your situation ▸ Defining your data mining goals ▸ Producing your project plan 24
  • 24.
    data & contentdesign LESSON 5 DATA UNDERSTANDING Review the data that you have, document it, identify data management and data quality issues. Tasks in this phase include: ▸ Gathering data ▸ Describing ▸ Exploring ▸ Verifying quality 25
  • 25.
    data & contentdesign LESSON 5 DATA PREPARATION Get your data ready to use for modeling. Tasks in this phase include: ▸ Selecting data ▸ Cleaning data ▸ Constructing ▸ Integrating ▸ Formatting 26
  • 26.
    data & contentdesign LESSON 5 MODELING Use mathematical techniques to identify patterns within your data. Tasks in this phase include: ▸ Selecting techniques ▸ Designing tests ▸ Building models ▸ Assessing models 27
  • 27.
    data & contentdesign LESSON 5 EVALUATION Review the patterns you have discovered and assess their potential for business use. Tasks in this phase include: ▸ Evaluating results ▸ Reviewing the process ▸ Determining the next steps 28
  • 28.
    data & contentdesign LESSON 5 DEPLOYMENT Put your discoveries to work in everyday business.  Tasks in this phase include: ▸ Planning deployment (your methods for integrating data mining discoveries into use) ▸ Reporting final results ▸ Reviewing final results 29
  • 29.
  • 30.
    data & contentdesign LESSON 5 DATA AGGREGATION Data aggregation is the process where raw data is gathered and expressed in a summary form for statistical analysis. For example, raw data can be aggregated over a given time period to provide statistics. After the data is aggregated and written to a view or report, you can analyze the aggregated data to gain insights about particular resources or resource groups. There are two types of data aggregation: ▸ Time aggregation - All data points for a single resource over a specified time period. ▸ Spatial aggregation - All data points for a group of resources over a specified time period. 31
  • 31.
    data & contentdesign LESSON 5 SUMMARY STATISTICS When data is aggregated, groups of observations are replaced with summary statistics based on those observations. Summary statistics are used tto communicate the largest amount of information as simply as possible. ▸ Mean ▸ Count ▸ Maximum ▸ Median ▸ Minimum ▸ Mode ▸ Range ▸ Sum 32
  • 32.
    data & contentdesign LESSON 5 TABLES Tables are the format in which most numerical data are initially stored and analysed and are likely to be the means you use to organise data collected during experiments and dissertation research. Tables are an effective way of presenting data: • when you wish to show how a single category of information varies when measured at different points (in time or space). • when the dataset contains relatively few numbers. • when the precise value is crucial to your argument and a graph would not convey 33
  • 33.
    data & contentdesign LESSON 5 BAR CHARTS Bar charts are one of the most commonly used types of graph and are used to display and compare the number, frequency or other measure for different discrete categories or groups. The bars can be drawn either vertically or horizontally depending upon the number of categories and length or complexity of the category labels. 34
  • 34.
    data & contentdesign LESSON 5 HISTOGRAMS Histograms are a special form of bar chart where the data represent continuous rather than discrete categories. Since a continuous category may have a large number of possible values the data are often grouped to reduce the number of data points. 35
  • 35.
    data & contentdesign LESSON 5 PIE CHARTS Pie charts are a visual way of displaying how the total data are distributed between different categories. Pie charts should only be used for displaying nominal data. They are generally best for showing information grouped into a small number of categories and are a graphical way of displaying data that might otherwise be presented as a simple table. 36 Pie chart of populations of English native speakers
  • 36.
    data & contentdesign LESSON 5 LINE GRAPHS Line graphs are usually used to show time series data – that is how one or more variables vary over a continuous period of time. Line graphs are particularly useful for identifying patterns and trends in the data such as seasonal effects, large changes and turning points. As well as time series data, line graphs can also be appropriate for displaying data that are measured over other continuous variables such as distance. 37
  • 37.
    data & contentdesign LESSON 5 SCATTER PLOT Scatter plots are used to show the relationship between pairs of quantitative measurements made for the same object or individual. By analysing the pattern of dots that make up a scatter plot it is possible to identify whether there is any systematic or causal relationship between the two measurements. ▸ https://www2.le.ac.uk/offices/ld/resources/ numerical-data/numerical-data 38
  • 38.
    DATA SCIENCE WHAT IS Photoby ev on Unsplash
  • 39.
    data & contentdesign LESSON 5 DEFINITION Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data and solve analytically complicated problems. 40
  • 40.
    data & contentdesign LESSON 5 APPLICATION OF DATA SCIENCE 41
  • 41.
    data & contentdesign LESSON 5 42
  • 42.
    data & contentdesign LESSON 5 EXPLAINING VS PREDICTING 43 By 2020 more than 80 % of the data will be unstructured. This data is generated from different sources like financial logs, text files, multimedia forms, sensors, and instruments.
  • 43.
    data & contentdesign LESSON 5 44https://databasetown.com/introduction-to-data-science-a-beginners-guide/#What_is_Data_Science
  • 44.
    data & contentdesign LESSON 5 45
  • 45.
    data & contentdesign LESSON 5 46 The Data Scientist has the ability to handle the crude data using the latest technologies and techniques, can perform the necessary analysis, and can present the acquired knowledge to his associates in an informative way.
  • 46.
    data & contentdesign LESSON 5 47 The Data Analyst works with R, Python and SQL; the role combines technical and analytical knowledge.
  • 47.
    data & contentdesign LESSON 5 48 The Data Architect integrates, centralizes, protects and maintains data sources.
  • 48.
    data & contentdesign LESSON 5 49 The Statistician can be seen as the pioneer of the data science field. It is often he who reaps the information from the data and transforms it into actionable insights.
  • 49.
    data & contentdesign LESSON 5 50 The Database Administrator ensures that the database is accessible to every stakeholder in the organizations and performs the necessary safety measures to keep the stored data safe.
  • 50.
    data & contentdesign LESSON 5 51 The Business Analyst is probably the least technical profile, he has a deep understanding of the various business processes that are in place. He often performs the role of the middle person between the business folks and the technicians.
  • 51.
    data & contentdesign LESSON 5 52 The Data and Analytics Manager steers the direction of the data science team. He consolidates strong and specialized skills in a various arrangement of advancements (SQL, R, SAS, … ) with the social aptitudes required to deal with a group.
  • 52.
  • 53.
    data & contentdesign LESSON 5 THE NY TIMES https://www.nytimes.com/interactive/2019/11/02/us/politics/trump-twitter- disinformation.html 54