data & content design
Frieda Brioschi - frieda.brioschi@gmail.com
Emma Tracanella - emma.tracanella@gmail.com
AROUND DATA SCIENCE
LESSON 5 - 2020
data & content design
LESSON 5
THE COURSE
1. Introduction. What are data and information, why they matter
2. How to collect and organize data
3. Information classification
4. Data lingo
5. Around Data Science
6. Computer & Humans: how we perceive information
7. Visual communication of numerical data
8. Visual communication of non numerical data
9. Content type and effectiveness
10.Storytelling with data
11.Tools for analysis and data visualization
12.Artificial Intelligence demythologized
2
WITH YOUR DATA PROJECT
LET’S START
data & content design
LESSON 5
4
DESCRIBE YOUR PROJECT
Photo by William Iven on Unsplash
data & content design
LESSON 5
A COUPLE OF DIGRESSIONS
▸ storage issues
▸ http://blog.odsi.co.uk/wp-content/uploads/2013/08/History-of-computer-
data-storage.png.jpg
▸ the rise of data center
▸ computational power
▸ the Internet
5
data & content design
LESSON 5
MARGARET HAMILTON
6
data & content design
LESSON 5
DATA CENTER CLOUD (4.563 IN 2019)
7https://www.digitalic.it/tecnologia/data-center-cloud-numeri-e-diffusione-nel-mondo-litalia-tra-i-paesi-europei-che-ne-ospita-di-piu
BIG DATA
WHAT ARE
Photo by ev on Unsplash
data & content design
LESSON 5
DEFINITION
The term “big data” refers to data that is so large, fast or complex that it’s difficult or impossible to
process using traditional methods. The concept of big data gained momentum in the early 2000s
when industry analyst Doug Laney articulated the definition of big data as the three V’s:
▸ Volume: Organizations collect data from a variety of sources, including business transactions,
smart (IoT) devices, industrial equipment, videos, social media and more. In the past, storing it
would have been a problem.
▸ Velocity: With the growth in the Internet of Things, data streams in to businesses at an
unprecedented speed and must be handled in a timely manner, near-real time.
▸ Variety: Data comes in all types of formats – from structured, numeric data in traditional
databases to unstructured text documents, emails, videos, audios, stock ticker data and financial
transactions.
9
data & content design
LESSON 5
(ACCORDING TO SAS)
10
data & content design
LESSON 5
11
https://www.visualcapitalist.com/
big-data-keeps-getting-bigger/
data & content design
LESSON 5
CORRELATION
When two sets of data are strongly linked together we say they have a High Correlation.
▸ Correlation is Positive when the values increase together, and
▸ Correlation is Negative when one value decreases as the other increases
Correlation can have a value:
▸ 1 is a perfect positive correlation
▸ 0 is no correlation (the values don't seem linked at all)
▸ -1 is a perfect negative correlation
13
data & content design
LESSON 5
CORRELATION
Correlation is one of the most widely used statistical concepts.
Since the term "correlation" refers to a mutual relationship or association between
quantities, why is it a useful metric?
▸ Correlation can help in predicting one quantity from another
▸ Correlation can (but often does not) indicate the presence of a causal
relationship
▸ Correlation is used as a basic quantity and foundation for many other
modeling techniques
14
https://thenextweb.com/growth-quarters/2020/01/30/digital-trends-2020-every-single-stat-you-need-to-know-about-the-internet/
https://thenextweb.com/growth-quarters/2020/01/30/digital-trends-2020-every-single-stat-you-need-to-know-about-the-internet/
DATA
LINKED
data & content design
LESSON 5
AN EXAMPLE OF ONTOLOGY
http://mappings.dbpedia.org/server/ontology/classes/
18
data & content design
LESSON 5
LINKED DATA / LOD
19
Linked data is structured data which is interlinked with other data so it becomes
more useful through semantic queries.It builds upon standard Web technologies
but rather than using them to serve web pages only for human readers, it extends
them to share information in a way that can be read automatically by computers.
Part of the vision of linked data is for the Internet to become a global database.
Linked data may also be open data, in which case it is usually described as linked
open data (LOD).
▸ https://en.wikipedia.org/wiki/Linked_data
data & content design
LESSON 5
SCHEMA.ORG
http://schema.org/docs/full.html
20
data & content design
LESSON 5
GOOGLE KNOWLEDGE GRAPH
21
https://www.youtube.com/watch?v=mmQl6VGvX-c
data & content design
LESSON 5
WHY LINKED DATA MATTERS
Linked data is a method for publishing structured data using vocabularies like
schema.org that can be connected together and interpreted by machines. Using
linked data, statements encoded in triples can be spread across different
websites.
This enables data from different sources to be connected and queried.
▸ https://wordlift.io/blog/en/entity/linked-data/
22
data & content design
LESSON 5
23
Data-Informed Decision Making
To Making Better
Data-Informed
Decisions
data & content design
LESSON 5
24
Formulate
a focused
question
ASK
Data-Informed
DECISION
MAKING
PROCESS
Monitor the
outcome
ASSESS
Search for the
best available
data
ACQUIRE
Critically
appraise and
analyze the data
ANALYZE
Integrate the data
with your professional
expertise and be
conscious about your
mental models
APPLY
Decide and
communicate
ANNOUNCE
data & content design
LESSON 5
25
ASK
Turn the business questions into analytical question(s).
ACQUIRE
Find and source all relevant data. Remember to think
about the question systemically and include any
interrelated data that could be relevant. This includes
not only internal but external data and information too.
Ensure the sourced data is available, trusted, and in
the right form (extracted, profiled, tagged, cataloged,
standardized, treated for sensitivity, etc…)
ANALYZE
Create a measurement framework
to describe your data with KPIs.
Use exploratory analytics to find patterns and
trends and relationships that may exist and
not be obvious to start to drill into root cause.
?
data & content design
LESSON 5
26
ANNOUNCE
Announce your decision at the right level to ALL stakeholders
(direct, indirect, upstream, and downstream) by leveraging methodologies
like the ‘Rule of 3’ and the ‘Pyramid Principle’ in your storytelling
APPLY
Review and orientate yourself to the
information and data so far and apply your
personal experiences to it.
Challenge the data and look for information
and data to disprove it.
Review with a cognitively diverse team (or if
you are alone, be aware of your bias and
play devil’s advocate and reframe).
If applicable, leverage predictive analytics
to run simulations or similar to test
potential decisions
and solutions.
data & content design
LESSON 5
27
© 2019 QlikTech International AB. All rights reserved. Qlik®, Qlik Sense®, QlikView®, QlikTech®, Qlik Cloud®, Qlik DataMarket®, Qlik Analytics Platform®, Qlik NPrinting®, Qlik
Connectors®, Qlik GeoAnalytics®, Qlik Core®, Associative Difference®, Qlik Data Catalyst™, Qlik Associative Big Data Index™ and the QlikTech logos are trademarks of QlikTech
International AB which have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or registered trademarks of their respective owners.
About Qlik®
Qlik is on a mission to create a data-literate world, where everyone can use data to solve their most challenging problems. Only Qlik’s end-to-end data
management and analytics platform brings together all of an organization’s data from any source, enabling people at any skill level to use their
curiosity to uncover new insights. Companies use Qlik products to see more deeply into customer behavior, reinvent business processes, discover new
revenue streams, and balance risk and reward. Qlik does business in more than 100 countries and serves over 48,000 customers around the world.
ASSESS
Setup a review mechanism to monitor the
impacts of the decision after it is made and
acted upon.
Leverage that review mechanism and
fail/fix/learn fast including improvements to
data, measurement frameworks, accountability,
decisions, and anything else relevant
To learn more about Data-Informed Decision Making and explore our free courses and resources, visit
qlik.com/GetDataLiterate.

Around Data Science (v. 2020 ITA)

  • 1.
    data & contentdesign Frieda Brioschi - frieda.brioschi@gmail.com Emma Tracanella - emma.tracanella@gmail.com AROUND DATA SCIENCE LESSON 5 - 2020
  • 2.
    data & contentdesign LESSON 5 THE COURSE 1. Introduction. What are data and information, why they matter 2. How to collect and organize data 3. Information classification 4. Data lingo 5. Around Data Science 6. Computer & Humans: how we perceive information 7. Visual communication of numerical data 8. Visual communication of non numerical data 9. Content type and effectiveness 10.Storytelling with data 11.Tools for analysis and data visualization 12.Artificial Intelligence demythologized 2
  • 3.
    WITH YOUR DATAPROJECT LET’S START
  • 4.
    data & contentdesign LESSON 5 4 DESCRIBE YOUR PROJECT Photo by William Iven on Unsplash
  • 5.
    data & contentdesign LESSON 5 A COUPLE OF DIGRESSIONS ▸ storage issues ▸ http://blog.odsi.co.uk/wp-content/uploads/2013/08/History-of-computer- data-storage.png.jpg ▸ the rise of data center ▸ computational power ▸ the Internet 5
  • 6.
    data & contentdesign LESSON 5 MARGARET HAMILTON 6
  • 7.
    data & contentdesign LESSON 5 DATA CENTER CLOUD (4.563 IN 2019) 7https://www.digitalic.it/tecnologia/data-center-cloud-numeri-e-diffusione-nel-mondo-litalia-tra-i-paesi-europei-che-ne-ospita-di-piu
  • 8.
    BIG DATA WHAT ARE Photoby ev on Unsplash
  • 9.
    data & contentdesign LESSON 5 DEFINITION The term “big data” refers to data that is so large, fast or complex that it’s difficult or impossible to process using traditional methods. The concept of big data gained momentum in the early 2000s when industry analyst Doug Laney articulated the definition of big data as the three V’s: ▸ Volume: Organizations collect data from a variety of sources, including business transactions, smart (IoT) devices, industrial equipment, videos, social media and more. In the past, storing it would have been a problem. ▸ Velocity: With the growth in the Internet of Things, data streams in to businesses at an unprecedented speed and must be handled in a timely manner, near-real time. ▸ Variety: Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data and financial transactions. 9
  • 10.
    data & contentdesign LESSON 5 (ACCORDING TO SAS) 10
  • 11.
    data & contentdesign LESSON 5 11
  • 12.
  • 13.
    data & contentdesign LESSON 5 CORRELATION When two sets of data are strongly linked together we say they have a High Correlation. ▸ Correlation is Positive when the values increase together, and ▸ Correlation is Negative when one value decreases as the other increases Correlation can have a value: ▸ 1 is a perfect positive correlation ▸ 0 is no correlation (the values don't seem linked at all) ▸ -1 is a perfect negative correlation 13
  • 14.
    data & contentdesign LESSON 5 CORRELATION Correlation is one of the most widely used statistical concepts. Since the term "correlation" refers to a mutual relationship or association between quantities, why is it a useful metric? ▸ Correlation can help in predicting one quantity from another ▸ Correlation can (but often does not) indicate the presence of a causal relationship ▸ Correlation is used as a basic quantity and foundation for many other modeling techniques 14
  • 15.
  • 16.
  • 17.
  • 18.
    data & contentdesign LESSON 5 AN EXAMPLE OF ONTOLOGY http://mappings.dbpedia.org/server/ontology/classes/ 18
  • 19.
    data & contentdesign LESSON 5 LINKED DATA / LOD 19 Linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries.It builds upon standard Web technologies but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database. Linked data may also be open data, in which case it is usually described as linked open data (LOD). ▸ https://en.wikipedia.org/wiki/Linked_data
  • 20.
    data & contentdesign LESSON 5 SCHEMA.ORG http://schema.org/docs/full.html 20
  • 21.
    data & contentdesign LESSON 5 GOOGLE KNOWLEDGE GRAPH 21 https://www.youtube.com/watch?v=mmQl6VGvX-c
  • 22.
    data & contentdesign LESSON 5 WHY LINKED DATA MATTERS Linked data is a method for publishing structured data using vocabularies like schema.org that can be connected together and interpreted by machines. Using linked data, statements encoded in triples can be spread across different websites. This enables data from different sources to be connected and queried. ▸ https://wordlift.io/blog/en/entity/linked-data/ 22
  • 23.
    data & contentdesign LESSON 5 23 Data-Informed Decision Making To Making Better Data-Informed Decisions
  • 24.
    data & contentdesign LESSON 5 24 Formulate a focused question ASK Data-Informed DECISION MAKING PROCESS Monitor the outcome ASSESS Search for the best available data ACQUIRE Critically appraise and analyze the data ANALYZE Integrate the data with your professional expertise and be conscious about your mental models APPLY Decide and communicate ANNOUNCE
  • 25.
    data & contentdesign LESSON 5 25 ASK Turn the business questions into analytical question(s). ACQUIRE Find and source all relevant data. Remember to think about the question systemically and include any interrelated data that could be relevant. This includes not only internal but external data and information too. Ensure the sourced data is available, trusted, and in the right form (extracted, profiled, tagged, cataloged, standardized, treated for sensitivity, etc…) ANALYZE Create a measurement framework to describe your data with KPIs. Use exploratory analytics to find patterns and trends and relationships that may exist and not be obvious to start to drill into root cause. ?
  • 26.
    data & contentdesign LESSON 5 26 ANNOUNCE Announce your decision at the right level to ALL stakeholders (direct, indirect, upstream, and downstream) by leveraging methodologies like the ‘Rule of 3’ and the ‘Pyramid Principle’ in your storytelling APPLY Review and orientate yourself to the information and data so far and apply your personal experiences to it. Challenge the data and look for information and data to disprove it. Review with a cognitively diverse team (or if you are alone, be aware of your bias and play devil’s advocate and reframe). If applicable, leverage predictive analytics to run simulations or similar to test potential decisions and solutions.
  • 27.
    data & contentdesign LESSON 5 27 © 2019 QlikTech International AB. All rights reserved. Qlik®, Qlik Sense®, QlikView®, QlikTech®, Qlik Cloud®, Qlik DataMarket®, Qlik Analytics Platform®, Qlik NPrinting®, Qlik Connectors®, Qlik GeoAnalytics®, Qlik Core®, Associative Difference®, Qlik Data Catalyst™, Qlik Associative Big Data Index™ and the QlikTech logos are trademarks of QlikTech International AB which have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or registered trademarks of their respective owners. About Qlik® Qlik is on a mission to create a data-literate world, where everyone can use data to solve their most challenging problems. Only Qlik’s end-to-end data management and analytics platform brings together all of an organization’s data from any source, enabling people at any skill level to use their curiosity to uncover new insights. Companies use Qlik products to see more deeply into customer behavior, reinvent business processes, discover new revenue streams, and balance risk and reward. Qlik does business in more than 100 countries and serves over 48,000 customers around the world. ASSESS Setup a review mechanism to monitor the impacts of the decision after it is made and acted upon. Leverage that review mechanism and fail/fix/learn fast including improvements to data, measurement frameworks, accountability, decisions, and anything else relevant To learn more about Data-Informed Decision Making and explore our free courses and resources, visit qlik.com/GetDataLiterate.