2. About me
• Education
• NCU (MIS)、NCCU (CS)
• Experiences
• Telecom big data Innovation
• Retail Media Network (RMN)
• Customer Data Platform (CDP)
• Know-your-customer (KYC)
• Digital Transformation
• Research
• Data Ops (ML Ops)
• Business Data Analysis, AI
2
7. Get ready to your Orange 3
• Open source machine learning and data visualization
• Version: 3.36.2
• https://orangedatamining.com/
7
8. Story telling With Data (SWD)
• Always remember Data Comparison!
• Focus on simplicity and ease of interpretation
• The takeaways!
8
https://www.storytellingwithdata.com
11. Allow the labels to be written in a single,
easily readable line
11
12. Rainbow palette, overly distracting!
• If the goal is to observe the「fluctuation of commercials across
categories over the five years」, we could better achieve that by
iterating to a different graph type.
• On the other hand, if we’re meant simply to compare the overall
category trends,「toning down the color」usage might be beneficial.
12
13. Color in only the year with the highest
number of commercials in each category
13
This results in a visually chaotic!
2023
2022
Over-Time
14. The Over-Time means the Line-Graph
14
An overly complex visualization with numerous overlapping data series
15. In order of total number of commercials
across all five years of data
15
16. Bar charts instead of line graphs, we can
intentionally emphasize that aspect of our data
16
The number of commercial advertisers in each category, in each year, is a countable
17. The area graph small multiple chart
17
A visualization of this on social media.
It maintains visual interest while facilitating more straightforward
comparisons across categories over several years.
18. A combination of line graphs with descriptive
captions to convey these insights more clearly
18
19. A combination of line graphs with descriptive
captions to convey these insights more clearly
19
20. A combination of line graphs with descriptive
captions to convey these insights more clearly
20
21. Conclusion
• There is no singularly correct approach to data visualization.
• The key is to consider the audience's needs, the context of the
presentation, and the intended message.
• Visualizing data is as much an art as it is a science, requiring
experimentation, iteration, and feedback, rather than adherence to a
strict set of rules.
•All about communications!
21
https://www.storytellingwithdata.com/blog
22. What is data visualization?
• Data visualization is the graphical representation of information and
data.
• By using visual elements like charts, graphs, and maps.
• A way to see and understand trends, outliers, and patterns in data.
22
23. What is data visualization?
23
https://www.tableau.com/learn/articles/data-visualization#advantages-disadvantages
24. 24
The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
25. 25
The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
26. Static chart
• There are generally THREE STEPS in drawing a chart:
• Observing the data, determine the relationship, and select the chart.
• What type of data it is, and what content you want to express.
• Category
• Numeric
• Text
• Datetime
• After clarifying the content to be expressed, you can choose which chart to
use to express it.
26
27. Pie chart
• You must have some kind of whole
amount that is divided into a number
of distinct parts.
• Your primary objective in a pie chart
should be to compare each group’s
contribution to the whole.
27
28. Line chart
• Line charts provide the clearest
graphical representation of time-
related variables and are the
preferred mode for representing
trends or variables over time.
28
29. Histogram chart
• It is used to summarize discrete
or continuous data that are
measured on an interval scale.
• It is often used to illustrate the
major features of the distribution
of the data in a convenient form.
29
30. Bar chart
• It provides a way of showing
data values represented as
the comparison of multiple
data sets side by side.
30
31. Differences between histogram and bar chart
Comparison terms Bar chart Histogram
Usage
To compare different categories of
data.
To display the distribution of a variable.
Type of variable Categorical variables Numeric variables
Rendering
Each data point is rendered as a
separate bar.
The data points are grouped and
rendered based on the bin value.
The entire range of data values is
divided into a series of non-
overlapping intervals.
Space between bars Can have space. No space.
Reordering bars Can be reordered. Cannot be reordered.
31
32. Scatter Plot
• It uses dots to
represent values for
two different numeric
variables and observe
relationships between
variables.
32
Pearson Correlation
33. Box plot
• Q1: The first quartile (25%) position.
• Q3: The third quartile (75%) position.
• Interquartile range (IQR)
• Lower and upper 1.5*IQR whiskers:
These represent the limits and
boundaries for the outliers.
• Outliers: Defined as observations that
fall below Q1 − 1.5 IQR or above Q3 +
1.5 IQR.
33
39. Modify your output file path
• Check each of
Python widget,
change the old
path to your
existing path.
39
40. Dataset description (titanic.csv)
• In total with 12 columns.
• A training dataset to
predict whether passengers
will survive in the Titanic
accident.
40
41. Data Summary
• Load titanic.csv
• Data description
• Look at Names, Types, Role,
Values in table.
• Change the configurations
of Columns.
41
42. Data Summary
• Missing values
• Using the Features
Statistics Widget
• How about those missing
ratios?
42
48. Scatter plot
• Using scatter plot widget.
• It used to observe the degree
of correlation between
features
• positive correlation
• negative correlation
• noncorrelation
48
49. Box plot
• Using box plot widget.
• Comparing multiple
features with each other
49
50. Pivot Table
• Using pivot table widget.
• It summarizes the data
of a more extensive
table into a table of
statistics.
• The statistics can include
sums, averages, counts,
etc.
50
51. 1. Show me top 10 data rows
• Hint: Use Data Sampler widget
51
52. 2. Show me dataset info
• How many Rows?
• How many Features?
• All information like this!
52
54. 4. Survival Conclusion
• For features, SEX, PCLASS, SIBSP,
PARCH, EMBARKED
• Women had a higher chance of survival
than men.
• First-class passengers had a higher
chance of survival.
• Passengers with siblings, spouses had a
higher chance of survival.
• Passengers with children and parents
had a higher chance of survival.
• Departing from the S terminal may
lead to lower cabin class and lower
chances of survival.
54
56. 6. Look at survival rate by SEX and PCLASS
• Women in first class had a survival rate as high as 96.8%. In contrast,
men in economy class only had a 13.54% chance of survival
56
57. 7. Look at survival rate by SEX, AGE and
PCLASS
• In the event of a disaster, women in
first class or business class have a 90%
chance of survival regardless of age.
• On the other hand, if a man is in
economy class and older than 18, the
chance of survival is only 13.36%.
• To summarize, in a disaster scenario,
girls and women have a higher chance
of survival compared to boys and men.
• Additionally, the higher the class (such
as first class), the higher the chances
of survival.
57
58. 8. The price paid of each class
• Try to plot Pclass and Fare chart
to visualize data
• Every seat had someone board
for free, while others spent over
500 pounds for a first-class
ticket. It's quite an interesting
observation!
58
59. 9. Visualizing data and express your thoughts
• Using today’s teaching knowledge and referencing
Story_telling_with_data.pdf, please visualize and analysis this data
(20240320_HW.csv) with the theme of sales.
• Based on your observations, explain the relationship between sales
and these variables.
59