2. About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
5. Basic functions
• Data pre-processing
• Set the ID column as primary key
• Join the salary column on primary key (Using VLOOKUP)
• Convert Height to Height_cm
5
=IFERROR(VLOOKUP(A2,salary!$A$2:$D$452,4,FALSE),0)
=MID(G2,2,1)*30.38 + MID(G2,4,1)*2.54
6. Basic functions
• Frozen the first row
• Conditional Formatting
• Age: Set RED color to under the average of Age
• Salary: Set GREEN color to above the average of Salary
• Position: Set YELLOW color to C of Position
• Sorting by Age or Salary for data investigation
• Sorting by Salary or Age
• Sorting by cell color
6
7. Basic functions
• Delete the Salary value equals to Zero
• Manually delete it
• Pivot table with Team, Position, Age, Salary
• Set Average on Age
• Sorting by Salary
7
8. Basic functions
• Binning by Age
• Find out the Max, Min, Ave, Range, Scale, Bin_interval, Dev
• Set each Bins of Age
• Frequency count of each bins
• First area the all cells, enter the formular in the first cell
• Ctrl + Shift + Enter for { } calculation
8
=FREQUENCY(F$2:F$449,P$3:P$8)
9. Basic functions
• Find the empty rate of College column
• Pick the College columns where includes the blank cells
• Click on Find & Select button and Click on Special objects
• Click on BLANK option
• key-in BLANK word in the top of edit area and CTRL + Enter to all the cells
• Distinct College column by utilizing the advanced filtering
• Count each distinct College frequency and ratio them
• Sorting by College ratio column
9
=COUNTIF(H$2:H$449,M2)
=N2/SUM(N$2:N$118)
10. Static chart
• There are generally three steps in drawing a chart:
• Observing the data, determine the relationship, and select the chart.
• What type of data it is, and what content you want to express.
• After clarifying the content to be expressed, you can choose which chart to
use to express it.
10
11. Pie chart
• You must have some kind of whole
amount that is divided into a number
of distinct parts.
• Your primary objective in a pie chart
should be to compare each group’s
contribution to the whole.
11
12. Line chart
• Line charts provide the clearest
graphical representation of time-
related variables and are the
preferred mode for representing
trends or variables over time.
12
13. Histogram chart
• It is used to summarize discrete
or continuous data that are
measured on an interval scale.
• It is often used to illustrate the
major features of the distribution
of the data in a convenient form.
13
14. Bar chart
• It provides a way of showing
data values represented as
the comparison of multiple
data sets side by side.
14
15. Differences between histogram and bar chart
Comparison terms Bar chart Histogram
Usage
To compare different categories of
data.
To display the distribution of a variable.
Type of variable Categorical variables Numeric variables
Rendering
Each data point is rendered as a
separate bar.
The data points are grouped and
rendered based on the bin value.
The entire range of data values is
divided into a series of non-
overlapping intervals.
Space between bars Can have space. No space.
Reordering bars Can be reordered. Cannot be reordered.
15
16. Scatter Plot
• It uses dots to
represent values for
two different numeric
variables and observe
relationships between
variables.
16
17. Bubble chart
• Bubble Charts are typically
used to compare and show
the relationships between
categorized circles, by the use
of positioning and
proportions.
• The overall picture of Bubble
Charts can be used to analyze
for patterns/correlations.
17
18. Box plot
• Q1: The first quartile (25%) position.
• Q3: The third quartile (75%) position.
• Interquartile range (IQR)
• Lower and upper 1.5*IQR whiskers:
These represent the limits and
boundaries for the outliers.
• Outliers: Defined as observations that
fall below Q1 − 1.5 IQR or above Q3 +
1.5 IQR.
18
19. Tree map
• It can display different data
in different color blocks,
and you can see the
comparison of the value of
each data through the size
of the block. When the
range of the block is larger,
it means that the value of
the data is larger.
19
20. Home work
Find a sample data source, try to pre-process in Excel and visualize it.
20