A retail company wants to analyze sales data from different branches in the first quarter. Creating a pivot table is an efficient way to summarize this data compared to manually calculating totals. The steps include selecting the data table, choosing where to place the pivot table, and dragging fields to the rows, columns, and values areas. This summarizes the data in a report without changing the original data. Pivot tables allow reorganizing data in a new way to gain insights.
2. 2
Dear fellow learners and researchers,
I want to express my deep passion for sharing knowledge and helping people find the information
they need. I believe that knowledge should be accessible to everyone, and by sharing it, we can
contribute to the advancement of our society.
However, I also want to emphasize the importance of creating your own research. While my work
may serve as a source of inspiration or a starting point for your own research, I urge you to
conduct your own exploration and draw your own conclusions. This not only ensures the
originality and authenticity of your work but also allows for the discovery of new ideas and
perspectives.
Let us all strive to create and share knowledge in an ethical and responsible manner.
By doing so, we can make a positive impact on our communities and the world.
Thank you for your attention, and I wish you all the best in your research and learning endeavors.
And remember Share is Care.
Sincerely,
Lamees El-Ghazoly.
3. Data Visualization
vs. Data Mining
3
https://www.slideshare.net/lameesmahmou
d1/data-and-information-visualization-part-
1part-1pptx
Part 1
4. Data Mining and Data Visualization are two
complementary approaches to the analysis and
interpretation of data.
While they are distinct techniques, they
are often used together to gain valuable insights
from Data.
4
5. A
T
D A
A large Retail Chain wanted to identify patterns in customer behavior to increase sales and
customer satisfaction. The company had a vast amount of customer data, including purchasing
histories, demographic information, and store location data. To analyze this data, the company
used data mining techniques to identify patterns and relationships in the data.
To communicate these insights to the company's stakeholders, Data
Visualization was used to create a series of interactive dashboards.
These dashboards provided a visual representation of the
relationships between different products, as well as the
demographics of customers who purchased them.
The Data Visualization Dashboards allowed the stakeholders to easily
explore the data and gain a deeper understanding of customer
behavior. For example, they were able to see that customers in
certain age groups were more likely to purchase certain products,
allowing them to tailor marketing campaigns to specific
demographics.
IF
The Data Mining analysis revealed that customers who purchased
certain products were more likely to purchase other related
products. For example, customers who purchased diapers were also
likely to purchase baby food and formula.
By using Data Mining to identify patterns in the data and Data Visualization to communicate
these insights, the retail chain was able to increase sales and customer satisfaction by tailoring
their marketing and product offerings to better meet the needs and preferences of their
customers.
5
6. 6
Data Mining-
Data Mining is the process of Discovering Patterns, Trends, and Insights in large Datasets using
various Computational Techniques such as Statistical Analysis, Machine Learning, and Artificial
Intelligence.
It involves Analyzing and Extracting useful information from large volumes of data, which can
then be used for various purposes, such as Making Informed Business Decisions, Improving
Processes, Identifying Opportunities, and Predicting Future Outcomes.
Also known as Knowledge Discovery in Data (KDD)
7. 7
Data Mining
Data Mining processes include
Sequences Analysis,
Classifications,
Path Analysis,
Clustering, and
Forecasting.
Data Mining is the practice of
Automatically searching large stores of
data to discover patterns and trends
that go beyond simple analysis.
Data mining uses Sophisticated
Mathematical Algorithms to Segment
the Data and Evaluate the probability of
future events.
8. Four Stages: Data Sources, Data Gathering or Data Exploring Data Modeling, and Deploying
the Data Models.
8
Data Mining
9. 10 Data Mining Techniques
Outlier Direction:
For certain instances, you can’t easily interpret the data collection by merely understanding
the underlying trend. You must also be able to spot anomalies in the data or outliers.
For example, You’ll want to investigate the spike and figure out what drove it, so you can
either reproduce or bring your public into the cycle if your buyers are almost entirely male.
Still, there are significant spikes in female purchasers during a stranger week in July.
9
Data Mining
10. Associations:
The association is related to trends but is unique to variables that are dependently
connected. In this case, you should search for particular events and characteristics which
are closely related to another occurrence.
For example: such as when your customers purchase a particular item, they also purchase a
second similar item. This is also used to suggest on online platforms like ” People also
bought ” this item.
10
Data Mining
11. Clustering is a common method used in the psychological, social,and physical sciences to identify
subgroups or profiles of indi-viduals within the larger population who share similar patternson a
set of variables.
Clustering algorithms employ unsupervised learning to findnatural data groups in a non-classified
dataset .
Traditional methods of clustering (e.g.,K-means) attempt to place each individual case into a
clusterwith other observations with which it shares a similar scorepattern.
The fuzzy clustering is considered as soft clustering, in which each element has a probability of
belonging to each cluster. In other words, each element has a set of membership coefficients
corresponding to the degree of being in a given cluster.
This is different from k-means and k-medoid clustering, where each object is affected exactly to
one cluster. K-means and k-medoids clustering are known as hard or non-fuzzy clustering. 11
Data Mining
12. For Example: To bundle your customer Demographics into different bundles, based on the
amount of disposable income you have or how much you choose to shop in your store.
12
Data Mining
13. Classifications
This Analysis is used to obtain essential and appropriate data and metadata information.
This method of data mining assists in the classification of data into various groups. It is a
more complex data mining technique that forces you to collect various attributes into
distinguishable categories, and then to draw more conclusions or serve a function.
For example, You might, for instance, identify them as “low,” “medium,” or “high” loans if
you analyze the financial history or purchase records of each borrower. We will then be
used to learn more about these customers.
13
Data Mining
14. Regressios is used primarily for forecasting and modeling purposes, considering the
existence of other variables, to determine the likelihood of a particular variable.
For example, A certain amount, based on other factors as availability, market demand, and
competition, may be predicted. The main goal of regression is to help you identify the exact
relationship between two (or more) variables in a given collection of data.
14
Data Mining
15. Data Warehousing
Without Data Warehousing, Data Mining is incomplete.
Data storage is a method used to store vast volumes of organized data safely. The
preservation of data is not only a preservation problem but also for data maintenance and
security. The business of a large scale requires Data warehousing to store the data safely.
15
Data Mining
16. Visualization
Graphs, Charting, and Digital Images are a process of tableting of data Visualization This
allows businesses to quantify and improve their growth chart.
You may also compare your growths to your rivals and assess your market place.
Data visualization will enable companies to make informed decisions because they are aware
of a simple, well-defined representation of data.
16
Data Mining
17. Factor analysis: Determine which variables are combined
to generate a given factor
For Example, for many psychiatric data, one can indirectly
measure other quantities (such as test scores) that reflect
the factor of interest.
17
Data Mining
Statistical Techniques
As its name suggests, the mean, mode, and median of the data are determined to predict
future trends.
For businesses, Statistical analysis is Instrumental as it paves the way for their future profits.
Using statistics, companies can make strategic choices, measure their ROIs, and formulate a
marketing plan that takes into account potential trends through data.
Discriminant analysis:
Predict a categorical response variable, commonly used in
social science.
Attempts to determine several discriminant functions (linear
combinations of the independent variables) that discriminate
among the groups defined by the response variable.
18. Tracking Patterns
One of the most critical strategies for Data Mining is to find trends in the data sets. It
typically detects a specific aberration of the information that happens periodically or
fluctuation of a particular variable over time. For example, You may note that a specific
product tends to increase sales shortly before your holidays or that hot weather brings more
customers to your website.
Sequential Patterns
This implies that the sequence of the data is known. Sequential analyzes are also useful for
businesses because they can track selling trends. It may also assist organizations in learning
about the sequence of activities taking place in their Databases.
•Mining Sequence Data
oMining Time Series
oMining Symbolic Sequences
oMining Biological Sequences
•Mining Graphs and Networks
18
Data Mining
20. 20
Data Mining
A Healthcare provider wants to identify patients who are at a high risk of developing a
particular disease.
They use Data Mining techniques to analyze Patient Records, including Demographic
Data, Medical Histories, and test results. By using Machine Learning Algorithms to
identify patterns in the data, they can identify patients who are at a High Risk of
Developing the Disease and take proactive measures to prevent it
IF
S.No Types of disease
Data mining
tool
Technique Algorithm
Traditional
method
Accuracy level
% for DM
application
1. Tuberculosis WEKA Naïve Bayes Classifier KNN
Probability
Statistics
78 %
2. Heart Disease ODND,NCC2 Classification Naive Probability 60 %
3. Kidney Dialysis RST Classification
Decision
Making
Statistics 76 %
4. Diabetes Mellitus ANN Classification
C4.5
Algorithm
Neural Network 82%
5.
Blood Bank
Sector
WEKA Classification J48 90 %
6. Dengue SPSS Modeler C5.0 Statistics 80 %
7. Hepatitis C SNP Information Gain Decision rule 74 %
21. 21
Data Mining
The Bar Graph formed by using the above table with the percentage of accuracy level of health
care problems is as illustrated in the given figure.
In this bar graph, the predicted accuracy level of various data mining applications has been
distinguished.
22. • Data Visualization is the process of visualizing or displaying the data extracted in different
graphical or visual formats such as statistical representations, pie charts, bar graphs, graphical
images, etc.
• Data Visualization contains processing, analyzing, communicating the data, etc.
• Data Visualization gives a clear view of the data and will be easy for the human brain to
remember and memorize large chunks of data at a glance.
• In Data Visualization has seven stages: acquiring process, parsing, filtering, mining,
representing, refining, and interacting.
• Data Visualization facilitates complex data analysis by converting numerical data into
meaningful 3D pictures and other graphical images.
• In contrast, the applications of Data Visualization include sonar measurements, satellite photos,
computer simulations, surveys, etc.
22
Data Visualization
23. Data Visualization originated from statistics and
sciences, which give clear visualization at a glance,
meaning a picture gives 100 words at its sight.
In Data Visualization, the main application includes
geographical information systems where important
geographical information can be represented as
visual images that represent complex information
as simply as possible.
Data Visualization has different applications, such
as retail, government, medicine and healthcare,
transportation, telecommunication, insurance,
capital markets, and asset management.
Data Visualization provides a lot of visualization
techniques that have been developed over the past
decades that support the exploration of large data
sets.
23
Data Visualization
The infographic allows viewing, through the flows,
geographical movements of migrant masses.
24. 24
Data Visualization
A company wants to analyze its sales data to identify trends and patterns. They use a
line chart to visualize the sales data over time, with different colors representing
different product lines. By looking at the chart, they can see which product lines are
performing well and which ones need improvement.
IF
25. Data Visualization vs. Data Mining
25
Basis For Comparison Data Mining Data Visualization
Definition Searches and produces relevant results
from large data chunks.
Gives a simple overview of complex data.
Preference This has different applications and is
preferred for web search engines.
They are preferred for data forecasting
and predictions.
Area Comes under data science. Comes under the area of data science.
Platform It is operated with web software systems
or applications.
Supports and works better in complex data
analyses and applications.
Generality New technology but underdeveloped. More useful in real-time data forecasting.
Algorithm Many algorithms exist in using data
mining.
No need to use any algorithms.
Integration It runs on any web-enabled platform or
with any applications.
Irrespective of hardware or software, it
provides visual information.
26. When working with Categorical Data and Numeric Data together, there are a few important
considerations to keep in mind:
Choosing the right visualization: When presenting Categorical Data and Numeric Data
together, it's important to choose a visualization that effectively communicates the
relationships between the Data.
For example, a scatter plot may be a good choice for visualizing the relationship between two
numeric variables, while a stacked bar chart may be more appropriate for comparing the
frequency of different categories in a categorical variable.
26
Categorical Data with Numeric Data
27. Scaling: When working with Numeric and Categorical Data together, it's important to ensure that
the data is scaled appropriately.
For example, if one variable has a much larger range of values than the other, it may be
necessary to Rescale the data to ensure that both variables are represented in the visualization.
There are different methods to Rescale Data such as Standard Scaling or Standardization,
Normalization or , Percentile Transformation and more. You can use codes to demonstrate how to
Standardize, Normalize and Percentilize Data in R
27
Categorical Data with Numeric Data
28. Statistical Analysis: When Analyzing Categorical Data and Numeric Data together, it's
important to use appropriate statistical techniques to identify relationships and patterns in
the Data.
For example, chi-squared tests may be used to determine whether there is a significant
relationship between a categorical variable and a numeric variable.
Interpretation: When interpreting the results of an analysis that includes both categorical
data and numeric data, it's important to consider the context of the data and the
relationships between the variables.
For example, a strong correlation between two variables may not necessarily imply a causal
relationship, and it may be important to consider other factors that may be influencing the
relationship.
28
Categorical Data with Numeric Data
31. You would have to maybe copy-paste each of these
first quarter sales numbers into another spreadsheet
or another part of this spreadsheet and then I'd have
to do a formula to calculate that number.
It's just a lot of work and effort . OR
You can Simply used a Pivot Table.
31
you have an Excel sheet for your hypothetic Retail company that has many branches
and you need to Know how did your business do in the first quarter well that's a little
bit difficult!!
IF
32. But it produces a Report that is going to be
helpful to you.
One Important thing should be recognized
about Pivot Tables that when you create this
Pivot Table in just a minute it's not going to
change any Data in the Spreadsheet this is all
going to stay intact nothing’s going to be
changed at all it just helps to look at this Data in
a New Way.
So, let's get started first thing to consider when
you’re about to create a Pivot Table….
32
A Quick Definition Of A Pivot Table
A Pivot Table is an Excel tool that allows you to
Reorganize and Summarize certain Data in the
Spreadsheet, specifically in selected Columns and
Rows of Data and it not only Reorganizes and
Summarizes that!!
33. Let's get started first thing to consider when you’re about to create a Pivot Table….
33
Tip 1# Data should be Listed Vertically, with Column Titles
Tip 2# Make sure there are NO Blank Row in your Data
34. Let's get started first thing to consider when you’re about to create a Pivot Table….
34
Tip 3# Avoid having Extra “Data’’ in your
Spreadsheet. Such as Hidden Notes
Tip 4# Format your Data as a Table
36. Mechanizm
36
All you have to do is go up to insert and choose
pivot table and right away Excel wants you to
give it some information about the Pivot Table
and the first thing that’s asking… If the Data is
a Table or a Range or if you would like to use an
External Data Source…
In our Example You will use a Table
Next Choose where you want the Pivot Table
Report to be placed… Somewhere in this
Existing Worksheet Or New Worksheet!
37. Mechanizm
37
At the right you can see that a panel opened up on
the right and this is the pivot table fields panel or
pane and what we have here is a list of the column
headings or column titles that you had typed in the
original spreadsheet
And then Down Below you have these Four areas
filters…
Columns, Rows, and Values..
So, it depends on your purpose from Pivot Table..
What do you want to show it in Rows and what
value do you care about in this report !
38. Mechanizm
38
In this Example
Customer City
Customer State
Delivery postcode
Add to Columns
Payment Method
Customers Name
Add to Rows
Final Price Values
39. Mechanizm
39
Your Pivot Table report the way you want it.
You can see The total for each payment method even you can create a Pivot Chart....
41. Data visualization, like any other form of communication, can be manipulated or misrepresented
to deceive or mislead the audience.
What is misrepresentation of data?
Data representation is the visual depiction of useful information. However, it is even more
important to represent the insights correctly. Any misrepresentation of data will lead to errors of
judgment.
For example, using a red color for a bar that represents a positive value can convey a negative
message.
The results could be catastrophic in the worst cases. On the other hand, it could be an
embarrassment at the workplace is not the worst case.
41
How can Data Visualization lie?
Different reasons for misleading visualization of data
Data is misrepresented if it qualifies one or more of the following criteria:
Unethical manipulation of data in analysis phase
Unethical manipulation of data in visualization phase
Inconsistency errors
Incompetency errors
42. Analysis Phase
Data can be wrongly manipulated in the
analysis phase itself. Sadly, it is far more
common than we may imagine. Such
manipulation stems from the need to
force a particular ideology, perspective, or
result. One may selectively collect data.
Further, one may selectively filter the data
for analysis. Also, there are times when
someone may hide an unsupported
hypothesis. At times, people may even
fabricate data to show the results. All of
these are considered to be unethical
practices.
Unethical manipulation of data in
Visualization Phase
The second layer of misrepresentation
comes from the visualization phase. In
this phase, the analyst already has the
result. They may purposefully
manipulate what and how the insights
are presented. Again, they may keep
the unfavorable findings to
themselves. Alternatively, there are
more technical ways to misrepresent
data. We shall discuss these in detail
later. The way we prepare the charts
and graphics has a very strong impact
on what story is being conveyed.
And
43. 43
Misleading Data Visualization Examples
This exercise is for analysis only and not meant to criticize any outlets. Charts are for education
and not copyrighted by Management Weekly.
Gun deaths vs ‘stand your ground’ campaign
What is wrong with this chart?
The Y-axis is flipped on this chart,
showing more gun-related deaths as
you move down instead of up. This goes
against the standard way of reading
charts and can be misleading to some
viewers.
How people interpret this chart?
This graph shows gun-related deaths in Florida over time. The number of deaths decreased
after 2005, possibly due to the "stand your ground" law aiming to reduce gun deaths.
44. 44
Misleading Data Visualization Examples
The number of gun deaths increased from about 500 to 800 after the law was passed.
A correctly visualized graph would show this trend in the traditional form.
Correct representation
Accurate Data Visualizations require following conventions. Altered charts and selective
storytelling can promote biased viewpoints.
The chart uses a conventional trendline to make data interpretation less prone to errors. But,
correlation doesn't mean causation. More investigation is needed to confirm the impact of
gun laws on deaths.
45. 45
Misleading Data Visualization Examples
Biggest covid worries
What is wrong with this chart?
A pie chart shows the proportion of each component as a percentage. All parts should add up
to 1, or 100%. For instance, around 48% of people may be concerned about getting the virus.
How people interpret this chart?
This chart wrongly depicts the
percentage of each of these components.
If you want to represent any data in the
form of a pie chart, you should always
do by finding the proportion in terms of
the whole. If you have got data that has
an overlap, such as this case, then you
must represent it differently.
46. 46
Misleading Data Visualization Examples
Correct representation
There are various ways to accurately represent this data. One option is to use a Venn diagram
to show the different categories and any potential overlap between them. Another option is to
create a bar chart. Both methods are effective in conveying the information.
47. Data visualization, like any other form of communication, can be manipulated or
misrepresented to deceive or mislead the audience.
Here are some ways in which data visualization can be used to lie:
Distorting the scale: By manipulating the scales on an axis, the visualization can be
made to appear more or less significant than it actually is.
For example, a bar graph that starts at a value greater than zero can make differences
between bars seem larger than they actually are.
47
How can Data Visualization lie?
Money Raised
48. Cherry-picking data: By selectively choosing which data to include or exclude, the visualization can be made
to support a particular conclusion. This is known as "cherry-picking" and can be used to misrepresent the
overall trend or pattern of the data.
For Example: Emerging markets are a very volatile asset, and depending on when you look at them they
can be all over the map. Let’s look at the timeframes that each source used:
https://portfoliocharts.com/2016/03/29/the-avoidable-mistake-of-cherry-picking-data/
48
How can Data Visualization lie?
As you can see, each average return is
completely accurate, but no one average
return tells the full story.
The source with the most data has the
most representative long-term number,
but it hides the fact that emerging
markets grew massively in the decade
between 1983-1993 and have done pretty
poorly for the last 20 years.
The shortest source includes all data since
its index fund was founded, but excludes
the remarkable run that drove people to
start it in the first place.
So which number should you use for your
own decision making? That’s where the
cherry picking comes in.
49. 49
How can Data Visualization lie?
Another Example: Below the created an example using the number of “leads” generated over the
course of 10 weeks. NOTE: Assume week 8 is simply an anomaly. There were no extra marketing
efforts made, just one great, random, week.
Leads
Generated
Week 1 20
Week 2 20
Week 3 30
Week 4 10
Week 5 10
Week 6 10
Week 7 10
Week 8 80
Week 9 10
Week 10 10
Created 2 very different graphical representations to illustrate how formatting can be manipulated
to create a large misrepresentation of data.
This is a Bar Graph showing the number of leads generated per week:
Logically, this tells us that after a decent showing in weeks 1-3,
excluding an abnormal week 8, we’ve seen a downtrend in leads
generated from 20-30 leads / week to 10.
Next Slide illustrated the exact same data set using different formatting:
50. 50
How can Data Visualization lie?
Here we’re looking at a linear trend line of the above data. You’ll notice that the actual data
line has been removed and the Y-Axis has been limited to show the maxi”}
k7
51. Omitting context: By omitting important context or background information, the visualization can
be made to appear more significant or less significant than it actually is.
A visualization of crime rates that does not account for changes in population over time can be
misleading.
For example, if the population of a city increases over time but the number of crimes remains
constant, the crime rate will appear to decrease.
This is why it is important to use Per Capita Rates when comparing crime rates over time or
between different locations1.
51
How can Data Visualization lie?
52. Overgeneralizing: By presenting data in a way that overgeneralizes or oversimplifies
complex phenomena, the visualization can be used to support a particular conclusion or
narrative.
For example, using a single data point to represent the entire population can be misleading.
Overall, Data Visualization can be used to lie or mislead if the designer intentionally distorts
the data, omits important context, or misrepresents the data in a way that supports a
particular agenda or point of view. It is important to critically evaluate visualizations and to
verify the accuracy and validity of the data presented.
52
How can Data Visualization lie?
53. 53
How to avoid data misrepresentation?
Unethical manipulation of data in analysis phase
Problem statement Have you defined your problem clearly, with required
variables?
Data Collection
Correct and Right source
Random sampling
Correct representation
Data Analysis
Pre-determined criteria
Avoid p-value hacking
Uniform methodology
Unethical manipulation of data in visualization phase
Type of visualization Use a visualization that enables correct inference
Visualization methodology
Unclutter the data with only top variables
Use standard and meaningful scales for axes and data
Don’t hide unfavorable findings
54. 54
How to avoid data misrepresentation?
Incompetency errors
Problem statement
Use labels for axes and titles for the visualization
Use pie chart sparingly (may be appropriate for percentage data)
Normalize data that has a lot of variances
Data Analysis Data that has random fluctuations must be averaged to eliminate
these variations.
Inconsistency errors
Visualization technique Use consistent and commonly used scale
Use the same scale for different charts having similar data
Data representation
Refrain from using too many variables
Never represent unrelated variables as related ones
56. Enhances Understanding
Data visualization can help people
understand complex Data by presenting it in
a more intuitive and accessible way. It
can reveal patterns, trends, and
relationships that may not be apparent in
Raw Data.
Improves Decision-making
Data visualization can help decision-makers
make more informed decisions by providing a
clear and concise representation of Data. It
enables them to quickly identify trends and
patterns, and make informed decisions based on
the insights gained from the Data.
Data Visualization plays a critical role in helping people to understand, analyze, and
communicate complex Data and information. It is an essential tool for Decision-
making, problem-solving, and exploring Data in a meaningful and impactful way.
Increases Engagement
Data visualization can make Data
more engaging and interesting by
presenting it in an interactive and
visually appealing way. This can
help to increase engagement and
encourage people to explore the
Data further.
Facilitates Communication
Data visualization can be used to
communicate complex data and
information to a wider audience.
It can help to simplify complex
concepts and ideas, making them
easier to understand
and communicate.
Enables Exploration
Data visualization can enable
people to explore Data in a more
interactive way. By providing tools
for filtering, sorting, and drilling
down into the Data, it can help
people to uncover insights and
gain a deeper understanding of
the Data. 56
57. 57
•Data visualization is the graphical representation of
information and data in a visual or graphic format(Charts,
graphs, and maps).
•Data visualization tools provide an accessible way to see
and understand trends, patterns in data, and outliers.
•Data visualization tools and technologies are essential
to analyzing massive amounts of information and
making Data-driven decisions.
•Using pictures is to understand data that has been used
for centuries. General types of Data Visualization
are Charts, Tables, Graphs, Maps, and Dashboards.