Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Muhammad abdullah bhutto
1. TOPIC:
Why Continuous Data “Better” than Categorical or Discrete Data?
Briefly describe the Advantages of Continuous Over Discrete Data.
SUBMITTED TO:SIR BAHAUDIN BALOCH
SUBMITTED BY: MUHAMMAD ABDULLAH BHUTTO
REGISTRATION NUMBER: 2K18-PT-134
FACULTY: OF CROP PROTECTION SINDH AGRICULTURE UNIVERSITY
TANDOJAM
DATE: 10/06/2020
2. TOPIC: Why Continuous Data “Better” than
Categorical or Discrete Data? Briefly describe the
Advantages of Continuous Over Discrete Data.
EARLIER, I WROTE ABOUT THE DIFFERENT
TYPES OF DATA STATISTICIANS TYPICALLY
ENCOUNTER. IN THIS POST, WE'RE GOING TO
LOOK AT WHY, WHEN GIVEN A CHOICE IN THE
MATTER, WE PREFER TO ANALYZE
CONTINUOUS DATA RATHER THAN
CATEGORICAL/ATTRIBUTE OR DISCRETE DATA.
3. AS A REMINDER
when we assign something to a group or give it a name, we have
created attribute or categorical data. If we count something, like
defects, we have gathered discrete data. And if we can measure
something to a (theoretically) infinite degree, we have continuous
data.
4. A statistical software package like Minitab is extremely
powerful and can tell us many valuable things—as long as
we're able to feed it good numbers. Without numbers, we have
no analyses nor graphs. Even categorical or attribute data
needs to be converted into numeric form by counting before
we can analyze it.
5. What Makes Numeric Data Discrete or Continuous?
At this point, you may be thinking, "Wait a minute—we can't really
measure anything infinitely,so isn't measurement data actually
discrete, too?" That's a fair question.
If you're a strict literalist, the answer is "yes"—when we measure a
property that's continuous, like height or distance, we are de
facto making a discrete assessment. When we collect a lot of those
discrete measurements, it's the amount of detail they contain that
will dictate whether we can treat the collection as discrete or
continuous
6. I like to think of it as a question of scale. Say I want to measure the weight of 16-ounce cereal
boxes coming off a production line, and I want to be sure that the weight of each box is at
least 16 ounces, but no more than 1/2 ounce over that.
With a scale calibrated to whole pounds, all I can do is put every box into one of three
categories: less than a pound, 1 pound, or more than a pound.
With a scale that can distinguish ounces, I will be able to measure with a bit more accuracy just
how close to a pound the individual boxes are. I'm getting nearer to continuous data, but there
are still only 16 degrees between each pound.
But if I measure with a scale capable of distinguishing 1/1000th of an ounce, I will have quite a
wide scale—a continuum—of potential values between pounds. The individual boxes could have
any value between 0.000 and 1.999 pounds. The scale of these measurements is fine enough to
be analyzed with powerful statistical tools made for continuous data.
7. What Can I Do with Continuous Data that I Can't Do with Discrete?
Not all data points are equally valuable, and you can glean a lot more insight
from 100 points of continuous data than you can from 100 points of attribute or
count data. How does this finer degree of detail affect what we can learn from a
set of data? It's easy to see.
Let's start with the simplest kind of data, attribute data that rates a the weight of
a cereal box as good or bad. For 100 boxes of cereal, any that are under 1 pound
are classified as bad, so each box can have one of only two values.
We can create a bar chart or a pie chart to visualize this data, and that's about it:
8. If we bump up the precision of our scale to differentiate between boxes that are over and under 1
pound, we can put each box of cereal into one of three categories. Here's what that looks like in a
pie chart:
9.
10. This gives us a little bit more insight—we now see that we are
overfilling more boxes than we are underfilling—but there is still a
very limited amount of information we can extract from the data.
If we measure each box to the nearest ounce, we open the door to
using methods for continuous data, and get a still better picture of
what's going on. We can see that, on average, the boxes weigh 1
pound. But there's high variability, with a standard deviation of 0.9.
There's also a wide range in our data, with observed values from 12
to 20 ounces:
11.
12. If I measure the boxes with a scale capable of differentiating
thousandths of an ounce, more options for analysis open up.
For example, now that the data are fine enough to distinguish
half-ounces (and then some), I can perform a capability
analysis to see if my process is even capable of consistently
delivering boxes that fall between 16 and 16.5 ounces. I'll use
the Assistant in Minitab to do it, selecting Assistant >
Capability Analysis
13.
14. The analysis has revealed that my process isn't
capable of meeting specifications. Looks like I have
some work to do...but the Assistant also gives me an
I-MR control chart, which reveals where and when
my process is going out of spec, so I can start looking
for root causes.
15.
16. If I were only looking at attribute data, I might think my
process was just fine. Continuous data has allowed me
to see that I can make the process better, and given me a
rough idea where to start. By making changes and
collecting additional continuous data, I'll be able to
conduct hypothesis tests, analyze sources of variances,
and more.
17. Does this mean discrete data is no good at all? Of
course not—we are concerned with many things that
can't be measured effectively except through discrete
data, such as opinions and demographics. But when you
can get it, continuous data is the better option. The table
below lays out the reasons why.
Some Final Advantages of Continuous Over Discrete
Data
18. Continuous Data Discrete Data
Inferences can be made with few data points—valid analysis can be performed with
small samples.
More data points (a larger sample) needed to make an equivalent inference.
Smaller samples are usually less expensive to gather Larger samples are usually more expensive to gather.
High sensitivity (how close to or far from a target) Low sensitivity (good/bad, pass/fail)
Variety of analysis options that can offer insight into the sources of variation Limited options for analysis, with little indication of sources of variation
19. I hope this very basic overview has effectively
illustrated why you should opt for continuous data
over discrete data whenever you can get it.