Data Types

Dr. Carlos Rodríguez Contreras
UNAM

Statistical Science
Descriptive statistics
– Collecting, presenting, and describing
data
Inferential statistics
– Drawing conclusions and/or making
decisions concerning a population based
only on sample data

Descriptive Statistics
 Collect data
e.g., Survey, Observation, Experiments
 Present data
e.g., Charts and graphs
 Characterize data
e.g., Sample mean
n
xi

Data Sources
Primary
Data Collection
Secondary
Data Compilation
Observation
Experimentation
Survey
Print or Electronic

Data
Qualitative
(Categorical)
Quantitative
(Numerical)
Discrete Continuous
Data Types
Examples:
 Marital Status
 Political Party
 Eye Color
(Defned categories)
Examples:
 Number of Children
 Defects per hour
(Counted items)
Examples:
 Weight
 Voltage
(Measured characteristics)

Data Types
 Time Series Data
– Ordered data values observed over time.
 Cross Section Data
– Data values observed at a fxed point in time.

Data Types
Sales (in £1000’s)
2013 2014 2015 2016
London 435 460 475 490
York 320 345 375 395
Bristol 405 390 410 395
Kent 260 270 285 280
Time
Series
Data
Cross
Section
Data

Data Measurement Levels
Ratio/Interval Data
Ordinal Data
Nominal Data
Highest Level
Complete Analysis
Higher Level
Mid-level Analysis
Lowest Level
Basic Analysis
Categorical Codes ID
Numbers Category
Names
Rankings
Ordered Categories
Measurements

Nominal scalesNominal scales
 A nominal scale of measurement only indicates the
category of a variable that a case falls into: it expresses
qualitative diferences but not quantitative diferences, and
as such data at this level are often referred to as qualitative
data.
 A nominal scale only allows us to say that one case may be
diferent from another
 No ‘natural’ order to the arrangement of categories
 Often identifed by ‘Other’ category

Ordinal scalesOrdinal scales
 Consider that we operationalise age so that we measure its
variation by recording whether someone is:
young (18 years or less),
middle aged (19-60 years), or
old (over 60 years)
 We can say one case may be diferent to another in terms of
age, and
 We can say one case may have more or less age than another,
but
 We cannot say how much more age one case may have as
compared to another

Ordinal scales (cont.)Ordinal scales (cont.)
 An ordinal level of measurement, in addition to the function
of classifcation, allows cases to be ordered by degree
according to measurements of the variable.
 But we cannot quantify the amount of diference – there is
no unit of measurement like years or dollars.
 Ordinal scales are particularly common when measuring
attitude or satisfaction in opinion surveys.
 Yes/No responses are often ordinal e.g. “Do you enjoy
statistics (Yes/No)?”
 we can say that someone who answers ‘Yes’ has more enjoyment of
statistics than someone who responds ‘No’, but
 we can’t say how much more enjoyment of statistics they have.

Interval/ratio scalesInterval/ratio scales
 The key characteristic of an interval/ratio scale is that it has
units measuring intervals of equal distance between values
on the scale.
 Consider the variable ‘age’. This can be defned
operationally as ‘age in whole years at last birthday’.
 Having defned age this way our measurements of people’s
age will allow us to say:
 one case may be diferent to another in terms of age, and
 one case may have more or less age than another, and
 how much more age one case may have as compared to another.

Types of Data
In all scientifc disciplines,
we are obliged to
understand the Stevens’
data classifcation...

Although Steven's taxonomy
has permeated all scientifc
disciplines, we still need to
characterize data to match the
way the digital computers work.

 When we look at many variables, some may
simply record categories used to group the
data.
 In R we will use factors to store these
variables.
 An example might be the browser a user has
used to view a web site, as gleaned from a web
site log.
factor datafactor data

 Some categorical data are factors, but others
are really just identifers, and are not used for
grouping.
 An example might be a user’s IP address. This is
basically a unique code identifying a computer,
like an address.
 While both factor and categorical data are
“nominal” we keep the distinction as we will
interact with such data in R diferently.
character datacharacter data

 Discrete data comes from measurements
where there are essentially only distinct
and separate possible values that can be
counted.
 For example, the number of visits a person
makes to our web site will always be
integer data, as will other counting data.
discrete datadiscrete data

 Continuous data is that which could conceivably
come from a continuum of values.
 The recording of the time in milliseconds of a visit
to a web site might be such data.
 A useful distinction is that for discrete data we
expect that cases will share values, whereas for
continuous data this will be impossible, or at least
very unlikely.
 There is no fne line though.
continuous datacontinuous data

 Time data can be considered continuous or discrete
depending on resolution, for computers there are often
separate ways entirely to handle date and time data.
 People in fnance want millisecond data, but over long
time ranges this recording can literally run out of
numbers on a computer.
 Astronomers need precise measurements for durations
down to leap seconds.
 R has several ways to work with such data, that go
beyond just storing the values as simple numbers.
date and time datadate and time data

 To organise data, R assigns a class
attribute to most R objects and otherwise
creates an implicit class for an object.
 The class of an object is used to determine
how it should be printed.
 The class function will return the class of
an object.

 The two main classes for numeric data are numeric and
integer, though there are others, e.g. complex. Most of the
time numbers are numeric.
 To make an integer value, we need to work a bit: we can
preallocate space for an integer data set of length n with
integer(n); we can use the sufx L to force a number to be
treated as an integer (e.g., 1L); we can coerce numeric values
of integer type through the as.integer function.
 Numeric values are stored using foating point representation.
 This format can store much larger integer values and has a
much wider range of numbers it can represent.
Numeric data typesNumeric data types

 Character data. Character data is created
just by quoting values.
 Quotes can be matching pairs of single or
double quotes, though double quotes are
preferred and used to display character
values.
 Within a quoted value a quote symbol can be
used, but it must be escaped by prefxing it
with a backslash.
Categorical data typesCategorical data types

 Factors. A factor can be made from a character
vector with the factor function.
 The levels of a factor are a list of all possible
categories for the data in the factor.
 They need not all be represented in a particular factor,
but when we create a factor through factor the default
choice is simply the collection of unique values.
 The current levels of a factor are returned by the
levels function.
Categorical data typesCategorical data types

 Working with dates and times is made more
convenient using a special data type.
 While R has some built-in features to work with
dates and times, the lubridate package simplifes
the usage.
 This package introduces the notion of “instants,”
“durations,” and “intervals” of time.
 We concern ourselves with some basics, learning
how to make and manipulate instants of time.
Date and time typesDate and time types

 R uses TRUE and FALSE to represent Boolean or logical data.
 Logical data is produced by many R functions, for example the
“is” functions.
 Most common, is the use of the comparison operators—<, <=,
==, !=, >=, > — to produce logical values.
 The operators ! (for not), & (for and), and | (for or) can be used
to combine values.
 The functions any, all, which, and %in% are useful functions for
working with logical vectors. The any and all functions answer
whether any of the values are TRUE or if all the values are true.
Logical dataLogical data

Data Types

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Data Types

Similar to Data Types (20)

Recently uploaded

Recently uploaded (20)

Data Types