SlideShare a Scribd company logo
Dr. Carlos Rodríguez Contreras
UNAM
Statistical Science
Descriptive statistics
– Collecting, presenting, and describing
data
Inferential statistics
– Drawing conclusions and/or making
decisions concerning a population based
only on sample data
Descriptive Statistics
 Collect data
e.g., Survey, Observation, Experiments
 Present data
e.g., Charts and graphs
 Characterize data
e.g., Sample mean
n
xi
Data Sources
Primary
Data Collection
Secondary
Data Compilation
Observation
Experimentation
Survey
Print or Electronic
Data
Qualitative
(Categorical)
Quantitative
(Numerical)
Discrete Continuous
Data Types
Examples:
 Marital Status
 Political Party
 Eye Color
(Defned categories)
Examples:
 Number of Children
 Defects per hour
(Counted items)
Examples:
 Weight
 Voltage
(Measured characteristics)
Data Types
 Time Series Data
– Ordered data values observed over time.
 Cross Section Data
– Data values observed at a fxed point in time.
Data Types
Sales (in £1000’s)
2013 2014 2015 2016
London 435 460 475 490
York 320 345 375 395
Bristol 405 390 410 395
Kent 260 270 285 280
Time
Series
Data
Cross
Section
Data
Data Measurement Levels
Ratio/Interval Data
Ordinal Data
Nominal Data
Highest Level
Complete Analysis
Higher Level
Mid-level Analysis
Lowest Level
Basic Analysis
Categorical Codes ID
Numbers Category
Names
Rankings
Ordered Categories
Measurements
Data Measurement Levels
Attributes of NOIR Data Types
Nominal scalesNominal scales
 A nominal scale of measurement only indicates the
category of a variable that a case falls into: it expresses
qualitative diferences but not quantitative diferences, and
as such data at this level are often referred to as qualitative
data.
 A nominal scale only allows us to say that one case may be
diferent from another
 No ‘natural’ order to the arrangement of categories
 Often identifed by ‘Other’ category
Ordinal scalesOrdinal scales
 Consider that we operationalise age so that we measure its
variation by recording whether someone is:
young (18 years or less),
middle aged (19-60 years), or
old (over 60 years)
 We can say one case may be diferent to another in terms of
age, and
 We can say one case may have more or less age than another,
but
 We cannot say how much more age one case may have as
compared to another
Ordinal scales (cont.)Ordinal scales (cont.)
 An ordinal level of measurement, in addition to the function
of classifcation, allows cases to be ordered by degree
according to measurements of the variable.
 But we cannot quantify the amount of diference – there is
no unit of measurement like years or dollars.
 Ordinal scales are particularly common when measuring
attitude or satisfaction in opinion surveys.
 Yes/No responses are often ordinal e.g. “Do you enjoy
statistics (Yes/No)?”
 we can say that someone who answers ‘Yes’ has more enjoyment of
statistics than someone who responds ‘No’, but
 we can’t say how much more enjoyment of statistics they have.
Interval/ratio scalesInterval/ratio scales
 The key characteristic of an interval/ratio scale is that it has
units measuring intervals of equal distance between values
on the scale.
 Consider the variable ‘age’. This can be defned
operationally as ‘age in whole years at last birthday’.
 Having defned age this way our measurements of people’s
age will allow us to say:
 one case may be diferent to another in terms of age, and
 one case may have more or less age than another, and
 how much more age one case may have as compared to another.
Types of Data
In all scientifc disciplines,
we are obliged to
understand the Stevens’
data classifcation...
Types of Data
Although Steven's taxonomy
has permeated all scientifc
disciplines, we still need to
characterize data to match the
way the digital computers work.
 When we look at many variables, some may
simply record categories used to group the
data.
 In R we will use factors to store these
variables.
 An example might be the browser a user has
used to view a web site, as gleaned from a web
site log.
factor datafactor data
 Some categorical data are factors, but others
are really just identifers, and are not used for
grouping.
 An example might be a user’s IP address. This is
basically a unique code identifying a computer,
like an address.
 While both factor and categorical data are
“nominal” we keep the distinction as we will
interact with such data in R diferently.
character datacharacter data
 Discrete data comes from measurements
where there are essentially only distinct
and separate possible values that can be
counted.
 For example, the number of visits a person
makes to our web site will always be
integer data, as will other counting data.
discrete datadiscrete data
 Continuous data is that which could conceivably
come from a continuum of values.
 The recording of the time in milliseconds of a visit
to a web site might be such data.
 A useful distinction is that for discrete data we
expect that cases will share values, whereas for
continuous data this will be impossible, or at least
very unlikely.
 There is no fne line though.
continuous datacontinuous data
 Time data can be considered continuous or discrete
depending on resolution, for computers there are often
separate ways entirely to handle date and time data.
 People in fnance want millisecond data, but over long
time ranges this recording can literally run out of
numbers on a computer.
 Astronomers need precise measurements for durations
down to leap seconds.
 R has several ways to work with such data, that go
beyond just storing the values as simple numbers.
date and time datadate and time data
Data types in R
 To organise data, R assigns a class
attribute to most R objects and otherwise
creates an implicit class for an object.
 The class of an object is used to determine
how it should be printed.
 The class function will return the class of
an object.
 The two main classes for numeric data are numeric and
integer, though there are others, e.g. complex. Most of the
time numbers are numeric.
 To make an integer value, we need to work a bit: we can
preallocate space for an integer data set of length n with
integer(n); we can use the sufx L to force a number to be
treated as an integer (e.g., 1L); we can coerce numeric values
of integer type through the as.integer function.
 Numeric values are stored using foating point representation.
 This format can store much larger integer values and has a
much wider range of numbers it can represent.
Numeric data typesNumeric data types
 Character data. Character data is created
just by quoting values.
 Quotes can be matching pairs of single or
double quotes, though double quotes are
preferred and used to display character
values.
 Within a quoted value a quote symbol can be
used, but it must be escaped by prefxing it
with a backslash.
Categorical data typesCategorical data types
 Factors. A factor can be made from a character
vector with the factor function.
 The levels of a factor are a list of all possible
categories for the data in the factor.
 They need not all be represented in a particular factor,
but when we create a factor through factor the default
choice is simply the collection of unique values.
 The current levels of a factor are returned by the
levels function.
Categorical data typesCategorical data types
 Working with dates and times is made more
convenient using a special data type.
 While R has some built-in features to work with
dates and times, the lubridate package simplifes
the usage.
 This package introduces the notion of “instants,”
“durations,” and “intervals” of time.
 We concern ourselves with some basics, learning
how to make and manipulate instants of time.
Date and time typesDate and time types
 R uses TRUE and FALSE to represent Boolean or logical data.
 Logical data is produced by many R functions, for example the
“is” functions.
 Most common, is the use of the comparison operators—<, <=,
==, !=, >=, > — to produce logical values.
 The operators ! (for not), & (for and), and | (for or) can be used
to combine values.
 The functions any, all, which, and %in% are useful functions for
working with logical vectors. The any and all functions answer
whether any of the values are TRUE or if all the values are true.
Logical dataLogical data
Data Types

More Related Content

What's hot

Data For Datamining
Data For DataminingData For Datamining
Data For Datamining
DataminingTools Inc
 
Data processing
Data processingData processing
Data processing
Joseph Lagod
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
Computer Science Journals
 
Data preparation
Data preparationData preparation
Data preparation
Harry Potter
 
Practical dimensions
Practical dimensionsPractical dimensions
Practical dimensionstholem
 
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3
DanWooster1
 
Datapreprocessingppt
DatapreprocessingpptDatapreprocessingppt
DatapreprocessingpptShree Hari
 
Data modelling interview question
Data modelling interview questionData modelling interview question
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
Anshik Bansal
 
How to start for machine learning career
How to start for machine learning careerHow to start for machine learning career
How to start for machine learning career
BigAnalytics .me
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11Bonnie Green
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
Derek Kane
 

What's hot (16)

Data For Datamining
Data For DataminingData For Datamining
Data For Datamining
 
Datapreprocessing
DatapreprocessingDatapreprocessing
Datapreprocessing
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 
Characterization
CharacterizationCharacterization
Characterization
 
Data processing
Data processingData processing
Data processing
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
 
Data preparation
Data preparationData preparation
Data preparation
 
Practical dimensions
Practical dimensionsPractical dimensions
Practical dimensions
 
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3
 
Datapreprocessingppt
DatapreprocessingpptDatapreprocessingppt
Datapreprocessingppt
 
Data modelling interview question
Data modelling interview questionData modelling interview question
Data modelling interview question
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
How to start for machine learning career
How to start for machine learning careerHow to start for machine learning career
How to start for machine learning career
 
1234
12341234
1234
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 

Similar to Data Types

Data .pptx
Data .pptxData .pptx
Data .pptx
ssuserbda195
 
Data Mining DataLecture Notes for Chapter 2Introduc
Data Mining DataLecture Notes for Chapter 2IntroducData Mining DataLecture Notes for Chapter 2Introduc
Data Mining DataLecture Notes for Chapter 2Introduc
OllieShoresna
 
Statistics (All About Data)
Statistics (All About Data)Statistics (All About Data)
Statistics (All About Data)
Glenn Rivera
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
LakshmiSarvani6
 
Introduction to Data science in syllabus of machine intelligence in data science
Introduction to Data science in syllabus of machine intelligence in data scienceIntroduction to Data science in syllabus of machine intelligence in data science
Introduction to Data science in syllabus of machine intelligence in data science
ApurvaLaddha
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
DrMAlagupriyasafiq
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
DrMAlagupriyasafiq
 
data science with python_UNIT 2_full notes.pdf
data science with python_UNIT 2_full notes.pdfdata science with python_UNIT 2_full notes.pdf
data science with python_UNIT 2_full notes.pdf
mukeshgarg02
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
SudhanshiBakre1
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
LamineKaba6
 
Value Added
Value AddedValue Added
Value Added
Kevlin Henney
 
Introduction to Data (1).pptx
Introduction to Data (1).pptxIntroduction to Data (1).pptx
Introduction to Data (1).pptx
SubhamitaKanungo
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
TimKasse
 
Categorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docxCategorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docx
keturahhazelhurst
 
Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3
OllieShoresna
 
Types of Data, Key Concept
Types of Data, Key ConceptTypes of Data, Key Concept
Types of Data, Key Concept
Long Beach City College
 
Data Mining DataLecture Notes for Chapter 2Introduc.docx
Data Mining DataLecture Notes for Chapter 2Introduc.docxData Mining DataLecture Notes for Chapter 2Introduc.docx
Data Mining DataLecture Notes for Chapter 2Introduc.docx
whittemorelucilla
 
EDA
EDAEDA
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019
Richard Zijdeman
 
Exploring Data
Exploring DataExploring Data
Exploring Data
Datamining Tools
 

Similar to Data Types (20)

Data .pptx
Data .pptxData .pptx
Data .pptx
 
Data Mining DataLecture Notes for Chapter 2Introduc
Data Mining DataLecture Notes for Chapter 2IntroducData Mining DataLecture Notes for Chapter 2Introduc
Data Mining DataLecture Notes for Chapter 2Introduc
 
Statistics (All About Data)
Statistics (All About Data)Statistics (All About Data)
Statistics (All About Data)
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
Introduction to Data science in syllabus of machine intelligence in data science
Introduction to Data science in syllabus of machine intelligence in data scienceIntroduction to Data science in syllabus of machine intelligence in data science
Introduction to Data science in syllabus of machine intelligence in data science
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
 
data science with python_UNIT 2_full notes.pdf
data science with python_UNIT 2_full notes.pdfdata science with python_UNIT 2_full notes.pdf
data science with python_UNIT 2_full notes.pdf
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
 
Value Added
Value AddedValue Added
Value Added
 
Introduction to Data (1).pptx
Introduction to Data (1).pptxIntroduction to Data (1).pptx
Introduction to Data (1).pptx
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
 
Categorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docxCategorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docx
 
Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3
 
Types of Data, Key Concept
Types of Data, Key ConceptTypes of Data, Key Concept
Types of Data, Key Concept
 
Data Mining DataLecture Notes for Chapter 2Introduc.docx
Data Mining DataLecture Notes for Chapter 2Introduc.docxData Mining DataLecture Notes for Chapter 2Introduc.docx
Data Mining DataLecture Notes for Chapter 2Introduc.docx
 
EDA
EDAEDA
EDA
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 

Recently uploaded

Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 

Recently uploaded (20)

Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 

Data Types

  • 1. Dr. Carlos Rodríguez Contreras UNAM
  • 2. Statistical Science Descriptive statistics – Collecting, presenting, and describing data Inferential statistics – Drawing conclusions and/or making decisions concerning a population based only on sample data
  • 3. Descriptive Statistics  Collect data e.g., Survey, Observation, Experiments  Present data e.g., Charts and graphs  Characterize data e.g., Sample mean n xi
  • 4. Data Sources Primary Data Collection Secondary Data Compilation Observation Experimentation Survey Print or Electronic
  • 5. Data Qualitative (Categorical) Quantitative (Numerical) Discrete Continuous Data Types Examples:  Marital Status  Political Party  Eye Color (Defned categories) Examples:  Number of Children  Defects per hour (Counted items) Examples:  Weight  Voltage (Measured characteristics)
  • 6. Data Types  Time Series Data – Ordered data values observed over time.  Cross Section Data – Data values observed at a fxed point in time.
  • 7. Data Types Sales (in £1000’s) 2013 2014 2015 2016 London 435 460 475 490 York 320 345 375 395 Bristol 405 390 410 395 Kent 260 270 285 280 Time Series Data Cross Section Data
  • 8. Data Measurement Levels Ratio/Interval Data Ordinal Data Nominal Data Highest Level Complete Analysis Higher Level Mid-level Analysis Lowest Level Basic Analysis Categorical Codes ID Numbers Category Names Rankings Ordered Categories Measurements
  • 10. Attributes of NOIR Data Types
  • 11. Nominal scalesNominal scales  A nominal scale of measurement only indicates the category of a variable that a case falls into: it expresses qualitative diferences but not quantitative diferences, and as such data at this level are often referred to as qualitative data.  A nominal scale only allows us to say that one case may be diferent from another  No ‘natural’ order to the arrangement of categories  Often identifed by ‘Other’ category
  • 12. Ordinal scalesOrdinal scales  Consider that we operationalise age so that we measure its variation by recording whether someone is: young (18 years or less), middle aged (19-60 years), or old (over 60 years)  We can say one case may be diferent to another in terms of age, and  We can say one case may have more or less age than another, but  We cannot say how much more age one case may have as compared to another
  • 13. Ordinal scales (cont.)Ordinal scales (cont.)  An ordinal level of measurement, in addition to the function of classifcation, allows cases to be ordered by degree according to measurements of the variable.  But we cannot quantify the amount of diference – there is no unit of measurement like years or dollars.  Ordinal scales are particularly common when measuring attitude or satisfaction in opinion surveys.  Yes/No responses are often ordinal e.g. “Do you enjoy statistics (Yes/No)?”  we can say that someone who answers ‘Yes’ has more enjoyment of statistics than someone who responds ‘No’, but  we can’t say how much more enjoyment of statistics they have.
  • 14. Interval/ratio scalesInterval/ratio scales  The key characteristic of an interval/ratio scale is that it has units measuring intervals of equal distance between values on the scale.  Consider the variable ‘age’. This can be defned operationally as ‘age in whole years at last birthday’.  Having defned age this way our measurements of people’s age will allow us to say:  one case may be diferent to another in terms of age, and  one case may have more or less age than another, and  how much more age one case may have as compared to another.
  • 15. Types of Data In all scientifc disciplines, we are obliged to understand the Stevens’ data classifcation...
  • 17. Although Steven's taxonomy has permeated all scientifc disciplines, we still need to characterize data to match the way the digital computers work.
  • 18.  When we look at many variables, some may simply record categories used to group the data.  In R we will use factors to store these variables.  An example might be the browser a user has used to view a web site, as gleaned from a web site log. factor datafactor data
  • 19.  Some categorical data are factors, but others are really just identifers, and are not used for grouping.  An example might be a user’s IP address. This is basically a unique code identifying a computer, like an address.  While both factor and categorical data are “nominal” we keep the distinction as we will interact with such data in R diferently. character datacharacter data
  • 20.  Discrete data comes from measurements where there are essentially only distinct and separate possible values that can be counted.  For example, the number of visits a person makes to our web site will always be integer data, as will other counting data. discrete datadiscrete data
  • 21.  Continuous data is that which could conceivably come from a continuum of values.  The recording of the time in milliseconds of a visit to a web site might be such data.  A useful distinction is that for discrete data we expect that cases will share values, whereas for continuous data this will be impossible, or at least very unlikely.  There is no fne line though. continuous datacontinuous data
  • 22.  Time data can be considered continuous or discrete depending on resolution, for computers there are often separate ways entirely to handle date and time data.  People in fnance want millisecond data, but over long time ranges this recording can literally run out of numbers on a computer.  Astronomers need precise measurements for durations down to leap seconds.  R has several ways to work with such data, that go beyond just storing the values as simple numbers. date and time datadate and time data
  • 24.  To organise data, R assigns a class attribute to most R objects and otherwise creates an implicit class for an object.  The class of an object is used to determine how it should be printed.  The class function will return the class of an object.
  • 25.  The two main classes for numeric data are numeric and integer, though there are others, e.g. complex. Most of the time numbers are numeric.  To make an integer value, we need to work a bit: we can preallocate space for an integer data set of length n with integer(n); we can use the sufx L to force a number to be treated as an integer (e.g., 1L); we can coerce numeric values of integer type through the as.integer function.  Numeric values are stored using foating point representation.  This format can store much larger integer values and has a much wider range of numbers it can represent. Numeric data typesNumeric data types
  • 26.  Character data. Character data is created just by quoting values.  Quotes can be matching pairs of single or double quotes, though double quotes are preferred and used to display character values.  Within a quoted value a quote symbol can be used, but it must be escaped by prefxing it with a backslash. Categorical data typesCategorical data types
  • 27.  Factors. A factor can be made from a character vector with the factor function.  The levels of a factor are a list of all possible categories for the data in the factor.  They need not all be represented in a particular factor, but when we create a factor through factor the default choice is simply the collection of unique values.  The current levels of a factor are returned by the levels function. Categorical data typesCategorical data types
  • 28.  Working with dates and times is made more convenient using a special data type.  While R has some built-in features to work with dates and times, the lubridate package simplifes the usage.  This package introduces the notion of “instants,” “durations,” and “intervals” of time.  We concern ourselves with some basics, learning how to make and manipulate instants of time. Date and time typesDate and time types
  • 29.  R uses TRUE and FALSE to represent Boolean or logical data.  Logical data is produced by many R functions, for example the “is” functions.  Most common, is the use of the comparison operators—<, <=, ==, !=, >=, > — to produce logical values.  The operators ! (for not), & (for and), and | (for or) can be used to combine values.  The functions any, all, which, and %in% are useful functions for working with logical vectors. The any and all functions answer whether any of the values are TRUE or if all the values are true. Logical dataLogical data