SlideShare a Scribd company logo
1 of 27
Unit 3: Data Quality and
Preprocessing
Mr. V. H. Kondekar
E&TC Dept., WIT, Solapur
Course Outcomes
 ET424.1 Discuss challenges in big data analytics and
Describe fundamental techniques and principles for data
analytics.
 ET424.2 Identify, organize and operate on the datasets to
compute statistics for data analysis
 ET424.3 Select and implement appropriate data
visualizations to clearly communicate analytic insights.
 ET424.4 Apply different preprocessing techniques for data
quality enhnacement
 ET424.5 Use the tools and techniques to apply different
algorithms and methodologies
Data Quality ?
 Depending on the type of data scale, different data quality and
preprocessing techniques can be used.
The nature of the application domain,
Human error
The integration of different data sets (say, from different
devices),
The methodology used to collect data
Generate data sets that are
Noisy,
Inconsistent,
Contain duplicate records.
Why Preprocessing?
 When these data are used by algorithms that learn from
data – ML algorithms – the analysis problem can look
more complex than it really is if there is no data pre-
processing.
 This increases the time required for the induction of
assumptions or models and resulting in models that do
not capture the true patterns present in the data set.
 The elimination or even just the reduction of these
problems can lead to an improvement in the quality of
knowledge extracted by data analysis processes.
What affects data Quality?
Data quality is important and can be affected by internal
and external factors.
 Internal factors can be linked to the measurement
process and the collection of information through the
attributes chosen.
 External factors are related to faults in the data
collection process, and can involve the absence of
values for some attributes and the voluntary or
involuntary addition of errors to others.
What are the main problems affecting data
quality are?
The main problems affecting data quality are
associated with
Missing values
Inconsistency
Redundancy
Noise
outliers
Missing Values?
Missing Values?
 There are several causes of missing values, among them:
attributes values only recorded some time after the start of data collection,
so that early records do not have a value
the value of an attribute being unknown at time of collection
distraction, misunderstanding or refusal at time of collection
attribute not required for particular objects
non-existence of a value
fault in the data collection device
cost or difficulty of assigning a class label to an object in classification
problems.
How to deal with missing values?
Ignore missing values:
– Use for each object only the attributes with values, without paying
attention to missing values. This does not require any change in the
modeling
algorithm used, but the distance function should ignore the values
of
attributes with at least one missing value;
– Modify a learning algorithm to allow it to accept and work with
missing
values.
Remove objects: Use only those objects with values for all
attributes.
How to estimate values?
Several methods can be used:
• Fill with a location value: the mean or median for quantitative and ordinal
attributes, and the mode for nominal values. The mean is just the average
of the values and the mode is the quantitative value that appears most
often in the attribute. The median is the value that is greater than half of
the values and lower than the remaining half.
• For classification tasks, we can use the previous method, namely using
only instances from the same class to calculate the location statistic. In
other words, if we intend to fill the value of attribute at of instance i that
belongs to class C1, we will use only instances from the class C1 that do
not have missing values in the at attribute.
• A learning algorithm can be used to as a prediction model giving a
replacement value for one that is missing in a particular attribute. The
learning algorithm uses all other attributes as predictors and the one to be
filled as the target.
Redundant Data?
Inconsistent Data?
Noisy Data?
Outliers Data?
How to detect Outliers?
A simple yet effective method to detect outliers in
quantitative attributes is based in the interquartile range.
Let Q1 and Q3 be the first quartile and the
third quartile, respectively.
The interquartile range is given by IQ = Q3 − Q1.
Values below Q1 − 1.5 × IQ or above Q3 + 1.5 × IQ are
considered too far away from central values to be
reasonable.
How to detect Outliers?
A simple yet effective method to detect outliers in
quantitative attributes is based in the interquartile range.
Let Q1 and Q3 be the first quartile and the
third quartile, respectively.
The interquartile range is given by IQ = Q3 − Q1.
Values below Q1 − 1.5 × IQ or above Q3 + 1.5 × IQ are
considered too far away from central values to be
reasonable.
Classification: Nominal, Ordinal, Interval,
Ratio
Data Element
Nominal:
A scale of
measurement
where levels
are distinct but
do not vary in
magnitude.
Ordinal: A scale of
measurement where
levels vary in order of
magnitude but equal
intervals between
levels cannot be
assumed.
Interval: The interval
level of measurement
has the characteristics
of distinct levels,
ordering in magnitude,
and equal intervals.
Equal intervals are
obtained if equivalent
differences between
measurements
represent the same
amount of difference in
the property being
measured.
Ratio: The ratio level
of measurement has
characteristics of
distinct levels, ordering
in magnitude, equal
intervals, and an
absolute zero.
A measurement has an
absolute zero when a
measurement of zero
represents the
absence of the
property being
measured.
Contrasting Nominal, Ordinal,
Interval and Ratio
Scale has levels that are: Nominal Ordinal Interval Ratio
Distinctive X X X X
Ordered X X X
Equally spaced X X
Has an absolute zero X
Qualitative /
categorical
Quantitetive/
numerical
Converting to a Different Scale Type?
Converting Nominal to Relative
Since the nominal scale does not assume an order
between its values, to keep this information, nominal
values should be converted to relative or binary
values.
The most common conversion is called “1-of-n”, also
known as canonical or one-attribute- per-value
conversion, which transforms n values of a nominal
attribute into n binary attributes. A binary attribute has
only two values, 0 or 1.
Converting to a Different Scale Type?
Converting to a Different Scale Type?
Converting to a Different Scale Type?
Converting to a Different Scale Type?
Converting to a Different Scale Type?
Converting to a Different Scale Type?
Converting to a Different Scale Type?
Unit 3 Data Quality and Preprocessing .pptx

More Related Content

Similar to Unit 3 Data Quality and Preprocessing .pptx

measurement and scaling
measurement and scalingmeasurement and scaling
measurement and scaling
Ashraf Hlouh
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
DurgaDevi310087
 

Similar to Unit 3 Data Quality and Preprocessing .pptx (20)

measurement and scaling
measurement and scalingmeasurement and scaling
measurement and scaling
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
 
Measurment and scaling
Measurment and scalingMeasurment and scaling
Measurment and scaling
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
Introduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersIntroduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse Researchers
 
Literature Survey: Clustering Technique
Literature Survey: Clustering TechniqueLiterature Survey: Clustering Technique
Literature Survey: Clustering Technique
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
1.2 types of data
1.2 types of data1.2 types of data
1.2 types of data
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Construction of composite index: process & methods
Construction of composite index:  process & methodsConstruction of composite index:  process & methods
Construction of composite index: process & methods
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
EDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxEDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptx
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 

More from vipulkondekar

Technology & business transformation and Career in UK.pptx
Technology & business transformation and Career in UK.pptxTechnology & business transformation and Career in UK.pptx
Technology & business transformation and Career in UK.pptx
vipulkondekar
 
Machine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine LearningMachine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine Learning
vipulkondekar
 
properties of the task environment in artificial intelligence system
properties of the task environment in artificial intelligence systemproperties of the task environment in artificial intelligence system
properties of the task environment in artificial intelligence system
vipulkondekar
 

More from vipulkondekar (12)

Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
 
C Introduction and bascis of high level programming
C Introduction and bascis of high level programmingC Introduction and bascis of high level programming
C Introduction and bascis of high level programming
 
Analyzing patterns and statistics in data.pptx
Analyzing patterns and statistics in data.pptxAnalyzing patterns and statistics in data.pptx
Analyzing patterns and statistics in data.pptx
 
Technology & business transformation and Career in UK.pptx
Technology & business transformation and Career in UK.pptxTechnology & business transformation and Career in UK.pptx
Technology & business transformation and Career in UK.pptx
 
Machine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine LearningMachine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine Learning
 
Min Max Artificial Intelligence algorithm
Min Max Artificial Intelligence algorithmMin Max Artificial Intelligence algorithm
Min Max Artificial Intelligence algorithm
 
Cyclic Redundancy check approach for Error Detection
Cyclic Redundancy check approach for Error DetectionCyclic Redundancy check approach for Error Detection
Cyclic Redundancy check approach for Error Detection
 
Embedded System serial Communication.ppt
Embedded System serial Communication.pptEmbedded System serial Communication.ppt
Embedded System serial Communication.ppt
 
properties of the task environment in artificial intelligence system
properties of the task environment in artificial intelligence systemproperties of the task environment in artificial intelligence system
properties of the task environment in artificial intelligence system
 
INTELLIGENT AGENTS.pptx
INTELLIGENT AGENTS.pptxINTELLIGENT AGENTS.pptx
INTELLIGENT AGENTS.pptx
 
AI 1.pptx
AI 1.pptxAI 1.pptx
AI 1.pptx
 
DC ISE QP E&TC.doc
DC ISE QP E&TC.docDC ISE QP E&TC.doc
DC ISE QP E&TC.doc
 

Recently uploaded

Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
pritamlangde
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 

Unit 3 Data Quality and Preprocessing .pptx

  • 1. Unit 3: Data Quality and Preprocessing Mr. V. H. Kondekar E&TC Dept., WIT, Solapur
  • 2. Course Outcomes  ET424.1 Discuss challenges in big data analytics and Describe fundamental techniques and principles for data analytics.  ET424.2 Identify, organize and operate on the datasets to compute statistics for data analysis  ET424.3 Select and implement appropriate data visualizations to clearly communicate analytic insights.  ET424.4 Apply different preprocessing techniques for data quality enhnacement  ET424.5 Use the tools and techniques to apply different algorithms and methodologies
  • 3. Data Quality ?  Depending on the type of data scale, different data quality and preprocessing techniques can be used. The nature of the application domain, Human error The integration of different data sets (say, from different devices), The methodology used to collect data Generate data sets that are Noisy, Inconsistent, Contain duplicate records.
  • 4. Why Preprocessing?  When these data are used by algorithms that learn from data – ML algorithms – the analysis problem can look more complex than it really is if there is no data pre- processing.  This increases the time required for the induction of assumptions or models and resulting in models that do not capture the true patterns present in the data set.  The elimination or even just the reduction of these problems can lead to an improvement in the quality of knowledge extracted by data analysis processes.
  • 5. What affects data Quality? Data quality is important and can be affected by internal and external factors.  Internal factors can be linked to the measurement process and the collection of information through the attributes chosen.  External factors are related to faults in the data collection process, and can involve the absence of values for some attributes and the voluntary or involuntary addition of errors to others.
  • 6. What are the main problems affecting data quality are? The main problems affecting data quality are associated with Missing values Inconsistency Redundancy Noise outliers
  • 8. Missing Values?  There are several causes of missing values, among them: attributes values only recorded some time after the start of data collection, so that early records do not have a value the value of an attribute being unknown at time of collection distraction, misunderstanding or refusal at time of collection attribute not required for particular objects non-existence of a value fault in the data collection device cost or difficulty of assigning a class label to an object in classification problems.
  • 9. How to deal with missing values? Ignore missing values: – Use for each object only the attributes with values, without paying attention to missing values. This does not require any change in the modeling algorithm used, but the distance function should ignore the values of attributes with at least one missing value; – Modify a learning algorithm to allow it to accept and work with missing values. Remove objects: Use only those objects with values for all attributes.
  • 10. How to estimate values? Several methods can be used: • Fill with a location value: the mean or median for quantitative and ordinal attributes, and the mode for nominal values. The mean is just the average of the values and the mode is the quantitative value that appears most often in the attribute. The median is the value that is greater than half of the values and lower than the remaining half. • For classification tasks, we can use the previous method, namely using only instances from the same class to calculate the location statistic. In other words, if we intend to fill the value of attribute at of instance i that belongs to class C1, we will use only instances from the class C1 that do not have missing values in the at attribute. • A learning algorithm can be used to as a prediction model giving a replacement value for one that is missing in a particular attribute. The learning algorithm uses all other attributes as predictors and the one to be filled as the target.
  • 15. How to detect Outliers? A simple yet effective method to detect outliers in quantitative attributes is based in the interquartile range. Let Q1 and Q3 be the first quartile and the third quartile, respectively. The interquartile range is given by IQ = Q3 − Q1. Values below Q1 − 1.5 × IQ or above Q3 + 1.5 × IQ are considered too far away from central values to be reasonable.
  • 16. How to detect Outliers? A simple yet effective method to detect outliers in quantitative attributes is based in the interquartile range. Let Q1 and Q3 be the first quartile and the third quartile, respectively. The interquartile range is given by IQ = Q3 − Q1. Values below Q1 − 1.5 × IQ or above Q3 + 1.5 × IQ are considered too far away from central values to be reasonable.
  • 17. Classification: Nominal, Ordinal, Interval, Ratio Data Element Nominal: A scale of measurement where levels are distinct but do not vary in magnitude. Ordinal: A scale of measurement where levels vary in order of magnitude but equal intervals between levels cannot be assumed. Interval: The interval level of measurement has the characteristics of distinct levels, ordering in magnitude, and equal intervals. Equal intervals are obtained if equivalent differences between measurements represent the same amount of difference in the property being measured. Ratio: The ratio level of measurement has characteristics of distinct levels, ordering in magnitude, equal intervals, and an absolute zero. A measurement has an absolute zero when a measurement of zero represents the absence of the property being measured.
  • 18. Contrasting Nominal, Ordinal, Interval and Ratio Scale has levels that are: Nominal Ordinal Interval Ratio Distinctive X X X X Ordered X X X Equally spaced X X Has an absolute zero X Qualitative / categorical Quantitetive/ numerical
  • 19. Converting to a Different Scale Type? Converting Nominal to Relative Since the nominal scale does not assume an order between its values, to keep this information, nominal values should be converted to relative or binary values. The most common conversion is called “1-of-n”, also known as canonical or one-attribute- per-value conversion, which transforms n values of a nominal attribute into n binary attributes. A binary attribute has only two values, 0 or 1.
  • 20. Converting to a Different Scale Type?
  • 21. Converting to a Different Scale Type?
  • 22. Converting to a Different Scale Type?
  • 23. Converting to a Different Scale Type?
  • 24. Converting to a Different Scale Type?
  • 25. Converting to a Different Scale Type?
  • 26. Converting to a Different Scale Type?