SlideShare a Scribd company logo
International Association of Scientific Innovation and Research (IASIR)
(An Association Unifying the Sciences, Engineering, and Applied Research)
International Journal of Software and Web Sciences (IJSWS)
www.iasir.net
IJSWS 14-423; © 2014, IJSWS All Rights Reserved Page 32
ISSN (Print): 2279-0063
ISSN (Online): 2279-0071
Normalization of Data in Data Mining
Dr. Himani Goyal1
, Sandeep.D*2
Venu.R*3
Raghavendra Pokuri *4
, Sandeep Kathula *5
, Naveen Battula *6
*1
Dean , Dept.of Electronics and Communications, *2,*3,*4,*5,*6,
Student
MLR Institute of Technology, Dundigal, Hyderabad-43 Telangana, India
_________________________________________________________________________________________
Abstract: In today’s competitive world that thrives on the thirst for profits through excellence, obtaining greater
efficiency by optimum utilization of resources and better decision-making through analytical data mining
methods has become the backbone of every industry. This highlights the importance of the powerful tools and
concepts of Data Mining and Warehousing, which when applied effectively can revolutionize the face of any
industry. The role of normalization techniques has become extremely pivotal for identifying patterns and
maintaining the consistency of database.
Keywords: Data Normalization, Min-Max, Decimal Scaling, Zero-Score.
__________________________________________________________________________________________
I. Introduction
Normalization is a process of decomposing the attribute values so that they are within a specified range of
smaller size. It transforms a complex database into a simple database. Normalization involves a sequence of
rules to be employed to test individual relations so that the database can be normalized to any degree. The
process of normalization is based on the engrossing concept of normal forms. A relational schema may be in
either 1NF or 2NF or 3NF or Boyce-Codd Normal form. If the relational schema is not in the required normal
form, then it has to be transformed into either of the desired normal forms. Normalization can thus be used as a
data transformation technique. The various data normalization techniques are as follows:
II. Min-Max Normalization
This intriguing technique is responsible for accomplishing linear transformation on actual data set and for
retaining the correlation between them. Assume 'R ' to be an attribute of a given relational schema. Also, assume
that the range of values which 'R' can take may vary from MP to XP. In this enticing technique, a value 'd' of
attribute R is mapped to d' in the range [nXP, ,nMP ] by calculating d' using the equation:
d'=(d-MP)(nXP-nMP)/XP-MP +nMP
An error "out-of-bound" is displayed in computer executed program if the input value is greater than the actual
data range.
III. Zero-Score (Z-Score) Normalization:
This method is generally used when the actual range of a particular attribute is unknown. However, this
technique can be used to obtain feasible results if the minimum and maximum values are considered to be
outliers. Normalization can thus be performed by using arithmetic mean and standard deviation. Thus, the value
d may be transformed in d' using the equation:
d'=(d-PA)/σP
Where PA is the arithmetic mean of attribute P, whereas σP is the attribute P.
IV. Normalization using Decimal Scaling
The data value of attribute P is normalized by changing the position of decimal points. The decision
regarding the position of decimal point is based on maximum absolute value of P i.e., Max(!d'!). The value of d
is thus transformed using the equation, d’=d/10Z
V. Elimination of Outliers
Outliers are a common sighting while dealing with data. Their presence creates quite a lot of hassles in the
computations. So, eliminating them is a very clever idea. So, detect the outliers from the box-plots and refine
the data by eliminating them. One legitimate reason to remove outliers is to prevent the distortion of central
Min-Max
Normalization
Zero Score
Normalization
Normalization
Using Decimal
Scaling
Himani Goyal et al., International Journal of Software and Web Sciences, 10(1), September-November, 2014, pp. 32-33
IJSWS 14-423; © 2014, IJSWS All Rights Reserved Page 33
tendency of data. Suppose that the data for analysis includes the attribute age. The age values for the data tuples
in the increasing order are 13,15,16,16,19,20,21,22,25,25,25,25,30,33,33,35,35,35,36,40,45,46,52,70.
Thus using the concept of min-max normalization to transform the value 35 for age within the range [0.0,1.0]:
(MP) min=13 and (XP) max=70. Range is [nMP,nXP]=[0.0,1.0].
Transforming the value 35 as,
d'=(d-MP)(nXP-nMP)/(XP-MP)+nMP
=(35-13)(1.0-0.0)/(70-13)+0.0
=22(1.0)/57=0.38.
Hence, d'=0.38 which is well within the actual range. The arithmetic mean PA=29.96 and Standard deviation
σP=12.94 years. Thus using z-score normalization, d'=d-P'/σP which is same as (5.04)/(12.94)=0.38.The value
obtained using min-max normalization is same as the score obtained through z-score normalization. Further, the
value d' can be transformed using decimal scale normalization as, d'=d/10Z
=35/102
=0.35. The value d' is thus
approximately 0.365 which is obtained by taking into consideration the mean of the above three values.
V. Application
Normalization is extensively used in the following applications:
(i) Neural network classification algorithms such as in back-propagation algorithm that enhances the speed
of learning phase.
(ii) Distance-based method such as k-nearest neighbor classification that prohibits the larger range attribute
values from outweighing the smaller range attribute values.
VI. Conclusion
Normalized relation tables do not contain repeated groups. Hence the concepts of anomalous updation,
anomalous deletion, anomalous insertion, redundancy errors and database inconsistency can be obviated.
Further, simplified results can be obtained which help in efficient maintenance of database integrity. Business
enterprises can thus enhance their data analytics through the predictive behavior of the normalized data.
Acknowledgments
Ineffable are our feelings to Prof. Kamakshi Prasad of JNTU-Hyderabad for assisting us in this work. The
values and beliefs that our professors have instilled in us have been a source of constant inspiration. F.A likes to
extend special thanks to his parents for their amazing insight and guidance. The unflinching support of family
members through thick and thin has helped us in reaching where we are today. S.A would like to thank the
students of JNTU for their constant motivation. T.A extends his warm regards to his friends for their excellent
ideologies and ideations which have been the constant sources of enlightenment.
References
1. Database Management Systems by Raghu RamaKrishna, 2002 edition, McGraw -Hill.
2. Database Management Systems by Abraham Silberschatz, Henry F.Korth, Sudarshan, Ed 5, 2005, Mc-Graw-Hill education.
3. Database Management Systems by Raghuram and RadhaKrishna, Professional Publications.
4. Data Mining: Concepts and Techniques by Jiawei Han and Micheline Kambler, Ed4, Morgan Kauffman publications.
5. Data Mining tutorial, tutorialspoint.com.
6. Data Mining Techniques by Arun K Pujari.
7. Fundamentals of Database Systems by Remez Elmasri & Shamkant Navathe, Ed 4.
About the Authors
First Author: Raghavendra Pokuri is a final year Computer Science Engineering student pursuing his Bachelors in Technology from JNTU-
Hyderabad. His fields of interests include research on the captivating subjects of Data Mining and Data Warehousing, adhoc -sensor
networks and extensive C& Java programming.
Second Author: Sandeep Kathula is a final year Computer Science engineering student pursuing his Bachelors in Technology from the
esteemed college of JNTU-Hyderabad. His fields of interest include extensive research on Information retrieval systems, Data Mining and
Warehousing, SQL programming and Web programming.
Third Author: Naveen Battula is a final year Computer Science Engineering student pursuing his Bachelors in Technology from JNTU-
Hyderabad. His fields of interest include substantial research on Database Transactions and Concurrency control, Storage and Indexing
algorithms, Schema Refinement and Relational Calculus.

More Related Content

What's hot

Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
ijsrd.com
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek outiaemedu
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
Zac Darcy
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET Journal
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
IJDKP
 
20 26 jan17 walter latex
20 26 jan17 walter latex20 26 jan17 walter latex
20 26 jan17 walter latex
IAESIJEECS
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXING
IJDKP
 
Mining data streams using option trees
Mining data streams using option treesMining data streams using option trees
Mining data streams using option trees
Alexander Decker
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
IJDKP
 
Deep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection SystemDeep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection System
Sri Ram
 
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
IJECEIAES
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease Dataset
IRJET Journal
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
IDES Editor
 
Test PDF
Test PDFTest PDF
Test PDFAlgnuD
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
ijcsity
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
IJET - International Journal of Engineering and Techniques
 
New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...
IJDKP
 

What's hot (20)

Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
 
20 26 jan17 walter latex
20 26 jan17 walter latex20 26 jan17 walter latex
20 26 jan17 walter latex
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXING
 
Mining data streams using option trees
Mining data streams using option treesMining data streams using option trees
Mining data streams using option trees
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
Deep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection SystemDeep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection System
 
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease Dataset
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
 
Test PDF
Test PDFTest PDF
Test PDF
 
15 19
15 1915 19
15 19
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
 
winbis1005
winbis1005winbis1005
winbis1005
 
New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...
 

Similar to Ijsws14 423 (1)-paper-17-normalization of data in (1)

CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
ieijjournal1
 
IRJET- A Survey on Predictive Analytics and Parallel Algorithms for Knowl...
IRJET-  	  A Survey on Predictive Analytics and Parallel Algorithms for Knowl...IRJET-  	  A Survey on Predictive Analytics and Parallel Algorithms for Knowl...
IRJET- A Survey on Predictive Analytics and Parallel Algorithms for Knowl...
IRJET Journal
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted data
IOSR Journals
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
eSAT Journals
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
eSAT Journals
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
14894
 
Network Intrusion Detection System using Machine Learning
Network Intrusion Detection System using Machine LearningNetwork Intrusion Detection System using Machine Learning
Network Intrusion Detection System using Machine Learning
IRJET Journal
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloud
eSAT Journals
 
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemThe Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
IOSRjournaljce
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Editor IJMTER
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET Journal
 
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
ijcnes
 
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
IRJET Journal
 
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
IRJET Journal
 
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
CSCJournals
 
Data Imputation by Soft Computing
Data Imputation by Soft ComputingData Imputation by Soft Computing
Data Imputation by Soft Computing
ijtsrd
 
PARKINSON’S DISEASE DETECTION USING MACHINE LEARNING
PARKINSON’S DISEASE DETECTION USING MACHINE LEARNINGPARKINSON’S DISEASE DETECTION USING MACHINE LEARNING
PARKINSON’S DISEASE DETECTION USING MACHINE LEARNING
IRJET Journal
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
IRJET Journal
 
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
IJERDJOURNAL
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningEditor IJCATR
 

Similar to Ijsws14 423 (1)-paper-17-normalization of data in (1) (20)

CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
 
IRJET- A Survey on Predictive Analytics and Parallel Algorithms for Knowl...
IRJET-  	  A Survey on Predictive Analytics and Parallel Algorithms for Knowl...IRJET-  	  A Survey on Predictive Analytics and Parallel Algorithms for Knowl...
IRJET- A Survey on Predictive Analytics and Parallel Algorithms for Knowl...
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted data
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
Network Intrusion Detection System using Machine Learning
Network Intrusion Detection System using Machine LearningNetwork Intrusion Detection System using Machine Learning
Network Intrusion Detection System using Machine Learning
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloud
 
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemThe Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
 
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
 
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
 
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
 
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
 
Data Imputation by Soft Computing
Data Imputation by Soft ComputingData Imputation by Soft Computing
Data Imputation by Soft Computing
 
PARKINSON’S DISEASE DETECTION USING MACHINE LEARNING
PARKINSON’S DISEASE DETECTION USING MACHINE LEARNINGPARKINSON’S DISEASE DETECTION USING MACHINE LEARNING
PARKINSON’S DISEASE DETECTION USING MACHINE LEARNING
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
 
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

Ijsws14 423 (1)-paper-17-normalization of data in (1)

  • 1. International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Software and Web Sciences (IJSWS) www.iasir.net IJSWS 14-423; © 2014, IJSWS All Rights Reserved Page 32 ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 Normalization of Data in Data Mining Dr. Himani Goyal1 , Sandeep.D*2 Venu.R*3 Raghavendra Pokuri *4 , Sandeep Kathula *5 , Naveen Battula *6 *1 Dean , Dept.of Electronics and Communications, *2,*3,*4,*5,*6, Student MLR Institute of Technology, Dundigal, Hyderabad-43 Telangana, India _________________________________________________________________________________________ Abstract: In today’s competitive world that thrives on the thirst for profits through excellence, obtaining greater efficiency by optimum utilization of resources and better decision-making through analytical data mining methods has become the backbone of every industry. This highlights the importance of the powerful tools and concepts of Data Mining and Warehousing, which when applied effectively can revolutionize the face of any industry. The role of normalization techniques has become extremely pivotal for identifying patterns and maintaining the consistency of database. Keywords: Data Normalization, Min-Max, Decimal Scaling, Zero-Score. __________________________________________________________________________________________ I. Introduction Normalization is a process of decomposing the attribute values so that they are within a specified range of smaller size. It transforms a complex database into a simple database. Normalization involves a sequence of rules to be employed to test individual relations so that the database can be normalized to any degree. The process of normalization is based on the engrossing concept of normal forms. A relational schema may be in either 1NF or 2NF or 3NF or Boyce-Codd Normal form. If the relational schema is not in the required normal form, then it has to be transformed into either of the desired normal forms. Normalization can thus be used as a data transformation technique. The various data normalization techniques are as follows: II. Min-Max Normalization This intriguing technique is responsible for accomplishing linear transformation on actual data set and for retaining the correlation between them. Assume 'R ' to be an attribute of a given relational schema. Also, assume that the range of values which 'R' can take may vary from MP to XP. In this enticing technique, a value 'd' of attribute R is mapped to d' in the range [nXP, ,nMP ] by calculating d' using the equation: d'=(d-MP)(nXP-nMP)/XP-MP +nMP An error "out-of-bound" is displayed in computer executed program if the input value is greater than the actual data range. III. Zero-Score (Z-Score) Normalization: This method is generally used when the actual range of a particular attribute is unknown. However, this technique can be used to obtain feasible results if the minimum and maximum values are considered to be outliers. Normalization can thus be performed by using arithmetic mean and standard deviation. Thus, the value d may be transformed in d' using the equation: d'=(d-PA)/σP Where PA is the arithmetic mean of attribute P, whereas σP is the attribute P. IV. Normalization using Decimal Scaling The data value of attribute P is normalized by changing the position of decimal points. The decision regarding the position of decimal point is based on maximum absolute value of P i.e., Max(!d'!). The value of d is thus transformed using the equation, d’=d/10Z V. Elimination of Outliers Outliers are a common sighting while dealing with data. Their presence creates quite a lot of hassles in the computations. So, eliminating them is a very clever idea. So, detect the outliers from the box-plots and refine the data by eliminating them. One legitimate reason to remove outliers is to prevent the distortion of central Min-Max Normalization Zero Score Normalization Normalization Using Decimal Scaling
  • 2. Himani Goyal et al., International Journal of Software and Web Sciences, 10(1), September-November, 2014, pp. 32-33 IJSWS 14-423; © 2014, IJSWS All Rights Reserved Page 33 tendency of data. Suppose that the data for analysis includes the attribute age. The age values for the data tuples in the increasing order are 13,15,16,16,19,20,21,22,25,25,25,25,30,33,33,35,35,35,36,40,45,46,52,70. Thus using the concept of min-max normalization to transform the value 35 for age within the range [0.0,1.0]: (MP) min=13 and (XP) max=70. Range is [nMP,nXP]=[0.0,1.0]. Transforming the value 35 as, d'=(d-MP)(nXP-nMP)/(XP-MP)+nMP =(35-13)(1.0-0.0)/(70-13)+0.0 =22(1.0)/57=0.38. Hence, d'=0.38 which is well within the actual range. The arithmetic mean PA=29.96 and Standard deviation σP=12.94 years. Thus using z-score normalization, d'=d-P'/σP which is same as (5.04)/(12.94)=0.38.The value obtained using min-max normalization is same as the score obtained through z-score normalization. Further, the value d' can be transformed using decimal scale normalization as, d'=d/10Z =35/102 =0.35. The value d' is thus approximately 0.365 which is obtained by taking into consideration the mean of the above three values. V. Application Normalization is extensively used in the following applications: (i) Neural network classification algorithms such as in back-propagation algorithm that enhances the speed of learning phase. (ii) Distance-based method such as k-nearest neighbor classification that prohibits the larger range attribute values from outweighing the smaller range attribute values. VI. Conclusion Normalized relation tables do not contain repeated groups. Hence the concepts of anomalous updation, anomalous deletion, anomalous insertion, redundancy errors and database inconsistency can be obviated. Further, simplified results can be obtained which help in efficient maintenance of database integrity. Business enterprises can thus enhance their data analytics through the predictive behavior of the normalized data. Acknowledgments Ineffable are our feelings to Prof. Kamakshi Prasad of JNTU-Hyderabad for assisting us in this work. The values and beliefs that our professors have instilled in us have been a source of constant inspiration. F.A likes to extend special thanks to his parents for their amazing insight and guidance. The unflinching support of family members through thick and thin has helped us in reaching where we are today. S.A would like to thank the students of JNTU for their constant motivation. T.A extends his warm regards to his friends for their excellent ideologies and ideations which have been the constant sources of enlightenment. References 1. Database Management Systems by Raghu RamaKrishna, 2002 edition, McGraw -Hill. 2. Database Management Systems by Abraham Silberschatz, Henry F.Korth, Sudarshan, Ed 5, 2005, Mc-Graw-Hill education. 3. Database Management Systems by Raghuram and RadhaKrishna, Professional Publications. 4. Data Mining: Concepts and Techniques by Jiawei Han and Micheline Kambler, Ed4, Morgan Kauffman publications. 5. Data Mining tutorial, tutorialspoint.com. 6. Data Mining Techniques by Arun K Pujari. 7. Fundamentals of Database Systems by Remez Elmasri & Shamkant Navathe, Ed 4. About the Authors First Author: Raghavendra Pokuri is a final year Computer Science Engineering student pursuing his Bachelors in Technology from JNTU- Hyderabad. His fields of interests include research on the captivating subjects of Data Mining and Data Warehousing, adhoc -sensor networks and extensive C& Java programming. Second Author: Sandeep Kathula is a final year Computer Science engineering student pursuing his Bachelors in Technology from the esteemed college of JNTU-Hyderabad. His fields of interest include extensive research on Information retrieval systems, Data Mining and Warehousing, SQL programming and Web programming. Third Author: Naveen Battula is a final year Computer Science Engineering student pursuing his Bachelors in Technology from JNTU- Hyderabad. His fields of interest include substantial research on Database Transactions and Concurrency control, Storage and Indexing algorithms, Schema Refinement and Relational Calculus.