SlideShare a Scribd company logo
1 of 12
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
Rule-Based Method for Entity Resolution
ABSTRACT:
 Entity resolution (ER) is to identify records referring to the same real-
world entity. Traditional ER approaches identify records based on pair wise
similarity comparisons, which assumes that records referring to the same entity
are more similar to each other than otherwise. However, this assumption does not
always hold in practice and similarity comparisons do not work well when such
assumption breaks.
 We propose a new class of rules which could describe the complex
matching conditions between records and entities. Based on this class of rules,
we present the rule-based entity resolution problem and develop an on-line
approach for ER. In this framework, by applying rules to each record, we identify
which entity the record refers to.
 Additionally, we propose an effective and efficient rule discovery
algorithm. We experimentally evaluated our rule-based ER algorithm on real data
sets. The experimental results show that both our rule discovery algorithm and
rule-based ER algorithm can achieve high performance.
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
Existing System
 These rules can be discovered from existing high quality data such as
master data or manually identified data. Inspired by the swoosh method,
each cluster is then merged into a composite record via a merge function.
Finally a traditional ER method, denoted by T-ER, can be applied to
identify the new data set. Moreover, in order to identify more records, the
current ER result can be used as the training data to discover new ER-
rules. The training data can also be obtained by using techniques, such as
relevant feedback, crowd sourcing and knowledge extraction from the web.
Therefore, with the accumulated information, ER-rules for more entities
can be discovered.
 Invalid rule. A rule r is invalid if there exist records that match LHS(r) but
do not refer to RHS(r) . Invalid rules might be discovered when the
information of entities is not comprehensive. Forexample, supposethe
training data set involves the records. The rule r: (name ¼“wei
wang”)^(coa 2“zhang”)) e1 can be generated. For o31, it matches LHS(r)
but does not refer to e1. Therefore, r is an invalid rule.
 Incomplete rule set. An ER-rule set R of entity set E is incomplete if there
are records referring to entities in E that are not covered by R. Both the
incomprehensive information of entities and continuous changes of entity
features would cause a rule set become incomplete. To solve these
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
problems, we develop some methods to identify candidate invalid rules and
candidate useless rules and discover new effective ER-rules.
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
PROPOSED SYSTEM
1. The syntax and semantics of the rules for ER are designed, and the
independence, consistency, completeness and validity of the rules are defined and
analyzed.
2. An efficient rule discovery algorithm based on training data is proposed
and analyzed.
3. An efficient rule-based algorithm for solving entity resolution problem is
proposed and analyzed.
4. A rule maintaining method is proposed when entity information is
changed.
5. Experiments are performed on real data to verify the effectiveness and
efficiency of the proposed algorithms.
RULES FOR ENTITY RESOLUTION
A rule system for entity resolution, called ER rule, is defined. We can see
that each rule consists of two clauses.
1. The If clause includes constraints on attributes of records, such as
“including zhang in coauthors”.
2. The real world entity referred by the records that satisfy the first clause of
the rule, such as “refers to entity e1”. Thus, we use A) B to express the rules “8o,
If o satisfies A Then o refers to B” for ER. We denote the left-hand side and the
right-hand side of a rule r as LHS(r) and RHS(r) respectively.
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
RULE-BASED ENTITY RESOLUTION
The algorithm of entity resolution by leveraging ER-rules. We first define
the rule-based ER problem. Next we develop an online algorithm for rule based
ER problem. Finally, we describe how to incorporate this algorithm into a
generalized ER framework. (Rule-based ER). Rule-based ER takes U and RE as
input, and outputs U. U is a data set, RE is an ER-rule set of entity set E ¼ fe1; . .
. ; emg, U ¼ fU1; . . . ; Umg is a partition of records where each group Uj(1 _ j _
m) is a subset of U which are determined to refer to the entity ej and [1_j_mUj is
a subset of U. Our rule-based ER algorithm R-ER scans records one by one and
determines the entity for each record. The determination process can be divided
into the following steps. First, we find all the rules satisfied by o (FINDRULES).
Second, for each entity e to which o might refer, we compute the confidence that
o refers to e according to the rules of e that are satisfied by o (COMPCONF).
Third, we select the entity e with the largest confidence to which o might refer,
and if this confidence is larger than a confidence threshold, it is determined that o
refers to e (SELENTITY). These procedures are described as follows.
PAIR WISE ER
Most works on ER focus on record matching, which involves comparing
record pairs and identifying whether they match. A major part of work on record
matching focuses on similarity functions. To capture string variations, proposed a
transformation-based framework for record matching. Some machine-learning
based approaches can identify matching strings which are syntactically far apart.
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
Similarity based on record relationships are also proposed to solve the people
identification problem.
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
MODULE DESCRIPTION
Modules:
The system consists of modules and threat modules.
 Books and Authors
 Staff details
 View
 Entity View
 Entity with Rules resolution
Module Explanations:
Books andAuthors:
The records are inserted in this module with duplicate and non duplicate
records, with the entity rules for the each values of the records
Staff Details
The records are inserted in this module with duplicate and non duplicate
records, with the entity rules for the each values of the records of Staff
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
View
This module describes the view without the entity values so that it can be
viewed with duplicate records ofthe table.
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
Entity View
The Entire table is viewed with the entity of the records of the table
Entity with Rules Resolution
In this module it is described such that the entity records can be viewed
with the rules for resolution of the records
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
SYSTEM CONFIGURATION
SOFTWARE REQUIREMENTS:
 Operating System : Windows 7
 Technology : Java7 and J2EE
 Web Technologies : Html, JavaScript, CSS
 IDE : Eclipse Juno
 Web Server : Tomcat
 Database : My SQL
 Java Version : J2SDK1.5
HARDWARE REQUIREMENTS:
 Hardware : Pentium Dual Core
 Speed : 2.80 GHz
 RAM : 1GB
 Hard Disk : 20 GB
 Floppy Drive : 1.44 MB
 Key Board : Standard Windows Keyboard
Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore
www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
 Mouse : Two or Three Button Mouse
 Monitor : SVGA

More Related Content

What's hot (6)

Addressing mode
Addressing mode Addressing mode
Addressing mode
 
Hw fdb(2)
Hw fdb(2)Hw fdb(2)
Hw fdb(2)
 
ER Model ppt
ER Model pptER Model ppt
ER Model ppt
 
Iisrt zz mamatha
Iisrt zz mamathaIisrt zz mamatha
Iisrt zz mamatha
 
OVERALL PERFORMANCE EVALUATION OF ENGINEERING STUDENTS USING FUZZY LOGIC
OVERALL PERFORMANCE EVALUATION OF ENGINEERING STUDENTS USING FUZZY LOGICOVERALL PERFORMANCE EVALUATION OF ENGINEERING STUDENTS USING FUZZY LOGIC
OVERALL PERFORMANCE EVALUATION OF ENGINEERING STUDENTS USING FUZZY LOGIC
 
POPSI
POPSIPOPSI
POPSI
 

Viewers also liked

Bien doi khi hau49
Bien doi khi hau49Bien doi khi hau49
Bien doi khi hau49
Phi Phi
 
Recomendation Letter
Recomendation LetterRecomendation Letter
Recomendation Letter
Mike Kharrat
 
SAKAMURI DILLI BABU_Resume
SAKAMURI DILLI BABU_ResumeSAKAMURI DILLI BABU_Resume
SAKAMURI DILLI BABU_Resume
DILLI BABU
 
A_Deep_Dive_into_Meaningful_Use_Stage_2
A_Deep_Dive_into_Meaningful_Use_Stage_2A_Deep_Dive_into_Meaningful_Use_Stage_2
A_Deep_Dive_into_Meaningful_Use_Stage_2
laurenstill
 
PORTFOLIO MARIANA T. DIB
PORTFOLIO MARIANA T. DIBPORTFOLIO MARIANA T. DIB
PORTFOLIO MARIANA T. DIB
Mariana Dib
 

Viewers also liked (16)

Queensland’s Intelligent Transport Systems (ITS) Pilot Projects
Queensland’s Intelligent Transport Systems (ITS) Pilot ProjectsQueensland’s Intelligent Transport Systems (ITS) Pilot Projects
Queensland’s Intelligent Transport Systems (ITS) Pilot Projects
 
Intelligent Transport Systems in Hong Kong
Intelligent Transport Systems in Hong KongIntelligent Transport Systems in Hong Kong
Intelligent Transport Systems in Hong Kong
 
Welltech Energy
Welltech EnergyWelltech Energy
Welltech Energy
 
Z a k i
Z a k iZ a k i
Z a k i
 
1
11
1
 
Bien doi khi hau49
Bien doi khi hau49Bien doi khi hau49
Bien doi khi hau49
 
Recomendation Letter
Recomendation LetterRecomendation Letter
Recomendation Letter
 
SAKAMURI DILLI BABU_Resume
SAKAMURI DILLI BABU_ResumeSAKAMURI DILLI BABU_Resume
SAKAMURI DILLI BABU_Resume
 
Presentation_NEW.PPTX
Presentation_NEW.PPTXPresentation_NEW.PPTX
Presentation_NEW.PPTX
 
Learning With Sequential Art
Learning With Sequential ArtLearning With Sequential Art
Learning With Sequential Art
 
A_Deep_Dive_into_Meaningful_Use_Stage_2
A_Deep_Dive_into_Meaningful_Use_Stage_2A_Deep_Dive_into_Meaningful_Use_Stage_2
A_Deep_Dive_into_Meaningful_Use_Stage_2
 
Curso de Serigrafia - Mestre Jonas
Curso de Serigrafia - Mestre JonasCurso de Serigrafia - Mestre Jonas
Curso de Serigrafia - Mestre Jonas
 
PORTFOLIO MARIANA T. DIB
PORTFOLIO MARIANA T. DIBPORTFOLIO MARIANA T. DIB
PORTFOLIO MARIANA T. DIB
 
South Yorkshire Intelligent Transport System
South Yorkshire Intelligent Transport SystemSouth Yorkshire Intelligent Transport System
South Yorkshire Intelligent Transport System
 
Democratic Politics Chapter 4 Grade 10 CBSE [ Gender, Religion and Caste ]
Democratic Politics Chapter 4 Grade 10 CBSE [ Gender, Religion and Caste ]Democratic Politics Chapter 4 Grade 10 CBSE [ Gender, Religion and Caste ]
Democratic Politics Chapter 4 Grade 10 CBSE [ Gender, Religion and Caste ]
 
Intelligent Transport System simplified | Logica
Intelligent Transport System simplified | LogicaIntelligent Transport System simplified | Logica
Intelligent Transport System simplified | Logica
 

Similar to Rule based method-for entity resolution

Top 40 Data Science Interview Questions and Answers 2022.pdf
Top 40 Data Science Interview Questions and Answers 2022.pdfTop 40 Data Science Interview Questions and Answers 2022.pdf
Top 40 Data Science Interview Questions and Answers 2022.pdf
Suraj Kumar
 

Similar to Rule based method-for entity resolution (20)

Rule based method for entity resolution
Rule based method for entity resolutionRule based method for entity resolution
Rule based method for entity resolution
 
RULE-BASED METHOD FOR ENTITY RESOLUTION - IEEE PROJECTS IN PONDICHERRY,BULK I...
RULE-BASED METHOD FOR ENTITY RESOLUTION - IEEE PROJECTS IN PONDICHERRY,BULK I...RULE-BASED METHOD FOR ENTITY RESOLUTION - IEEE PROJECTS IN PONDICHERRY,BULK I...
RULE-BASED METHOD FOR ENTITY RESOLUTION - IEEE PROJECTS IN PONDICHERRY,BULK I...
 
A Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets GenerationA Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets Generation
 
B017441015
B017441015B017441015
B017441015
 
Rule based method for entity resolution
Rule based method for entity resolutionRule based method for entity resolution
Rule based method for entity resolution
 
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph ReasoningLearning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
 
Effectiveness of ERules in Generating Non Redundant Rule Sets in Pharmacy Dat...
Effectiveness of ERules in Generating Non Redundant Rule Sets in Pharmacy Dat...Effectiveness of ERules in Generating Non Redundant Rule Sets in Pharmacy Dat...
Effectiveness of ERules in Generating Non Redundant Rule Sets in Pharmacy Dat...
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
A03730108
A03730108A03730108
A03730108
 
Optimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML DocumentsOptimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML Documents
 
Hybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithmHybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithm
 
Top 40 Data Science Interview Questions and Answers 2022.pdf
Top 40 Data Science Interview Questions and Answers 2022.pdfTop 40 Data Science Interview Questions and Answers 2022.pdf
Top 40 Data Science Interview Questions and Answers 2022.pdf
 
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific OntologyConceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
 
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
 
D017422528
D017422528D017422528
D017422528
 
Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[
 
A boolean modeling for improving
A boolean modeling for improvingA boolean modeling for improving
A boolean modeling for improving
 
Hw fdb(2)
Hw fdb(2)Hw fdb(2)
Hw fdb(2)
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 

More from Pvrtechnologies Nellore

Pre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Pre encoded multipliers based on non-redundant radix-4 signed-digit encodingPre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Pre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Pvrtechnologies Nellore
 

More from Pvrtechnologies Nellore (20)

A High Throughput List Decoder Architecture for Polar Codes
A High Throughput List Decoder Architecture for Polar CodesA High Throughput List Decoder Architecture for Polar Codes
A High Throughput List Decoder Architecture for Polar Codes
 
Performance/Power Space Exploration for Binary64 Division Units
Performance/Power Space Exploration for Binary64 Division UnitsPerformance/Power Space Exploration for Binary64 Division Units
Performance/Power Space Exploration for Binary64 Division Units
 
Hybrid LUT/Multiplexer FPGA Logic Architectures
Hybrid LUT/Multiplexer FPGA Logic ArchitecturesHybrid LUT/Multiplexer FPGA Logic Architectures
Hybrid LUT/Multiplexer FPGA Logic Architectures
 
Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...
Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...
Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video...
 
2016 2017 ieee matlab project titles
2016 2017 ieee matlab project titles2016 2017 ieee matlab project titles
2016 2017 ieee matlab project titles
 
2016 2017 ieee vlsi project titles
2016   2017 ieee vlsi project titles2016   2017 ieee vlsi project titles
2016 2017 ieee vlsi project titles
 
2016 2017 ieee ece embedded- project titles
2016   2017 ieee ece  embedded- project titles2016   2017 ieee ece  embedded- project titles
2016 2017 ieee ece embedded- project titles
 
A High-Speed FPGA Implementation of an RSD-Based ECC Processor
A High-Speed FPGA Implementation of an RSD-Based ECC ProcessorA High-Speed FPGA Implementation of an RSD-Based ECC Processor
A High-Speed FPGA Implementation of an RSD-Based ECC Processor
 
6On Efficient Retiming of Fixed-Point Circuits
6On Efficient Retiming of Fixed-Point Circuits6On Efficient Retiming of Fixed-Point Circuits
6On Efficient Retiming of Fixed-Point Circuits
 
Pre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Pre encoded multipliers based on non-redundant radix-4 signed-digit encodingPre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Pre encoded multipliers based on non-redundant radix-4 signed-digit encoding
 
Quality of-protection-driven data forwarding for intermittently connected wir...
Quality of-protection-driven data forwarding for intermittently connected wir...Quality of-protection-driven data forwarding for intermittently connected wir...
Quality of-protection-driven data forwarding for intermittently connected wir...
 
11.online library management system
11.online library management system11.online library management system
11.online library management system
 
06.e voting system
06.e voting system06.e voting system
06.e voting system
 
New web based projects list
New web based projects listNew web based projects list
New web based projects list
 
Power controlled medium access control
Power controlled medium access controlPower controlled medium access control
Power controlled medium access control
 
IEEE PROJECTS LIST
IEEE PROJECTS LIST IEEE PROJECTS LIST
IEEE PROJECTS LIST
 
Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...
Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...
Control cloud-data-access-privilege-and-anonymity-with-fully-anonymous-attrib...
 
Control cloud data access privilege and anonymity with fully anonymous attrib...
Control cloud data access privilege and anonymity with fully anonymous attrib...Control cloud data access privilege and anonymity with fully anonymous attrib...
Control cloud data access privilege and anonymity with fully anonymous attrib...
 
Cloud keybank privacy and owner authorization
Cloud keybank  privacy and owner authorizationCloud keybank  privacy and owner authorization
Cloud keybank privacy and owner authorization
 
Circuit ciphertext policy attribute-based hybrid encryption with verifiable
Circuit ciphertext policy attribute-based hybrid encryption with verifiableCircuit ciphertext policy attribute-based hybrid encryption with verifiable
Circuit ciphertext policy attribute-based hybrid encryption with verifiable
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Rule based method-for entity resolution

  • 1. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 Rule-Based Method for Entity Resolution ABSTRACT:  Entity resolution (ER) is to identify records referring to the same real- world entity. Traditional ER approaches identify records based on pair wise similarity comparisons, which assumes that records referring to the same entity are more similar to each other than otherwise. However, this assumption does not always hold in practice and similarity comparisons do not work well when such assumption breaks.  We propose a new class of rules which could describe the complex matching conditions between records and entities. Based on this class of rules, we present the rule-based entity resolution problem and develop an on-line approach for ER. In this framework, by applying rules to each record, we identify which entity the record refers to.  Additionally, we propose an effective and efficient rule discovery algorithm. We experimentally evaluated our rule-based ER algorithm on real data sets. The experimental results show that both our rule discovery algorithm and rule-based ER algorithm can achieve high performance.
  • 2. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 Existing System  These rules can be discovered from existing high quality data such as master data or manually identified data. Inspired by the swoosh method, each cluster is then merged into a composite record via a merge function. Finally a traditional ER method, denoted by T-ER, can be applied to identify the new data set. Moreover, in order to identify more records, the current ER result can be used as the training data to discover new ER- rules. The training data can also be obtained by using techniques, such as relevant feedback, crowd sourcing and knowledge extraction from the web. Therefore, with the accumulated information, ER-rules for more entities can be discovered.  Invalid rule. A rule r is invalid if there exist records that match LHS(r) but do not refer to RHS(r) . Invalid rules might be discovered when the information of entities is not comprehensive. Forexample, supposethe training data set involves the records. The rule r: (name ¼“wei wang”)^(coa 2“zhang”)) e1 can be generated. For o31, it matches LHS(r) but does not refer to e1. Therefore, r is an invalid rule.  Incomplete rule set. An ER-rule set R of entity set E is incomplete if there are records referring to entities in E that are not covered by R. Both the incomprehensive information of entities and continuous changes of entity features would cause a rule set become incomplete. To solve these
  • 3. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 problems, we develop some methods to identify candidate invalid rules and candidate useless rules and discover new effective ER-rules.
  • 4. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 PROPOSED SYSTEM 1. The syntax and semantics of the rules for ER are designed, and the independence, consistency, completeness and validity of the rules are defined and analyzed. 2. An efficient rule discovery algorithm based on training data is proposed and analyzed. 3. An efficient rule-based algorithm for solving entity resolution problem is proposed and analyzed. 4. A rule maintaining method is proposed when entity information is changed. 5. Experiments are performed on real data to verify the effectiveness and efficiency of the proposed algorithms. RULES FOR ENTITY RESOLUTION A rule system for entity resolution, called ER rule, is defined. We can see that each rule consists of two clauses. 1. The If clause includes constraints on attributes of records, such as “including zhang in coauthors”. 2. The real world entity referred by the records that satisfy the first clause of the rule, such as “refers to entity e1”. Thus, we use A) B to express the rules “8o, If o satisfies A Then o refers to B” for ER. We denote the left-hand side and the right-hand side of a rule r as LHS(r) and RHS(r) respectively.
  • 5. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457
  • 6. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 RULE-BASED ENTITY RESOLUTION The algorithm of entity resolution by leveraging ER-rules. We first define the rule-based ER problem. Next we develop an online algorithm for rule based ER problem. Finally, we describe how to incorporate this algorithm into a generalized ER framework. (Rule-based ER). Rule-based ER takes U and RE as input, and outputs U. U is a data set, RE is an ER-rule set of entity set E ¼ fe1; . . . ; emg, U ¼ fU1; . . . ; Umg is a partition of records where each group Uj(1 _ j _ m) is a subset of U which are determined to refer to the entity ej and [1_j_mUj is a subset of U. Our rule-based ER algorithm R-ER scans records one by one and determines the entity for each record. The determination process can be divided into the following steps. First, we find all the rules satisfied by o (FINDRULES). Second, for each entity e to which o might refer, we compute the confidence that o refers to e according to the rules of e that are satisfied by o (COMPCONF). Third, we select the entity e with the largest confidence to which o might refer, and if this confidence is larger than a confidence threshold, it is determined that o refers to e (SELENTITY). These procedures are described as follows. PAIR WISE ER Most works on ER focus on record matching, which involves comparing record pairs and identifying whether they match. A major part of work on record matching focuses on similarity functions. To capture string variations, proposed a transformation-based framework for record matching. Some machine-learning based approaches can identify matching strings which are syntactically far apart.
  • 7. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 Similarity based on record relationships are also proposed to solve the people identification problem.
  • 8. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 MODULE DESCRIPTION Modules: The system consists of modules and threat modules.  Books and Authors  Staff details  View  Entity View  Entity with Rules resolution Module Explanations: Books andAuthors: The records are inserted in this module with duplicate and non duplicate records, with the entity rules for the each values of the records Staff Details The records are inserted in this module with duplicate and non duplicate records, with the entity rules for the each values of the records of Staff
  • 9. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 View This module describes the view without the entity values so that it can be viewed with duplicate records ofthe table.
  • 10. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 Entity View The Entire table is viewed with the entity of the records of the table Entity with Rules Resolution In this module it is described such that the entity records can be viewed with the rules for resolution of the records
  • 11. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457 SYSTEM CONFIGURATION SOFTWARE REQUIREMENTS:  Operating System : Windows 7  Technology : Java7 and J2EE  Web Technologies : Html, JavaScript, CSS  IDE : Eclipse Juno  Web Server : Tomcat  Database : My SQL  Java Version : J2SDK1.5 HARDWARE REQUIREMENTS:  Hardware : Pentium Dual Core  Speed : 2.80 GHz  RAM : 1GB  Hard Disk : 20 GB  Floppy Drive : 1.44 MB  Key Board : Standard Windows Keyboard
  • 12. Head office: 3nd floor, Krishna Reddy Buildings, OPP: ICICI ATM, Ramalingapuram, Nellore www.pvrtechnology.com, E-Mail: pvrieeeprojects@gmail.com, Ph: 81432 71457  Mouse : Two or Three Button Mouse  Monitor : SVGA