SlideShare a Scribd company logo
1 of 17
GFilter: A General Gram Filter for
String Similarity Search
ABSTRACT
• Numerous applications such as data integration, protein detection,
and article copy detection share a similar core problem: given a
string as the query, how to efficiently find all the similar answers
from a large scalestring collection.
• Many existing methods adopt a prefix-filter-based framework to
solve this problem, and a number of recent works aim to use
advanced filters to improve the overall search performance.
ABSTRACT
• In this paper we propose a gram-based framework to achieve near
maximum filter performance. The main idea is to judiciously choose
the high-quality grams as the prefix of query according to their
estimated ability to filter candidates.
• As this selection process is proved to be NP-hard problem, we give a
cost model to measure the filter ability of grams and develop
efficient heuristic algorithms to find high-quality grams.
• Extensive experiments on real datasets demonstrate the superiority
of the proposedframework in comparison with the state-of-art
approaches.
EXISTING SYSTEM
• Existing literatures usually adopt a gram-based filter-and
verification framework for string similarity search.
• After the query string is decomposed into grams, an efficient and
critical type of filter can utilize the grams and inverted index (from
grams to strings) of the string collection togenerate candidates.
• Other advanced filters can be also used after this elementary type of
filter. Then in verification step, the similarity function is evaluated
for the surviving candidates to produce the final results.
PROPOSED SYSTEM
• We develop a general gram filter. This generalization provides a
chance to select optimized combination of grams from q-gram set of
a query.
• Theoretical analysis is conducted on the complexity of the problem,
which is NP-hard. We present a choose-and-extend framework to
efficiently find the high quality grams in the query process. Under
this framework, different strategies can be extended.
PROPOSED SYSTEM
The Main Modules Identified in this project are:
•Admin Module
•Master Data Module
•Members Registration Module
•Authorization Module
•Publishers Module
•Users Module
•MIS Reports Module
ADMIN MODULE
• The Admin Module is responsible for maintaining the application
like Creating the necessary Master Data for the Application,
Granting possible Access Permissions to the Application Users, and
Generation of MIS (Management Information System) etc.
MASTER DATA MODULE
• The Master Data Module is accessed by Application Admin User
Only. This Module is responsible for creating necessary Master Data
for the Application.
• The Master Data Indentified in this Module is Creation of
Categories which allows Publishers to Publish their content under
these categories. By introducing this module, Various MIS reports
can be generated according to the managerial requirement for their
analysis purpose
.
MEMBERS REGISTRATION
MODULE
• This Module is available online for the general public on the Home
Page itself. The general public is categorized into two Member
Groups. One Member Group is Publisher and other Group is Users
Group.
• By using the Registration Module, these member groups are get
registered by giving the necessary details about themselves.
• Once they complete their registration phase, they have to be
Authorized by the Application Admin User. Till then they cannot
have the access to the Application. Once the Member is authorized
by the Admin User, They can enter in to the Application and
perform their specified tasks defined onto them.
AUTHORIZATION MODULE
• This Module plays a vital role in the application. The Authorization
Module is accessed by the Application Admin User only.
• This module facilitates the Admin User to Grant the Access
Permissions to the Enrolled Members of the Application. The
Authorization Module comprises of two parts; one is giving the
Authorization Permissions to the Publishers and another one is
granting the Access Permissions to the Users.
• Once the Enrolled Members are authorized by the Admin, they can
have the access permissions immediately to the application
resources and start their defined tasks in the application.
PUBLISHERS MODULE
• The Publishers Module is the Key Area of the Application. These
publishers are authorized by the Admin User. Once they got the
Access Permissions to the Application, the Publisher can perform
the following Tasks.
• A Publisher can Publish their content under any available Categories
by Upload the flat files
• Can get the List of Contents Already Published by them earlier
• Can get the List of New Queries Raises by the Users
• Can Send Replies to the User Queries
• And also two more facilities provided to the publishers are
– Changing their Current Password
– Changing their User Profile.
USERS MODULE
Publishers, Users also have to be authorized by the Admin User
to get the Access to the Application Resources. Once a User is
Authorized, they can logged into the Application and perform their
regular tasks defined on to them. The User can carry out the
following functionalities:
 A User can get the List of Queries sent by them already
 A facility of composing their queries in any available category and
submit them to the Publisher.
USER MODULE
 While submitting queries, the application implements the same
mechanism which is implemented in Publisher module while
uploading the content by the Publishers.
 If a user tries to compose and send a query to the publisher, the
Application compares the newly composed query with the Queries
already sent and available in the Application’s Data Bank.
 Once the new query is found in the Data Bank, it searches for the
corresponding Answer for that. If Answer is available for that, it
displays the available answer immediately.
 Otherwise User has to wait for the reply which will be sent by
Publisher. Once the Publisher replies answers to the queries, then
User can get that list along with the Query and the replies from the
Publishers.
MIS REPORT
The MIS Module provides various Managerial Reports to the
Top-Level Management for their analysis purpose. Indirectly in
improves the quality of the services provided by the management.
Some of the MIS Reports;
• List of Available Categories
• List of Publishers
• List of Users
• List of Queries not yet replied
• List of Category wise Queries
• List of Uploaded Contents
HARDWARE REQUIREMENTS
• System : Pentium IV 3.5 GHz.
• Hard Disk : 40 GB.
• Monitor : 15 VGA Colour.
• Ram : 1 GB.
SOFTWARE REQUIREMENTS:
• Operating system : Windows 7/8.
• Coding Language : ASP.NET,C#.NET
• Tool : Visual Studio 2013
• Database : SQL Server 2014
THANKING YOU

More Related Content

Similar to G filter ppt

Teacher training material
Teacher training materialTeacher training material
Teacher training materialVikram Parmar
 
A Project to Automate Inventory Management in a Fast Food, Cas.docx
A Project to Automate Inventory Management in a Fast Food, Cas.docxA Project to Automate Inventory Management in a Fast Food, Cas.docx
A Project to Automate Inventory Management in a Fast Food, Cas.docxransayo
 
Oracle R12 Apps – Short Notes on Request Group and Request Set
Oracle R12 Apps – Short Notes on Request Group and Request SetOracle R12 Apps – Short Notes on Request Group and Request Set
Oracle R12 Apps – Short Notes on Request Group and Request SetBoopathy CS
 
Test Automation Framework An Insight into Some Popular Automation Frameworks.pdf
Test Automation Framework An Insight into Some Popular Automation Frameworks.pdfTest Automation Framework An Insight into Some Popular Automation Frameworks.pdf
Test Automation Framework An Insight into Some Popular Automation Frameworks.pdfSerena Gray
 
Personalized Re-Ranking of Documents
Personalized Re-Ranking of DocumentsPersonalized Re-Ranking of Documents
Personalized Re-Ranking of Documentskswapna9
 
PS02CINT22 SE Software Maintenance
PS02CINT22 SE Software MaintenancePS02CINT22 SE Software Maintenance
PS02CINT22 SE Software MaintenanceConestoga Collage
 
Software Engineering Assignment
Software Engineering AssignmentSoftware Engineering Assignment
Software Engineering AssignmentSohaib Latif
 
DISE - Introduction to Software Engineering
DISE - Introduction to Software EngineeringDISE - Introduction to Software Engineering
DISE - Introduction to Software EngineeringRasan Samarasinghe
 
Software development life cycle (SDLC)
Software development life cycle (SDLC)Software development life cycle (SDLC)
Software development life cycle (SDLC)Simran Kaur
 
java mini project for college students
java mini project for college students java mini project for college students
java mini project for college students SWETALEENA2
 
StartUP – Intelligent training needs assessment and Open Educational Resource...
StartUP – Intelligent training needs assessment and Open Educational Resource...StartUP – Intelligent training needs assessment and Open Educational Resource...
StartUP – Intelligent training needs assessment and Open Educational Resource...The Open Education Consortium
 
System development
System developmentSystem development
System developmentPraveen Minz
 
Alveena Assignment.docx
Alveena Assignment.docxAlveena Assignment.docx
Alveena Assignment.docxShahZaman82
 

Similar to G filter ppt (20)

Teacher training material
Teacher training materialTeacher training material
Teacher training material
 
A Project to Automate Inventory Management in a Fast Food, Cas.docx
A Project to Automate Inventory Management in a Fast Food, Cas.docxA Project to Automate Inventory Management in a Fast Food, Cas.docx
A Project to Automate Inventory Management in a Fast Food, Cas.docx
 
Oracle R12 Apps – Short Notes on Request Group and Request Set
Oracle R12 Apps – Short Notes on Request Group and Request SetOracle R12 Apps – Short Notes on Request Group and Request Set
Oracle R12 Apps – Short Notes on Request Group and Request Set
 
Colleges.net
Colleges.netColleges.net
Colleges.net
 
Test Automation Framework An Insight into Some Popular Automation Frameworks.pdf
Test Automation Framework An Insight into Some Popular Automation Frameworks.pdfTest Automation Framework An Insight into Some Popular Automation Frameworks.pdf
Test Automation Framework An Insight into Some Popular Automation Frameworks.pdf
 
system development life cycle
system development life cyclesystem development life cycle
system development life cycle
 
Pawan111
Pawan111Pawan111
Pawan111
 
Prototype Model
Prototype ModelPrototype Model
Prototype Model
 
Personalized Re-Ranking of Documents
Personalized Re-Ranking of DocumentsPersonalized Re-Ranking of Documents
Personalized Re-Ranking of Documents
 
PS02CINT22 SE Software Maintenance
PS02CINT22 SE Software MaintenancePS02CINT22 SE Software Maintenance
PS02CINT22 SE Software Maintenance
 
Software Engineering Assignment
Software Engineering AssignmentSoftware Engineering Assignment
Software Engineering Assignment
 
DISE - Introduction to Software Engineering
DISE - Introduction to Software EngineeringDISE - Introduction to Software Engineering
DISE - Introduction to Software Engineering
 
software engineering
software engineering software engineering
software engineering
 
Bug Tracking Java Project
Bug Tracking Java ProjectBug Tracking Java Project
Bug Tracking Java Project
 
Software development life cycle (SDLC)
Software development life cycle (SDLC)Software development life cycle (SDLC)
Software development life cycle (SDLC)
 
java mini project for college students
java mini project for college students java mini project for college students
java mini project for college students
 
Promostat original
Promostat originalPromostat original
Promostat original
 
StartUP – Intelligent training needs assessment and Open Educational Resource...
StartUP – Intelligent training needs assessment and Open Educational Resource...StartUP – Intelligent training needs assessment and Open Educational Resource...
StartUP – Intelligent training needs assessment and Open Educational Resource...
 
System development
System developmentSystem development
System development
 
Alveena Assignment.docx
Alveena Assignment.docxAlveena Assignment.docx
Alveena Assignment.docx
 

Recently uploaded

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 

Recently uploaded (20)

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 

G filter ppt

  • 1. GFilter: A General Gram Filter for String Similarity Search
  • 2. ABSTRACT • Numerous applications such as data integration, protein detection, and article copy detection share a similar core problem: given a string as the query, how to efficiently find all the similar answers from a large scalestring collection. • Many existing methods adopt a prefix-filter-based framework to solve this problem, and a number of recent works aim to use advanced filters to improve the overall search performance.
  • 3. ABSTRACT • In this paper we propose a gram-based framework to achieve near maximum filter performance. The main idea is to judiciously choose the high-quality grams as the prefix of query according to their estimated ability to filter candidates. • As this selection process is proved to be NP-hard problem, we give a cost model to measure the filter ability of grams and develop efficient heuristic algorithms to find high-quality grams. • Extensive experiments on real datasets demonstrate the superiority of the proposedframework in comparison with the state-of-art approaches.
  • 4. EXISTING SYSTEM • Existing literatures usually adopt a gram-based filter-and verification framework for string similarity search. • After the query string is decomposed into grams, an efficient and critical type of filter can utilize the grams and inverted index (from grams to strings) of the string collection togenerate candidates. • Other advanced filters can be also used after this elementary type of filter. Then in verification step, the similarity function is evaluated for the surviving candidates to produce the final results.
  • 5. PROPOSED SYSTEM • We develop a general gram filter. This generalization provides a chance to select optimized combination of grams from q-gram set of a query. • Theoretical analysis is conducted on the complexity of the problem, which is NP-hard. We present a choose-and-extend framework to efficiently find the high quality grams in the query process. Under this framework, different strategies can be extended.
  • 6. PROPOSED SYSTEM The Main Modules Identified in this project are: •Admin Module •Master Data Module •Members Registration Module •Authorization Module •Publishers Module •Users Module •MIS Reports Module
  • 7. ADMIN MODULE • The Admin Module is responsible for maintaining the application like Creating the necessary Master Data for the Application, Granting possible Access Permissions to the Application Users, and Generation of MIS (Management Information System) etc.
  • 8. MASTER DATA MODULE • The Master Data Module is accessed by Application Admin User Only. This Module is responsible for creating necessary Master Data for the Application. • The Master Data Indentified in this Module is Creation of Categories which allows Publishers to Publish their content under these categories. By introducing this module, Various MIS reports can be generated according to the managerial requirement for their analysis purpose .
  • 9. MEMBERS REGISTRATION MODULE • This Module is available online for the general public on the Home Page itself. The general public is categorized into two Member Groups. One Member Group is Publisher and other Group is Users Group. • By using the Registration Module, these member groups are get registered by giving the necessary details about themselves. • Once they complete their registration phase, they have to be Authorized by the Application Admin User. Till then they cannot have the access to the Application. Once the Member is authorized by the Admin User, They can enter in to the Application and perform their specified tasks defined onto them.
  • 10. AUTHORIZATION MODULE • This Module plays a vital role in the application. The Authorization Module is accessed by the Application Admin User only. • This module facilitates the Admin User to Grant the Access Permissions to the Enrolled Members of the Application. The Authorization Module comprises of two parts; one is giving the Authorization Permissions to the Publishers and another one is granting the Access Permissions to the Users. • Once the Enrolled Members are authorized by the Admin, they can have the access permissions immediately to the application resources and start their defined tasks in the application.
  • 11. PUBLISHERS MODULE • The Publishers Module is the Key Area of the Application. These publishers are authorized by the Admin User. Once they got the Access Permissions to the Application, the Publisher can perform the following Tasks. • A Publisher can Publish their content under any available Categories by Upload the flat files • Can get the List of Contents Already Published by them earlier • Can get the List of New Queries Raises by the Users • Can Send Replies to the User Queries • And also two more facilities provided to the publishers are – Changing their Current Password – Changing their User Profile.
  • 12. USERS MODULE Publishers, Users also have to be authorized by the Admin User to get the Access to the Application Resources. Once a User is Authorized, they can logged into the Application and perform their regular tasks defined on to them. The User can carry out the following functionalities:  A User can get the List of Queries sent by them already  A facility of composing their queries in any available category and submit them to the Publisher.
  • 13. USER MODULE  While submitting queries, the application implements the same mechanism which is implemented in Publisher module while uploading the content by the Publishers.  If a user tries to compose and send a query to the publisher, the Application compares the newly composed query with the Queries already sent and available in the Application’s Data Bank.  Once the new query is found in the Data Bank, it searches for the corresponding Answer for that. If Answer is available for that, it displays the available answer immediately.  Otherwise User has to wait for the reply which will be sent by Publisher. Once the Publisher replies answers to the queries, then User can get that list along with the Query and the replies from the Publishers.
  • 14. MIS REPORT The MIS Module provides various Managerial Reports to the Top-Level Management for their analysis purpose. Indirectly in improves the quality of the services provided by the management. Some of the MIS Reports; • List of Available Categories • List of Publishers • List of Users • List of Queries not yet replied • List of Category wise Queries • List of Uploaded Contents
  • 15. HARDWARE REQUIREMENTS • System : Pentium IV 3.5 GHz. • Hard Disk : 40 GB. • Monitor : 15 VGA Colour. • Ram : 1 GB.
  • 16. SOFTWARE REQUIREMENTS: • Operating system : Windows 7/8. • Coding Language : ASP.NET,C#.NET • Tool : Visual Studio 2013 • Database : SQL Server 2014