User friendly pattern search paradigm

  • 331 views
Uploaded on



Low Price

Contact :9840442542

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
331
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
10
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • In this use case diagram the user who is going to search the patent will be login in to the patent search page. After validating patent search is takes places .the actors are user and the patent database. The user first login with the patent search page then he will type the search keyword. Then the patentDb will be called for finding the patent partitions.
  • This class diagram shows the classes of our project. In this login is used to verify the username and password. Then patent search class is used for getting the keyword from the user. Patent partition class is for finding the partitions that are stored in the patent db. The query process for aggregating the answers from the database. using the ranking method result we shown to the user.
     
  • The object diagram shown here is for showing the objects that are used in our project. The login object is for doing the login with the patent search page. And the topic based search is for getting the patents with the related topics. Patent partition is getting the partitioned patent form the patent database. Query processing object for process the answers from the database and for ranks the results.
  • The state diagram shown here is for our project. It denotes every states of execution of our project. First the user getting login with the patent search page. Then he will be typing the keyword of the patent he is searching the patent query is matched with the patent partition which has already having in the patent database. Finally the processed query will show the top answers to the user.
  • The activity diagram shown is for our project. This is used for showing the workflow of our patent search project. First the user is login with our page. Then he will type the patent keyword that for which he is searching. Patent partitions are for retrieve the regarding patent from the database. Query processing for process the answers from the database and for ranks the results.
  • The sequence diagram is for showing the each and every sequence of our project. First the user doing login then he will type the patent which he is searching in our search page. The keyword is error corrected if is there any error in the sense. Corresponding patents are searched in the patent partitions which are stored in the patent database. Then the regarding answers will be processed and the corresponding top answers will be shown to the user.
  • Collaboration diagram is for showing the communication between the objects that are all used in our project. Here user communicating with the login page entering into the patent search page. After logged in user types the patent keyword. Then the keyword will be checked with the patent partitions. That partition will be got from the patent database. Then the relevant patents will processed by the query processing method. Finally the top patent answers will be shown to the user
  • Component diagram is used here for showing the components which are used in our project. User component is for showing the user and the login for user login. Patent search page is for searching the patent using the keyword. Patent partition is for the partitions that are stored in the database. Query process is for producing the corresponding answer to the user.
  • In this architecture user enters the patent search page using the login. If he is a new user he is registering with the page. The user enters the patent search keyword then the key words will error corrected if is there any error. And keyword got expanded regarding with the search term. Next Patent will searched in the patent partitions that are stored in patent Database. The results will group and the top answers for the searched patent will be displayed to the user.
  • In the dataflow diagram 0 the initial process of executions is takes places. First the user will login if he is having a login id otherwise He has to register with our page. Then after getting logged in he will be seeing the patent search Interface for finding the patents using the keyword that he have regarding with that patent. Then he will type the search keyword to corresponding search place.
     
  • The patent keyword which has to search will be typed by the user then the patent keyword will be error corrected if is there any error in the keyword. Then keyword will be searched using the topic based search format. The keyword will find the partition from the database topic related. Using the query expansion technique, the word which has typed will be expanded with corresponding related keyword.
     
  • The patent keyword will searched with the corresponding partitions from the patent database. It will be find using the indexes that are stored in the patent database. then the query keyword will be processed from the database the result will be found then it will ranked using the ranking method for giving the top answers to the user.
  • In this ER diagram the database part which has used for project is shown here. For the user login we will be having login details and while searching with the patents the corresponding patent partition details will be retrieved using the patent index. All things corresponding with these are stored with the patent database. That database table will handle all these entities.

Transcript

  • 1. A User-friendly Patent Search Paradigm
  • 2. INTRODUCTION Patents play a very important role in intellectual property protection. As patent search can help the patent examiners to find previously published relevant patents and validate or invalidate new patent applications, it has become more and more popular, and recently attracts much attention from both industrial and academic communities. For example, there are many online systems to support patent search, such as Google patent search, Derwent Innovations Index (DII), and USPTO. As most patent-search users have limited knowledge about the underlying patents, they have to use a try-and see approach to repeatedly issue queries and check answers, which is a very tedious process.
  • 3. ABSTRACT As most patent-search users have limited knowledge about the underlying patents, they have to use a try-and see approach to repeatedly issue queries and check answers, which is a very tedious process. To overcome this, our proposed system introduces the efficient patent search paradigm. Our project can help users find relevant patents more easily and improve user search experience. To overcome the typing error problem in existing system our project introduces the error correction technique. Our project proposes three effective techniques, error correction, Topic-based query suggestion, and query expansion, to improve the usability of patent search. For improving efficiency partition the patents into small partitions based to their topics and classes. Then given a query and find highly relevant partitions and answer the query in each of such highly relevant partitions. Finally combine the answers of each partition and generate top answers of the patent-search query.
  • 4. SCOPE OF THE PROJECT: In this project we improve the search efficiency. And we provide the more suggestions for user to check the patents. We correct the errors in the search keywords using the query correction methods.
  • 5. LITERATURE SURVEY: Title: Improving Retrievability of Patents in Prior-Art Search Authors: S. Bashir and A. Rauber Year: 2010 Description Prior-art search is an important task in patent retrieval. The success of this task relies upon the selection of relevant search queries. Typically terms for prior-art queries are extracted from the claim fields of query patents. However, due to the complex technical structure of patents, and presence of terms mismatch and vague terms, selecting relevant terms for queries is a difficult task. During evaluating the patents retrievability coverage of prior-art queries generated from query patents, a large bias toward a subset of the collection is experienced. A large number of patents either have a very low retrievability score or cannot be discovered via any query. To increase the retrievability of patents, in this paper we expand prior-art queries generated from query patents using query expansion with pseudo relevance feedback. Missing terms from query patents are discovered from feedback patents, and better patents for relevance feedback are identified using a novel approach for checking their similarity with query patents. We specifically focus on how to automatically select better terms from query patents based on their proximity distribution with prior-art queries that are used as features for computing similarity. Our results show, that the coverage of prior-art queries can be increased significantly by incorporating relevant queries terms using query expansion.
  • 6. Title: Latent dirichlet allocation Authors: D. M. Blei, A. Y. Ng, and M. I. Jordan Year: 2003 Description We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model
  • 7. Title: Suggesting Topic-Based Query Terms as You Type Authors: J. Fan, H. Wu, G. Li, and L. Zhou Year: 2010 Description Query term suggestion that interactively expands the queries is an indispensable technique to help users formulate high-quality queries and has attracted much attention in the community of web search. Existing methods usually suggest terms based on statistics in documents as well as query logs and external dictionaries, and they neglect the fact that the topic information is very crucial because it helps retrieve topically relevant documents. To give users gratification, we propose a novel term suggestion method: as the user types in queries letter by letter, we suggest the terms that are topically coherent with the query and could retrieve relevant documents instantly. For effectively suggesting highly relevant terms, we propose a generative model by incorporating the topical coherence of terms. The model learns the topics from the underlying documents based on Latent Dirichlet Allocation (LDA). For achieving the goal of instant query suggestion, we use a trie structure to index and access terms. We devise an efficient top-k algorithm to suggest terms as users type in queries. Experimental results show that our approach not only improves the effectiveness of term suggestion, but also achieves better efficiency and scalability.
  • 8. Title: Ranking structured documents: a large margin based approach for patent prior art search Authors: Y. Guo and C. P. Gomes Year: 2009 Description We propose an approach for automatically ranking structured documents applied to patent prior art search. Our model, SVM Patent Ranking (SVMPR) incorporates margin constraints that directly capture the specificities of patent citation ranking. Our approach combines patent domain knowledge features with meta-score features from several different general Information Retrieval methods. The training algorithm is an extension of the Pegasos algorithm with performance guarantees, effectively handling hundreds of thousands of patent-pair judgments in a high dimensional feature space. Experiments on a homogeneous essential wireless patent dataset show that SVMPRperforms on average 30%-40% better than many other state-of-the-art general-purpose Information Retrieval methods in terms of the NDCG measure at different cut-off positions.
  • 9. Title: Efficient interactive fuzzy keyword search Authors: S. Ji, G. Li, C. Li, and J. Feng Year: 2009 Description Traditional information systems return answers after a user submits a complete query. Users often feel "left in the dark" when they have limited knowledge about the underlying data, and have to use a try-and-see approach for finding information. A recent trend of supporting auto complete in these systems is a first step towards solving this problem. In this paper, we study a new information-access paradigm, called "interactive, fuzzy search," in which the system searches the underlying data "on the fly" as the user types in query keywords. It extends auto complete interfaces by (1) allowing keywords to appear in multiple attributes (in an arbitrary order) of the underlying data; and (2) finding relevant records that have keywords matching query keywords approximately. This framework allows users to explore data as they type, even in the presence of minor errors. We study research challenges in this framework for large amounts of data. Since each keystroke of the user could invoke a query on the backend, we need efficient algorithms to process each query within milliseconds. We develop various incremental- search algorithms using previously computed and cached results in order to achieve an interactive speed. We have deployed several real prototypes using these techniques. One of them has been deployed to support interactive search on the UC Irvine people directory, which has been used regularly and well received by users due to its friendly interface and high efficiency.
  • 10. Title: Efficient Merging and Filtering Algorithms for Approximate String Searches Authors: C. Li, J. Lu, and Y. Lu Year: 2008 Description We study the following problem: how to efficiently find in a collection of strings those similar to a given query string? Various similarity functions can be used, such as edit distance, Jaccard similarity, and cosine similarity. This problem is of great interests to a variety of applications that need a high real-time performance, such as data cleaning, query relaxation, and spellchecking. Several algorithms have been proposed based on the idea of merging inverted lists of grams generated from the strings. In this paper we make two contributions. First, we develop several algorithms that can greatly improve the performance of existing algorithms. Second, we study how to integrate existing filtering techniques with these algorithms, and show that they should be used together judiciously, since the way to do the integration can greatly affects the performance. We have conducted experiments on several real data sets to evaluate the proposed techniques.
  • 11. Title: Supporting Search-As-You-Type Using SQL in Databases Authors: G. Li, J. Feng, and C. Li Year: 2011 Description A search-as-you-type system computes answers on-the-fly as a user types in a keyword query letter by letter. We study how to support search-as-you-type on data residing in a relational DBMS. We focus on how to support this type of search using the native database language, SQL. A main challenge is how to leverage existing database functionalities to meet the high-performance requirement to achieve an interactive speed. We study how to use auxiliary indexes stored as tables to increase search performance. We present solutions for both single-keyword queries and multi-keyword queries, and develop novel techniques for fuzzy search using SQL by allowing mismatches between query keywords and answers. We present techniques to answer first-N queries and discuss how to support updates efficiently. Experiments on large, real data sets show that our techniques enable DBMS systems on a commodity computer to support search-as-you-type on tables with millions of records.
  • 12. Title: Efficient fuzzy full-text type-ahead search Authors: G. Li, S. Ji, C. Li, and J. Feng Year: 2011 Description Traditional information systems return answers after a user submits a complete query. Users often feel "left in the dark" when they have limited knowledge about the underlying data and have to use a try-and-see approach for finding information. A recent trend of supporting auto complete in these systems is a first step toward solving this problem. In this paper, we study a new information-access paradigm, called "type- ahead search" in which the system searches the underlying data "on the fly" as the user types in query keywords. It extends auto complete interfaces by allowing keywords to appear at different places in the underlying data. This framework allows users to explore data as they type, even in the presence of minor errors. We study research challenges in this framework for large amounts of data. Since each keystroke of the user could invoke a query on the backend, we need efficient algorithms to process each query within milliseconds. We develop various incremental-search algorithms for both single-keyword queries and multi-keyword queries, using previously computed and cached results in order to achieve a high interactive speed. We develop novel techniques to support fuzzy search by allowing mismatches between query keywords and answers. We have deployed several real prototypes using these techniques. One of them has been deployed to support type-ahead search on the UC Irvine people directory, which has been used regularly and well received by users due to its friendly interface and high efficiency.
  • 13. Title: EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data Authors: G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou Year: 2008 Description Conventional keyword search engines are restricted to a given data model and cannot easily adapt to unstructured, semi-structured or structured data. In this paper, we propose an efficient and adaptive keyword search method, called EASE, for indexing and querying large collections of heterogeneous data. To achieve high efficiency in processing keyword queries, we first model unstructured, semi- structured and structured data as graphs, and then summarize the graphs and construct graph indices instead of using traditional inverted indices. We propose an extended inverted index to facilitate keyword-based search, and present a novel ranking mechanism for enhancing search effectiveness. We have conducted an extensive experimental study using real datasets, and the results show that EASE achieves both high search efficiency and high accuracy, and outperforms the existing approaches significantly
  • 14. Title: Simple vs. sophisticated approaches for patent prior-art search Authors: W. Magdy, P. Lopez, and G. J. F. Jones Year: 2011 Description Patent prior-art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-art in patent prior-art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided.
  • 15. Modules: 1. Login page 2. Client Search through query 2.1 Automatic Error correction 2.2 Topic based query suggestion 2.3 Query expansion 3. Ranking 4. Patent Partition selection 5. Query Processing
  • 16. Module Description 1. Login page Before client creation we check the user credential here by login page, we receive the username and password by the user and we will check in the database is that user have the credential or not to give request to the server. Here also we can add new user through user registration by taking all the important details like user’s name, gender, username, password, address, email id, phone no from the user. 2. Client Search through query In this module first we design the page for getting the user’s query then we will write the code in java file and through jsp file we will take the user’s query request to the semantic storage.
  • 17. 2.1 Automatic Error correction In the automatic error correction we are using trie structure to do efficient keyword correction and completion. We are considering the prefix of the query word .if it is not familiar with the trie node then we don’t want to consider that keyword. 2.2 Topic based query suggestion The topic based model is estimating the probability of the next query keyword. If a keyword in patents is more topically coherent with the previously typed query word it will be getting the higher score. 2.3 Query expansion In the query expansion we will be using the search engine for suggesting the relevant keyword. And we are using the relevant keywords from the query log for the expansion purpose.
  • 18. 3. Ranking In this module we are ranking the answers that are obtained for our query search by the probability of most relevant patent. We are finding the most relevant patent regarding with the patent search. 4. Patent Partition selection In this module we are selecting the partition regarding with our patent search using two relevancy .That is topic relevancy and keyword relevancy. Using these two relevancy we are finding the top relevant partitions. 5. Query Processing Query processing module is for find the top answers regarding with our search. In this process we are combining all the ranking and selected partitions for finding the top answer.
  • 19. Module Diagram 1. Login page User Login Page Database Patent search page
  • 20. 2. Client Search through query
  • 21. 2.1 Automatic Error correction User Typing Query Error Corrected Query
  • 22. 2.2 Topic based query suggestion
  • 23. 2.3 Query expansion
  • 24. 3. Ranking
  • 25. 4. Patent Partition selection
  • 26. 5. Query Processing
  • 27. GIVEN INPUT EXPECTED OUTPUT 1. Login page Input: User name and Password Output: Application transferred to the Patent search engine 2. Client Search through query Input: Enters the patent keyword which has to search Output: Query shown in search place 2.1 Automatic Error correction Input: Enters the patent which has to search Output: Error corrected Patent keyword 2.2 Topic based query suggestion Input: Enters the patent which has to search Output: Suggestions regarding with the topic 2.3 Query expansion Input: Enters the patent which has to search Output: Query keyword with relevant expanded format
  • 28. 3. Ranking Input: Enters the patent which has to search Output: : Patent will be selected using ranking 4. Patent Partition selection Input: Enters the patent which has to search Output: Partitions searched topic based and keyword based 5. Query Processing Input: Enters the patent which has to search Output: Aggregated And Ranked top answers
  • 29. SYSTEM REQUIREMENTS HARDWARE PROCESSOR : PENTIUM IV 2.6 GHz, Intel Core 2 Duo. RAM : 512 MB DD RAM MONITOR : 15” COLOR HARD DISK : 40 GB CDDRIVE : LG 52X SOFTWARE Front End : JSP Back End : MS SQL 2000/05 Operating System : Windows XP/07 IDE : Net Beans, Eclipse
  • 30. TECHNIQUE USED 1. Automatic Error Correction 2. Topic-based Query Suggestion 3. Query Expansion
  • 31. Automatic Error Correction As query keywords that users have typed in may have typos, traditional methods will return no answer as they cannot find answers that contain the query keywords. Obviously this method is not user-friendly. Instead, it is better to correct the typos, recommend users similar keywords, and return the answers of the similar keywords. To quantify the similarity between keywords, existing methods usually adopt edit distance. The edit distance between two keywords is the minimum number of edit operations (i.e., insertion, deletion, and substitution) of single characters needed to transform the first one to the second. For example, the edit distance of “patent” and “paitant” is 2. Two keywords are said to be similar if their edit distance is within a given threshold. There are some recent studies on efficient error correction, which use a filter-and-refine framework to find similar keywords of a query keyword. The method first uses the filter step to find a subset of keywords which may be potentially similar to the query keyword. Then it uses a verification step to remove those false positives and get the final similar keywords. Although we can use these methods to efficiently suggest keywords for complete keywords, they cannot support prefix keyword the user is completing. To address this problem, we can use the trie structure to do efficient keyword correction and completion. Using the trie structure, even users type in a partial keyword, we can also efficiently suggest relevant accurate keywords. The basic idea is that if a prefix is not similar enough to a trie node, then we do not need to consider the keywords under the trie node. We can use this observation to efficiently suggest similar keywords.
  • 32. Topic based Query Suggestion We devise a novel model for effectively suggesting keywords as user’s type in queries letter by letter. The basic idea of our method is to use the topic model to estimate the probability of the next query keyword. Intuitively, if a keyword in patents is more topically coherent with the previously typed query keywords, it would obtain a higher score. Specifically, we can focus on estimating two important probabilities: the probability of a keyword conditioned on topics, and the probability of sampling a keyword from a patent. Both of the two probabilities are used to estimate the score of each keyword. An LDA model can be utilized to learn the keyword distribution over each topic from the underlying patents. LDA can be classified as a soft-clustering technique which allows a keyword to appear in multiple topics and takes into account the degree of a keyword belonging to each topic. The keyword distribution over a set of patents is learnt by using a language model. The language model approach can capture the property of the patents and predict the likelihood of sampling a specific keyword. Thus we can combine the two probabilities and use the topic-based method to suggest relevant keywords.
  • 33. Query Expansion In many cases, users cannot understand the underlying data precisely. In this way, they may type in ambiguous keywords or inaccurate keywords. In addition, the same concept may have different representations. To this end, we can use Word Net to expand a keyword. If the query word is indexed by Word Net, we can easily get the relevant keywords of the query keyword using an inverted list structure. However Word Net is artificially generated for common words. If the query keywords are not in Word Net, we cannot recommend relevant keywords. To address this problem, we have two solutions. The first one is to utilize search engines, since most search engines will suggest relevant keywords as user’s type in queries. We can issue the patent query to search engines and get the relevant keywords from the search engines, such as Google. The second way is to mine the relevant keywords from the query logs. To this end, we use the click through data to mine the correlated queries as follows. For two queries, if users click the same returned result (patent), they are potentially relevant. We utilize this property to mine relevant queries. For two queries, we use the number of times user clicked on the same patent to denote their relevance. If a keyword pair with their co-occurrence is larger than a given threshold, the two keywords are relevant and we use them to do query expansion.
  • 34. SYSTEM DESIGN USECASE DIAGRAM Login User Patent search Ok Patent Partitions QueryProcess Patent DB Top answer
  • 35. CLASS DIAGRAM
  • 36. OBJECT DIAGRAM
  • 37. STATE DIAGRAM User Login Enters Keyword Errorcorrection Topic search Ok Verified Expansion Partitionselection Queryprocessing Topanswers
  • 38. ACTIVITY DIAGRAM
  • 39. SEQUENCE DIAGRAM
  • 40. COLLABORATION DIAGRAM
  • 41. SYSTEM ARCHITECTURE
  • 42. DATA FLOW DIAGRAM LEVEL 1
  • 43. DATA FLOW DIAGRAM LEVEL 2
  • 44. E-R Diagram
  • 45. FUTURE ENHANCEMENT In future, our proposed patent search paradigm will be implemented by connecting large number of database. This will increase the efficiency and search ability of patents with user friendly approach. Advantage 1. Keyword error correction 2.Partition based patent search 3. High search efficiency 4.Query suggestion and expansion Application 1. Google patent search 2 .Derwent Innovations Index (DII) 3. USPTO
  • 46. CONCLUSION In this paper, we proposed a new patent-search paradigm. We developed three effective techniques, error correction, topic-based query suggestion, and query expansion, to make patent search more user- friendly and improve user search experience. Error correlation can provide users accurate keywords and correct the typing errors. Topic-based query suggestion can suggest topically coherent keywords as user’s type in query keywords. Query expansion can suggest synonyms and those relevant keywords of query keywords which are in the same concept with query keywords. We proposed a partition-based method to improve the search performance. Experimental results show that our method achieves high efficiency and quality.
  • 47. REFERENCES [1] L. Azzopardi, W. Vanderbauwhede, and H. Joho. Search system requirements of patent analysts. In SIGIR, pages 775– 776, 2010. [2] S. Bashir and A. Rauber. Improving retrievability of patents in prior art search. In ECIR, pages 457–470, 2010. [3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993– 1022, 2003. [4] J. Fan, H. Wu, G. Li, and L. Zhou. Suggesting topic-based query terms as you type. In APWeb, pages 61–67, 2010. [5] Y. Guo and C. P. Gomes. Ranking structured documents: A large margin based approach for patent prior art search. In IJCAI, pages 1058–1064, 2009. [6] S. Ji, G. Li, C. Li, and J. Feng. Efficient interactive fuzzy keyword search. In WWW, pages 371–380, 2009.
  • 48. [7] L. S. Larkey. A patent search and classification system. In ACM DL, pages 179–187, 1999. [8] C. Li, J. Lu, and Y. Lu. Efficient merging and filtering algorithms for approximate string searches. In ICDE, pages 257–266, 2008. [9] G. Li, J. Feng, and C. Li. Supporting search-as-you- type using sql in databases. IEEE TKDE, 2011. [10] G. Li, S. Ji, C. Li, and J. Feng. Efficient fuzzy full- text type-ahead search. VLDB J., 20(4):617–640, 2011. [11] G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In SIGMOD Conference, pages 903–914, 2008. [12] W. Magdy, P. Lopez, and G. J. F. Jones. Simple vs. sophisticated approaches for patent prior-art search. In ECIR, pages 725–728, 2011.