• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Mining sequential patterns matching over high utility data sets
 

Mining sequential patterns matching over high utility data sets

on

  • 469 views

Mining sequential patterns matching over high utility data sets

Mining sequential patterns matching over high utility data sets

Statistics

Views

Total Views
469
Views on SlideShare
469
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Mining sequential patterns matching over high utility data sets Mining sequential patterns matching over high utility data sets Document Transcript

    • Base Paper Title: Record Matching over Query Results From Multiple Web DatabasesModified Title: Mining sequential patterns matching over high utility data sets.Abstract: Record matching, which identifies the records that represent the samereal-world entity, is an important step for data integration. Most state-of-the-art record matching methods are supervised, which requires the user toprovide training data. These methods are not applicable for the Webdatabase scenario, where the records to match are query resultsdynamically generated onthe- fly. Such records are query-dependent and aprelearned method using training examples from previous query resultsmay fail on the results of a new query. To address the problem of record Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
    • matching in the Web database scenario, we present an unsupervised,onlinerecord matching method, UDD, which, for a given query, can effectivelyidentify duplicates from the query result records of multiple Webdatabases. After removal of the same-source duplicates, the “presumed”non duplicate records from the same source can be used as trainingexamples alleviating the burden of users having to manually label trainingexamples. Starting from the non duplicate set, we use two cooperatingclassifiers, a weighted component similarity summing classifier and an SVMclassifier, to iteratively identify duplicates in the query results frommultiple Web databases. Experimental results show that UDD works wellfor the Web database scenario where existing supervised methods do notapply.Existing System: Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
    • • Relational database systems • All web data base (unknown user are easy to destroy the data base)Proposed System: • False data can discover the actions when unauthorized users attempted to access computer systems or authorized users attempted to misuse their privileges. • Association rule mining • An algorithm based on sequential pattern mining using the same data collected by the Databases. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
    • Our Proposed Work apart from Base paper:Sequential pattern mining a. Apriori-like methods(gsp) b. Pattern-growth methods(Free Span, Prefix Span)Hardware SpecificationProcessor Type : Pentium -IIISpeed : 1.6 GHZRam : 128 MB RAM Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
    • Hard disk : 8 GB HDSoftware SpecificationOperating System : Linux / WindowsProgramming Package : JAVATools : Eclipse, Weka Data Mining Tools.Data Base : MySQLSDK : JDK1.5.0Algorithm: • Association rule mining o Find large item sets for a given minsup, and o Compute rules for a given minconf based on the item sets obtained before. • Sequential pattern mining Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
    • • UDD Algorithm • component weight assignment algorithmModules: 1. Analysis and design of Data sets /items: 2. Data preprocessing 3. sequential pattern mining 4. Record matching with web data base 5. Performance analysis Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com