Result Page Analysis (Cheng Wang)
Upcoming SlideShare
Loading in...5
×
 

Result Page Analysis (Cheng Wang)

on

  • 373 views

 

Statistics

Views

Total Views
373
Views on SlideShare
373
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Result Page Analysis (Cheng Wang) Result Page Analysis (Cheng Wang) Presentation Transcript

  • Cheng Wang
  • ²  A list of results decorated with ³  Ø Side bars ³  Ø Branding banners ³  Ø Advertisement ³  Ø Merchant Information ³  Ø Search forms ³  Ø Navigation part
  • ²  Data Area Identification²  Record Segmentation²  Data Alignment
  • ²  Visual Information ³  Ø ViDE, VIPER²  Ontology ³  Ø ODE²  HTML Page based ³  Ø FiVaTech²  Regular Expression ³  Ø EXALG, DELA
  • ²  Weifeng Su, Jiying Wang, Frederick H.Lochvsky. 2009.²  1: Domain ontology construction ³  Ø query interface ³  Ø query result pages²  2. Data Extraction using the ontology ³  Ø Identify data area ³  Ø Segments record ³  Ø Data Value alignment
  • ²  Multiple Query Result Page ³  Ø PADE
  • ²  1: Match query interface element to data values. Ø title=“%orientalism%”²  2. Search for voluntary labels in table headers.²  3. Search for voluntary labels encoded together with data values. ³  Ø ISBN No: 0814756654 ³  Ø ISBN No: 0789204592²  4. Data values formats ³  Ø 18/09/2008 : 20080918 ³  Ø 03/18/98 : 19980318
  • ²  1. Value level matching ³  Ø Data value similarity²  2. Label level matching ³  Ø Label co-occurrence²  3. Label-value matching ³  Ø Check assigned label ³  Ø Assign a suitable label for columns ³  Ø Matching conflict resolution
  • ²  1. Matching is unique ð create attribute²  2. Matching is 1:1 ð alias ³  Ø Category : Subject²  3. Matching is 1:n ð n+1 attributes ³  Ø Author: {Last Name, First Name}²  4. Matching is n:m ð n:1 + 1:m
  • ²  One result page ð One data area²  Maximum Entropy Model ³  Maximum Correlation Subtree Identification
  • ²  Ø 1 result²  Ø several results (CABABABAD) ³  Ø find continuous repeated patterns ³  Ø Visual gap
  • ²  Each data value is assigned a label Ø Maximum Entropy Model Ø Match with Ontology²  ØLabel ð Column
  • ²  Wei Liu, Xiaofeng Meng and Weiyi Meng. 2009.²  ViDRE: Data Record Extractor²  ViDIE: Data Item Extractor²  New measure: revision
  • ²  1. Build a Visual Block tree²  2. Extract data records ³  Ø Noise block filtering ³  Ø Blocks clustering ³  Ø Regroup blocks²  3. Partition data records into data items and alignment
  • ²  Mandatory data items²  Optional data items²  Static data items
  • ²  Simple one-pass clustering algorithm ³  Ø Take the first block from the list, use it to form a cluster. ³  Ø For each remaining blocks, compute similarities to existing clusters.
  • ²  ViDE assumes ³  1. blocks in the same cluster all come from different data records ³  2. the cluster which has maximum number n of blocks may contain the mandatory value of data records.
  • ²  Step 1: Rearranges blocks in each cluster.²  Step 2: A cluster with n blocks is used as seed. Initialize n groups, each contains one seed block.²  Step 3: For all blocks (in all clusters), determines which group it belongs.
  • ²  WDBt: total number of web databases processed²  WDBc: number of web databases whose precision and recall are both 100%
  • Root Data Area (LCA) Record Separator Record Separator Record£ £ £ £
  • ²  Real-estate domain²  60 agents’ websites ³  Ø MRP: 95.0% ³  Ø ERP: 90.0%
  • Root Data AreaRecord Record Record Record Record Record 1 1 2 2 3 3Part A Part B Part A Part B Part A Part B £ £ £
  • ²  DIADEM 0.1 : ³  Ø Construct Real-estate result page ontology ³  Ø Ontological Record Segmentation °  (More features) ³  Ø Data labeling and data alignment²  After: ³  Ø Add visual information