Your SlideShare is downloading. ×
0
HMM-based Artificial Designer for Search Interface Segmentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HMM-based Artificial Designer for Search Interface Segmentation

44

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
44
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. HMM-based Artificial Designer for Search Interface Segmentation Ritu Khare, Yuan An, Il-Yeol Song ACCESSING THE DEEP WEB HMM: ARTIFICIAL DESIGNER Deep Web: Data that exist on the Web but are not returned by search engines through traditional crawling and indexing. An HMM (Hidden Markov Model) can act like a human designer who has the ability to design an interface using acquired knowledge and to determine (decode) the segment boundaries and semantic labels of components. Accessing Deep Web contents: The primary way to access this data (by manually filling up HTML forms on search interfaces ) is not scalable. Hence, more sophisticated solutions, such as designing meta-search engines or creating dynamic page repositories, are required. A pre-requisite to these solutions is an understanding of the search interfaces. Interface Segmentation is an important portion of the problem of search interface understanding. INTERFACE SEGMENTATION RESULTS 0.3 0.15 Knowledge of Semantic Labels TextTrivial 0.23 0.21 Segments & Tagged Components DESIGNING 2-Layered HMM Search Interface 0.21 0.59 Operand Bag of Components 0.44 0.16 Fig 2. Simulating a Human Designer using HMMs Attributename 0.54 0.08 0.89 0.09 Operator DECODING The designing process is similar to statistically choosing one component from a bag of components (a superset of all possible components) and placing it on the interface while keeping the semantic role (attribute-name, operand, or operator) of the component in mind. See Figure 2. Fig 4. Learnt Topology of semantic labels Semantic Label Accuracy Segment /Logical Attribute 86.05 86 05 % Marker Range: Operator 85.10 % between Operand 98.60 % Attribute-name 90.11 % and e.g., between “D19Mit32” and “Tbx10” cM Position: between e.g., “10.0 -40.0” Fig 1. Segmented Interface (segments marked by dotted lines) While a user is naturally trained to perform g , g segmentation, a machine is unable to “see” a segment due to the following reasons: 1. The components that are visually close to each other might be located very far apart in the HTML source code. 2. A machine does not implicitly have any search experience that can be leveraged to identify a segment ‘ b t ‘s boundary. d Research Question: How can we make a machine learn how to segment an interface? 2-LAYERED HMM APPROACH The problem of decoding is two-folded: 1) Segmentation, 2) Assignment of semantic labels to components. Hence, a 2-layered HMM is employed as shown in Figure 3. The first layer T-HMM tags each component with appropriate semantic labels (attributeg p pp p ( name, operator, and operand). The second layer S-HMM segments the interface into logical attributes. HTML coded Interfaces T-HMM Training Interfaces Manually y Tagged Sequences S-HMM Manually y Segmented Interfaces Fig 3. 2-Layered HMM Architecture EXPERIMENTATION Data Set 200 interfaces from Biology Domain Parsing DOM-trees of components Training Maximum Likelihood Method Testing Viterbi Algorithm Segmented and Tagged Interfaces CONTRIBUTIONS 1 This approach outperforms LEX a contemporary 1. LEX, heuristic-based method, and achieves a 10% improvement in segmentation accuracy. 2. This is the first work to apply HMMs on deep Web search interfaces. HMMs helped in incorporating the first-hand knowledge of the designer to perform interface understanding. FUTURE WORK 1. To recover the schema of deep Web databases by extraction of finer details such as data type and constraints of logical attribute. 2. To test this approach on interfaces from other domains, given the diverse domain distribution of the deep Web 3. To investigate the use of the use of Baum Welch training algorithm to minimize the degree of automation .

×