SlideShare a Scribd company logo
Sukanta Sinha, Rana Dattagupta, Debajyoti Mukhopadhyay / International Journal of
             Engineering Research and Applications (IJERA)    ISSN: 2248-9622
                www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.877-880


    Identify Web-page Content meaning using Knowledge based
                 System for Dual Meaning Words
       Sukanta Sinha1, 4, Rana Dattagupta2, Debajyoti Mukhopadhyay3, 4
                      1
                        (TATA Consultancy Services, Victoria Park, Kolkata 700091, India)
                        2
                          (Computer Sc. Dept., Jadavpur University, Kolkata 700032, India)
         3
           (Dept. of Information Technology, Maharashtra Institute of Technology, Pune 411038, India)
                     4
                      (WIDiCoReL, Green Tower C- 9/1, Golf Green, Kolkata 700095, India)


Abstract
                                                          holds only dual meaning words in their Web-page
          Meaning of Web-page content plays a big
                                                          content. To identify the meaning, we have created a
role while produced a search result from a search
                                                          knowledge based system by collecting various types
engine. Most of the cases Web-page meaning
                                                          of data patterns.
stored in title or meta-tag area but those
                                                            Our paper is not intended to provide a complete
meanings do not always match with Web-page
                                                          survey of techniques. According to our knowledge,
content. To overcome this situation we need to go
                                                          we have applied these techniques on few examples.
through the Web-page content to identify the
                                                          Now a day‟s research on search engine has been
Web-page meaning. In such cases, where Web-
                                                          carried out in universities and open laboratories,
page content holds dual meaning words that time
                                                          many dot-com companies. Unfortunately, many of
it is really difficult to identify the meaning of the
                                                          these techniques are used by dot-coms, and
Web-page. In this paper, we are introducing a
                                                          especially the resulting performance, are kept
new design and development mechanism of
                                                          private behind company walls, or are disclosed in
identifying the Web-page content meaning which
                                                          patents that can be comprehended and appreciate by
holds dual meaning words in their Web-page
                                                          the lawyers. Therefore, we believe that the overview
content.
                                                          of problems and techniques that we presented here
                                                          can be useful.
Keywords – Dual meaning word, Knowledge                     This paper discusses survey of the problem area in
based system, Search engine, Web-page content,            section 2. Section 3 discusses about the XML
Web resources                                             schema. Section 4 depicts the proposed approach.
                                                          Section 5 shows some experimental analyses.
1. Introduction                                           Finally, section 6 concludes the paper.
          Web search engine is a tool that produces
search results based on the user given query. World       2. The Problem Area
Wide Web (WWW) is a huge reservoir of Web-                          Web-page content meaning identification is
pages. Search engine crawler crawls down the Web-         an essential part of a search engine to produce
pages from WWW and creates a database of Web              relevant search result. Most of the cases we can get
resources for the search engine [1, 2].                   the Web-page content meaning from title or meta-
In the present era of Internet, WWW is an                 tag area of that Web-page content but they do not
accumulated and interactive medium for accessing          always match with the actual Web-page content. On
an enormous conglomeration of information [3].            the other hand, a few cases where Web-page content
The information in the Web-page content consists of       holding dual meaning words are really difficult to
diverse data types such as structured data, semi          identify the meaning of the Web-page content.
structured data and lack of structure of Web data,           In general, our main goal is to identify the Web-
etc. [4]. Few cases we also found holds dual              page content meaning which holds dual meaning
meaning words are exists in Web-page content.             words in their Web-page content. The example
Meaning identification of those Web-page contents         illustrates the difficulty to identify the meaning of a
which holds dual meaning words is a challenging           Web-page content, which can be overcome by using
task.                                                     our proposed system.
The dual meaning word means a word which
contains two meanings like „bank‟ represents              Example 1: John is looking for a bank to open a
„financial institute‟ as well as „river side‟. We need    savings account on the other hand Alex is looking
to identify the meaning based on the full sentence.       for a bank of the river for a get together. Here, both
In our approach, we have mainly focused on                the bank represents different meaning, one for
identifying the Web-page content meaning, which           financial institutes and other one for river side. If
                                                          both the sentence exists in different Web-page
                                                                                                 877 | P a g e
Sukanta Sinha, Rana Dattagupta, Debajyoti Mukhopadhyay / International Journal of
             Engineering Research and Applications (IJERA)    ISSN: 2248-9622
                www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.877-880

content then the meaning of the Web-page content           complex type element which holds similar types of
need to be retrieved based on their content.               key elements with their meaning. „names‟ is a
                                                           complex type element which holds key element
Example 2: Peter found a bank which located on the         names that represent same meaning. „name‟ and
bank of the river. This is a single sentence which         „meaning‟ are simple type element holds key values
represents financial institutions as well as river side.   and their meaning. Each XML holds a „dmw_id‟.
This time any one of the meanings is valid for the         We have maintained dual meaning word with a
sentence. In our approach, we assumed that one             corresponding „dmw_id‟. Key words are taken from
Web-page has only one meaning. Hence, for this             dual meaning word holding sentence. For example
type of situation we will assign any one meaning           “John is looking for a bank to open a savings
based on our programming logic.                            account” and “Alex is looking for a bank of the river
                                                           for a get together” holds „account‟, „river‟ key
3. XML Schema                                              words. All the key word meaning is taken care while
                                                           design the XML. In Fig.2 we have shown a part of
An XML Schema describes the structure of an XML
                                                           an XML for „bank‟.
document [5, 6]. The XML Schema language refers
to an XML Schema Definition (XSD). The purpose
of an XML Schema is to define the legal building
blocks of an XML document. An XML Schema
defines elements, attributes that can appear in a
document [7, 8]. It also expressed data types, default
and fixed values for elements and attributes. One of
the greatest strengths of XML Schemas is the
support for data types and written in XML. XML
Schemas are extensible because they are written in
XML.
   XML Schema holds simple and complex elements
[9, 10, 11]. A simple element is an XML element
that contains only text. It cannot contain any other
elements or attributes. A complex element is an
XML element that contains other elements and/or
attributes. There are four kinds of complex
elements; they are empty elements, elements that
contain only other elements, elements that contain
only text, elements that contain both other elements                     figure 1. A sample XSD
and text. The <schema> element is the root element
of every XML Schema. The <schema> element may
contain some attributes [12, 13, 14].

4. Proposed Approach
In our approach, we have proposed a mechanism
which identifies meaning of Web-page content for
those who holds dual meaning word in their Web-
page content. Section 4.1 explains an overview of
creating knowledge based system and section 4.2
depicts our algorithm.

4.1. Knowledge Based System Generation

To create a knowledge based system we have
collected dual meaning words from various sources
like internet, dictionary, etc. Now for each dual
                                                                  figure 2. A part of an XML (for bank)
meaning word, we have created one XML which
link with Fig.1 given XSD. The considered XSD              4.2. Algorithm
holds both simple and complex type of elements.
                                                           To identify Web-page content meaning we are using
   „dualMeaningWordName‟ attribute holds the dual
                                                           below given algorithm. This algorithm mainly
meaning word name. „keywords‟ is a complex
                                                           focused on identifying the Web-page content
element which holds various sets of keyword, which
                                                           meaning, which holds dual meaning words in their
classified based on their meaning. „keyword‟ also a

                                                                                                 878 | P a g e
Sukanta Sinha, Rana Dattagupta, Debajyoti Mukhopadhyay / International Journal of
             Engineering Research and Applications (IJERA)    ISSN: 2248-9622
                www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.877-880

Web-page content. In our approach, we have used a    proper meaning, XML schema, etc. Initially, we
knowledge based system for identifying the           have created the knowledge based system with the
meaning of dual meaning words. The knowledge         help of internet, dictionary. Then we have tuned the
based system stores the information in XML form.     knowledge based system through our experiments.
Input : Web-page content                             In our experiment, we have taken a Web-page from
Output :  Meaning   of   the           Web-page      our repository and pass it through our system and
content                                              check the database for the meaning of that Web-
                                                     page. If the Web-page holds dual meaning words
1. Extract dual meaning words from the
   Web-page content.                                 then the meaning will identified otherwise update
2. get count of dual meaning words in                'isDualMeaningFlag' as false.
   the Web-page content
3. if count = 0 then                                 5.2. Experimental Results
     set isDualMeaningFlag:=False and                        It is very difficult to compare our system
exit
                                                     with any existing system. Anyhow we have
4. if count = 1 then
   a) set isDualMeaningFlag:=True                    produced few data to measure our proposed system
   b) Extract key words in the dual                  performance. As a part of experimental results, we
       meaning word sentence                         have produced a statistic, which given in Table 1.
   c) Based on the key word traverse
       XML (knowledge based system) for                   Table1. Performance Report of Our System
       dual meaning word
   d) Retrieve the meaning of that key
                                                     No. of Web-page No. of Web-page Correct Meaning No. of Correct Meaning
       and store it in a temporary
                                                         Taken /        hold Dual    Identified in 1st XML       Identified after
       table.
                                                     Repository Size Meaning Words         Run         Modified XML Modification
   e) Go to step 6.
                                                           1000              30             22           6              28
5. if count > 1 then                                       2000              50             43           5              47
   a) set isDualMeaningFlag:=True                          3000              80             71           6              76
   b) select the max occurred dual                         4000             110             99           9             104
       meaning word in the Web-page                        5000             140            127           10            134
       content
   c) if there exists multiple dual
       meaning word with same number of              6. Conclusion
       occurrence   then   select   dual                      Web-page content meaning identification is
       meaning   word   which   appeared
                                                     a very difficult job for any system. The human brain
       first in the Web-page content
   d) Extract key words in the dual                  can find it easily but need to go through each and
       meaning word sentence                         every Web-page contents, which is really
   e) Based on the key word traverse                 impossible. We found that approximate 30% - 40%
       XML (knowledge based system) for
       dual meaning word                             Web-pages are representing unique meaning; out of
   f) Retrieve the meaning of that key               those 30% - 40% approximate 8% - 10% Web-pages
       and store it in a temporary                   are holding dual meaning words. Hence, we are
       table.
   g) Go to step 6.                                  concentrating to create those 8% - 10% Web-page
6. Choose the meaning from temporary                 meaning XML. We found approximate 95%
   table which count is maximized.                   successful cases achieved to identify Web-page
7. end
                                                     content meaning those held dual meaning words in
                                                     their Web-page content. Our approach is highly
5. Experimental Analysis                             scalable. Suppose, we encountered a new pattern
        In this section, we have given some          and want to support that pattern, then we just
experimental study as well as discussed how to set   introduce the meaning XML and the system will
up our system. Section 5.1 explains our              work. We have tested our system by taking a sub-set
experimental procedure, and section 5.2 shows the    of Web-pages shown in experimental results section.
experimental results of our system.                  In this paper, we are mainly focused on our
                                                     approach, which will work for large volume of data.
5.1. Experimental Procedure
         Performance of our system depends on
various parameters and those parameters need to be   REFERENCES
set up before running our system. The considered     [1] C. H. Yu, and S. J. Lin, Parallel Crawling and
parameters are Web-page repository, knowledge            Capturing for On-Line Auction, Lecture Notes
based system, i.e., dual meaning word XML with


                                                                                                            879 | P a g e
Sukanta Sinha, Rana Dattagupta, Debajyoti Mukhopadhyay / International Journal of
             Engineering Research and Applications (IJERA)    ISSN: 2248-9622
                www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.877-880

    In Computer Science, Springer-Verlag, Berlin,       Databases, WebDB 2004 Proceedings, Maison
    Heidelberg, 5075, 2008, 455-466.                    de la Chimie, Paris, France, June 17-18, 2004,
                                                        79–84.
[2] D. Mukhopadhyay, A. Biswas, S. Sinha, A New
    Approach to Design Domain Specific Ontology     [13] B. Chidlovskii, Schema extraction from xml: A
    Based Web Crawler, 10th International                grammatical      inference    approach,     In
    Conference on Information Technology, ICIT           Proceedings of the International Workshop on
    2007 Proceedings, Rourkela, India, IEEE              Knowledge Representation Meets Databases
    Computer Society Press, California, USA,             (KRDB), 2001.
    December 17-20, 2007, 289-291.
                                                    [14] Y. Papakonstantinou and V. Vianu, DTD
[3] W. Willinger, R. Govindan, S. Jamin, V.              Inference for Views of XML Data, In the
    Paxson and S. Shenker, Scaling phenomena in          Procedings of 19th ACM Symposium on
    the Internet, In Proceedings of the National         Principles of Database Systems (PODS),
    Academy of Sciences, 1999, suppl. 1, 2573–           Dallas, Texas, USA, 2000, 35-46.
    2580.
[4] J. J. Rehmeyer, Mapping a medusa: The
    Internet spreads its tentacles, Science News,
    171, June 2007, 387-388.
[5] M. Murata, D. Lee, M. Mani and K.
    Kawaguchi, Taxonomy of XML Schema
    Languages using Formal Language Theory, In
    ACM Trans. on Internet Technology (TOIT),
    5(4), November 2005, 1-45.
[6] I. Stuart, XML Schema, a brief introduction
    (Internet archived by WayBack Machine,
    October 26, 2001).
[7] D. Lee and W. W. Chu, Comparative Analysis
    of Six XML Schema Languages, In ACM
    SIGMOD Record, 29(3), September 2000, 76-
    87.
[8] C. Binstock, D. Peterson, M. Smith, M.
    Wooding and C. Dix, The XML Schema
    Complete Reference (Published by Addison-
    Wesley, 2002).
[9] J. Hegewald, F. Naumann and M. Weis,
    XStruct: efficient schema extraction from
    multiple and large XML documents, The 22nd
    International Conference on Data Engineering,
    ICDE Workshops, IEEE Computer Society,
    Atlanta, GA, April 3-8, 2006, 81-91.
[10] G. J. Bex, W. Martens, F. Neven and T.
     Schwentick, Expressiveness of XSDs: from
     practice to theory, there and back again, In
     Proceedings of the 14th international World
     Wide Web Conference, Chiba, Japan, 2005,
     712–721.
[11] G. J. Bex, F. Neven, T. Schwentick and K.
     Tuyls, Inference of concise DTDs from XML
     data, Proceedings of the 32nd International
     Conference on Very Large Data Bases (VLDB),
     Seoul, Korea, September 12-15, 2006.
[12] G. J. Bex, F. Neven and J. V. Bussche, DTDs
     versus XML Schema: a practical study, 7th
     International Workshop on the Web and

                                                                                        880 | P a g e

More Related Content

Viewers also liked

Af25175180
Af25175180Af25175180
Af25175180
IJERA Editor
 
Fi24989993
Fi24989993Fi24989993
Fi24989993
IJERA Editor
 
Ey24943946
Ey24943946Ey24943946
Ey24943946
IJERA Editor
 
Bk25371374
Bk25371374Bk25371374
Bk25371374
IJERA Editor
 
Fg24980983
Fg24980983Fg24980983
Fg24980983
IJERA Editor
 
A25001003
A25001003A25001003
A25001003
IJERA Editor
 
Eu24916923
Eu24916923Eu24916923
Eu24916923
IJERA Editor
 
F24034040
F24034040F24034040
F24034040
IJERA Editor
 
Fa3110171022
Fa3110171022Fa3110171022
Fa3110171022
IJERA Editor
 
Cm31588593
Cm31588593Cm31588593
Cm31588593
IJERA Editor
 
O sangue
O sangueO sangue
O sangue
biologiakaio
 
Double page spread
Double page spreadDouble page spread
Double page spreadDan Hall
 
Quero a delicia de poder fotografar as coisas
Quero a delicia de poder fotografar as coisasQuero a delicia de poder fotografar as coisas
Quero a delicia de poder fotografar as coisas
Lucas Fonseca
 
3 houredpmanualpart2email-120131064823-phpapp01 (1)
3 houredpmanualpart2email-120131064823-phpapp01 (1)3 houredpmanualpart2email-120131064823-phpapp01 (1)
3 houredpmanualpart2email-120131064823-phpapp01 (1)
Nila Libranda
 
Bolibar
BolibarBolibar
Bolibar
bolitotuflai
 
Agenda politica economica version final
Agenda politica economica version finalAgenda politica economica version final
Agenda politica economica version final
Robert Gallegos
 
Apresentação 1 t12
Apresentação 1 t12Apresentação 1 t12
Apresentação 1 t12
TriunfoRi
 
CurriculumVitae
CurriculumVitaeCurriculumVitae
CurriculumVitae
sibahle mdluli
 
Amil linha blue_individual_pf
Amil linha blue_individual_pfAmil linha blue_individual_pf
Amil linha blue_individual_pf
easysaude
 
Presenta tic
Presenta ticPresenta tic
Presenta tic
MARIA
 

Viewers also liked (20)

Af25175180
Af25175180Af25175180
Af25175180
 
Fi24989993
Fi24989993Fi24989993
Fi24989993
 
Ey24943946
Ey24943946Ey24943946
Ey24943946
 
Bk25371374
Bk25371374Bk25371374
Bk25371374
 
Fg24980983
Fg24980983Fg24980983
Fg24980983
 
A25001003
A25001003A25001003
A25001003
 
Eu24916923
Eu24916923Eu24916923
Eu24916923
 
F24034040
F24034040F24034040
F24034040
 
Fa3110171022
Fa3110171022Fa3110171022
Fa3110171022
 
Cm31588593
Cm31588593Cm31588593
Cm31588593
 
O sangue
O sangueO sangue
O sangue
 
Double page spread
Double page spreadDouble page spread
Double page spread
 
Quero a delicia de poder fotografar as coisas
Quero a delicia de poder fotografar as coisasQuero a delicia de poder fotografar as coisas
Quero a delicia de poder fotografar as coisas
 
3 houredpmanualpart2email-120131064823-phpapp01 (1)
3 houredpmanualpart2email-120131064823-phpapp01 (1)3 houredpmanualpart2email-120131064823-phpapp01 (1)
3 houredpmanualpart2email-120131064823-phpapp01 (1)
 
Bolibar
BolibarBolibar
Bolibar
 
Agenda politica economica version final
Agenda politica economica version finalAgenda politica economica version final
Agenda politica economica version final
 
Apresentação 1 t12
Apresentação 1 t12Apresentação 1 t12
Apresentação 1 t12
 
CurriculumVitae
CurriculumVitaeCurriculumVitae
CurriculumVitae
 
Amil linha blue_individual_pf
Amil linha blue_individual_pfAmil linha blue_individual_pf
Amil linha blue_individual_pf
 
Presenta tic
Presenta ticPresenta tic
Presenta tic
 

Similar to En24877880

Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic Web
Editor IJCATR
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
csandit
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
csandit
 
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
dannyijwest
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled IntelligenceMetadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
dannyijwest
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence               Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
dannyijwest
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
ijasa
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep Web
Samiul Hoque
 
Semantic Query Optimisation with Ontology Simulation
Semantic Query Optimisation with Ontology SimulationSemantic Query Optimisation with Ontology Simulation
Semantic Query Optimisation with Ontology Simulation
dannyijwest
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web Nature
Constantin Stan
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
IOSR Journals
 
B-BabelNet: Business-Specific Lexical Database for Improving Semantic Analysi...
B-BabelNet: Business-Specific Lexical Database for Improving Semantic Analysi...B-BabelNet: Business-Specific Lexical Database for Improving Semantic Analysi...
B-BabelNet: Business-Specific Lexical Database for Improving Semantic Analysi...
TELKOMNIKA JOURNAL
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
Carmen Sanborn
 
Semantic web
Semantic webSemantic web
Semantic web
Hon Lasisi H
 
A Survey on Various Web Technologies
A Survey on Various Web TechnologiesA Survey on Various Web Technologies
A Survey on Various Web Technologies
ijsrd.com
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
Editor IJARCET
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
Editor IJARCET
 
An Implementation of a New Framework for Automatic Generation of Ontology and...
An Implementation of a New Framework for Automatic Generation of Ontology and...An Implementation of a New Framework for Automatic Generation of Ontology and...
An Implementation of a New Framework for Automatic Generation of Ontology and...
IJCSIS Research Publications
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
John Breslin
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Artificial Intelligence Institute at UofSC
 

Similar to En24877880 (20)

Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic Web
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled IntelligenceMetadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence               Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep Web
 
Semantic Query Optimisation with Ontology Simulation
Semantic Query Optimisation with Ontology SimulationSemantic Query Optimisation with Ontology Simulation
Semantic Query Optimisation with Ontology Simulation
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web Nature
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
 
B-BabelNet: Business-Specific Lexical Database for Improving Semantic Analysi...
B-BabelNet: Business-Specific Lexical Database for Improving Semantic Analysi...B-BabelNet: Business-Specific Lexical Database for Improving Semantic Analysi...
B-BabelNet: Business-Specific Lexical Database for Improving Semantic Analysi...
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
Semantic web
Semantic webSemantic web
Semantic web
 
A Survey on Various Web Technologies
A Survey on Various Web TechnologiesA Survey on Various Web Technologies
A Survey on Various Web Technologies
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
An Implementation of a New Framework for Automatic Generation of Ontology and...
An Implementation of a New Framework for Automatic Generation of Ontology and...An Implementation of a New Framework for Automatic Generation of Ontology and...
An Implementation of a New Framework for Automatic Generation of Ontology and...
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
 

En24877880

  • 1. Sukanta Sinha, Rana Dattagupta, Debajyoti Mukhopadhyay / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.877-880 Identify Web-page Content meaning using Knowledge based System for Dual Meaning Words Sukanta Sinha1, 4, Rana Dattagupta2, Debajyoti Mukhopadhyay3, 4 1 (TATA Consultancy Services, Victoria Park, Kolkata 700091, India) 2 (Computer Sc. Dept., Jadavpur University, Kolkata 700032, India) 3 (Dept. of Information Technology, Maharashtra Institute of Technology, Pune 411038, India) 4 (WIDiCoReL, Green Tower C- 9/1, Golf Green, Kolkata 700095, India) Abstract holds only dual meaning words in their Web-page Meaning of Web-page content plays a big content. To identify the meaning, we have created a role while produced a search result from a search knowledge based system by collecting various types engine. Most of the cases Web-page meaning of data patterns. stored in title or meta-tag area but those Our paper is not intended to provide a complete meanings do not always match with Web-page survey of techniques. According to our knowledge, content. To overcome this situation we need to go we have applied these techniques on few examples. through the Web-page content to identify the Now a day‟s research on search engine has been Web-page meaning. In such cases, where Web- carried out in universities and open laboratories, page content holds dual meaning words that time many dot-com companies. Unfortunately, many of it is really difficult to identify the meaning of the these techniques are used by dot-coms, and Web-page. In this paper, we are introducing a especially the resulting performance, are kept new design and development mechanism of private behind company walls, or are disclosed in identifying the Web-page content meaning which patents that can be comprehended and appreciate by holds dual meaning words in their Web-page the lawyers. Therefore, we believe that the overview content. of problems and techniques that we presented here can be useful. Keywords – Dual meaning word, Knowledge This paper discusses survey of the problem area in based system, Search engine, Web-page content, section 2. Section 3 discusses about the XML Web resources schema. Section 4 depicts the proposed approach. Section 5 shows some experimental analyses. 1. Introduction Finally, section 6 concludes the paper. Web search engine is a tool that produces search results based on the user given query. World 2. The Problem Area Wide Web (WWW) is a huge reservoir of Web- Web-page content meaning identification is pages. Search engine crawler crawls down the Web- an essential part of a search engine to produce pages from WWW and creates a database of Web relevant search result. Most of the cases we can get resources for the search engine [1, 2]. the Web-page content meaning from title or meta- In the present era of Internet, WWW is an tag area of that Web-page content but they do not accumulated and interactive medium for accessing always match with the actual Web-page content. On an enormous conglomeration of information [3]. the other hand, a few cases where Web-page content The information in the Web-page content consists of holding dual meaning words are really difficult to diverse data types such as structured data, semi identify the meaning of the Web-page content. structured data and lack of structure of Web data, In general, our main goal is to identify the Web- etc. [4]. Few cases we also found holds dual page content meaning which holds dual meaning meaning words are exists in Web-page content. words in their Web-page content. The example Meaning identification of those Web-page contents illustrates the difficulty to identify the meaning of a which holds dual meaning words is a challenging Web-page content, which can be overcome by using task. our proposed system. The dual meaning word means a word which contains two meanings like „bank‟ represents Example 1: John is looking for a bank to open a „financial institute‟ as well as „river side‟. We need savings account on the other hand Alex is looking to identify the meaning based on the full sentence. for a bank of the river for a get together. Here, both In our approach, we have mainly focused on the bank represents different meaning, one for identifying the Web-page content meaning, which financial institutes and other one for river side. If both the sentence exists in different Web-page 877 | P a g e
  • 2. Sukanta Sinha, Rana Dattagupta, Debajyoti Mukhopadhyay / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.877-880 content then the meaning of the Web-page content complex type element which holds similar types of need to be retrieved based on their content. key elements with their meaning. „names‟ is a complex type element which holds key element Example 2: Peter found a bank which located on the names that represent same meaning. „name‟ and bank of the river. This is a single sentence which „meaning‟ are simple type element holds key values represents financial institutions as well as river side. and their meaning. Each XML holds a „dmw_id‟. This time any one of the meanings is valid for the We have maintained dual meaning word with a sentence. In our approach, we assumed that one corresponding „dmw_id‟. Key words are taken from Web-page has only one meaning. Hence, for this dual meaning word holding sentence. For example type of situation we will assign any one meaning “John is looking for a bank to open a savings based on our programming logic. account” and “Alex is looking for a bank of the river for a get together” holds „account‟, „river‟ key 3. XML Schema words. All the key word meaning is taken care while design the XML. In Fig.2 we have shown a part of An XML Schema describes the structure of an XML an XML for „bank‟. document [5, 6]. The XML Schema language refers to an XML Schema Definition (XSD). The purpose of an XML Schema is to define the legal building blocks of an XML document. An XML Schema defines elements, attributes that can appear in a document [7, 8]. It also expressed data types, default and fixed values for elements and attributes. One of the greatest strengths of XML Schemas is the support for data types and written in XML. XML Schemas are extensible because they are written in XML. XML Schema holds simple and complex elements [9, 10, 11]. A simple element is an XML element that contains only text. It cannot contain any other elements or attributes. A complex element is an XML element that contains other elements and/or attributes. There are four kinds of complex elements; they are empty elements, elements that contain only other elements, elements that contain only text, elements that contain both other elements figure 1. A sample XSD and text. The <schema> element is the root element of every XML Schema. The <schema> element may contain some attributes [12, 13, 14]. 4. Proposed Approach In our approach, we have proposed a mechanism which identifies meaning of Web-page content for those who holds dual meaning word in their Web- page content. Section 4.1 explains an overview of creating knowledge based system and section 4.2 depicts our algorithm. 4.1. Knowledge Based System Generation To create a knowledge based system we have collected dual meaning words from various sources like internet, dictionary, etc. Now for each dual figure 2. A part of an XML (for bank) meaning word, we have created one XML which link with Fig.1 given XSD. The considered XSD 4.2. Algorithm holds both simple and complex type of elements. To identify Web-page content meaning we are using „dualMeaningWordName‟ attribute holds the dual below given algorithm. This algorithm mainly meaning word name. „keywords‟ is a complex focused on identifying the Web-page content element which holds various sets of keyword, which meaning, which holds dual meaning words in their classified based on their meaning. „keyword‟ also a 878 | P a g e
  • 3. Sukanta Sinha, Rana Dattagupta, Debajyoti Mukhopadhyay / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.877-880 Web-page content. In our approach, we have used a proper meaning, XML schema, etc. Initially, we knowledge based system for identifying the have created the knowledge based system with the meaning of dual meaning words. The knowledge help of internet, dictionary. Then we have tuned the based system stores the information in XML form. knowledge based system through our experiments. Input : Web-page content In our experiment, we have taken a Web-page from Output : Meaning of the Web-page our repository and pass it through our system and content check the database for the meaning of that Web- page. If the Web-page holds dual meaning words 1. Extract dual meaning words from the Web-page content. then the meaning will identified otherwise update 2. get count of dual meaning words in 'isDualMeaningFlag' as false. the Web-page content 3. if count = 0 then 5.2. Experimental Results set isDualMeaningFlag:=False and It is very difficult to compare our system exit with any existing system. Anyhow we have 4. if count = 1 then a) set isDualMeaningFlag:=True produced few data to measure our proposed system b) Extract key words in the dual performance. As a part of experimental results, we meaning word sentence have produced a statistic, which given in Table 1. c) Based on the key word traverse XML (knowledge based system) for Table1. Performance Report of Our System dual meaning word d) Retrieve the meaning of that key No. of Web-page No. of Web-page Correct Meaning No. of Correct Meaning and store it in a temporary Taken / hold Dual Identified in 1st XML Identified after table. Repository Size Meaning Words Run Modified XML Modification e) Go to step 6. 1000 30 22 6 28 5. if count > 1 then 2000 50 43 5 47 a) set isDualMeaningFlag:=True 3000 80 71 6 76 b) select the max occurred dual 4000 110 99 9 104 meaning word in the Web-page 5000 140 127 10 134 content c) if there exists multiple dual meaning word with same number of 6. Conclusion occurrence then select dual Web-page content meaning identification is meaning word which appeared a very difficult job for any system. The human brain first in the Web-page content d) Extract key words in the dual can find it easily but need to go through each and meaning word sentence every Web-page contents, which is really e) Based on the key word traverse impossible. We found that approximate 30% - 40% XML (knowledge based system) for dual meaning word Web-pages are representing unique meaning; out of f) Retrieve the meaning of that key those 30% - 40% approximate 8% - 10% Web-pages and store it in a temporary are holding dual meaning words. Hence, we are table. g) Go to step 6. concentrating to create those 8% - 10% Web-page 6. Choose the meaning from temporary meaning XML. We found approximate 95% table which count is maximized. successful cases achieved to identify Web-page 7. end content meaning those held dual meaning words in their Web-page content. Our approach is highly 5. Experimental Analysis scalable. Suppose, we encountered a new pattern In this section, we have given some and want to support that pattern, then we just experimental study as well as discussed how to set introduce the meaning XML and the system will up our system. Section 5.1 explains our work. We have tested our system by taking a sub-set experimental procedure, and section 5.2 shows the of Web-pages shown in experimental results section. experimental results of our system. In this paper, we are mainly focused on our approach, which will work for large volume of data. 5.1. Experimental Procedure Performance of our system depends on various parameters and those parameters need to be REFERENCES set up before running our system. The considered [1] C. H. Yu, and S. J. Lin, Parallel Crawling and parameters are Web-page repository, knowledge Capturing for On-Line Auction, Lecture Notes based system, i.e., dual meaning word XML with 879 | P a g e
  • 4. Sukanta Sinha, Rana Dattagupta, Debajyoti Mukhopadhyay / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.877-880 In Computer Science, Springer-Verlag, Berlin, Databases, WebDB 2004 Proceedings, Maison Heidelberg, 5075, 2008, 455-466. de la Chimie, Paris, France, June 17-18, 2004, 79–84. [2] D. Mukhopadhyay, A. Biswas, S. Sinha, A New Approach to Design Domain Specific Ontology [13] B. Chidlovskii, Schema extraction from xml: A Based Web Crawler, 10th International grammatical inference approach, In Conference on Information Technology, ICIT Proceedings of the International Workshop on 2007 Proceedings, Rourkela, India, IEEE Knowledge Representation Meets Databases Computer Society Press, California, USA, (KRDB), 2001. December 17-20, 2007, 289-291. [14] Y. Papakonstantinou and V. Vianu, DTD [3] W. Willinger, R. Govindan, S. Jamin, V. Inference for Views of XML Data, In the Paxson and S. Shenker, Scaling phenomena in Procedings of 19th ACM Symposium on the Internet, In Proceedings of the National Principles of Database Systems (PODS), Academy of Sciences, 1999, suppl. 1, 2573– Dallas, Texas, USA, 2000, 35-46. 2580. [4] J. J. Rehmeyer, Mapping a medusa: The Internet spreads its tentacles, Science News, 171, June 2007, 387-388. [5] M. Murata, D. Lee, M. Mani and K. Kawaguchi, Taxonomy of XML Schema Languages using Formal Language Theory, In ACM Trans. on Internet Technology (TOIT), 5(4), November 2005, 1-45. [6] I. Stuart, XML Schema, a brief introduction (Internet archived by WayBack Machine, October 26, 2001). [7] D. Lee and W. W. Chu, Comparative Analysis of Six XML Schema Languages, In ACM SIGMOD Record, 29(3), September 2000, 76- 87. [8] C. Binstock, D. Peterson, M. Smith, M. Wooding and C. Dix, The XML Schema Complete Reference (Published by Addison- Wesley, 2002). [9] J. Hegewald, F. Naumann and M. Weis, XStruct: efficient schema extraction from multiple and large XML documents, The 22nd International Conference on Data Engineering, ICDE Workshops, IEEE Computer Society, Atlanta, GA, April 3-8, 2006, 81-91. [10] G. J. Bex, W. Martens, F. Neven and T. Schwentick, Expressiveness of XSDs: from practice to theory, there and back again, In Proceedings of the 14th international World Wide Web Conference, Chiba, Japan, 2005, 712–721. [11] G. J. Bex, F. Neven, T. Schwentick and K. Tuyls, Inference of concise DTDs from XML data, Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), Seoul, Korea, September 12-15, 2006. [12] G. J. Bex, F. Neven and J. V. Bussche, DTDs versus XML Schema: a practical study, 7th International Workshop on the Web and 880 | P a g e