SlideShare a Scribd company logo
1 of 94
Download to read offline
Building an Intelligent Web:
            Theory and Practice
            Th       d P ti
                 Pawan Lingras
              Saint Mary’s University
                Rajendra Akerkar
  American University of Armenia and SIBER, India
Discipline




                                                  Mathematics and Statistics                            Management
             Computer Science




                                                                                                   Chapters 1 – 8 excluding
                                                                                                   shaded portion related to
  Research                    Graduate            Research                  Graduate                  mathematics and
                                                                                                       implementation.




                      Information                                      Chapters 1 – 8 excluding     Chapters 2, 4 – 8 excluding
Complete Book                             Web Mining                   shaded portion related to     shaded portion related to
                       Retrieval
                                                                           implementation.               implementation.




                   Chapters 1, 2, 3, 7
                        and 8            Chapters 4 - 8
Information Retrieval
Create a list of words




                        Remove stop words




                            Stem words




               Calculate frequency of each stemmed
                               word




Figure 2.1 Transforming text document to a weighted list of keywords
Data Mining has emerged as one of the most exciting and dynamic
fields in computing science. The driving force for data mining is
the presence of petabyte-scale online archives that potentially
contain valuable bits of information hidden in them. Commercial
enterprises h
  t    i     have bbeen quick t
                           i k to recognize th
                                          i   the value of thi
                                                     l     f this
concept; consequently, within the span of a few years, the
software market itself for data mining is expected to be in excess
of $10 billion. Data mining refers to a family of techniques used
to detect interesting nuggets of relationships/knowledge in data.
While the theoretical underpinnings of the field have been around
for quite some time (in the form of pattern recognition,
statistics, data analysis and machine learning), the practice and
use of these techniques have been largely ad-hoc. With the
availability of large databases to store manage and assimilate
                                     store,
data, the new thrust of data mining lies at the intersection of
database systems, artificial intelligence and algorithms that
efficiently analyze data. The distributed nature of several
databases, their size and the high complexity of many techniques
present interesting computational challenges.
1


              0.75
              0 75
    ecision

               0.5
  Pre




              0.25


                0
                     0.25   0.5            0.75   1
                                  Recall



Figure 2.43 Relationship between precision and recall
  g                    p         p
Semantic Web
Semantic Web
The layer language model
    (Berners-Lee, 2001; Broekstra et al, 2001)
<h1>Student Service Centre</h1>

Welcome to the home page of the Student Service Centre.

The centre is located in the main building of the University.

You may visit us for assistance during working days.

<h2>Office hours</h2>

Mon to Thu 8am - 6pm<br>

Fri 8am - 2pm<p>

But note that centre is not open during the weeks of the

<a href=”. . .”>State Of Origin</a>.



            Figure 3.2 Example of a Web page of a Student Service Centre
<organization>

     <serviceOffered>Admission</serviceOffered>

     <organizationName>Student Service Centre</organizationName>

     <staff>

        <director>John Roth</director>

        <secretary>Penny Brenner</secretary>

     </staff>

</organization>




            Figure 3.3 Example of a Web page of a Student Service Centre
Figure 3.4 Representing classes and instances (Noy et al., 2001)
Edward
                 lecturer   @name
                                                    Bunker




                            course   @title        Algorithms




                            course                 Computati
                                     @title           onal
                                                    Algebra



                 lecturer   @name

                                                    Daniela
                                                     Frost




                                                   Nonlinear
                            course   @title
                                                   Analysis

root   college


                                                      Sam
                            @name
                                                     Hoofer




                                                    Discrete
                 lecturer   course   @title
                                                   Structures




                                                    Modern
                            course
                            co rse   @title
                                                    Algebra




                                                   Nonlinear
                            course   @title
                                                   Analysis




                 location                     Innsbruck
Queries 1 and 2
                                                      Edward
                 lecturer    @name
                                                      Bunker




                              course   @title        Algorithms




                              course                 Computati
                                       @title           onal
                                                      Algebra



                 lecturer    @name

                                                      Daniela
                                                       Frost




                                                     Nonlinear
                              course   @title
                                                     Analysis

root   college


                                                        Sam
                             @name                     Hoofer




                                                      Discrete
                 lecturer     course   @title
                                                     Structures




                                                      Modern
                              course   @title
                                                      Algebra




                                                     Nonlinear
                              course   @title
                                                     Analysis




                 location                       Innsbruck
Queries 3 and 4
                                                    Edward
                 lecturer   @name
                                                    Bunker




                            course   @title        Algorithms




                            course                 Computati
                                     @title           onal
                                                    Algebra



                 lecturer   @name

                                                    Daniela
                                                     Frost




                                                   Nonlinear
                            course   @title
                                                   Analysis

root   college


                                                      Sam
                            @
                            @name                    Hoofer




                                                    Discrete
                 lecturer   course   @title
                                                   Structures




                                                    Modern
                            course   @title
                                                    Algebra




                                                   Nonlinear
                            course   @title
                                                   Analysis




                 location                     Innsbruck
<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                       p //         g/    / /         y       #

     xmlns:dc="http://purl.org/dc/elements/1.1/">

  <rdf:Description rdf:about="">

     <dc:title>

             Building an Intelligent Web: Theory and Practice

      </dc:title>

     <dc:creator> Rajendra Akerkar and Pawan Lingras </dc:creator>

  </rdf:Description>

</rdf:RDF>




                             Figure 3.26 Fragment of RDF
A RDF model for automobiles
<?xml version="1.0"?>

<rdf:RDF

  xmlns:rdf http://www.w3.org/1999/02/22 rdf syntax ns#
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

  xmlns:my="http://www.myvehicle.com/vehicle-schema/">



  <rdfs:Class rdf:about="#Vehicle"/>



  <rdfs:Class rdf:about="#Car">

     <rdfs:subClassOf rdf:resource="#Vehicle"/>

  </rdfs:Class>



  <rdf:Property rdf:about="#name">
    df P     t   df b t "#      "

     <rdfs:domain rdf:resource="#Vehicle"/>

  </rdf:Property>



  <rdf:Description rdf:about="#Ford">

     <rdf:type rdf:resource="#Car"/>

     <my:name>Ford Icon</my:name>

  </rdf:Description>



  <my:Truck rdf:about="#Mitsubishi">

     <my:name>Mitsubishi</my:name>

     <my:carry rdf:resource="#Mitsubishi"/>

  </my:Truck>

</rdf:RDF>




                  Figure 3.29 RDF/XML file for the automobile example
<?xml version="1.0"?>

<topicMap id="tmrf"

            xmlns       = 'http://www.topicmaps.org/xtm/1.0/'

            xmlns:xlink = 'http://www.w3.org/1999/xlink'>

<!--

       The map contains information about Technomathematics Research Foundation.

       We can include comment and narrative here…

-->

.... here my topics and my associations go ...

</topicMap>




Figure 3.30 A Topic Map document
(Adopted from http://topicmaps.bond.edu.au/docs/6/1)
Classification and Association
Data Preparation

•   Database Theory
•   SQL
•   Data Transformation
•   http://www.ecn.purdue.edu/KDDCUP/data/
Classification
• Find a rule, a formula, or black box classifier for
  organizing data into classes.
   – Classify clients requesting loans into categories
     based on the likelihood of repayment
                                   p y
   – Classify customers into Big or Moderate Spenders
     based on what they buy
   – Classify the customers into loyal, semi-loyal,
                                        semi loyal,
     infrequent based on the products they buy
• The classifier is developed from the data in the
  training set
• The reliability of the classifier is evaluated using
  the test set of data
Classification
• ID3 Algorithm
  – Numerical Illustration
  – Application to a Small E commerce Dataset
                           E-commerce
• C4.5 for Experimentation
• Other approaches
  – Neural Networks
  – Fuzzy Classification
  – Rough Set Theory
Association
• Market basket analysis
  – determine which things go together
• Transactions might reveal that
  – customers who buy banana also buy candles
  – cheese and pickled onions seem to occur frequently
    in a shopping cart
• Information can be used for
  – arranging a physical shop or structuring the Web site
  – for targeted advertising campaign
Association

• Apriori Algorithm
• D
  Demonstration f an E-commerce
           t ti for  E
  Application
Clustering
Clustering
• Breaks a large database into different
  subgroups or clusters
• Unlike classification there are no
  predefined classes
• Th clusters are put t
  The l t              t together on th basis
                             th      the b i
  of similarity to each other
• The data miners determine whether the
  clusters offer any useful insight
5


4


3


2


1


0
    0   1   2   3   4   5
Statistical Methods

•   k – means
    – Numerical Example
    – Implementation
      •   Data Preparation
      •   Clustering
•   Other Methods
Neural Network Based Approaches


• Kohonen Self Organising Maps
  – Numerical Demonstration
  – Application to Web Data Collection
• Oth Neural N t
  Other N  l Network B
                   k Based A
                         d Approaches
                                  h
Clustering of customers
Web Mining




                 Web Content
                 W bC t t                      Web Structure
                                               W b St t                         Web Usage
                                                                                W bU
                   Mining                        Mining                          Mining




                                                                  General
 Web Page                      Search Result                                                 Customized
                                                               Access Pattern
Content Mining                    Mining                                                    Usage Tracking
                                                                  Tracking
Web Usage Mining
High level web usage mining process
       (Srivastava et al., 2000)
       (S i   t     t l
Applications of web usage mining
 (Romanko, 2006; Srivastava et al., 2000)
140.14.6.11 - pawan [06/Sep/2001:10:46:07 -0300] "GET /s.htm HTTP/1.0" 200 2267


140.14.7.18 - raj [06/Sep/2001:11:23:53 -0300] "POST /s.cgi HTTP/1.0" 200 499
Clustering exercise
Classification exercise

                  Channel                    Recall   Precision
                  Finance                    44.3%    98.27%
                  Health                     52.3%
                                             52 3%    89.66%
                                                      89 66%
                  Market                     49.1%    83.34%
                  News                       44.1%    89.27%
                  Shopping                   31.5%    91.31%
                  Specials                   60.2%    92.86%
                  Sport                      50.0%    91.93%
                  Surveys                    21.9%    92.66%
                  Theatre                    54.8%    94.63%

Table 6.8 Precision and recall for predicting user’s interest in channels
                                              user s
                           (Baglioni, et al., 2003)
Association exercise


     News          Minimum Maximum Mean        Standard
     Section       Requests Requests
                     q        q       Requests Deviation
                                        q
     Science               1       97   2.3034    2.8184
     Culture               1      208   3.7878    5.9742
     Sports                1      318   5.6985   10.8360
     Economics             1      258   3.9335    7.2341
     International         1      208   3.3823    5.5540
     Local Lisbon
     L l Li b              1      460   5.6883
                                        5 6883   11.5650
                                                 11 5650
     Local Port            1      256   7.5984   13.2351
     Politics              1      208   3.3577    5.4101
     Society               1      367   4.2673    7.9853
     Education             1       90   2.6496   3.29090
Table 6.9 Summary statistics of requests to the Publico on-line newspaper
                        (Batista and Silva, 2002)
The association mining showed strong associations between the following pairs:

ď‚·   Politics and Society

ď‚·   Politics and International News

ď‚·   Politics and Sports

ď‚·   Society and International News

ď‚·   Society and Local Lisbon

ď‚·   S
    Society and Sports
          y     Sp

ď‚·   Society and Culture

ď‚·   Sports and International News
     p
Sequence Pattern Analysis of
        Web L
        W b Logs
Web Content Mining
Data Collection

•   Web Crawlers
•   Public
    P blic Domain Web Cra lers
                      Crawlers
•   An Implementation of a Web Crawler
Architecture of a search engine
        (Romanko, 2006)
Other topics in Web Content Mining
•   Search Engines
    – How to prepare for and setup a search
      engine
    – Types and listings of search engines
      (freeware, remote hosting services,
      commercial)
•   Multimedia Information Retrieval
Web Structure Mining
0/10:    The site or page is probably new.

3/10:    The site is perhaps new, small in size and has very little or no worthwhile

         arriving links. The page gets very little traffic.

5/10:    The site has a fair amount of worthwhile arriving links and traffic volume. The

         site might be larger in size and gets a good amount of steady traffic with some

         return visitors.

8/10:    The site has many arriving links, probably from other high PageRank pages.

         The site perhaps contains a lot of information and has a higher traffic flow and

         return visitor rate.
                 ii

10/10:   The Web site is large, popular and has an extremely high number of links

         pointing to it.
http://www.iprcom.com/papers/pagerank/
   p        p         p p    p g
Index quality for different search engines
         (Henzinger, et al., 1999)
Index quality per page for different search engines

              (Henzinger, et al., 1999)
Page                         Freq.     Freq.     Rank
                                                 Walk2     Walk1     Walk1

www.microsoft.com/                                  3172      1600        1
www.microsoft.com/windows/ie/default.htm            2064      1045        3
www.netscape.com/                                   1991       876        6
www.microsoft.com/ie/
www microsoft com/ie/                               1982      1017        4
www.microsoft.com/windows/ie/download/              1915       943        5
www.microsoft.com/windows/ie/download/all.htm       1696       830        7
www.adobe.com/prodindex/acrobat/readstep.html       1634       780        8
home.netscape.com/                                  1581       695       10
www.linkexchange.com/                               1574       763        9
www.yahoo.com/                                      1527      1132        2

     Table 8.2 Most frequently visited pages (Henzinger, et al., 1999)
Site               Frequency       Frequency        Rank
                                Walk 2          Walk 1         Walk 1

www.microsoft.com                    32452          16917                1
home.netscape.com                    23329          11084                2
www.adobe.com                        10884           5539                3
www.amazon.com                       10146           5182                4
www.netscape.com                      4862           2307               10
excite.netscape.com
excite netscape com                   4714           2372                9
www.real.com                          4494           2777                5
www.lycos.com                         4448           2645                6
www.zdnet.com                         4038           2562                8
www.linkexchange.com                  3738           1940               12
www.yahoo.com
www yahoo com                         3461           2595                7

    Table 8.3 Most frequently visited hosts (Henzinger, et al., 1999)

More Related Content

Similar to Building an Intelligent Web: Theory & Practice

Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarMustafa Jarrar
 
CIS-(Data Structures and Algorithms)FALL2023.pdf
CIS-(Data Structures and Algorithms)FALL2023.pdfCIS-(Data Structures and Algorithms)FALL2023.pdf
CIS-(Data Structures and Algorithms)FALL2023.pdfShayanAamir2
 
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET Journal
 
MapReduce and Its Discontents
MapReduce and Its DiscontentsMapReduce and Its Discontents
MapReduce and Its DiscontentsDean Wampler
 
Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...Gilbert Paquette
 
12111 data structure
12111 data structure12111 data structure
12111 data structureGaurang Thakar
 
Designing Dippler
Designing DipplerDesigning Dippler
Designing DipplerMart Laanpere
 
Verilog EMERSON EDUARDO RODRIGUES ENGENHEIRO.pdf
Verilog EMERSON EDUARDO RODRIGUES ENGENHEIRO.pdfVerilog EMERSON EDUARDO RODRIGUES ENGENHEIRO.pdf
Verilog EMERSON EDUARDO RODRIGUES ENGENHEIRO.pdfEMERSON EDUARDO RODRIGUES
 

Similar to Building an Intelligent Web: Theory & Practice (9)

Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrar
 
Towards Integrating Ontologies An EDM-Based Approach
Towards Integrating Ontologies An EDM-Based ApproachTowards Integrating Ontologies An EDM-Based Approach
Towards Integrating Ontologies An EDM-Based Approach
 
CIS-(Data Structures and Algorithms)FALL2023.pdf
CIS-(Data Structures and Algorithms)FALL2023.pdfCIS-(Data Structures and Algorithms)FALL2023.pdf
CIS-(Data Structures and Algorithms)FALL2023.pdf
 
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
 
MapReduce and Its Discontents
MapReduce and Its DiscontentsMapReduce and Its Discontents
MapReduce and Its Discontents
 
Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...
 
12111 data structure
12111 data structure12111 data structure
12111 data structure
 
Designing Dippler
Designing DipplerDesigning Dippler
Designing Dippler
 
Verilog EMERSON EDUARDO RODRIGUES ENGENHEIRO.pdf
Verilog EMERSON EDUARDO RODRIGUES ENGENHEIRO.pdfVerilog EMERSON EDUARDO RODRIGUES ENGENHEIRO.pdf
Verilog EMERSON EDUARDO RODRIGUES ENGENHEIRO.pdf
 

More from R A Akerkar

Rajendraakerkar lemoproject
Rajendraakerkar lemoprojectRajendraakerkar lemoproject
Rajendraakerkar lemoprojectR A Akerkar
 
Big Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social MediaBig Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social MediaR A Akerkar
 
Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?R A Akerkar
 
Big data in Business Innovation
Big data in Business Innovation   Big data in Business Innovation
Big data in Business Innovation R A Akerkar
 
What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?R A Akerkar
 
Connecting and Exploiting Big Data
Connecting and Exploiting Big DataConnecting and Exploiting Big Data
Connecting and Exploiting Big DataR A Akerkar
 
Linked open data
Linked open dataLinked open data
Linked open dataR A Akerkar
 
Semi structure data extraction
Semi structure data extractionSemi structure data extraction
Semi structure data extractionR A Akerkar
 
Big data: analyzing large data sets
Big data: analyzing large data setsBig data: analyzing large data sets
Big data: analyzing large data setsR A Akerkar
 
Description logics
Description logicsDescription logics
Description logicsR A Akerkar
 
Data Mining
Data MiningData Mining
Data MiningR A Akerkar
 
Link analysis
Link analysisLink analysis
Link analysisR A Akerkar
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligenceR A Akerkar
 
Case Based Reasoning
Case Based ReasoningCase Based Reasoning
Case Based ReasoningR A Akerkar
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup R A Akerkar
 
Intelligent natural language system
Intelligent natural language systemIntelligent natural language system
Intelligent natural language systemR A Akerkar
 
Data mining
Data miningData mining
Data miningR A Akerkar
 
Knowledge Organization Systems
Knowledge Organization SystemsKnowledge Organization Systems
Knowledge Organization SystemsR A Akerkar
 
Rational Unified Process for User Interface Design
Rational Unified Process for User Interface DesignRational Unified Process for User Interface Design
Rational Unified Process for User Interface DesignR A Akerkar
 
Unified Modelling Language
Unified Modelling LanguageUnified Modelling Language
Unified Modelling LanguageR A Akerkar
 

More from R A Akerkar (20)

Rajendraakerkar lemoproject
Rajendraakerkar lemoprojectRajendraakerkar lemoproject
Rajendraakerkar lemoproject
 
Big Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social MediaBig Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social Media
 
Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?
 
Big data in Business Innovation
Big data in Business Innovation   Big data in Business Innovation
Big data in Business Innovation
 
What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?
 
Connecting and Exploiting Big Data
Connecting and Exploiting Big DataConnecting and Exploiting Big Data
Connecting and Exploiting Big Data
 
Linked open data
Linked open dataLinked open data
Linked open data
 
Semi structure data extraction
Semi structure data extractionSemi structure data extraction
Semi structure data extraction
 
Big data: analyzing large data sets
Big data: analyzing large data setsBig data: analyzing large data sets
Big data: analyzing large data sets
 
Description logics
Description logicsDescription logics
Description logics
 
Data Mining
Data MiningData Mining
Data Mining
 
Link analysis
Link analysisLink analysis
Link analysis
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
Case Based Reasoning
Case Based ReasoningCase Based Reasoning
Case Based Reasoning
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup
 
Intelligent natural language system
Intelligent natural language systemIntelligent natural language system
Intelligent natural language system
 
Data mining
Data miningData mining
Data mining
 
Knowledge Organization Systems
Knowledge Organization SystemsKnowledge Organization Systems
Knowledge Organization Systems
 
Rational Unified Process for User Interface Design
Rational Unified Process for User Interface DesignRational Unified Process for User Interface Design
Rational Unified Process for User Interface Design
 
Unified Modelling Language
Unified Modelling LanguageUnified Modelling Language
Unified Modelling Language
 

Recently uploaded

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Recently uploaded (20)

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

Building an Intelligent Web: Theory & Practice

  • 1. Building an Intelligent Web: Theory and Practice Th d P ti Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
  • 2.
  • 3. Discipline Mathematics and Statistics Management Computer Science Chapters 1 – 8 excluding shaded portion related to Research Graduate Research Graduate mathematics and implementation. Information Chapters 1 – 8 excluding Chapters 2, 4 – 8 excluding Complete Book Web Mining shaded portion related to shaded portion related to Retrieval implementation. implementation. Chapters 1, 2, 3, 7 and 8 Chapters 4 - 8
  • 5.
  • 6. Create a list of words Remove stop words Stem words Calculate frequency of each stemmed word Figure 2.1 Transforming text document to a weighted list of keywords
  • 7.
  • 8. Data Mining has emerged as one of the most exciting and dynamic fields in computing science. The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them. Commercial enterprises h t i have bbeen quick t i k to recognize th i the value of thi l f this concept; consequently, within the span of a few years, the software market itself for data mining is expected to be in excess of $10 billion. Data mining refers to a family of techniques used to detect interesting nuggets of relationships/knowledge in data. While the theoretical underpinnings of the field have been around for quite some time (in the form of pattern recognition, statistics, data analysis and machine learning), the practice and use of these techniques have been largely ad-hoc. With the availability of large databases to store manage and assimilate store, data, the new thrust of data mining lies at the intersection of database systems, artificial intelligence and algorithms that efficiently analyze data. The distributed nature of several databases, their size and the high complexity of many techniques present interesting computational challenges.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. 1 0.75 0 75 ecision 0.5 Pre 0.25 0 0.25 0.5 0.75 1 Recall Figure 2.43 Relationship between precision and recall g p p
  • 17.
  • 19. Semantic Web The layer language model (Berners-Lee, 2001; Broekstra et al, 2001)
  • 20. <h1>Student Service Centre</h1> Welcome to the home page of the Student Service Centre. The centre is located in the main building of the University. You may visit us for assistance during working days. <h2>Office hours</h2> Mon to Thu 8am - 6pm<br> Fri 8am - 2pm<p> But note that centre is not open during the weeks of the <a href=”. . .”>State Of Origin</a>. Figure 3.2 Example of a Web page of a Student Service Centre
  • 21. <organization> <serviceOffered>Admission</serviceOffered> <organizationName>Student Service Centre</organizationName> <staff> <director>John Roth</director> <secretary>Penny Brenner</secretary> </staff> </organization> Figure 3.3 Example of a Web page of a Student Service Centre
  • 22. Figure 3.4 Representing classes and instances (Noy et al., 2001)
  • 23.
  • 24. Edward lecturer @name Bunker course @title Algorithms course Computati @title onal Algebra lecturer @name Daniela Frost Nonlinear course @title Analysis root college Sam @name Hoofer Discrete lecturer course @title Structures Modern course co rse @title Algebra Nonlinear course @title Analysis location Innsbruck
  • 25.
  • 26. Queries 1 and 2 Edward lecturer @name Bunker course @title Algorithms course Computati @title onal Algebra lecturer @name Daniela Frost Nonlinear course @title Analysis root college Sam @name Hoofer Discrete lecturer course @title Structures Modern course @title Algebra Nonlinear course @title Analysis location Innsbruck
  • 27. Queries 3 and 4 Edward lecturer @name Bunker course @title Algorithms course Computati @title onal Algebra lecturer @name Daniela Frost Nonlinear course @title Analysis root college Sam @ @name Hoofer Discrete lecturer course @title Structures Modern course @title Algebra Nonlinear course @title Analysis location Innsbruck
  • 28.
  • 29.
  • 30.
  • 31. <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" p // g/ / / y # xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about=""> <dc:title> Building an Intelligent Web: Theory and Practice </dc:title> <dc:creator> Rajendra Akerkar and Pawan Lingras </dc:creator> </rdf:Description> </rdf:RDF> Figure 3.26 Fragment of RDF
  • 32. A RDF model for automobiles
  • 33. <?xml version="1.0"?> <rdf:RDF xmlns:rdf http://www.w3.org/1999/02/22 rdf syntax ns# xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:my="http://www.myvehicle.com/vehicle-schema/"> <rdfs:Class rdf:about="#Vehicle"/> <rdfs:Class rdf:about="#Car"> <rdfs:subClassOf rdf:resource="#Vehicle"/> </rdfs:Class> <rdf:Property rdf:about="#name"> df P t df b t "# " <rdfs:domain rdf:resource="#Vehicle"/> </rdf:Property> <rdf:Description rdf:about="#Ford"> <rdf:type rdf:resource="#Car"/> <my:name>Ford Icon</my:name> </rdf:Description> <my:Truck rdf:about="#Mitsubishi"> <my:name>Mitsubishi</my:name> <my:carry rdf:resource="#Mitsubishi"/> </my:Truck> </rdf:RDF> Figure 3.29 RDF/XML file for the automobile example
  • 34. <?xml version="1.0"?> <topicMap id="tmrf" xmlns = 'http://www.topicmaps.org/xtm/1.0/' xmlns:xlink = 'http://www.w3.org/1999/xlink'> <!-- The map contains information about Technomathematics Research Foundation. We can include comment and narrative here… --> .... here my topics and my associations go ... </topicMap> Figure 3.30 A Topic Map document (Adopted from http://topicmaps.bond.edu.au/docs/6/1)
  • 36. Data Preparation • Database Theory • SQL • Data Transformation • http://www.ecn.purdue.edu/KDDCUP/data/
  • 37. Classification • Find a rule, a formula, or black box classifier for organizing data into classes. – Classify clients requesting loans into categories based on the likelihood of repayment p y – Classify customers into Big or Moderate Spenders based on what they buy – Classify the customers into loyal, semi-loyal, semi loyal, infrequent based on the products they buy • The classifier is developed from the data in the training set • The reliability of the classifier is evaluated using the test set of data
  • 38. Classification • ID3 Algorithm – Numerical Illustration – Application to a Small E commerce Dataset E-commerce • C4.5 for Experimentation • Other approaches – Neural Networks – Fuzzy Classification – Rough Set Theory
  • 39. Association • Market basket analysis – determine which things go together • Transactions might reveal that – customers who buy banana also buy candles – cheese and pickled onions seem to occur frequently in a shopping cart • Information can be used for – arranging a physical shop or structuring the Web site – for targeted advertising campaign
  • 40. Association • Apriori Algorithm • D Demonstration f an E-commerce t ti for E Application
  • 42. Clustering • Breaks a large database into different subgroups or clusters • Unlike classification there are no predefined classes • Th clusters are put t The l t t together on th basis th the b i of similarity to each other • The data miners determine whether the clusters offer any useful insight
  • 43. 5 4 3 2 1 0 0 1 2 3 4 5
  • 44. Statistical Methods • k – means – Numerical Example – Implementation • Data Preparation • Clustering • Other Methods
  • 45. Neural Network Based Approaches • Kohonen Self Organising Maps – Numerical Demonstration – Application to Web Data Collection • Oth Neural N t Other N l Network B k Based A d Approaches h
  • 47. Web Mining Web Content W bC t t Web Structure W b St t Web Usage W bU Mining Mining Mining General Web Page Search Result Customized Access Pattern Content Mining Mining Usage Tracking Tracking
  • 49. High level web usage mining process (Srivastava et al., 2000) (S i t t l
  • 50. Applications of web usage mining (Romanko, 2006; Srivastava et al., 2000)
  • 51. 140.14.6.11 - pawan [06/Sep/2001:10:46:07 -0300] "GET /s.htm HTTP/1.0" 200 2267 140.14.7.18 - raj [06/Sep/2001:11:23:53 -0300] "POST /s.cgi HTTP/1.0" 200 499
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 68.
  • 69.
  • 70. Classification exercise Channel Recall Precision Finance 44.3% 98.27% Health 52.3% 52 3% 89.66% 89 66% Market 49.1% 83.34% News 44.1% 89.27% Shopping 31.5% 91.31% Specials 60.2% 92.86% Sport 50.0% 91.93% Surveys 21.9% 92.66% Theatre 54.8% 94.63% Table 6.8 Precision and recall for predicting user’s interest in channels user s (Baglioni, et al., 2003)
  • 71. Association exercise News Minimum Maximum Mean Standard Section Requests Requests q q Requests Deviation q Science 1 97 2.3034 2.8184 Culture 1 208 3.7878 5.9742 Sports 1 318 5.6985 10.8360 Economics 1 258 3.9335 7.2341 International 1 208 3.3823 5.5540 Local Lisbon L l Li b 1 460 5.6883 5 6883 11.5650 11 5650 Local Port 1 256 7.5984 13.2351 Politics 1 208 3.3577 5.4101 Society 1 367 4.2673 7.9853 Education 1 90 2.6496 3.29090 Table 6.9 Summary statistics of requests to the Publico on-line newspaper (Batista and Silva, 2002)
  • 72. The association mining showed strong associations between the following pairs: ď‚· Politics and Society ď‚· Politics and International News ď‚· Politics and Sports ď‚· Society and International News ď‚· Society and Local Lisbon ď‚· S Society and Sports y Sp ď‚· Society and Culture ď‚· Sports and International News p
  • 73. Sequence Pattern Analysis of Web L W b Logs
  • 74.
  • 75.
  • 76.
  • 78. Data Collection • Web Crawlers • Public P blic Domain Web Cra lers Crawlers • An Implementation of a Web Crawler
  • 79. Architecture of a search engine (Romanko, 2006)
  • 80.
  • 81.
  • 82.
  • 83. Other topics in Web Content Mining • Search Engines – How to prepare for and setup a search engine – Types and listings of search engines (freeware, remote hosting services, commercial) • Multimedia Information Retrieval
  • 85. 0/10: The site or page is probably new. 3/10: The site is perhaps new, small in size and has very little or no worthwhile arriving links. The page gets very little traffic. 5/10: The site has a fair amount of worthwhile arriving links and traffic volume. The site might be larger in size and gets a good amount of steady traffic with some return visitors. 8/10: The site has many arriving links, probably from other high PageRank pages. The site perhaps contains a lot of information and has a higher traffic flow and return visitor rate. ii 10/10: The Web site is large, popular and has an extremely high number of links pointing to it.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91. Index quality for different search engines (Henzinger, et al., 1999)
  • 92. Index quality per page for different search engines (Henzinger, et al., 1999)
  • 93. Page Freq. Freq. Rank Walk2 Walk1 Walk1 www.microsoft.com/ 3172 1600 1 www.microsoft.com/windows/ie/default.htm 2064 1045 3 www.netscape.com/ 1991 876 6 www.microsoft.com/ie/ www microsoft com/ie/ 1982 1017 4 www.microsoft.com/windows/ie/download/ 1915 943 5 www.microsoft.com/windows/ie/download/all.htm 1696 830 7 www.adobe.com/prodindex/acrobat/readstep.html 1634 780 8 home.netscape.com/ 1581 695 10 www.linkexchange.com/ 1574 763 9 www.yahoo.com/ 1527 1132 2 Table 8.2 Most frequently visited pages (Henzinger, et al., 1999)
  • 94. Site Frequency Frequency Rank Walk 2 Walk 1 Walk 1 www.microsoft.com 32452 16917 1 home.netscape.com 23329 11084 2 www.adobe.com 10884 5539 3 www.amazon.com 10146 5182 4 www.netscape.com 4862 2307 10 excite.netscape.com excite netscape com 4714 2372 9 www.real.com 4494 2777 5 www.lycos.com 4448 2645 6 www.zdnet.com 4038 2562 8 www.linkexchange.com 3738 1940 12 www.yahoo.com www yahoo com 3461 2595 7 Table 8.3 Most frequently visited hosts (Henzinger, et al., 1999)