SlideShare a Scribd company logo
1 of 88
Download to read offline
Evolving Web,
        g
Evolving Search
                   Yuan Tian & Tianqi Chen
    Apex Data & Knowledge Management Lab
               Shanghai Jiao Tong University
Agenda
 g

   Introduction to SJTU
   Introduction to Apex Lab
                     p
   Research
   Demo
Agenda
 g

   Introduction to SJTU
   Introduction to Apex Lab
                     p
   Research
   Demo
Shanghai Jiao Tong University
    g    J       g          y

   Location
   History
          y
   Student
   Campus
Agenda
 g

   Introduction to SJTU
   Introduction to Apex Lab
                     p
   Research
   Demo
Apex Lab
 p
              Director Professor
                  Yong Yu




              Associate Professor
                  Guirong X
                   G i     Xue
Apex Lab
 p

   Research
       Web Search
       Social Web
       Semantic Search
       Machine Learning
       Image Search
Apex Lab
 p

   Project Partners
Apex Lab
 p

   Ph.D. Students
       Haofen Wang
       Jing Lu
       Jia Chen
       Guangcan Liu
       Xian Wu
       Yunbo Cao
                 g
        Ruihua Song
   35 Master Students
Agenda
 g

   Introduction to SJTU
   Introduction to Apex Lab
                     p
   Research
   Demo
Research

   Traditional Web
   Social Web
   Semantic Web
   Machine Learning
Research

   Traditional Web
   Social Web
   Semantic Web
   Machine Learning
Search on Traditional Web

   Focus on how to
       improve search relevance? rank pages?
       integrate mining technologies into search?
       search finer grained objects instead of documents?


   Search Applications
            pp
       General search engine
       Vertical search engine
                          g
       Meta search engine
Expert Search
Expert Search (
  p           (introduction)
                           )




   Treat web page as bag of words
   Q
    Queries are not fully understood
                        y
Expert Search (
  p           (motivation)
                         )

Searching for Experts:
• A more and more important information need

•   PM search for Dev
•   Patient
    P ti t search f D t
                h for Doctor
•   Student search for Professor
•   ……

• Not only in Enterprise
• But also on WWW
Query




            Ranked
             List of
            Experts




An Evidence: an expert and a query co-occur in a document under
certain relation constraint
Research

   Traditional Web
   Social Web
   Semantic Web
   Machine Learning
The Emergence of Web 2.0
        g
   Web gets social

              Web 1.0             ->         Web 2.0
              Publishing          ->        Participation
          Personal Websites       ->          Blogging
    Content Management Systems    ->           Wikis
           Britannica Online      ->
                                   >         Wikipedia
        Directories (taxonomy)    -> Tagging ("folksonomy")

       Lower the barrier for contribution.
       More people are involved. They are less professional.
Search on Web 2.0

   Focus on how to
       elaborate user involved data?
       search on new social media
Deegle
   g

   (WWW 2006, WWW 2007, SIGIR 2008)
Related facets   Search results   Related
                                   tags




                                  Relatd
                                  users
Emotion Analysis on the Blog
            y              g

   Blog can be the resource of the news, but also
    be the stage for representing the emotion

   Enhancing the blog search for different user
Blog Search
   g

I f
 Informative article
        ti     ti l
 News that is similar to the news on traditional
  news websites
          b it
 Technical descriptions, e.g. programming
  techniques.
  techniques
 Commonsense knowledge
 Objective comments on the events in the world
Affective article
 Diaries about personall affairs
            b              ff
 Self-feelings or self-emotions descriptions
Two types of blog
     yp         g
Intent-driven blog search (WWW 2007)
                         Informati                  Snippets
                         ve Sense
                       1    1.00
                            1 00   The catalogue of IBM certification: DB2
                                   Database Administrator DB2 Application
                                   Developer MQSeries Engineer VisualAge
                                   For Java …
                       2   -0.94   Crazy Me! I have hesitated between Acer
                                   and smuggled IBM for one week. I
                                   wouldn’t have taken into account the
                                   price, quality or service if I had enough
                                   money …
                       3    1.00   Selling IBM laptop, t22p3-900, , dvd S3/,
                                         g        p p,      p       ,           ,
                                   independent accelerating display card.
                                   3550      YUAN.        (Post      fee    not
                                   included) .Please contact 30316255. We
                                   guarantee the quality. This product is only
                                   sold within Tianjing city ...
                       4   -0.35   I got a laptop from my friend this week.
                                   Although outdated, it is still a classical
                                   one in IBM enthusiast’s mind. There are
                                   many second hand IBM laptops in the
                                   market. Al h
                                       k Although I h
                                                    h have sold many IBM
                                                                 ld
                                   laptops …
                       5   -0.53   Doctor said that I should make more
                                   preparations mentally. You have stayed
                                   with me for three years, leaving without
                                                       y              g
                                   any words. Do you feel fair for me? Do
                                   you remember the moments we were
                                   together? You are heartless, I hate you! ...
Informative                               Snippets
        Sense
1        1.00      The catalogue of IBM certification: DB2 Database
                   Administrator DB2 Application Developer MQSeries
                   Engineer VisualAge For Java …
2       1.00       Biao Lin is a military talent. Stalin called him “thegifted
                   general .
                   general”. Americans called him “the unbeaten general”.
                                                         the              general .
                   Chiang Kai-shek called him “devil of war”. Biao Lin is a
                   special person in modern history …
3       0.99       Microsoft’s hotmail can only be registered with suffix
                   “@hotmail.com” by default. You can register @msn.com by
                   visiting…
4       0.95
        0 95       Yi Sh
                       Shang i still sending th fil t me. I will practice it l t 1
                             is till     di the file to      ill       ti    later. 1.
                   Start up Instance (db2inst1) db2start; 2. Stop Instance
                   (db2inst1) db2stop …
5       0.84       Name: Lei Zhang.          Student number: 5030309959. Class
                   number: 007. The analysis and review about the tendency of
                   Jilin Chemical Industry’ stock in 2005. Date, Increasing and
                   Decreasing ranges, Open Price, Close Price, Amount of
                   deals …
6       0.01       Recently I like reading the Buddhist Scripture. I can learn
                    philosophies in it. It makes me comfortable. It is from ...
7       -0.11      It’s out of my mind when I first saw it. The water seemed to be
                   exuding from the building. There was much water on the floor
                   of education building. Water was all around us, anywhere you
                   can touch had water. …
8       -0.51      I read an article about the last emperor Po-yee today. I have
                   watched “The Last Emperor” before, which realistically
                   described his life without losing artistry. His love impressed
                   me. As an emperor, he can’t choose the one he loved …
9       -0.53      She is 164 in height with white skin, black hair and long limp
                   leg. I like the girl who has long hair and likes sport and
                   dancing. I like sweet girls. …
10      -0.94      I have many things to do at the end of this semester. There are
                   five
                   fi        final
                             fi l    examinations,
                                           i ti         Discrete
                                                        Di     t     Mathematics,
                                                                     M th     ti
                   Communication Theory, Architecture of Computer, Algorithm
                   and Law. I know little about them. OMG! Only four weeks are
                   left. There are also two projects, Compiler and Operation
                   System. Complier can be easily completed but Operation
                   System …
Research

   Traditional Web
   Social Web
   Semantic Web
   Machine Learning
Our Vision of Semantic Web Search
                       • It covers most of the
                       important topics in SW
                       • A lot of tools are bu in
                             o o oo s e built
                       each layer


                       • 10+ top papers
                       (WWW’09, SIGMOD’09,
                       SIGMOD’08, VLDB’07,
                       ICDE’09, ISWC’07, etc)
Knowledge Engineering Layer
       g    g       g y
   Ontology Engineering
       Orient: Integrating Ontology Engineering into Industry
        Tooling Environment (ISWC 2004)
   Ontology Learning & Population
    O t l    L    i     P   l ti
       EachWiki: Facilitating Semantics Reuse for Wikipedia
        Authoring (ISWC/ASWC 2007)
          u o g ( S C/ S C 00 )
       PORE: Semi-supervised Positive Only Relation Extraction
        from Wikipedia (ISWC/ASWC 2007)
       HS E l
            Explorer: Unsupervised Hierarchical Semantics Explorer
                      U         i d Hi     hi l S      ti E l
        for Social Annotations (ISWC/ASWC 2007)
       Catriple: Extracting Triples from Wikipedia Categories
             p             g p                p         g
        (ASWC 2008)
Indexing and Search Layer
       g              y
   Ontology Query Engine based on DBMS
       SOR: A Practical System for OWL Ontology Storage,
        Reasoning and Search (VLDB 2007, SIGMOD 2008)
   Annotation-based Semantic Search Engine (DB + IR)
    A   t ti b     dS     ti S     hE i
       CE2: Towards Large Scale Annotation-based Semantic
        Sea c (CIKM 2008)
        Search (C     008)
   An Extension to IR index for Relational Search
       Semplore: An IR Approach to Scalable Hybrid Query of
        Semantic Web Data (ISWC/ASWC 2007, ASWC 2008,
        WWW 2009, JWS)
   Pattern-based
    Pattern based RDF Store
SOR

   Semantic Object
    Repository
   Based on IBM DB2
   Supports T-Box
             T Box
    reasoning
Semplore
   p

   Extension to
    traditional IR engine
   Ranking is
    considered
CE^2

   Search over
    semantically annotated
    corpus
   Combination of DB and
    IR search engines
Pattern-based RDF store

   Learning to materialize join results
   Efficient retrieval of pattern matches
                           p
   Reasonable extra space -> Significant
    performance increase (on some dataset)
Query Interface and User Interaction
Layer
   Keyword Interface for Semantic Search
       Q2Semantic: Lightweight Ontology based Keyword
        Interpretation for Semantic Search (ESWC 2008, ICDE
        2009)
   Natural Language Interface for Semantic Search
       PANTO: A Portable Natural Language Interface to
        Ontologies (ESWC 2007)
   Snippet Generation
       Snippet Generation for Semantic Web Search Engines
        (ASWC 2008)
   Ontology Presentation
       ZoomRDF: Semantic-driven Fisheye Zooming for RDF Data
        (WWW 2010)
Q
Q2Semantic

   Structured queries vs.
    keyword queries
   Structural data
RDF Snippet
       pp

   Representation of
    search results
   How will you know
    which answers are
    most relevant?
ZoomRDF
Research

   Traditional Web
   Social Web
   Semantic Web
   Machine Learning
Agenda
 g

   Introduction to SJTU
   Introduction to Apex Lab
                     p
   Research
   Demo
How to make them as a whole?
   We focused on Semantic Web
    search
       Closed corpus / one single data source
        involved
       Just scale to million triples
       Uncertainty is not fully considered or used
                    y           y

   We need Semantic Web               search,
    however
       More th 11 million data sources (Web
        M     than      illi d t           (W b
        heterogeneity)
       More than 2 billion triples (Scalability)
       Uncertainty everywhere
   Thus, we carefully consider the
    following topics
       Pay as you go for semantic data integration
       Semantic search engine towards billion
        triples                                             Missing
       User-friendly query Interface for Semantic
        Web                                           Let’s
                                                      Let s Forget
Hermes (2nd place Billion Triple Challenge,
   SIGMOD 2009, JWS)
   S                 S
1. Integrate and index data sources                       2. Understand user’s need                                  3. Search and refine
                                                            Input keywords
                                                              p     y                    Select a query 
                                                                                                  q y                        Refine or navigate
                                                                                                                                           g

                                                                           3
                                                       “Article        2
                                                       Stanford    1                                         Results
                                                                                                             Rudi Studer, 
                                                       Turing                                                Semantic Web
                                                       Award”                                                ...
                                                                                                             Suggestions
                                                                                                             Affiliations
                                                                                                             ...

 Schema‐level Mapping            Data‐level Mapping

     Graph Data Processing                            Keyword Translation                           Distributed Query Processing

       Data Graph Summarization                                                                  Query Graph                  Result Combination 
                                                        Keyword Mapping
                                                                                                Decomposition                     & Ranking

        Element Label Extraction                           Top‐k Query 
                                                                                               Query Planning
                                                                                               Query Planning                     Local Query 
                                                                                                                                  Local Query
                                                           Graph Search
                                                           G hS       h
                                                                                               & Optimization                      Processing
         Graph Element Scoring


           Mapping Discovery
           Mapping Discovery
                                                                                         Internal Indices
               Indexing
                                                        Keyword                Schema            Structure           Mapping            Graph Indices
                                                         Index                 Index               Index              Index
Heterogeneous Transfer
Learning
L    i

   Machine Learning Team
                  g
   APEXLAB
   Shanghai Jiao Tong University
Machine Learning Team in APEX
               g

   Focus on machine learning and its application in
    Web mining and IR.
       Transfer learning
       Advertising Techniques in Web
       Short text classification&clustering
       Multiligual search result integeration
Outline

   Introduction to heterogeneous transfer learning
   Cross media: Text  Imageg
       Clustering
       Classification
   Cross language: English  Chinese
   Application: Visual Contextual Advertising




                                                      47
Outline

   Introduction to heterogeneous transfer learning
   Cross media: Text  Imageg
       Clustering
       Classification
   Cross language: English  Chinese
   Application: Visual Contextual Advertising




                                                      48
Traditional machine learning
                           g

   training data and test data in a same
    distribution.




     Training data: T t d
     T i i d t newsda
                    Test                    49
Transfer learning
                g

   Transfer learning: distributions are not
    identical.




     Training data:Test data
            g       news
                                               50
Heterogeneous Transfer Learning
      g                       g

   Learning across different feature spaces.

     A fixed-wing aircraft, typically
     called an airplane, aeroplane or
     simply plane, is an aircraft
     capable of flight using forward
     motion that generates lift as
     the wing moves through the
     air…


     An automobile, motor car or
     car is a wheeled motor vehicle
     used for transporting
     p
     passengers, which also carries
             g ,
     its own engine or motor...




    Training data: Text Do
    T i i d TT data
                 Test d D                       51
Related Areas of Heterogeneous Learning
                            g              g

                                                 Multiple Domain
                                                       Data

                                                   Feature Space
                                                  among Domains
                              Heterogeneous                             Homogeneous


                          Instance
                                                                              Data Distribution
                                                                              D t Di t ib ti
                      Correspondences
                                                                              among Domains
  Each instance in    among Domains           There are few or
  one                                         no
                                                                          Different          Same
  domain has its                              Instance
  correspondences                             correspondence
  In other domains                            among domains
             Multi-view           Heterogeneous                  Transfer Learning
                                                                                             Traditional
             Learning            Transfer Learning                across Different
                                                                                          Machine Learning
                                                                   Distributions
                                Apple is a     Banana is
Source                          fr-uit that    the
Domain                          can be         common
                                found …        name for…


Target
Domain

                                                                                                             52
Related Areas of Heterogeneous Learning
                            g              g

                                                 Multiple Domain
                                                       Data

                                                   Feature Space
                                                  among Domains
                              Heterogeneous                             Homogeneous


                          Instance
                                                                              Data Distribution
                                                                              D t Di t ib ti
                      Correspondences
                                                                              among Domains
  Each instance in    among Domains           There are few or
  one                                         no
                                                                          Different          Same
  domain has its                              Instance
  correspondences                             correspondence
  In other domains                            among domains
             Multi-view           Heterogeneous                  Transfer Learning
                                                                                             Traditional
             Learning            Transfer Learning                across Different
                                                                                          Machine Learning
                                                                   Distributions
                                Apple is a     Banana is
Source                          fr-uit that    the
Domain                          can be         common
                                found …        name for…


Target
Domain

                                                                                                             53
Related Areas of Heterogeneous Learning
                            g              g

                                                 Multiple Domain
                                                       Data

                                                   Feature Space
                                                  among Domains
                              Heterogeneous                             Homogeneous


                          Instance
                                                                              Data Distribution
                                                                              D t Di t ib ti
                      Correspondences
                                                                              among Domains
  Each instance in    among Domains           There are few or
  one                                         no
                                                                          Different          Same
  domain has its                              Instance
  correspondences                             correspondence
  In other domains                            among domains
             Multi-view           Heterogeneous                  Transfer Learning
                                                                                             Traditional
             Learning            Transfer Learning                across Different
                                                                                          Machine Learning
                                                                   Distributions
                                Apple is a     Banana is
Source                          fr-uit that    the
Domain                          can be         common
                                found …        name for…


Target
Domain

                                                                                                             54
Related Areas of Heterogeneous Learning
                            g              g

                                                 Multiple Domain
                                                       Data

                                                   Feature Space
                                                  among Domains
                              Heterogeneous                             Homogeneous


                          Instance
                                                                              Data Distribution
                                                                              D t Di t ib ti
                      Correspondences
                                                                              among Domains
  Each instance in    among Domains           There are few or
  one                                         no
                                                                          Different          Same
  domain has its                              Instance
  correspondences                             correspondence
  In other domains                            among domains
             Multi-view           Heterogeneous                  Transfer Learning
                                                                                             Traditional
             Learning            Transfer Learning                across Different
                                                                                          Machine Learning
                                                                   Distributions
                                Apple is a     Banana is
Source                          fr-uit that    the
Domain                          can be         common
                                found …        name for…


Target
Domain

                                                                                                             55
Related Areas of Heterogeneous Learning
                            g              g

                                                 Multiple Domain
                                                       Data

                                                   Feature Space
                                                  among Domains
                              Heterogeneous                             Homogeneous


                          Instance
                                                                              Data Distribution
                                                                              D t Di t ib ti
                      Correspondences
                                                                              among Domains
  Each instance in    among Domains           There are few or
  one                                         no
                                                                          Different          Same
  domain has its                              Instance
  correspondences                             correspondence
  In other domains                            among domains
             Multi-view           Heterogeneous                  Transfer Learning
                                                                                             Traditional
             Learning            Transfer Learning                across Different
                                                                                          Machine Learning
                                                                   Distributions
                                Apple is a     Banana is
Source                          fr-uit that    the
Domain                          can be         common
                                found …        name for…


Target
Domain

                                                                                                             56
Outline

   Introduction to heterogeneous transfer learning
   Cross media: Text  Imageg
       Classification
       Clusteringg
   Cross language: English  Chinese
   Application: Visual Contextual Advertising




                                                      57
Text to Images
    [Dai et al. NIPS 2008] [Lin et al. APWeb 2010]

   Mining and learning the multimedia data is
    becoming increasing important

   Limited b
    Li i d by scarce labeled image data, can we
                     l b l di      d
    use abundant text data in the Web?


       Our answer is YES


                                                     58
Objective


Ele
             Learning
ph
ma  In                O
an pu
ssi
  i                   ut
                  translati
ts  t
t In
ve                    pu
                  ng  O
             Learning


are 
ho pu                  t
                  learning 
                      ut      59
Basic Ideas
Exploiting co occurrence data as a bridge between text and image
           co-occurrence
Data Sets

   Documents from ODP
   Images from Caltech-256
       g
Experimental Result
  p
Approach 2: Naïve Bayes Way
 pp                 y     y
[Lin et al. APWeb 2010]




                          P(v|w)P(w|c)
                          P( | )P( | )
                 P(w|c)
                               P(v|w)
Text-aided Image Classification
              g
(TAIC)




                                  64
Experiments: TAIC
  p

   Data sets: 9 binary classification data sets and 5
    are six-class classification data sets
       Image data from Caltech-256 and Fifteen scene
       Auxiliary text data from Open Directory Project


   Baseline methods
       Base classifiers: Naïve Bayes (NBC) and Support
        vector machine (SVM)




                                                          65
Evaluation 1: Classification




                      Heterogeneous TL    No‐Heterogeneous TL 

 Average Error Rate
 Average Error Rate        0.318                 0.334



                      4 8% error reduc
                                                                 66
Outline

   Introduction to heterogeneous transfer learning
   Cross media: Text  Imageg
       Classification
       Clusteringg
   Cross language: English  Chinese
   Application: Visual Contextual Advertising




                                                      67
Text-aided Image Cl t rin
     T t id d Im      Clustering
      [Yang et al. ACL 2009]
   Image clustering is a effective method for
    increasing accessibility of image search result

          Apple =


                               OR do not work
    But traditional clustering methods
    well with small amount of data
   We consider use annotated images in the social
             d               d           h       l
    Web to help image clustering

                                                      68
Annotated PLSA Model for Clustering
   Leveraging the
    auxiliary text data by
    using the topics
           Z             as   From Flickr.
                                   Flickr
    a bridge
Words
W d
from  Topics
        Image features
Aux
  Ima
  I
Data                                   69
Making the transfer…
     g
    Log-likelihood objective function
        Two parts: i
         T        t image f t
                          features and auxiliary text
                                     d    ili    t t
         features
                Image feature to image instance correlation: A
                Word feature to image feature correlation: B




     trade Nor 
     L 
                  
                  
                           Aij
                                        log P ( f j | vi )  (1   )
                                                                            B lj                        
                                                                                      log P ( f j | wl )
                             Aij '                                           B                       

     -off mali
      off                                                                                              
             j         i   j'                                           l    j ' lj '




     paraLik lih d of lih
              zatio Lik
                 ti Likeliho
            Likelihood f                                                                                    70
Experiment Setup
  p            p

   Data sets:
       Generated from Caltech-256 and 15-scene corpora
   Baseline methods
       Baseline clustering methods: KMeans, PLSA and STC
       Strategies:
           clustering on target image data only
           combined: clustering target image data and annotated image
            data together and evaluate result for target image data




                                                                         71
Experimental Result
  p
             KM_Seperate   KM_Combine   PLSA_Seperate   PLSA_Combine     STC   aPLSA
            2
           1.8
           1.6
           1.4
           14
           1.2
 Entropy




            1
           0.8
           0.6
           06
           0.4
           0.2
            0




                                 Heterogeneous TL          No‐Heterogeneous TL 

       Average Entropy
       Average Entropy                  0.741                          0.786



                                5 7% entroy redu
                                                                                       72
Clustering Results
         g
on Caltech256 [Griffin et al. TR 2007]
frog
f k kayak j
        k jesus-christh
         bear
         b       watch
                 h it t




                                         73
Outline

   Introduction to heterogeneous transfer learning
   Cross media: Text  Imageg
       Clustering
       Classification
   Cross language: English  Chinese
   Application: Visual Contextual Advertising




                                                      74
Cross-language Classification
         g g
[Ling et al. WWW 2008]

                         Classifier

                learn                 classify


               Labeled                   Unlabeled 
             Chinese Web 
             Chinese Web                Chinese Web
                                                Web 
                pages                      pages




                         Text Classification           75
Cross-language Classification
         g g
    Much labelled data in English, but few in
                             g ,
     Chinese.

Labeled Data      English            Chinese
News              Reuters 21578
                  Reuters‐21578      ?
newsgroups        20 Newsgroups      ?
Web pages         Open Document      Very few ODP 
                  Project            data 
                                     data
                  (> 1M)             (< 20k, ~ 1%)

                                                     76
Cross-language Classification
         g g
                    Classifier

       learn                     classify


       Labeled                      Unlabeled 
     English Web                   Chinese Web 
        pages                         pages




           Cross‐language Classification
                                                  77
Cross-language Classification
         g g
   Information Bottleneck
   X : signals to be encoded (Web pages)
             l     b       d d( b       )

     
    X : codewords (class labels)
   Y : features related to X (terms)



                                       X




                                            78
Cross-language Classification
         g g

   Optimization

        minimize
           Information betw
           Information betw


                          Minimize 
                          this distance

                                          79
Cross-language Classification
              g g

   Performance




                                     80
Outline

   Introduction to heterogeneous transfer learning
   Cross media: Text  Imageg
       Clustering
       Classification
   Cross language: English  Chinese
   Application: Visual Contextual Advertising




                                                      81
Application: Visual Contextual
Advertising
[
[Chen et al. AAAI 2010]
                      ]
   Previous research focused on advertising for text
    P i               hf       d     d ti i f t t
    Web pages.
   With th b
          the booming of multimedia data, we need
                    i    f    lti di d t             d
    to recommend advertisement for these data
   Difficulty: image and the text i different f t
    Diffi lt i           d th t t in diff      t feature
    spaces
   Use th
    U the co-occurrence d t t b id these two
                             data to bridge th     t
    feature spaces
Figure illustration of Visual Contextual
  g
Advertising
Visual Contextual Advertising
                            g

        (based
        on the
        independ is
We assume that there
        ent
        assumpti  i
Where
Experimental Results
  p

   Co-occurrence data from Flickr.
   Test Image from Flickr and Fifteen scene data
             g
    set
   Advertisement are crawled from MSN search
    engine with queries chosen from AOL query log.
Experimental Result
  p
Experimental Result
  p
Thank you
      y

   For more details of APEXLAB
       http://apex.sjtu.edu.cn/apex_wiki/FrontPage
   Our works
       http://apex.sjtu.edu.cn/apex_wiki/Papers
           p // p    j         / p _     / p

More Related Content

Similar to SJTU Apex Lab Research on Evolving Web and Search

MDN Development & Web Documentation
MDN Development & Web DocumentationMDN Development & Web Documentation
MDN Development & Web DocumentationJay Patel
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Digital Reasoning
 
Free and Stress Free!
Free and Stress Free!Free and Stress Free!
Free and Stress Free!Ellie Dworak
 
Alla ricerca della User Story perduta
Alla ricerca della User Story perdutaAlla ricerca della User Story perduta
Alla ricerca della User Story perdutaEdoardo Schepis
 
Alla ricerca della user story perduta
Alla ricerca della user story perdutaAlla ricerca della user story perduta
Alla ricerca della user story perdutaBetter Software
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
SEO in the Age of Entities: Using Schema.org for Findability
SEO in the Age of Entities: Using Schema.org for FindabilitySEO in the Age of Entities: Using Schema.org for Findability
SEO in the Age of Entities: Using Schema.org for FindabilityJonathon Colman
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedStanford University
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarizationGeorge Ang
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1Aseel Addawood
 
2.0 Research Toolbox: Tools for organizing and sharing resources with students
2.0 Research Toolbox: Tools for organizing and sharing resources with students2.0 Research Toolbox: Tools for organizing and sharing resources with students
2.0 Research Toolbox: Tools for organizing and sharing resources with studentsleederk
 
Hack your work. Toolbox for non-profits
Hack your work. Toolbox for non-profitsHack your work. Toolbox for non-profits
Hack your work. Toolbox for non-profitsLinda Liukas
 
Leveraging Adobe JavaScript Virtual Machine
Leveraging Adobe JavaScript Virtual MachineLeveraging Adobe JavaScript Virtual Machine
Leveraging Adobe JavaScript Virtual MachineZ Chen
 
Microblogging 101 for Corporate Communicators
Microblogging 101 for Corporate CommunicatorsMicroblogging 101 for Corporate Communicators
Microblogging 101 for Corporate CommunicatorsAndre
 
Blacks In Technology BMI Tech Workshop preso
Blacks In Technology BMI Tech Workshop presoBlacks In Technology BMI Tech Workshop preso
Blacks In Technology BMI Tech Workshop presoblacksintechnology
 

Similar to SJTU Apex Lab Research on Evolving Web and Search (20)

MDN Development & Web Documentation
MDN Development & Web DocumentationMDN Development & Web Documentation
MDN Development & Web Documentation
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...
 
Free and Stress Free!
Free and Stress Free!Free and Stress Free!
Free and Stress Free!
 
Alla ricerca della User Story perduta
Alla ricerca della User Story perdutaAlla ricerca della User Story perduta
Alla ricerca della User Story perduta
 
Alla ricerca della user story perduta
Alla ricerca della user story perdutaAlla ricerca della user story perduta
Alla ricerca della user story perduta
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
SEO in the Age of Entities: Using Schema.org for Findability
SEO in the Age of Entities: Using Schema.org for FindabilitySEO in the Age of Entities: Using Schema.org for Findability
SEO in the Age of Entities: Using Schema.org for Findability
 
Web design idea presentation
Web design idea presentationWeb design idea presentation
Web design idea presentation
 
Magic of SEO
Magic of SEOMagic of SEO
Magic of SEO
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarization
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1
 
2.0 Research Toolbox: Tools for organizing and sharing resources with students
2.0 Research Toolbox: Tools for organizing and sharing resources with students2.0 Research Toolbox: Tools for organizing and sharing resources with students
2.0 Research Toolbox: Tools for organizing and sharing resources with students
 
Hack your work. Toolbox for non-profits
Hack your work. Toolbox for non-profitsHack your work. Toolbox for non-profits
Hack your work. Toolbox for non-profits
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
Leveraging Adobe JavaScript Virtual Machine
Leveraging Adobe JavaScript Virtual MachineLeveraging Adobe JavaScript Virtual Machine
Leveraging Adobe JavaScript Virtual Machine
 
Microblogging 101 for Corporate Communicators
Microblogging 101 for Corporate CommunicatorsMicroblogging 101 for Corporate Communicators
Microblogging 101 for Corporate Communicators
 
slideshare
slideshareslideshare
slideshare
 
Blacks In Technology BMI Tech Workshop preso
Blacks In Technology BMI Tech Workshop presoBlacks In Technology BMI Tech Workshop preso
Blacks In Technology BMI Tech Workshop preso
 

More from net2-project

Random Manhattan Indexing
Random Manhattan IndexingRandom Manhattan Indexing
Random Manhattan Indexingnet2-project
 
Borders of Decidability in Verification of Data-Centric Dynamic Systems
Borders of Decidability in Verification of Data-Centric Dynamic SystemsBorders of Decidability in Verification of Data-Centric Dynamic Systems
Borders of Decidability in Verification of Data-Centric Dynamic Systemsnet2-project
 
Exchanging OWL 2 QL Knowledge Bases
Exchanging OWL 2 QL Knowledge BasesExchanging OWL 2 QL Knowledge Bases
Exchanging OWL 2 QL Knowledge Basesnet2-project
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1net2-project
 
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...net2-project
 
Extending DBpedia (LOD) using WikiTables
Extending DBpedia (LOD) using WikiTablesExtending DBpedia (LOD) using WikiTables
Extending DBpedia (LOD) using WikiTablesnet2-project
 
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...net2-project
 
Exchanging more than Complete Data
Exchanging more than Complete DataExchanging more than Complete Data
Exchanging more than Complete Datanet2-project
 
Answer-set programming
Answer-set programmingAnswer-set programming
Answer-set programmingnet2-project
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 

More from net2-project (11)

Random Manhattan Indexing
Random Manhattan IndexingRandom Manhattan Indexing
Random Manhattan Indexing
 
Borders of Decidability in Verification of Data-Centric Dynamic Systems
Borders of Decidability in Verification of Data-Centric Dynamic SystemsBorders of Decidability in Verification of Data-Centric Dynamic Systems
Borders of Decidability in Verification of Data-Centric Dynamic Systems
 
Exchanging OWL 2 QL Knowledge Bases
Exchanging OWL 2 QL Knowledge BasesExchanging OWL 2 QL Knowledge Bases
Exchanging OWL 2 QL Knowledge Bases
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1
 
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
 
Extending DBpedia (LOD) using WikiTables
Extending DBpedia (LOD) using WikiTablesExtending DBpedia (LOD) using WikiTables
Extending DBpedia (LOD) using WikiTables
 
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
 
Exchanging more than Complete Data
Exchanging more than Complete DataExchanging more than Complete Data
Exchanging more than Complete Data
 
Answer-set programming
Answer-set programmingAnswer-set programming
Answer-set programming
 
XSPARQL Tutorial
XSPARQL TutorialXSPARQL Tutorial
XSPARQL Tutorial
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 

Recently uploaded

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 

Recently uploaded (20)

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 

SJTU Apex Lab Research on Evolving Web and Search

  • 1. Evolving Web, g Evolving Search Yuan Tian & Tianqi Chen Apex Data & Knowledge Management Lab Shanghai Jiao Tong University
  • 2. Agenda g  Introduction to SJTU  Introduction to Apex Lab p  Research  Demo
  • 3. Agenda g  Introduction to SJTU  Introduction to Apex Lab p  Research  Demo
  • 4. Shanghai Jiao Tong University g J g y  Location  History y  Student  Campus
  • 5. Agenda g  Introduction to SJTU  Introduction to Apex Lab p  Research  Demo
  • 6. Apex Lab p  Director Professor  Yong Yu  Associate Professor  Guirong X G i Xue
  • 7. Apex Lab p  Research  Web Search  Social Web  Semantic Search  Machine Learning  Image Search
  • 8. Apex Lab p  Project Partners
  • 9. Apex Lab p  Ph.D. Students  Haofen Wang  Jing Lu  Jia Chen  Guangcan Liu  Xian Wu  Yunbo Cao  g Ruihua Song  35 Master Students
  • 10. Agenda g  Introduction to SJTU  Introduction to Apex Lab p  Research  Demo
  • 11. Research  Traditional Web  Social Web  Semantic Web  Machine Learning
  • 12. Research  Traditional Web  Social Web  Semantic Web  Machine Learning
  • 13. Search on Traditional Web  Focus on how to  improve search relevance? rank pages?  integrate mining technologies into search?  search finer grained objects instead of documents?  Search Applications pp  General search engine  Vertical search engine g  Meta search engine
  • 15. Expert Search ( p (introduction) )  Treat web page as bag of words  Q Queries are not fully understood y
  • 16. Expert Search ( p (motivation) ) Searching for Experts: • A more and more important information need • PM search for Dev • Patient P ti t search f D t h for Doctor • Student search for Professor • …… • Not only in Enterprise • But also on WWW
  • 17. Query Ranked List of Experts An Evidence: an expert and a query co-occur in a document under certain relation constraint
  • 18. Research  Traditional Web  Social Web  Semantic Web  Machine Learning
  • 19. The Emergence of Web 2.0 g  Web gets social Web 1.0 -> Web 2.0 Publishing -> Participation Personal Websites -> Blogging Content Management Systems -> Wikis Britannica Online -> > Wikipedia Directories (taxonomy) -> Tagging ("folksonomy")  Lower the barrier for contribution.  More people are involved. They are less professional.
  • 20. Search on Web 2.0  Focus on how to  elaborate user involved data?  search on new social media
  • 21. Deegle g  (WWW 2006, WWW 2007, SIGIR 2008)
  • 22. Related facets Search results Related tags Relatd users
  • 23. Emotion Analysis on the Blog y g  Blog can be the resource of the news, but also be the stage for representing the emotion  Enhancing the blog search for different user
  • 24. Blog Search g I f Informative article ti ti l News that is similar to the news on traditional news websites b it Technical descriptions, e.g. programming techniques. techniques Commonsense knowledge Objective comments on the events in the world Affective article Diaries about personall affairs b ff Self-feelings or self-emotions descriptions
  • 25. Two types of blog yp g
  • 26. Intent-driven blog search (WWW 2007) Informati Snippets ve Sense 1 1.00 1 00 The catalogue of IBM certification: DB2 Database Administrator DB2 Application Developer MQSeries Engineer VisualAge For Java … 2 -0.94 Crazy Me! I have hesitated between Acer and smuggled IBM for one week. I wouldn’t have taken into account the price, quality or service if I had enough money … 3 1.00 Selling IBM laptop, t22p3-900, , dvd S3/, g p p, p , , independent accelerating display card. 3550 YUAN. (Post fee not included) .Please contact 30316255. We guarantee the quality. This product is only sold within Tianjing city ... 4 -0.35 I got a laptop from my friend this week. Although outdated, it is still a classical one in IBM enthusiast’s mind. There are many second hand IBM laptops in the market. Al h k Although I h h have sold many IBM ld laptops … 5 -0.53 Doctor said that I should make more preparations mentally. You have stayed with me for three years, leaving without y g any words. Do you feel fair for me? Do you remember the moments we were together? You are heartless, I hate you! ...
  • 27. Informative Snippets Sense 1 1.00 The catalogue of IBM certification: DB2 Database Administrator DB2 Application Developer MQSeries Engineer VisualAge For Java … 2 1.00 Biao Lin is a military talent. Stalin called him “thegifted general . general”. Americans called him “the unbeaten general”. the general . Chiang Kai-shek called him “devil of war”. Biao Lin is a special person in modern history … 3 0.99 Microsoft’s hotmail can only be registered with suffix “@hotmail.com” by default. You can register @msn.com by visiting… 4 0.95 0 95 Yi Sh Shang i still sending th fil t me. I will practice it l t 1 is till di the file to ill ti later. 1. Start up Instance (db2inst1) db2start; 2. Stop Instance (db2inst1) db2stop … 5 0.84 Name: Lei Zhang. Student number: 5030309959. Class number: 007. The analysis and review about the tendency of Jilin Chemical Industry’ stock in 2005. Date, Increasing and Decreasing ranges, Open Price, Close Price, Amount of deals … 6 0.01 Recently I like reading the Buddhist Scripture. I can learn philosophies in it. It makes me comfortable. It is from ... 7 -0.11 It’s out of my mind when I first saw it. The water seemed to be exuding from the building. There was much water on the floor of education building. Water was all around us, anywhere you can touch had water. … 8 -0.51 I read an article about the last emperor Po-yee today. I have watched “The Last Emperor” before, which realistically described his life without losing artistry. His love impressed me. As an emperor, he can’t choose the one he loved … 9 -0.53 She is 164 in height with white skin, black hair and long limp leg. I like the girl who has long hair and likes sport and dancing. I like sweet girls. … 10 -0.94 I have many things to do at the end of this semester. There are five fi final fi l examinations, i ti Discrete Di t Mathematics, M th ti Communication Theory, Architecture of Computer, Algorithm and Law. I know little about them. OMG! Only four weeks are left. There are also two projects, Compiler and Operation System. Complier can be easily completed but Operation System …
  • 28. Research  Traditional Web  Social Web  Semantic Web  Machine Learning
  • 29. Our Vision of Semantic Web Search • It covers most of the important topics in SW • A lot of tools are bu in o o oo s e built each layer • 10+ top papers (WWW’09, SIGMOD’09, SIGMOD’08, VLDB’07, ICDE’09, ISWC’07, etc)
  • 30. Knowledge Engineering Layer g g g y  Ontology Engineering  Orient: Integrating Ontology Engineering into Industry Tooling Environment (ISWC 2004)  Ontology Learning & Population O t l L i P l ti  EachWiki: Facilitating Semantics Reuse for Wikipedia Authoring (ISWC/ASWC 2007) u o g ( S C/ S C 00 )  PORE: Semi-supervised Positive Only Relation Extraction from Wikipedia (ISWC/ASWC 2007)  HS E l Explorer: Unsupervised Hierarchical Semantics Explorer U i d Hi hi l S ti E l for Social Annotations (ISWC/ASWC 2007)  Catriple: Extracting Triples from Wikipedia Categories p g p p g (ASWC 2008)
  • 31.
  • 32. Indexing and Search Layer g y  Ontology Query Engine based on DBMS  SOR: A Practical System for OWL Ontology Storage, Reasoning and Search (VLDB 2007, SIGMOD 2008)  Annotation-based Semantic Search Engine (DB + IR) A t ti b dS ti S hE i  CE2: Towards Large Scale Annotation-based Semantic Sea c (CIKM 2008) Search (C 008)  An Extension to IR index for Relational Search  Semplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data (ISWC/ASWC 2007, ASWC 2008, WWW 2009, JWS)  Pattern-based Pattern based RDF Store
  • 33. SOR  Semantic Object Repository  Based on IBM DB2  Supports T-Box T Box reasoning
  • 34. Semplore p  Extension to traditional IR engine  Ranking is considered
  • 35. CE^2  Search over semantically annotated corpus  Combination of DB and IR search engines
  • 36. Pattern-based RDF store  Learning to materialize join results  Efficient retrieval of pattern matches p  Reasonable extra space -> Significant performance increase (on some dataset)
  • 37. Query Interface and User Interaction Layer  Keyword Interface for Semantic Search  Q2Semantic: Lightweight Ontology based Keyword Interpretation for Semantic Search (ESWC 2008, ICDE 2009)  Natural Language Interface for Semantic Search  PANTO: A Portable Natural Language Interface to Ontologies (ESWC 2007)  Snippet Generation  Snippet Generation for Semantic Web Search Engines (ASWC 2008)  Ontology Presentation  ZoomRDF: Semantic-driven Fisheye Zooming for RDF Data (WWW 2010)
  • 38. Q Q2Semantic  Structured queries vs. keyword queries  Structural data
  • 39. RDF Snippet pp  Representation of search results  How will you know which answers are most relevant?
  • 41. Research  Traditional Web  Social Web  Semantic Web  Machine Learning
  • 42. Agenda g  Introduction to SJTU  Introduction to Apex Lab p  Research  Demo
  • 43. How to make them as a whole?  We focused on Semantic Web search  Closed corpus / one single data source involved  Just scale to million triples  Uncertainty is not fully considered or used y y  We need Semantic Web search, however  More th 11 million data sources (Web M than illi d t (W b heterogeneity)  More than 2 billion triples (Scalability)  Uncertainty everywhere  Thus, we carefully consider the following topics  Pay as you go for semantic data integration  Semantic search engine towards billion triples Missing  User-friendly query Interface for Semantic Web Let’s Let s Forget
  • 44. Hermes (2nd place Billion Triple Challenge, SIGMOD 2009, JWS) S S 1. Integrate and index data sources 2. Understand user’s need 3. Search and refine Input keywords p y Select a query  q y Refine or navigate g 3 “Article 2 Stanford 1 Results Rudi Studer,  Turing  Semantic Web Award” ... Suggestions Affiliations ... Schema‐level Mapping Data‐level Mapping Graph Data Processing Keyword Translation Distributed Query Processing Data Graph Summarization Query Graph  Result Combination  Keyword Mapping Decomposition  & Ranking Element Label Extraction Top‐k Query  Query Planning Query Planning Local Query  Local Query Graph Search G hS h & Optimization Processing Graph Element Scoring Mapping Discovery Mapping Discovery Internal Indices Indexing Keyword  Schema  Structure Mapping  Graph Indices Index Index Index Index
  • 45. Heterogeneous Transfer Learning L i Machine Learning Team g APEXLAB Shanghai Jiao Tong University
  • 46. Machine Learning Team in APEX g  Focus on machine learning and its application in Web mining and IR.  Transfer learning  Advertising Techniques in Web  Short text classification&clustering  Multiligual search result integeration
  • 47. Outline  Introduction to heterogeneous transfer learning  Cross media: Text  Imageg  Clustering  Classification  Cross language: English  Chinese  Application: Visual Contextual Advertising 47
  • 48. Outline  Introduction to heterogeneous transfer learning  Cross media: Text  Imageg  Clustering  Classification  Cross language: English  Chinese  Application: Visual Contextual Advertising 48
  • 49. Traditional machine learning g  training data and test data in a same distribution. Training data: T t d T i i d t newsda Test 49
  • 50. Transfer learning g  Transfer learning: distributions are not identical. Training data:Test data g news 50
  • 51. Heterogeneous Transfer Learning g g  Learning across different feature spaces. A fixed-wing aircraft, typically called an airplane, aeroplane or simply plane, is an aircraft capable of flight using forward motion that generates lift as the wing moves through the air… An automobile, motor car or car is a wheeled motor vehicle used for transporting p passengers, which also carries g , its own engine or motor... Training data: Text Do T i i d TT data Test d D 51
  • 52. Related Areas of Heterogeneous Learning g g Multiple Domain Data Feature Space among Domains Heterogeneous Homogeneous Instance Data Distribution D t Di t ib ti Correspondences among Domains Each instance in among Domains There are few or one no Different Same domain has its Instance correspondences correspondence In other domains among domains Multi-view Heterogeneous Transfer Learning Traditional Learning Transfer Learning across Different Machine Learning Distributions Apple is a Banana is Source fr-uit that the Domain can be common found … name for… Target Domain 52
  • 53. Related Areas of Heterogeneous Learning g g Multiple Domain Data Feature Space among Domains Heterogeneous Homogeneous Instance Data Distribution D t Di t ib ti Correspondences among Domains Each instance in among Domains There are few or one no Different Same domain has its Instance correspondences correspondence In other domains among domains Multi-view Heterogeneous Transfer Learning Traditional Learning Transfer Learning across Different Machine Learning Distributions Apple is a Banana is Source fr-uit that the Domain can be common found … name for… Target Domain 53
  • 54. Related Areas of Heterogeneous Learning g g Multiple Domain Data Feature Space among Domains Heterogeneous Homogeneous Instance Data Distribution D t Di t ib ti Correspondences among Domains Each instance in among Domains There are few or one no Different Same domain has its Instance correspondences correspondence In other domains among domains Multi-view Heterogeneous Transfer Learning Traditional Learning Transfer Learning across Different Machine Learning Distributions Apple is a Banana is Source fr-uit that the Domain can be common found … name for… Target Domain 54
  • 55. Related Areas of Heterogeneous Learning g g Multiple Domain Data Feature Space among Domains Heterogeneous Homogeneous Instance Data Distribution D t Di t ib ti Correspondences among Domains Each instance in among Domains There are few or one no Different Same domain has its Instance correspondences correspondence In other domains among domains Multi-view Heterogeneous Transfer Learning Traditional Learning Transfer Learning across Different Machine Learning Distributions Apple is a Banana is Source fr-uit that the Domain can be common found … name for… Target Domain 55
  • 56. Related Areas of Heterogeneous Learning g g Multiple Domain Data Feature Space among Domains Heterogeneous Homogeneous Instance Data Distribution D t Di t ib ti Correspondences among Domains Each instance in among Domains There are few or one no Different Same domain has its Instance correspondences correspondence In other domains among domains Multi-view Heterogeneous Transfer Learning Traditional Learning Transfer Learning across Different Machine Learning Distributions Apple is a Banana is Source fr-uit that the Domain can be common found … name for… Target Domain 56
  • 57. Outline  Introduction to heterogeneous transfer learning  Cross media: Text  Imageg  Classification  Clusteringg  Cross language: English  Chinese  Application: Visual Contextual Advertising 57
  • 58. Text to Images [Dai et al. NIPS 2008] [Lin et al. APWeb 2010]  Mining and learning the multimedia data is becoming increasing important  Limited b Li i d by scarce labeled image data, can we l b l di d use abundant text data in the Web?  Our answer is YES 58
  • 59. Objective Ele Learning ph ma In O an pu ssi i ut translati ts  t t In ve pu ng  O Learning are  ho pu t learning  ut 59
  • 60. Basic Ideas Exploiting co occurrence data as a bridge between text and image co-occurrence
  • 61. Data Sets  Documents from ODP  Images from Caltech-256 g
  • 63. Approach 2: Naïve Bayes Way pp y y [Lin et al. APWeb 2010] P(v|w)P(w|c) P( | )P( | ) P(w|c) P(v|w)
  • 65. Experiments: TAIC p  Data sets: 9 binary classification data sets and 5 are six-class classification data sets  Image data from Caltech-256 and Fifteen scene  Auxiliary text data from Open Directory Project  Baseline methods  Base classifiers: Naïve Bayes (NBC) and Support vector machine (SVM) 65
  • 66. Evaluation 1: Classification Heterogeneous TL  No‐Heterogeneous TL  Average Error Rate Average Error Rate 0.318 0.334 4 8% error reduc 66
  • 67. Outline  Introduction to heterogeneous transfer learning  Cross media: Text  Imageg  Classification  Clusteringg  Cross language: English  Chinese  Application: Visual Contextual Advertising 67
  • 68. Text-aided Image Cl t rin T t id d Im Clustering [Yang et al. ACL 2009]  Image clustering is a effective method for increasing accessibility of image search result Apple =  OR do not work But traditional clustering methods well with small amount of data  We consider use annotated images in the social d d h l Web to help image clustering 68
  • 69. Annotated PLSA Model for Clustering  Leveraging the auxiliary text data by using the topics Z as From Flickr. Flickr a bridge Words W d from Topics Image features Aux Ima I Data 69
  • 70. Making the transfer… g  Log-likelihood objective function  Two parts: i T t image f t features and auxiliary text d ili t t features  Image feature to image instance correlation: A  Word feature to image feature correlation: B trade Nor  L    Aij log P ( f j | vi )  (1   ) B lj  log P ( f j | wl )    Aij ' B  -off mali off   j i j' l j ' lj ' paraLik lih d of lih zatio Lik ti Likeliho Likelihood f 70
  • 71. Experiment Setup p p  Data sets:  Generated from Caltech-256 and 15-scene corpora  Baseline methods  Baseline clustering methods: KMeans, PLSA and STC  Strategies:  clustering on target image data only  combined: clustering target image data and annotated image data together and evaluate result for target image data 71
  • 72. Experimental Result p KM_Seperate KM_Combine PLSA_Seperate PLSA_Combine STC aPLSA 2 1.8 1.6 1.4 14 1.2 Entropy 1 0.8 0.6 06 0.4 0.2 0 Heterogeneous TL  No‐Heterogeneous TL  Average Entropy Average Entropy 0.741 0.786 5 7% entroy redu 72
  • 73. Clustering Results g on Caltech256 [Griffin et al. TR 2007] frog f k kayak j k jesus-christh bear b watch h it t 73
  • 74. Outline  Introduction to heterogeneous transfer learning  Cross media: Text  Imageg  Clustering  Classification  Cross language: English  Chinese  Application: Visual Contextual Advertising 74
  • 75. Cross-language Classification g g [Ling et al. WWW 2008] Classifier learn classify Labeled  Unlabeled  Chinese Web  Chinese Web Chinese Web Web  pages pages Text Classification 75
  • 76. Cross-language Classification g g  Much labelled data in English, but few in g , Chinese. Labeled Data English Chinese News Reuters 21578 Reuters‐21578 ? newsgroups 20 Newsgroups ? Web pages Open Document  Very few ODP  Project data  data (> 1M) (< 20k, ~ 1%) 76
  • 77. Cross-language Classification g g Classifier learn classify Labeled  Unlabeled  English Web  Chinese Web  pages pages Cross‐language Classification 77
  • 78. Cross-language Classification g g  Information Bottleneck  X : signals to be encoded (Web pages) l b d d( b )   X : codewords (class labels)  Y : features related to X (terms) X 78
  • 79. Cross-language Classification g g  Optimization minimize Information betw Information betw Minimize  this distance 79
  • 80. Cross-language Classification g g  Performance 80
  • 81. Outline  Introduction to heterogeneous transfer learning  Cross media: Text  Imageg  Clustering  Classification  Cross language: English  Chinese  Application: Visual Contextual Advertising 81
  • 82. Application: Visual Contextual Advertising [ [Chen et al. AAAI 2010] ]  Previous research focused on advertising for text P i hf d d ti i f t t Web pages.  With th b the booming of multimedia data, we need i f lti di d t d to recommend advertisement for these data  Difficulty: image and the text i different f t Diffi lt i d th t t in diff t feature spaces  Use th U the co-occurrence d t t b id these two data to bridge th t feature spaces
  • 83. Figure illustration of Visual Contextual g Advertising
  • 84. Visual Contextual Advertising g (based on the independ is We assume that there ent assumpti i Where
  • 85. Experimental Results p  Co-occurrence data from Flickr.  Test Image from Flickr and Fifteen scene data g set  Advertisement are crawled from MSN search engine with queries chosen from AOL query log.
  • 88. Thank you y  For more details of APEXLAB  http://apex.sjtu.edu.cn/apex_wiki/FrontPage  Our works  http://apex.sjtu.edu.cn/apex_wiki/Papers p // p j / p _ / p