SlideShare a Scribd company logo
Exploring Content
with Semantic Transformations
using Collaborative Knowledge Bases
Yegin Genc
Prof. Jeffrey V. Nickerson
OBJECTIVE

Understanding text automatically to support
search driven exploratory activities.
EXPLORATORY SEARCH

LOOKUP

Fact retrieval
Known item search
Navigation

Marchionini, G. (2006)

LEARN

Knowledge acquisition
Comprehension/interpretation
Comparison

INVESTIGATE

Accretion
Analysis
Exclusion/Negation
EXPLORATORY SEARCH
ILL-STRUCTURED PROBLEM
• No single right approach
• Problem definitions change as new
information is gathered
Foreign minorities, Germany
Text: “ Foreign Minorities Germany ”
Exploratory Search Task

Given a journal abstract, rank other abstracts
based on their relevancy to the seed abstract.

Evaluation is based on relevancy and diversity.
Concepts

Candidates
Seed
Document

(candidates that match
to a Wikipedia Page title
and connected through Ontology)

n-grams
(1 to 3)

CONCEPT– WORD
K (W x K)

d
Tf-idf(D)

DOCUMENT – CONCEPT
Θ (D x K)

k

DOCUMENT – W0RD
D (D x W )

k

*
D: Documents

=

d

Tf-idf(K)
K: Concepts

Argsort (row.sum(Θ) )

W: Words
EXTRACTING CONCEPT NETWORK
“Representation independence formally characterizes the
encapsulation provided by language constructs for data
abstraction and justifies reasoning by simulation.
Representation independence has been shown for a
variety of languages and constructs but not for shared
references to mutable state; indeed it fails in general for
such languages. This article formulates representation
independence for classes, in an imperative, objectoriented language with pointers, subclassing and dynamic
dispatch, class oriented visibility control, recursive types
and methods, and a simple form of module. An instance
of a class is considered to implement an abstraction using
private fields and so-called representation objects.
Encapsulation of representation objects is expressed by a
restriction,
called
confinement,
on
aliasing.
Representation independence is proved for programs
satisfying the confinement condition. A static analysis is
given for confinement that accepts common designs such
as the observer and factory patterns. The formalization
takes into account not only the usual interface between a
client and a class that provides an abstraction but also the
interface (often called protected") between the class
and its subclasses."
EXTRACTING CONCEPT NETWORK
“Representation independence formally characterizes the
encapsulation provided by language constructs for data
abstraction and justifies reasoning by simulation.
Representation independence has been shown for a
variety of languages and constructs but not for shared
references to mutable state; indeed it fails in general for
such languages. This article formulates representation
independence for classes, in an imperative, objectoriented language with pointers, subclassing and dynamic
dispatch, class oriented visibility control, recursive types
and methods, and a simple form of module. An instance
of a class is considered to implement an abstraction using
private fields and so-called representation objects.
Encapsulation of representation objects is expressed by a
restriction,
called
confinement,
on
aliasing.
Representation independence is proved for programs
satisfying the confinement condition. A static analysis is
given for confinement that accepts common designs such
as the observer and factory patterns. The formalization
takes into account not only the usual interface between a
client and a class that provides an abstraction but also the
interface (often called protected") between the class
and its subclasses."
WIKIPEDIA PAGES AS CONCEPTS
Solar System
“The Solar System[a] consists
of the Sun and the
astronomical objects
gravitationally bound in orbit
around it, all of which formed
from the collapse of a giant
molecular cloud
approximately 4.6 billion
years ago…”
(http://en.wikipedia.org/wiki/Solar
_System)

Word Stem

Occ. Freq.

abstract

53

0.056

program

44

0.046

langu

33

0.035

spec

16

0.017

comput

12

0.013

conceiv

12

0.013

dat

12

0.013

bk = p(Wi | k) =

{Wi Î k}
N

å {W Î k}
i

i

βk : Per-concept word distribution
RANKING DOCUMENTS

DOCUMENT – W0RD
D (D x W )

DOCUMENT – CONCEPT
Θ (D x K)

CONCEPT– WORD
K (W x K)

k

k
d

=

*

D: Documents

K: Concepts
W: Words

d
SORT DOCUMENTS

DOCUMENT – W0RD
D (D x W )

DOCUMENT – CONCEPT
Θ (D x K)

CONCEPT– WORD
K (W x K)

k

k
d

=

*

D: Documents

K: Concepts
W: Words

d
EXPERIMENT
Given a journal abstract, rank other abstracts based on
their relevancy to the seed abstract.

• Data: 619 abstracts of the Journal of the ACM
(JACM) and their references.
• Task: Select Top-k (5,10,15, and 20) relevant
abstracts.
• Observe: Relevancy (measured by LSA vector
similarity) and Diversity (measured through the
coverage of the references.)
MAXIMAL MARGINAL RELEVANCE
• a measure to increase the diversity of documents
retrieved by an IR system

-Similarity to query: BM25 (Xapian1)
-Similarity to results: LSA similarity (Gensim2)
1.
2.

http://xapian.org
http://radimrehurek.com/gensim/
MMR RESULTS
WIKI-BASED MODEL VS MMR
CONCLUDING REMARKS
• Our Wiki based technique provides high
diversity with low relevancy loss.
• Semantics embedded in concept networks
extracted from Wikipedia can improve
exploratory search tasks.

More Related Content

What's hot

Presentation_euroCRIS_ES
Presentation_euroCRIS_ESPresentation_euroCRIS_ES
Presentation_euroCRIS_ES
Ed Simons
 
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
Lviv Data Science Summer School
 

What's hot (7)

A Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a RoseA Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a Rose
 
Data Dictionary
Data DictionaryData Dictionary
Data Dictionary
 
Presentation_euroCRIS_ES
Presentation_euroCRIS_ESPresentation_euroCRIS_ES
Presentation_euroCRIS_ES
 
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
 
Oopsinphp
OopsinphpOopsinphp
Oopsinphp
 
Handout for Dublin Core Metadata Initiative Abstract Model
Handout for Dublin Core Metadata Initiative Abstract ModelHandout for Dublin Core Metadata Initiative Abstract Model
Handout for Dublin Core Metadata Initiative Abstract Model
 
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
 

Viewers also liked

H0ly L4nd
H0ly L4ndH0ly L4nd
H0ly L4nd
danr
 

Viewers also liked (15)

Test Chart
Test ChartTest Chart
Test Chart
 
Servicing
ServicingServicing
Servicing
 
Creative
CreativeCreative
Creative
 
Advertising
AdvertisingAdvertising
Advertising
 
windward5
windward5windward5
windward5
 
Build Your Community Subscription Services
Build Your Community Subscription ServicesBuild Your Community Subscription Services
Build Your Community Subscription Services
 
Discovering Context
Discovering ContextDiscovering Context
Discovering Context
 
Semantic Transforms Using Collaborative Knowledge Bases
Semantic Transforms Using Collaborative Knowledge BasesSemantic Transforms Using Collaborative Knowledge Bases
Semantic Transforms Using Collaborative Knowledge Bases
 
Planning
PlanningPlanning
Planning
 
Dan Reisner
Dan Reisner Dan Reisner
Dan Reisner
 
H0ly L4nd
H0ly L4ndH0ly L4nd
H0ly L4nd
 
Knights
KnightsKnights
Knights
 
Forever Young Facewash: Digital Strategy
Forever Young Facewash: Digital StrategyForever Young Facewash: Digital Strategy
Forever Young Facewash: Digital Strategy
 
Lay's India: Report
Lay's India: ReportLay's India: Report
Lay's India: Report
 
Goodyear: Digital Marketing Case Study
Goodyear: Digital Marketing Case StudyGoodyear: Digital Marketing Case Study
Goodyear: Digital Marketing Case Study
 

Similar to Exploring Content with Wikipedia

Ontology driven Annotation
Ontology driven AnnotationOntology driven Annotation
Ontology driven Annotation
Ashish Kulkarni
 
Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
Ankit Sharma
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 

Similar to Exploring Content with Wikipedia (20)

Ontology driven Annotation
Ontology driven AnnotationOntology driven Annotation
Ontology driven Annotation
 
Assessing, Creating and Using Knowledge Graph Restrictions
Assessing, Creating and Using Knowledge Graph RestrictionsAssessing, Creating and Using Knowledge Graph Restrictions
Assessing, Creating and Using Knowledge Graph Restrictions
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
 
Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Object Oriented Programming Language is an oop
Object Oriented Programming Language is an oopObject Oriented Programming Language is an oop
Object Oriented Programming Language is an oop
 
Extraction of common conceptual components from multiple ontologies
Extraction of common conceptual components from multiple ontologiesExtraction of common conceptual components from multiple ontologies
Extraction of common conceptual components from multiple ontologies
 
Discovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked DataDiscovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked Data
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Spotlight
SpotlightSpotlight
Spotlight
 
The Role Of Ontology In Modern Expert Systems Dallas 2008
The Role Of Ontology In Modern Expert Systems   Dallas   2008The Role Of Ontology In Modern Expert Systems   Dallas   2008
The Role Of Ontology In Modern Expert Systems Dallas 2008
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath New York Community Day in-person event
UiPath New York Community Day in-person eventUiPath New York Community Day in-person event
UiPath New York Community Day in-person event
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 

Exploring Content with Wikipedia

  • 1. Exploring Content with Semantic Transformations using Collaborative Knowledge Bases Yegin Genc Prof. Jeffrey V. Nickerson
  • 2. OBJECTIVE Understanding text automatically to support search driven exploratory activities.
  • 3. EXPLORATORY SEARCH LOOKUP Fact retrieval Known item search Navigation Marchionini, G. (2006) LEARN Knowledge acquisition Comprehension/interpretation Comparison INVESTIGATE Accretion Analysis Exclusion/Negation
  • 4. EXPLORATORY SEARCH ILL-STRUCTURED PROBLEM • No single right approach • Problem definitions change as new information is gathered
  • 6. Text: “ Foreign Minorities Germany ”
  • 7. Exploratory Search Task Given a journal abstract, rank other abstracts based on their relevancy to the seed abstract. Evaluation is based on relevancy and diversity.
  • 8. Concepts Candidates Seed Document (candidates that match to a Wikipedia Page title and connected through Ontology) n-grams (1 to 3) CONCEPT– WORD K (W x K) d Tf-idf(D) DOCUMENT – CONCEPT Θ (D x K) k DOCUMENT – W0RD D (D x W ) k * D: Documents = d Tf-idf(K) K: Concepts Argsort (row.sum(Θ) ) W: Words
  • 9. EXTRACTING CONCEPT NETWORK “Representation independence formally characterizes the encapsulation provided by language constructs for data abstraction and justifies reasoning by simulation. Representation independence has been shown for a variety of languages and constructs but not for shared references to mutable state; indeed it fails in general for such languages. This article formulates representation independence for classes, in an imperative, objectoriented language with pointers, subclassing and dynamic dispatch, class oriented visibility control, recursive types and methods, and a simple form of module. An instance of a class is considered to implement an abstraction using private fields and so-called representation objects. Encapsulation of representation objects is expressed by a restriction, called confinement, on aliasing. Representation independence is proved for programs satisfying the confinement condition. A static analysis is given for confinement that accepts common designs such as the observer and factory patterns. The formalization takes into account not only the usual interface between a client and a class that provides an abstraction but also the interface (often called protected") between the class and its subclasses."
  • 10. EXTRACTING CONCEPT NETWORK “Representation independence formally characterizes the encapsulation provided by language constructs for data abstraction and justifies reasoning by simulation. Representation independence has been shown for a variety of languages and constructs but not for shared references to mutable state; indeed it fails in general for such languages. This article formulates representation independence for classes, in an imperative, objectoriented language with pointers, subclassing and dynamic dispatch, class oriented visibility control, recursive types and methods, and a simple form of module. An instance of a class is considered to implement an abstraction using private fields and so-called representation objects. Encapsulation of representation objects is expressed by a restriction, called confinement, on aliasing. Representation independence is proved for programs satisfying the confinement condition. A static analysis is given for confinement that accepts common designs such as the observer and factory patterns. The formalization takes into account not only the usual interface between a client and a class that provides an abstraction but also the interface (often called protected") between the class and its subclasses."
  • 11. WIKIPEDIA PAGES AS CONCEPTS Solar System “The Solar System[a] consists of the Sun and the astronomical objects gravitationally bound in orbit around it, all of which formed from the collapse of a giant molecular cloud approximately 4.6 billion years ago…” (http://en.wikipedia.org/wiki/Solar _System) Word Stem Occ. Freq. abstract 53 0.056 program 44 0.046 langu 33 0.035 spec 16 0.017 comput 12 0.013 conceiv 12 0.013 dat 12 0.013 bk = p(Wi | k) = {Wi Î k} N å {W Î k} i i βk : Per-concept word distribution
  • 12. RANKING DOCUMENTS DOCUMENT – W0RD D (D x W ) DOCUMENT – CONCEPT Θ (D x K) CONCEPT– WORD K (W x K) k k d = * D: Documents K: Concepts W: Words d
  • 13. SORT DOCUMENTS DOCUMENT – W0RD D (D x W ) DOCUMENT – CONCEPT Θ (D x K) CONCEPT– WORD K (W x K) k k d = * D: Documents K: Concepts W: Words d
  • 14. EXPERIMENT Given a journal abstract, rank other abstracts based on their relevancy to the seed abstract. • Data: 619 abstracts of the Journal of the ACM (JACM) and their references. • Task: Select Top-k (5,10,15, and 20) relevant abstracts. • Observe: Relevancy (measured by LSA vector similarity) and Diversity (measured through the coverage of the references.)
  • 15. MAXIMAL MARGINAL RELEVANCE • a measure to increase the diversity of documents retrieved by an IR system -Similarity to query: BM25 (Xapian1) -Similarity to results: LSA similarity (Gensim2) 1. 2. http://xapian.org http://radimrehurek.com/gensim/
  • 18. CONCLUDING REMARKS • Our Wiki based technique provides high diversity with low relevancy loss. • Semantics embedded in concept networks extracted from Wikipedia can improve exploratory search tasks.

Editor's Notes

  1. However, majority of the inquires go beyond simple fact checks
  2. searches involving the cognitive processing and interpretation of new knowledgesearches requiring critical assessment before being integrated into knowledge basesSearch driven exploration activitiesExploratory Search relies on other information/cognitive behaviors:sense-making organizing and analyzing search resultsdecision making
  3. p.24: This kind of ill-structured problems 1) begin with a lack of information necessary to develop a solution or even precisely define the problem, 2) have no single right approach for solution, 3) have problem definitions that change as new information is gathered, and 4) have no identifiable ‘correct’ solution [3]. -- Highlighted jul 19, 2013
  4. It’s hard for search systems to identify concepts and their relationships --
  5. Concepts are characterized as distributions over observed words in Wikipedia pagesUse posterior expectations / approximate posterior inference: gibbs sampling, variational inference
  6. ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences.Ontologies can be used to model concepts and their interrelationships (Lanzenberger et al., 2010).In this sense, ontologies represent the relevant aspects of context. To effectively comprehend cross-lingual corpora, tools that can explore the dependencies between language and context are needed.
  7. ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences.Ontologies can be used to model concepts and their interrelationships (Lanzenberger et al., 2010).In this sense, ontologies represent the relevant aspects of context. To effectively comprehend cross-lingual corpora, tools that can explore the dependencies between language and context are needed.
  8. Concepts are characterized as distributions over observed words in Wikipedia pagesEach topic is a distribution over words
  9. Today, most user searches are of an exploratorynature, in the sense that users are interested inretrieving pieces of information that cover manyaspects of their information needs.
  10. retrieved by an IR systemThe principle is similar to TF-IDF where query terms are weighted based on frequency in a document (tf) and across the corpus (idf). In addition, ratio of the document length to the average document length is taken into account in K and BM25 is parameterized for further optimization. We used xapian’s implementation of BM25 with default parameters.\subsection{Maximal Marginal Relevance (MMR)}One approach to diversify search result is optimizing the search results based on two criteria: similarity to the query -- relevance, and dissimilarity to the other relevant documents -- novelty. Maximal Marginal Relevance (MMR) \cite{Carbonell:1998ja}, for example, work on this principle: similarity of a document to a query is adjusted based on its similarity to the other documents that are more similar to the query.\small\begin{displaymath}MMR=\underset { D_{ i }\in R\setminus S}{ argmax }\left[\lambda Sim_{ 1 }\left(D_{ i },Q \right)-\left(1-\lambda\right)\max_{ D_{ j }\in S } \left( Sim_{ 2 }\left( D_{ i },D_{ j } \right) \right) \right] \end{displaymath}
  11. can arguably be lessened, because the semantics strips away extraneous context while at the same time providing better diversity within the universe of relevant documents