Searching
Political Data
   by Strategy
 Roberto Cornacchia
    Jaap Kamps
    Wouter Alink
 Arjen P. de Vries
 info@spinque.com
Search by Strategy
 An iterative 2-stage search process
   Express domain knowledge as high-level
    search strategies
   Generate search engine from the strategy
     A dynamic REST API
     UI controls for unspecified parameters
 Separate search strategy definition (the
  how) from actual searching and browsing
  of data collections (the what)
https://devel.spinque.com/ExPoSeApp-20130116/?config=demo#
                                         dashboard/demo04:
                                           /p/topic/Mokken
Search by Strategy captures:
 Arbitrary retrieval unit types (not just
  documents)
   E.g., expert finding, entity search
 “Semantic” search
   The building blocks operate on scored triples
 Semi-structured search
   Data objects may be structured in hierarchies
 Exploratory search
   Use facets as preferences
Exposé
 Searching the parliamentary proceedings
  of the Dutch parliament
   Complete transcripts of everything said in
    parliament
   Organized by parliamentary session
   Detailing who sais what in what role and
    context
Exposé
 Original data is PDF, transformed into
  XML by award-winning project Political
  Mashup
   http://politicalmashup.nl/
In Politics…
 Essence is not only what is said, but also
  by who and to whom, and why
 Concrete example:
   Wilders sais “knettergek” in parliament (in
    2007) – is this remarkable?
“Knettergek” case
The word “knettergek” has been used many
 times in parliament…

… but never to address a member of the
 government
Varying result types
Utterances




     Person / Party / …
Flexibility
 Concrete case:
   Maarten: “I cannot find Prof. Mokken, who I
    know has been spoken about in parliament
    multiple times!”
Flexibility
 Default indexing uses stemming and
  normalization
 But… searching for people’s names (and,
  as we mention it, many other domain
  specific terminology) can be negatively
  affected by stemming
     “Mokken” transformed into “mok”, leading us to
      geographic locations “Mook” and “De Mok”, but not
      to the famous professor!
https://devel.spinque.com/ExPoSeApp-20130116/?config=demo#dashboard/demo05:
                                        /p/topic/mokken/p/emphasis_stemming/0
Joins to the rescue!




Which house speakers from the Rotterdam harbour
                                       say what about “Amsterdam”?
Semantic Search
biographies



   describes



        person



              utterance
Advantages
 Define and execute custom build search
  strategies
   Specialized to the task, or even to the search
    at hand
 Search multiple data sources at once
 Explore and refine results interactively
 “Search provenance”
   Complete transparency on how search results
    were obtained
Position Statement
 Search professionals think in terms of
  search strategies already
 Let them design their own strategies, and
  thereby tailor their search engines
 So they learn to trust what we claim to be
  the effective information retrieval
  techniques!

Searching Political Data by Strategy

  • 1.
    Searching Political Data by Strategy Roberto Cornacchia Jaap Kamps Wouter Alink Arjen P. de Vries info@spinque.com
  • 2.
    Search by Strategy An iterative 2-stage search process  Express domain knowledge as high-level search strategies  Generate search engine from the strategy  A dynamic REST API  UI controls for unspecified parameters  Separate search strategy definition (the how) from actual searching and browsing of data collections (the what)
  • 4.
  • 5.
    Search by Strategycaptures:  Arbitrary retrieval unit types (not just documents)  E.g., expert finding, entity search  “Semantic” search  The building blocks operate on scored triples  Semi-structured search  Data objects may be structured in hierarchies  Exploratory search  Use facets as preferences
  • 6.
    Exposé  Searching theparliamentary proceedings of the Dutch parliament  Complete transcripts of everything said in parliament  Organized by parliamentary session  Detailing who sais what in what role and context
  • 7.
    Exposé  Original datais PDF, transformed into XML by award-winning project Political Mashup  http://politicalmashup.nl/
  • 8.
    In Politics…  Essenceis not only what is said, but also by who and to whom, and why  Concrete example:  Wilders sais “knettergek” in parliament (in 2007) – is this remarkable?
  • 10.
    “Knettergek” case The word“knettergek” has been used many times in parliament… … but never to address a member of the government
  • 11.
  • 12.
    Flexibility  Concrete case:  Maarten: “I cannot find Prof. Mokken, who I know has been spoken about in parliament multiple times!”
  • 13.
    Flexibility  Default indexinguses stemming and normalization  But… searching for people’s names (and, as we mention it, many other domain specific terminology) can be negatively affected by stemming  “Mokken” transformed into “mok”, leading us to geographic locations “Mook” and “De Mok”, but not to the famous professor!
  • 16.
  • 17.
    Joins to therescue! Which house speakers from the Rotterdam harbour say what about “Amsterdam”?
  • 18.
    Semantic Search biographies describes person utterance
  • 19.
    Advantages  Define andexecute custom build search strategies  Specialized to the task, or even to the search at hand  Search multiple data sources at once  Explore and refine results interactively  “Search provenance”  Complete transparency on how search results were obtained
  • 20.
    Position Statement  Searchprofessionals think in terms of search strategies already  Let them design their own strategies, and thereby tailor their search engines  So they learn to trust what we claim to be the effective information retrieval techniques!