Spinque @ Search Engines Amsterdam (SEA)
http://www.meetup.com/SEA-Search-Engines-Amsterdam/events/216345662/
Spinque is a spin-off company from CWI that builds on the research into Databases and Information Retrieval integration. We build tailor made search engines over connected datasets. With the Spinque technology we compose a search engine out of building blocks and compile this “search strategy” into an efficient query program. In the talk we explain and demonstrate the Search by Strategy approach. In addition, we discuss our current developments and challenges in searching Linked Data.
Bio: Michiel Hildebrand received his PhD from University of Amsterdam (at CWI) in 2010 for his research on access to Linked Data. He worked as a researcher at VU University and CWI. In 2014 he joined Spinque to apply the company’s search by strategy approach to Linked Data.
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Searching Linked Data with Spinque
1. Searching Linked Data with Spinque
Michiel Hildebrand, Wouter Alink, Roberto Cornacchia, Arjen de Vries
Search Engines Amsterdam, January 30 2015
2. background
concept
product
Information Retrieval and DB integration
Cornacchia et al. Flexible and efficient IR using Array Databases. VLDB‘08 Journal
Mühleisen et al. Column Stores as an IR Prototyping Tool. ECIR’14 & SIGIR’14
Search by Strategy
Alink et al. Searching CLEF-IP by strategy. CLEF’09
PatOlympics, 2010 and 2011
Tailored access to connected datasets
Koninklijke Bibliotheek, Wageningen Universiteit, Beeld&Geluid, Elsevier, Heineken, ...
3. Heterogenous Data
Hang Li et al. A new approach to intranet search based on information extraction. CIKM’05
Complex information needs
SQL
CSV
XML
HTML
OAI
JSON
4. Heterogenous University Data
Financial administration (ERP)
Contract administration (CMS)
Contract documents (CMS attachments)
Publication database (Institutional Repository)
Publication documents (Institutional Repository PDFs)
Employee database (address lists, ERP+CMS)
Companies (CMS + ERP + document mentions)
Subsidy database (CMS)
Departments (address lists, CMS)
Web addresses (extracted from documents)
Topic (assigned to publications)
Research programmes (dependent on funding scheme)
Complex information needs
What funding schemes are the primary source of
income?
Can we move to Europe when Dutch funding dries up?
Who has active relations with partner X?
“Valorisation”; new national funding requirements
What industry sectors do we depend upon?
How many projects in smart cities? Green energy?
Cloud computing? Etc.
How are strategic decisions implemented?
Has objective “move from Telecom toward ICT” been
achieved, and how does it develop over time?
6. Project by topic
Search in
attachments of
projects
Search for
project
contracts
(by metadata)
Traverse from
attachments to
projects &
combine results
7. Topic expert
Search objects about topic
Expand with neighbours in and out
Return related persons
Ranked by tf-idf on relations
8. Norbert Fuhr, Thomas Rölleke. A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems (1994)
10. Search by Strategy
(visual) modelling of search processes
Rank. Everything. Always.
all-round probabilistic search
Many strategies, one data model
many search engines, one index
16. API Builder for Open Data?
Supporting (search) application developers
Gregory Grefenstette. Search-based applications. 2010
Jamie Callan. Search Engine Support For Software Applications. CIKM 2010 Keynote
Who builds search strategies?
Developers are not IR specialists
Domain specialists neither
How to handle schema-mess?
in a heterogeneous dataspace
17.
18. Happy alignments are all alike,
every unhappy alignment is unhappy in it’s own way
Jacco van Ossenbruggen 2012 (improvisation on Anna Karenina, Leo Tolstoy 1887)
19. Alignment strategies
Interactive vocabulary alignment, Jacco van Ossenbruggen, Michiel Hildebrand, Victor de Boer, TPDL 2011
Coming soon
Spinque Alignment Service
Beeld&Geluid, Naturalis, Rijksdienst Cultureel Erfgoed (RCE)
First part: history of Spinque and overview of Spinque technology
Second part: apply Spinque technology to search Linked Data
Background: Spinque started as a spinoff company from CWI. Research on Information Retrieval on top of Databases. People not familiar with this work, one of the interesting advantages as that it creates a flexible environment where you can intuitively combine keyword search with structured queries. Implement XML search, Graph search, Feature-based search??? This flexibility is one of the key features from Spinque.
Concept: Spinque started in the context of a project to search patents. Finding patents is a complex task performed by domain specialists. Patents are structured documents and linked to references and external documents. Specialists know how to search in which parts. Spinque provided the environment to express complex search “strategy”. Not as a query, but as the algorithm. Spinque won the PatOlympics twice, 2010, 2011.
Evolved into a technology and toolset to make connected data accessible. Commercial and research projects to build search engines for ...
An example with data from a University.
Universities have many datasources. In various data formats: XML files, CSV files from a database dump. APIs. Crawl data from the Web.
They have information needs that span over these sources
An example with data from a University.
Universities have many datasources. In various data formats: XML files, CSV files from a database dump. APIs. Crawl data from the Web.
They have information needs that span over these sources
An example with data from a University.
Universities have many datasources. In various data formats: XML files, CSV files from a database dump. APIs. Crawl data from the Web.
They have information needs that span over these sources
The project COMSODE is an SME-driven RTD project aimed at progressing the capabilities in the field of Open Data re-use. Our concept is an answer to barriers still present in this young area: data published by various Open Data catalogues are poorly integrated; quality assessment and cleansing are seldom addressed. Costs of Open Data consumption are high and Open Data usage is still poor. COMSODE tries to change the game.
Towards the Future
One more future topic.
Links between datasets are not always there?
First part: history of Spinque and overview of Spinque technology
Second part: apply Spinque technology to search Linked Data