SlideShare a Scribd company logo
www.ict.tuwien.ac.at
Institute of
Computer Technology
Extracting Data from the Deep Web with Global-as-View
Mediators Using Rule-Enriched Semantic Annotations
Harold Boley
harold.boley[at]unb.ca
University of New Brunswick
Faculty of Computer Science
Fredericton, NB, Canada
Benjamin Dönz
doenz[at]ict.tuwien.ac.at
Vienna University of Technology
Institute of Computer Technology
Vienna, Austria
www.ict.tuwien.ac.at
Institute of
Computer Technology
The „Deep Web“ – What is it?
2
 Data hidden behind search forms and interfaces
 Estimated 400-500 times more information than the indexable
World Wide Web
 77% of the content classified as structured information
 Template based – so understanding how to extract one result
allows to extract them all
 Examples:
• Web shops
• Classified advertising
• Miscellaneous databases
www.ict.tuwien.ac.at
Institute of
Computer Technology
Accessing the Deep Web
3
www.ict.tuwien.ac.at
Institute of
Computer Technology
Our Approach: Upper-Right Quadrant
4
 Processing of queries using a query forwarding approach
• SPARQL queries as input
• Query transformation and forwarding via mediators
• Global-as-View mapping of local sources
 Web form interaction and information extraction
• Extraction process based on an extensible model
• Semantic annotations for mapping real-world Web pages to the model
• Feature-based rules for creating annotations
www.ict.tuwien.ac.at
Institute of
Computer Technology
Model Overview
5
 Query interface for submitting conjunctive queries
www.ict.tuwien.ac.at
Institute of
Computer Technology
Model Overview
6
 Query interface for submitting conjunctive queries
 Result list: all valid records, but only key attribute/value pairs
www.ict.tuwien.ac.at
Institute of
Computer Technology
Model Overview
7
 Query interface for submitting conjunctive queries
 Result list: all valid records, but only key attribute/value pairs
 Result detail: all attribute/value pairs, but only one record
www.ict.tuwien.ac.at
Institute of
Computer Technology
Query Process
8
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (1)
9
SELECT ? townname ?offername ?description ?realestaterent ?realestaterooms ?realestatefloorSpace
WHERE {?object rdf:type realestate:RealEstateOffer;
realestate:townname ?townname;
realestate:offername ?offername;
realestate:description ?description;
realestate:rent ?realestaterent;
realestate:rooms ?realestaterooms;
realestate:floorSpace ?realestatefloorSpace}.
FILTER (?realestaterent>=800 && ?realestaterent<=1200 &&
(?realestaterooms>=3 || ?realestatefloorSpace >= 80)) }
 Query:
“Return details of real estate offers with a rent between 800€ and 1200€ and
either at least 3 rooms or 80m²”
 SPARQL:
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (2)
10
 Transformation:
• Parse Query
• Transform filter to Disjunctive Normal Form and split into subqueries
• Unfold to include relevant sources via Global-as-View mappings
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (2)
11
 Transformation:
• Parse Query
• Transform filter to Disjunctive Normal Form and split into subqueries
• Unfold to include relevant sources via Global-as-View mappings
 Result
SELECT FROM <http://derStandard.at>: greaterOrEqual(realestate:rent,800) ∧
lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:rooms,3)
UNION
SELECT FROM <http://at.immolive24.com>: greaterOrEqual(realestate:rent,800) ∧
lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:rooms,3)
UNION
SELECT FROM <http://derStandard.at>: greaterOrEqual(realestate:rent,800) ∧
lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:floorSpace,100)
UNION
SELECT FROM <http://at.immolive24.com>: greaterOrEqual(realestate:rent,800) ∧
lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:floorSpace,100)
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (3)
12
 Features used for identification elements
 “Properties” of the HTML tags
 For example id, class, value, tag path, associated text, …
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (4)
13
 Object-centered Datalog rules
 Conditions: conjunction of req. features
 Conclusions: concepts of the model
 Single efficient evaluation pass
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (5)
14
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (6)
15
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (7)
16
SELECT ? townname ?offername ?description ?realestaterent ?realestaterooms ?realestatefloorSpace
WHERE {?object rdf:type realestate:RealEstateOffer;
realestate:townname ?townname;
realestate:offername ?offername;
realestate:description ?description;
realestate:rent ?realestaterent;
realestate:rooms ?realestaterooms;
realestate:floorSpace ?realestatefloorSpace}.
FILTER (?realestaterent>=800 && ?realestaterent<=1200 &&
(?realestaterooms>=3 || ?realestatefloorSpace >= 80)) }
 SPARQL
 Output
www.ict.tuwien.ac.at
Institute of
Computer Technology
Live Presentation of Example Use Cases
17
Deep Web Mediator and examples available online:
http://semann.bdoenz.com/default.aspx
#1 Plain list: Extract the average rent per town from a single site.
#2 Search and result list: Extract test results on cars of the brand Audi from a single site and return brand, model and the test
conclusion.
#3 Search, list and detail page: Extract real estate offers from a single site and return details for offers with 3 or more rooms and a
rent of 800€ to 1200€.
#4 Disjunctive query: Extract used car offers from a single site and return details of all offers for cars of the brand “Audi” that are
priced under 12.500€ if the construction year is after 2011 or under 15.000€ if the construction year is after 2012.
#5 Union: Extract used car offers from all available sites and return details of offers for cars of the brand “Audi” that are priced
under 12.500€ and have a construction year after 2011.
#6 Disjunctive union: Extract used car offers from all available sites and return details of all offers for cars of the brand “Audi” that
are priced under 12.500€ if the construction year is after 2011 or under 15.000€ if the construction year is after 2012.
#7 Relations between sources: Extract average rents and real estate offers from all available sites and return those that are
located in a specific town and have a lower rent/m² than the average for that town.
#8 Deep Web and local databases: Extract real estate offers from all available sites and add the type of town and population from a
local dataset.
#9 Deep Web and external databases: Extract real estate offers from all available sites and add a description of the town and the
population from an external SPARQL endpoint (dbPedia).
www.ict.tuwien.ac.at
Institute of
Computer Technology
Conclusion
18
 Processing of queries using a query forwarding approach
• SPARQL queries as input
• Query transformation and forwarding via mediators
• Global-as-View mapping of local sources
 Web form interaction and information extraction
• Extraction process based on an extensible model
• Semantic annotations for mapping real-world Web pages to the model
• Feature-based rules for creating annotations
www.ict.tuwien.ac.at
Institute of
Computer Technology
Extracting Data from the Deep Web with Global-as-View
Mediators Using Rule-Enriched Semantic Annotations
Harold Boley
harold.boley[at]unb.ca
University of New Brunswick
Faculty of Computer Science
Fredericton, NB, Canada
Benjamin Dönz
Doenz[at]ict.tuwien.ac.at
Vienna University of Technology
Institute of Computer Technology
Vienna, Austria
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #3
20
This example is situated in the domain of real estate, and asks for the name of the offer, a description, the
number of rooms, the floor space and the rent of all offers with 3 or more rooms and a rent in the range of
800€ and 1200€ from a specific real estate site. To process this query, the mediator accesses the query
interface of the site, sets the parameters in the fields of the web form and triggers the search function to
submit the query. The returned result lists are iterated to extract the values from the list itself, but also from
subpages by following the the corresponding link for each record. All extracted facts are collected in a
database and presented to the user in a tabular style including a link to the page where the offer was
found.
SELECT ?source ?realestatetownname ?realestateoffername ?realestatedescription ?
realestaterent ?realestaterooms ?realestatefloorSpace
FROM <http://derstandard.at/anzeiger/immoweb/Immobilien-suche.aspx>
WHERE {?object rdf:type <http://semannot.bdoenz.com/realestate#RealEstateOffer>.
OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#townname> ?realestatetownname}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#offername> ?realestateoffername}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#description> ?realestatedescription}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rent> ?realestaterent}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rooms> ?realestaterooms}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#floorSpace> ?realestatefloorSpace}.
FILTER (?realestaterent>=800 &&
?realestaterent<=1200 &&
?realestaterooms>=3
)
}
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #3
21
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #6
22
This example is situated in the domain of used cars and is the combination of the previous two examples:
The query requests the name, model, color, construction year, mileage and price of offers from a single
site, where the brand of the car is “Audi” and that are priced under 12.500€ if the construction year is after
2010, or under 15.000€ if the construction year is after 2011. This query is split into two conjunctive
subqueries and submit to all three available sites returning the union of a total of 6 queries.
SELECT DISTINCT ?source ?carsbrand ?carsmodel ?carsoffername ?carscolor ?carsmileage ?
carsconstructionYear ?carsofferprice
WHERE {?object rdf:type <http://semannot.bdoenz.com/cars#UsedCarOffer>.
OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#brand> ?carsbrand}.
?object <http://semannot.bdoenz.com/cars#brand> "Audi".
OPTIONAL {?object <http://semannot.bdoenz.com/cars#model> ?carsmodel}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#offername> ?carsoffername}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#color> ?carscolor}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#mileage> ?carsmileage}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#constructionYear> ?carsconstructionYear}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#offerprice> ?carsofferprice}.
FILTER (
(?carsconstructionYear>=2010 && ?carsofferprice<=12500)
||
(?carsconstructionYear>=2011 && ?carsofferprice<=15000)
)
}
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #6
23
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #7
24
This example is situated in the domain of real estate, and asks for the name of the offer, a description, the
number of rooms, the floor space and the rent of all offers with 3 or more rooms in the town of
"Klosterneuburg" where the rent is lower than the average rent per square meter for that town. No specific
site is referenced in the query, the mediator therefore includes all sites with real estate offers and also a s
site containig average rents. Each of these are accessed in turn collecting the intermediate results in a
database before applying the filter and returning the results. Intermediate results are only available after
the average rents have been extracted and can be compared to the offers in the defined manner. Note that
this type of query cannot be generated by the query wizard, but is entered directly as SPARQL
SELECT ?source ?realestatetownname ?realestateoffername ?realestaterent ?realestaterooms ?
realestatefloorSpace
WHERE {?object rdf:type <http://semannot.bdoenz.com/realestate#RealEstateOffer>.
OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#townname> ?realestatetownname}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#offername> ?realestateoffername}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rent> ?realestaterent}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rooms> ?realestaterooms}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#floorSpace> ?realestatefloorSpace}.
?community rdf:type <http://semannot.bdoenz.com/realestate#Community>.
?community <http://semannot.bdoenz.com/realestate#averageRent> ?avrent.
?community <http://semannot.bdoenz.com/realestate#communityname> ?cname.
FILTER (REGEX(?realestatetownname,"Klosterneuburg","i") &&
?realestaterent>=800 && ?realestaterent<=1200 &&
?realestaterooms>=3 && REGEX(?cname,"Klosterneuburg","i") &&
?realestaterent/?realestatefloorSpace <(?avrent))}
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #7
25

More Related Content

Viewers also liked

Terreno En Chriqui Con Mapak
Terreno En Chriqui Con MapakTerreno En Chriqui Con Mapak
Terreno En Chriqui Con Mapak
Dalimagen
 
Hna Crescencia Perez
Hna Crescencia PerezHna Crescencia Perez
Hna Crescencia Perezclaegfmh
 
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTOBrooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
ÁggapBrasil
 
NET-Metrix-Audit Juni 2009
NET-Metrix-Audit Juni 2009NET-Metrix-Audit Juni 2009
NET-Metrix-Audit Juni 2009danieltschudi
 
Governanca patricipacao social e dialogo federativo
Governanca patricipacao social e dialogo federativoGovernanca patricipacao social e dialogo federativo
Governanca patricipacao social e dialogo federativo
Cogepp CEPAM
 
FUNDAEMPRESARIAL
FUNDAEMPRESARIALFUNDAEMPRESARIAL
FUNDAEMPRESARIAL
Martha Lopez
 
Cloud contractadministrationb
Cloud contractadministrationbCloud contractadministrationb
Cloud contractadministrationb
IPSA
 
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
GDNÄ - Die Wissensgesellschaft
 
Masies de Sant Joan Despi
Masies de Sant Joan DespiMasies de Sant Joan Despi
Masies de Sant Joan Despi
Seniorlab25
 
EasyVista Company presentation
EasyVista Company presentationEasyVista Company presentation
EasyVista Company presentation
EasyVista
 
Guia de los derechos de los trabajadores
Guia de los derechos de los trabajadoresGuia de los derechos de los trabajadores
Guia de los derechos de los trabajadores
Miguel Simón
 
EETT clinica cruz blanca
EETT clinica cruz blancaEETT clinica cruz blanca
EETT clinica cruz blanca
construline
 
The Essential Global Guide to Liqueurs-Presentation
The Essential Global Guide to Liqueurs-Presentation The Essential Global Guide to Liqueurs-Presentation
The Essential Global Guide to Liqueurs-Presentation
Tales of the Cocktail
 
Benito Pérez Galdós – presentación por David Arroyo
Benito Pérez Galdós – presentación por David ArroyoBenito Pérez Galdós – presentación por David Arroyo
Benito Pérez Galdós – presentación por David Arroyo
DAVIDSTREAMS.com
 

Viewers also liked (17)

Terreno En Chriqui Con Mapak
Terreno En Chriqui Con MapakTerreno En Chriqui Con Mapak
Terreno En Chriqui Con Mapak
 
Hna Crescencia Perez
Hna Crescencia PerezHna Crescencia Perez
Hna Crescencia Perez
 
JBM-HH Bulletin 6-10
JBM-HH Bulletin 6-10JBM-HH Bulletin 6-10
JBM-HH Bulletin 6-10
 
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTOBrooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
 
NET-Metrix-Audit Juni 2009
NET-Metrix-Audit Juni 2009NET-Metrix-Audit Juni 2009
NET-Metrix-Audit Juni 2009
 
Governanca patricipacao social e dialogo federativo
Governanca patricipacao social e dialogo federativoGovernanca patricipacao social e dialogo federativo
Governanca patricipacao social e dialogo federativo
 
FUNDAEMPRESARIAL
FUNDAEMPRESARIALFUNDAEMPRESARIAL
FUNDAEMPRESARIAL
 
Cloud contractadministrationb
Cloud contractadministrationbCloud contractadministrationb
Cloud contractadministrationb
 
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
 
Masies de Sant Joan Despi
Masies de Sant Joan DespiMasies de Sant Joan Despi
Masies de Sant Joan Despi
 
EasyVista Company presentation
EasyVista Company presentationEasyVista Company presentation
EasyVista Company presentation
 
Guia de los derechos de los trabajadores
Guia de los derechos de los trabajadoresGuia de los derechos de los trabajadores
Guia de los derechos de los trabajadores
 
EETT clinica cruz blanca
EETT clinica cruz blancaEETT clinica cruz blanca
EETT clinica cruz blanca
 
The Essential Global Guide to Liqueurs-Presentation
The Essential Global Guide to Liqueurs-Presentation The Essential Global Guide to Liqueurs-Presentation
The Essential Global Guide to Liqueurs-Presentation
 
Que Es La User Experience
Que Es La User ExperienceQue Es La User Experience
Que Es La User Experience
 
Benito Pérez Galdós – presentación por David Arroyo
Benito Pérez Galdós – presentación por David ArroyoBenito Pérez Galdós – presentación por David Arroyo
Benito Pérez Galdós – presentación por David Arroyo
 
11018 ftp
11018 ftp11018 ftp
11018 ftp
 

Similar to RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations

Smart Cities and Intelligent Buildings.pptx
Smart Cities and Intelligent Buildings.pptxSmart Cities and Intelligent Buildings.pptx
Smart Cities and Intelligent Buildings.pptx
ReetaDutta1
 
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptxPresentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
SharanabasappaDegoan
 
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Justin Hayward
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
Boris Adryan
 
TIER TIER RESEARCH RESEARCH
TIER TIER RESEARCH RESEARCHTIER TIER RESEARCH RESEARCH
TIER TIER RESEARCH RESEARCHwebhostingguy
 
Smart Energy-Vincenzo Croce.pptx
Smart Energy-Vincenzo Croce.pptxSmart Energy-Vincenzo Croce.pptx
Smart Energy-Vincenzo Croce.pptx
FIWARE
 
Fog Lifter Summary from CES
Fog Lifter Summary from CESFog Lifter Summary from CES
Fog Lifter Summary from CES
billwzel
 
Apidays x api3 9th dec
Apidays x api3   9th decApidays x api3   9th dec
Apidays x api3 9th dec
BenCarvill1
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenterwebhostingguy
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenterwebhostingguy
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenterwebhostingguy
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenterwebhostingguy
 
E mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCTE mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCT
dinesh2vasu
 
Bem2034
Bem2034Bem2034
Bem2034
Lisa Harris
 
React Native e IoT - Un progetto complesso
React Native e IoT - Un progetto complessoReact Native e IoT - Un progetto complesso
React Native e IoT - Un progetto complesso
Commit University
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
MongoDB
 
DC4Cities: an innovative approach for efficient and environmentally sustainab...
DC4Cities: an innovative approach for efficient and environmentally sustainab...DC4Cities: an innovative approach for efficient and environmentally sustainab...
DC4Cities: an innovative approach for efficient and environmentally sustainab...
DC4Cities Project: Environmentally Sustainable Data Centre for Smart Cities
 
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web PagesWSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
IOSR Journals
 
Supporting a Cloud Platform with Streams of Factory Shop Floor Data in the C...
Supporting a Cloud Platform with Streams of  Factory Shop Floor Data in the C...Supporting a Cloud Platform with Streams of  Factory Shop Floor Data in the C...
Supporting a Cloud Platform with Streams of Factory Shop Floor Data in the C...
FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology
 
FIWARE Overview (University Cairo 20Aug2017)
FIWARE Overview (University Cairo 20Aug2017)FIWARE Overview (University Cairo 20Aug2017)
FIWARE Overview (University Cairo 20Aug2017)
FIWARE
 

Similar to RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations (20)

Smart Cities and Intelligent Buildings.pptx
Smart Cities and Intelligent Buildings.pptxSmart Cities and Intelligent Buildings.pptx
Smart Cities and Intelligent Buildings.pptx
 
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptxPresentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
 
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
 
TIER TIER RESEARCH RESEARCH
TIER TIER RESEARCH RESEARCHTIER TIER RESEARCH RESEARCH
TIER TIER RESEARCH RESEARCH
 
Smart Energy-Vincenzo Croce.pptx
Smart Energy-Vincenzo Croce.pptxSmart Energy-Vincenzo Croce.pptx
Smart Energy-Vincenzo Croce.pptx
 
Fog Lifter Summary from CES
Fog Lifter Summary from CESFog Lifter Summary from CES
Fog Lifter Summary from CES
 
Apidays x api3 9th dec
Apidays x api3   9th decApidays x api3   9th dec
Apidays x api3 9th dec
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenter
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenter
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenter
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenter
 
E mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCTE mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCT
 
Bem2034
Bem2034Bem2034
Bem2034
 
React Native e IoT - Un progetto complesso
React Native e IoT - Un progetto complessoReact Native e IoT - Un progetto complesso
React Native e IoT - Un progetto complesso
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
 
DC4Cities: an innovative approach for efficient and environmentally sustainab...
DC4Cities: an innovative approach for efficient and environmentally sustainab...DC4Cities: an innovative approach for efficient and environmentally sustainab...
DC4Cities: an innovative approach for efficient and environmentally sustainab...
 
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web PagesWSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
 
Supporting a Cloud Platform with Streams of Factory Shop Floor Data in the C...
Supporting a Cloud Platform with Streams of  Factory Shop Floor Data in the C...Supporting a Cloud Platform with Streams of  Factory Shop Floor Data in the C...
Supporting a Cloud Platform with Streams of Factory Shop Floor Data in the C...
 
FIWARE Overview (University Cairo 20Aug2017)
FIWARE Overview (University Cairo 20Aug2017)FIWARE Overview (University Cairo 20Aug2017)
FIWARE Overview (University Cairo 20Aug2017)
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations

  • 1. www.ict.tuwien.ac.at Institute of Computer Technology Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations Harold Boley harold.boley[at]unb.ca University of New Brunswick Faculty of Computer Science Fredericton, NB, Canada Benjamin Dönz doenz[at]ict.tuwien.ac.at Vienna University of Technology Institute of Computer Technology Vienna, Austria
  • 2. www.ict.tuwien.ac.at Institute of Computer Technology The „Deep Web“ – What is it? 2  Data hidden behind search forms and interfaces  Estimated 400-500 times more information than the indexable World Wide Web  77% of the content classified as structured information  Template based – so understanding how to extract one result allows to extract them all  Examples: • Web shops • Classified advertising • Miscellaneous databases
  • 4. www.ict.tuwien.ac.at Institute of Computer Technology Our Approach: Upper-Right Quadrant 4  Processing of queries using a query forwarding approach • SPARQL queries as input • Query transformation and forwarding via mediators • Global-as-View mapping of local sources  Web form interaction and information extraction • Extraction process based on an extensible model • Semantic annotations for mapping real-world Web pages to the model • Feature-based rules for creating annotations
  • 5. www.ict.tuwien.ac.at Institute of Computer Technology Model Overview 5  Query interface for submitting conjunctive queries
  • 6. www.ict.tuwien.ac.at Institute of Computer Technology Model Overview 6  Query interface for submitting conjunctive queries  Result list: all valid records, but only key attribute/value pairs
  • 7. www.ict.tuwien.ac.at Institute of Computer Technology Model Overview 7  Query interface for submitting conjunctive queries  Result list: all valid records, but only key attribute/value pairs  Result detail: all attribute/value pairs, but only one record
  • 9. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (1) 9 SELECT ? townname ?offername ?description ?realestaterent ?realestaterooms ?realestatefloorSpace WHERE {?object rdf:type realestate:RealEstateOffer; realestate:townname ?townname; realestate:offername ?offername; realestate:description ?description; realestate:rent ?realestaterent; realestate:rooms ?realestaterooms; realestate:floorSpace ?realestatefloorSpace}. FILTER (?realestaterent>=800 && ?realestaterent<=1200 && (?realestaterooms>=3 || ?realestatefloorSpace >= 80)) }  Query: “Return details of real estate offers with a rent between 800€ and 1200€ and either at least 3 rooms or 80m²”  SPARQL:
  • 10. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (2) 10  Transformation: • Parse Query • Transform filter to Disjunctive Normal Form and split into subqueries • Unfold to include relevant sources via Global-as-View mappings
  • 11. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (2) 11  Transformation: • Parse Query • Transform filter to Disjunctive Normal Form and split into subqueries • Unfold to include relevant sources via Global-as-View mappings  Result SELECT FROM <http://derStandard.at>: greaterOrEqual(realestate:rent,800) ∧ lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:rooms,3) UNION SELECT FROM <http://at.immolive24.com>: greaterOrEqual(realestate:rent,800) ∧ lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:rooms,3) UNION SELECT FROM <http://derStandard.at>: greaterOrEqual(realestate:rent,800) ∧ lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:floorSpace,100) UNION SELECT FROM <http://at.immolive24.com>: greaterOrEqual(realestate:rent,800) ∧ lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:floorSpace,100)
  • 12. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (3) 12  Features used for identification elements  “Properties” of the HTML tags  For example id, class, value, tag path, associated text, …
  • 13. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (4) 13  Object-centered Datalog rules  Conditions: conjunction of req. features  Conclusions: concepts of the model  Single efficient evaluation pass
  • 16. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (7) 16 SELECT ? townname ?offername ?description ?realestaterent ?realestaterooms ?realestatefloorSpace WHERE {?object rdf:type realestate:RealEstateOffer; realestate:townname ?townname; realestate:offername ?offername; realestate:description ?description; realestate:rent ?realestaterent; realestate:rooms ?realestaterooms; realestate:floorSpace ?realestatefloorSpace}. FILTER (?realestaterent>=800 && ?realestaterent<=1200 && (?realestaterooms>=3 || ?realestatefloorSpace >= 80)) }  SPARQL  Output
  • 17. www.ict.tuwien.ac.at Institute of Computer Technology Live Presentation of Example Use Cases 17 Deep Web Mediator and examples available online: http://semann.bdoenz.com/default.aspx #1 Plain list: Extract the average rent per town from a single site. #2 Search and result list: Extract test results on cars of the brand Audi from a single site and return brand, model and the test conclusion. #3 Search, list and detail page: Extract real estate offers from a single site and return details for offers with 3 or more rooms and a rent of 800€ to 1200€. #4 Disjunctive query: Extract used car offers from a single site and return details of all offers for cars of the brand “Audi” that are priced under 12.500€ if the construction year is after 2011 or under 15.000€ if the construction year is after 2012. #5 Union: Extract used car offers from all available sites and return details of offers for cars of the brand “Audi” that are priced under 12.500€ and have a construction year after 2011. #6 Disjunctive union: Extract used car offers from all available sites and return details of all offers for cars of the brand “Audi” that are priced under 12.500€ if the construction year is after 2011 or under 15.000€ if the construction year is after 2012. #7 Relations between sources: Extract average rents and real estate offers from all available sites and return those that are located in a specific town and have a lower rent/m² than the average for that town. #8 Deep Web and local databases: Extract real estate offers from all available sites and add the type of town and population from a local dataset. #9 Deep Web and external databases: Extract real estate offers from all available sites and add a description of the town and the population from an external SPARQL endpoint (dbPedia).
  • 18. www.ict.tuwien.ac.at Institute of Computer Technology Conclusion 18  Processing of queries using a query forwarding approach • SPARQL queries as input • Query transformation and forwarding via mediators • Global-as-View mapping of local sources  Web form interaction and information extraction • Extraction process based on an extensible model • Semantic annotations for mapping real-world Web pages to the model • Feature-based rules for creating annotations
  • 19. www.ict.tuwien.ac.at Institute of Computer Technology Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations Harold Boley harold.boley[at]unb.ca University of New Brunswick Faculty of Computer Science Fredericton, NB, Canada Benjamin Dönz Doenz[at]ict.tuwien.ac.at Vienna University of Technology Institute of Computer Technology Vienna, Austria
  • 20. www.ict.tuwien.ac.at Institute of Computer Technology Use case #3 20 This example is situated in the domain of real estate, and asks for the name of the offer, a description, the number of rooms, the floor space and the rent of all offers with 3 or more rooms and a rent in the range of 800€ and 1200€ from a specific real estate site. To process this query, the mediator accesses the query interface of the site, sets the parameters in the fields of the web form and triggers the search function to submit the query. The returned result lists are iterated to extract the values from the list itself, but also from subpages by following the the corresponding link for each record. All extracted facts are collected in a database and presented to the user in a tabular style including a link to the page where the offer was found. SELECT ?source ?realestatetownname ?realestateoffername ?realestatedescription ? realestaterent ?realestaterooms ?realestatefloorSpace FROM <http://derstandard.at/anzeiger/immoweb/Immobilien-suche.aspx> WHERE {?object rdf:type <http://semannot.bdoenz.com/realestate#RealEstateOffer>. OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#townname> ?realestatetownname}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#offername> ?realestateoffername}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#description> ?realestatedescription}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rent> ?realestaterent}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rooms> ?realestaterooms}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#floorSpace> ?realestatefloorSpace}. FILTER (?realestaterent>=800 && ?realestaterent<=1200 && ?realestaterooms>=3 ) }
  • 22. www.ict.tuwien.ac.at Institute of Computer Technology Use case #6 22 This example is situated in the domain of used cars and is the combination of the previous two examples: The query requests the name, model, color, construction year, mileage and price of offers from a single site, where the brand of the car is “Audi” and that are priced under 12.500€ if the construction year is after 2010, or under 15.000€ if the construction year is after 2011. This query is split into two conjunctive subqueries and submit to all three available sites returning the union of a total of 6 queries. SELECT DISTINCT ?source ?carsbrand ?carsmodel ?carsoffername ?carscolor ?carsmileage ? carsconstructionYear ?carsofferprice WHERE {?object rdf:type <http://semannot.bdoenz.com/cars#UsedCarOffer>. OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#brand> ?carsbrand}. ?object <http://semannot.bdoenz.com/cars#brand> "Audi". OPTIONAL {?object <http://semannot.bdoenz.com/cars#model> ?carsmodel}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#offername> ?carsoffername}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#color> ?carscolor}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#mileage> ?carsmileage}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#constructionYear> ?carsconstructionYear}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#offerprice> ?carsofferprice}. FILTER ( (?carsconstructionYear>=2010 && ?carsofferprice<=12500) || (?carsconstructionYear>=2011 && ?carsofferprice<=15000) ) }
  • 24. www.ict.tuwien.ac.at Institute of Computer Technology Use case #7 24 This example is situated in the domain of real estate, and asks for the name of the offer, a description, the number of rooms, the floor space and the rent of all offers with 3 or more rooms in the town of "Klosterneuburg" where the rent is lower than the average rent per square meter for that town. No specific site is referenced in the query, the mediator therefore includes all sites with real estate offers and also a s site containig average rents. Each of these are accessed in turn collecting the intermediate results in a database before applying the filter and returning the results. Intermediate results are only available after the average rents have been extracted and can be compared to the offers in the defined manner. Note that this type of query cannot be generated by the query wizard, but is entered directly as SPARQL SELECT ?source ?realestatetownname ?realestateoffername ?realestaterent ?realestaterooms ? realestatefloorSpace WHERE {?object rdf:type <http://semannot.bdoenz.com/realestate#RealEstateOffer>. OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#townname> ?realestatetownname}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#offername> ?realestateoffername}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rent> ?realestaterent}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rooms> ?realestaterooms}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#floorSpace> ?realestatefloorSpace}. ?community rdf:type <http://semannot.bdoenz.com/realestate#Community>. ?community <http://semannot.bdoenz.com/realestate#averageRent> ?avrent. ?community <http://semannot.bdoenz.com/realestate#communityname> ?cname. FILTER (REGEX(?realestatetownname,"Klosterneuburg","i") && ?realestaterent>=800 && ?realestaterent<=1200 && ?realestaterooms>=3 && REGEX(?cname,"Klosterneuburg","i") && ?realestaterent/?realestatefloorSpace <(?avrent))}