• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Linked Open data: CNR
 

Linked Open data: CNR

on

  • 492 views

 

Statistics

Views

Total Views
492
Views on SlideShare
394
Embed Views
98

Actions

Likes
0
Downloads
5
Comments
0

3 Embeds 98

http://www.innovatoripa.it 94
https://twitter.com 2
https://web.tweetdeck.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Linked Open data: CNR Linked Open data: CNR Presentation Transcript

    • data.cnr.it and the Semantic Scout CNR Semantic Technology Lab ISTC - SIAldo Gangemi, Alberto Salvati, Enrico Daga, Gianluca TroianiThanks to Claudio Baldassarre (UN-FAO) and Alfio Gliozzo (IBM-Watson) http://stlab.istc.cnr.it http://data.cnr.it http://bit.ly/semanticscout 1
    • data.cnr.it 2
    • Enhanced SPARQL endpoint 3
    • Ontologies 4
    • Sample class from ontology 5
    • The Semantic Scout• A framework for search, presentation, and analysis of entities and their associated knowledge• Employs SW, LOD, NLP, IR• Scientific work goes back to 2006, first presented at ISWC2007• An evolving prototype for requirements of the EU IP IKS: semantic search, hybrid IR/SW identity management, automatic document classification (against DBpedia)• 2009 requirements from the technology transfer office of CNR for the NetwOrK initiative 6
    • The CNR• CNR is the largest research institution in Italy – about 8000 permanent researchers (+14000) – 7 departments focused on the main scientific research areas – 108 institutes spread all over Italy • Subdivided into research units, labs, etc. 7
    • The CNR data sources Organizational data File System DB DB Administration DB Frameworks, Departments documentation Programmes, Workpackages Institutes, Central admin, Publications Activity-related data Only partly as open data! DB DBCurricula Permanent DB employees DB Financial data Accounting, Other Contracts, research Invoicing employees, Personnel-related data Externally funded projects 8
    • The CNR tasks• Strategic objective: matching the research demand to the research supply• Requirements – Semantic interoperability between heterogeneous data sources – Expert finding based on competence – Monitoring funding and evolution of different research areas and units – Browsing and reporting capabilities 9
    • Architecture 10
    • 11
    • Methods for data conversion, extraction, inference, integration, linking, publishing, and searching 12
    • Figures } 28 modules 120 classes CNR  Ontology 300 relations }1200 axioms>200K entities≈3M facts (about 2M inferred or extracted) CNR  Data≈240 datasets 13
    • Sources and lifting• Situation usually not as clean as using a unique CMS for most organizational tasks• DB (e.g. SQL Server) + a lot of textual records + HTML Web Site + textual corpus + linked open data• DB + interaction schemata (XML templates and HTML scraping, needed because of schemata degradation and user perspective evolution) 14
    • Ontology design• Starting from XML templates as module/pattern drafts• Reengineering XML and scraped templates• Reengineering DB schemata (system engineer involved)• Obtained modular, pattern-based, task-based ontology• Textual DB records with identity: precondition for hybridizing IR and SW (see later)• Alignments to FOAF, SIOC, SKOS, WordNet ontologies• Used patterns: situation, place, transitive reduction 15
    • The CNRontology 16
    • Data design• Triplifiers based on SQL rules (automatic scripting on JDBC drivers not enough because of legacy degradation of physical schemata) – Cf. also: Semion reengineering tool• Inferences: OWL (Pellet, HermiT), SPARQL CONSTRUCT• Extraction tool: Semiosearch, categorizer over Wikipedia categories – Next: deep parsing approach (facts, relations, entities) 17
    • Publishing and hybridizing• Publishing OWL-RDF datasets – linked data approach (persistent URIs, triple stores for RDF dataset management, linking to common vocabularies: FOAF, DBpedia, Geonames, Bibo, ...) – OWL ontologies for dataset generation, querying, inference (new enriched datasets)• Subgraph extraction through SNA• Virtual semantic corpus – IRW to distinguish information and non-information resources – SPARQL rules to generate virtual texts associated with entities• Indexing – Lucene+LSA indexing of semantic corpus – “Semantic” Lucene extension to produce tight coupling of virtual texts with entities – Multilinguality 18
    • Consuming• SPARQL endpoint, with interface enhancement• Keyword-based search – Semantic browsing with SPARQL-based AJAX DHTML, RDF relation browser, or XML-based relation browser• Category-based search – Keyword-based result focusing 19
    • 20
    • 21
    • http://bit.ly/semanticscout 22
    • Expert finding: Task-based testing• It is based on the ability to materialize on demand a contextual network of relevant information.• It is performed with a combination of tools in the toolkit to: – Identify the main topics of research – Recursively search the CNR data cloud 23
    • Identifying the main topics of research: project description• “Reputation is a social knowledge, on which a number of social decisions are accomplished. Regulating society from the morning of mankind becomes more crucial with the pace of development of ICT technologies, dramatically enlarging the range of interaction and generating new types of aggregation. Despite its critical role, reputation generation, transmission and use are unclear. The project aims to an interdisciplinary theory of reputation and to modeling the interplay between direct evaluations and meta-evaluations in three types of decisions, epistemic (whether to form a given evaluation), strategic (whether and how interact with target), and memetic (whether and which evaluation to transmit).” – Project About: Social Knowledge for e-Governance. – Topics can be manually annotated, or automatically induced, e.g.: ethics, sociology, collaboration, social network, reputation 24
    • Identifying the main topics of research: text categorization• Query: “ethics, sociology, collaboration, social network, reputation” 25
    • Search the CNR data cloud: identify an entry point• “Commessa” (programme): “Il Circuito dell’Integrazione: Mente, Relazioni e Reti Sociali. Simulazione Sociale e Strumenti di Governance” 26
    • Search the CNR data cloud: identify key people• Ing. Jordi Sabater: Cognitive Science;• Dott. Mario Paolucci: Sociology, Psichology;• Gennaro di Tosto: Artificial Intelligence;• Walter Quattrociocchi: Interdisciplinary Fields;• Giuseppe Castaldi: Ethics; 27• Aldo Gangemi: Semantic Web, Knowledge representation.
    • Expert Finding: Results• The description of “eRep project” was adopted as a gold standard to evaluate the results when testing the Semantic Scout.• 6 out of 10 CNR researchers, were correctly retrieved and a project member affiliated with another institution. – Project Coordinator: Dott. Mario Paolucci – External Member: Jordi Sabater Mir 28
    • Functional evaluation of Semantic Scout (example)• Expert finding accuracy – All the 6 retrieved people scored among the first 10 in the result from the search engine.• Benefit of integrated data cloud – The user judged an “activity” to be relevant to his goal and used it as entry point to the CNR newtork of resources. 29
    • Functional evaluation of Semantic Scout• Accessibility and Interaction – Multiple users interfaces guarantee the users an adaptive level of interaction to each specific type of required information• Completeness of retrieval – 4 people have not been included in our result set. – Antonietta Di Salvatore: scored below the first 10 people in the list;(+1) – Giulia Andrighetto was not listed among the people relevant to the query, but belongs to the social network of Dr. Rosaria Conte.(+1) – Marco Capenni and Stefano Picascia: have a technician profile, hence they are neither reported among the people relevant to the search query, nor belong to the network of any of the other researchers. 30
    • Ongoing work• More data linking (e.g. DBLP, Georeferencing)• Synchronization with data sources• More interaction paradigms• Privacy issues interlaced with hierarchical and idiosyncratic practices 31
    • Conclusions• Hybridizing several semantic and retrieval technologies provides added value to a research organization• Scalability works for CNR figures• Interaction is a core selling point• Try it at http://bit.ly/semanticscout• @data_cnr_it, @semanticscout, @aldogangemi 32