LOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype

1,302 views

Published on

LOD2 plenary meeting in Paris: presentation of WP1: State of Play: Requirements, Design & LOD2 Stack Prototype by Helmut Nagy (Semantic Web Company).

Published in: Business, Technology
3 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
1,302
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
3
Likes
0
Embeds 0
No embeds

No notes for slide
  • UCS 7.1 - Content Acquisition This use case scenario is dealing with all aspects relevant to identifying, selecting, collecting and approving relevant content for further processing. UCS 7.2 - Content Enrichment and Composition This use case scenario is dealing with the transformation of data formats and integration of data, the precise enrichment of available documents with Linked (Meta)Data (i.e. structured tagging) as well as the composition of new documents or content products by utilizing Linked Data principles. UCS 7.3 - Contextualisation and Cloud-Publishing This use case scenario is dealing with the engineering of domain-specific knowledge models (Thesauri) and the de-referencing of its concepts through LOD-Cloud sources. Additionally all aspects of publishing data to the LOD Cloud are being handled here. UCS 7.4 - Enterprise Applications and Customer Products This use case scenario is dealing with all end-user specific applications utilizing Linked Data to support internal workflows (enterprise applications) or power services or new products for third parties (customer products). UCS 7.5 - Service Innovation This use case scenario is dealing with any additional aspects relevant for service innovation based on Linked Data technology. It is meant to collect ideas for further product & service diversification based on Linked Data principles.
  • UCS 8.1 - Data Acquisition: The data acquisition is a combination of two acquisition processes: internal data collected from inside the enterprise information system frontier and external data coming from the web. Internal content acquisition: The internal acquisition use case scenario consists of the deployment of various enterprise components to collect and extract content from the different ERP applications of the enterprise IT system. Specific interactions are required according to the formats and protocols of the different ERP applications. For our use case, the acquisition will focus on all the systems that provide data on the human resources of a company. Different systems could contain this information starting from basic file systems hosting excel sheets files on the employees information to complex modules like HR ERP systems. External content acquisition: The external content acquisition use case scenario is defined as the process of fetching data from the web. The targeted data is to be extracted from various sources having three typologies: Structured sources where data is usually retrieved by APIs and/or SQL/SPARQL like queries. The LOD cloud is an example of these sources where data is served by API requests (REST for example), SPARQL queries from SPARQL endpoints or directly by fetching structured files (like CSV). Semi-structured sources where data can be obtained using extraction rules like XPath queries from websites having uniform presentation structure (e-commerce or news web sites for example). Unstructured sources which includes most web pages where data has to be extracted from free text, media content, etc. In the formal hiring use case, several external sources can be targeted starting from job opportunities web sites like http://monster.com . Gathering and refreshing candidates profiles from web sites like LinkedIn would also provide interesting input to the application. UCS 8.2 - Content LODification and integration The integration and LODification of the content includes the set of tools and processes that edit, filter, clean, transform, enrich and interlink the acquired content. Additional knowledge, like ontologies and taxonomies, can be required to formalize and guide the previous process. Clustering and classification techniques are additional steps that could be used to increase the efficiency of the integration process. This would be achieved by clustering data into logical units or directly relate to concepts to better organise data and prune abnormal patterns of data. UCS 8.3 - Service To Consumer (S2C) Service to consumer is the part that prepares the mashed data from the previous process to be finalized. By finalized we mean bundled, published and ready to be consumed by final users or third-tier applications. The finalization of the mashed data includes the following features: The accessibility and security strategy that defines and grants the rights of viewing, using and consuming the data. Bundling policies where various export and presentation formats are proposed to target different uses. Search service to browse data using queries. The service functionality in the integration of LOD data into a corporate application is the most important feature in the process we are describing. The benefit of mixing and mashing data could mainly be measured by the quality of the service we provide to the end users. The notion of this service relies first on an interface that proposes widgets for visualising different mashed data. In the case of hiring use case, widgets will provide the ability to browse the used taxonomies with the corresponding profiles of candidates, the job opportunities and the matching candidates. In addition to this interface, a service for exporting the mashed data will be provided. UCS 8.4 - Monetization and sales This use case scenario defines the business exploitation strategy of the final content and is discussed for reasons of completeness (like in UCS5 from the Media & Publishing use case – see above). This implies the setup of quality and support structures. This is beyond the scope of the project but shall be dealt with from a theoretical perspective.
  • UCS 9.1 - Data Harvesting from External Sources This use case scenario is dealing with all aspects relevant for collecting data from several catalogues and data portals in Europe (regional, national, private). UCS 9.2 - Upload of Datasets This use case is dealing with all aspects relevant for the upload of individual datasets or bulk upload of datasets directly into publicdata.eu. UCS 9.3 - Data Curation, bundling, rating and commenting This use case is dealing with all aspects of the enrichment and maintenance of datasets, supporting crowd-sourcing mechanisms to enable a flourishing community of data re-users, curators and publishers. UCS 9.4 - Search and Browse Data This use case is dealing with all aspects of establishing easy to use search and browse mechanisms making it easier for people to find datasets they are looking for and datasets that they might be interested in. UCS 9.5 - Download and Interfaces This use case is dealing with all aspects of availability of data for consumption in several formats and via several interfaces. UCS 9.6 - Portal & Future Services This use case is dealing with establishing general data portal services (help etc.) and creating new services based on existing infrastructure.
  • LOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype

    1. 1. Creating Knowledge out of Interlinked Data WP1: Requirements, Design & LOD2 Stack Prototype Paris, 24.-25. March 2011 Helmut Nagy Semantic Web Company, ViennaLOD2 Presentation . 02.09.2010 . Page http://lod2.eu
    2. 2. Creating Knowledge out of Interlinked DataWP Overview 2LOD2 Event . 24-25.03.2011 . Page 2 http://lod2.eu
    3. 3. Creating Knowledge out of Interlinked DataWP 1 Overview 3LOD2 Event . 24-25.03.2011 . Page 3 http://lod2.eu
    4. 4. Creating Knowledge out of Interlinked DataWP 1 Task Overview & Deliverables No. Tasks / Action Outcome Lead Due Deliverables till 1.1. Common • Prosa Use Case Description from end user PoV: Report SWC M6 Requirements problem description & requested solution(s) Specification • Role Models: Extrapolating relevant roles for each use case • Documentation of technical state of the art for each use case (tool analysis, technical interdependencies, APIs, …) • Analysis of available datasets & metadata assets (structure, formats, volume, IPR, …) 1.2. State of the Art • industrial & academic publication review, Report SWC M4 Analysis standards review, standards white spots, … 1.3. Architecture & • Technical requirements for & interdependencies Report Tenforce M6 System Design of system architecture components • Coverage of all functional & non-funtional requirements 1.4. Early LOD2 Stack … to be specified … Software Tenforce M12 Prototype 4LOD2 Event . 24-25.03.2011 . Page 4 http://lod2.eu
    5. 5. Creating Knowledge out of Interlinked DataWP 1 Task Use Cases • UC1: LOD2 for Media and Publishing (WKD, WP7) • UC2: LOD2 for Enterprise Data Webs (EXALEAD, WP8) • UC3: GovData.eu – Publishing Governmental Information as Linked Data (OKFN, WP9) 5LOD2 Event . 24-25.03.2011 . Page 5 http://lod2.eu
    6. 6. Creating Knowledge out of Interlinked DataWP 7: Media & Publishing Use Case – Short Description “The application of Linked Data principles shall support the information management in lawyer-specific workflows. WKD clients - like attorneys - shall be supported in their daily workflows currently managed by AnNoText . Along these workflows the knowledge worker has to make decisions and take actions to collect, enrich and manage a diverse set of contents.” 6LOD2 Event . 24-25.03.2011 . Page 6 http://lod2.eu
    7. 7. Creating Knowledge out of Interlinked DataWP 7: Media & Publishing Use Case Scenarios 7LOD2 Event . 24-25.03.2011 . Page 7 http://lod2.eu
    8. 8. Creating Knowledge out of Interlinked DataWP 8: Enterprise Use Case – Short Description “The Formal Hiring use case shows that semantic technology and linked open data can support hiring processes without having to do all the configurations that link services, projects or products to resources manually. Once the target system has identified valid candidates for a specific purpose, the legal hiring process itself starts. This process is still knowledge intensive, but it can to some degree be standardized and therefore Semantic Web technologies and linked open data can support it” 8LOD2 Event . 24-25.03.2011 . Page 8 http://lod2.eu
    9. 9. Creating Knowledge out of Interlinked DataWP 8: Enterprise Use Case Scenarios 9LOD2 Event . 24-25.03.2011 . Page 9 http://lod2.eu
    10. 10. Creating Knowledge out of Interlinked DataWP 9: OGD Use Case – Short Description “Information about European public datasets is currently scattered across many different data catalogues, portals and websites in many different languages, implemented using many different technologies. The kinds of information stored about public datasets may vary from country to country, and from registry to registry. publicdata.eu will harvest and federate this information to enable users to search, query, process, cache and perform other automated tasks on the data from a single place. This helps to solve the "discoverability problem" of finding interesting data across many different government websites, at many different levels of government, and across the many governments in Europe. In addition to providing access to official information about datasets from public bodies, publicdata.eu will capture (proposed) edits, annotations, comments and uploads from the broader community of public data users. In this way, PublicData.EU will harness the social aspect of working with data to create opportunities for mass collaboration.”LOD2 Event . 24-25.03.2011 10Page 10 . http://lod2.eu
    11. 11. Creating Knowledge out of Interlinked DataWP 9: OGD Use Case ScenariosLOD2 Event . 24-25.03.2011 11Page 11 . http://lod2.eu
    12. 12. Creating Knowledge out of Interlinked DataWP 1: Consolidated Feature RequestsLOD2 Event . 24-25.03.2011 12Page 12 . http://lod2.eu
    13. 13. Creating Knowledge out of Interlinked DataWP 1: Consolidated Feature Requests – Data Acquisition FR# Title Description WP7 WP8 WP9 REQ 01 Identify data sources Monitoring for identifying new/relevant internal and external x x sources. REQ 02 Identify data Identify relevant data within sources. x x REQ 03 Consume/Harvest data Provide mechanisms to grab, extract, import and store data from x x x relevant sources. Data may be datasets to enrich content but also datasets that can be used as metadata for describing content or metadata sets describing datasets in other sources. REQ 04 Upload data Provide interfaces for adding relevant data. x Upload of relevant datasets and adding metadata describing the datasets. REQ 05 Synchronise data Provide mechanism for synchronisation of data and sources. x Monitoring changes to datasets or metadata in external sources. Offering changes in datasets or metadata to external sources. Bi- directional synchronisation of changes. REQ 06 Store acquired data Provide storage functionality. x x x Persistent storing of datasets and metadata including versioning of changes.LOD2 Event . 24-25.03.2011 13Page 13 . http://lod2.eu
    14. 14. Creating Knowledge out of Interlinked DataWP 1: Consolidated Feature Requests – Editing FR# Title Description WP7 WP8 WP9 REQ 07 Integrate data Provide mechanism for conversion/mapping to different specific x x x formats. Enable mapping of different metadata schemes. REQ 08 Display data Show existing and new data available in the repository for x x x technical manipulation. Display available metadata for datasets and datasets themselves. REQ 09 Analyse data Provide mechanism for analysing newly added data and its x values. (inconsistencies, validation, syntax errors) REQ 10 Edit/Update data Provide mechanism for editing and converting data and merging x x x new with existing data. Edit available metadata for datasets. REQ 06 Store data Store new or updated data to the repository. x x x REQ 05 Synchronise data Provide mechanism for synchronisation of edited data. x x x Offering changes in metadata/datasets to original data source.LOD2 Event . 24-25.03.2011 14Page 14 . http://lod2.eu
    15. 15. Creating Knowledge out of Interlinked DataWP 1: Consolidated Feature Requests – Compositing & Bundling FR# Title Description WP7 WP8 WP9 REQ 11 Search for data Provide search functionality for editorial curation of new datasets. x x x Advanced search mechanism for metadata and datasets (moderated search, faceted search, ...) REQ 08 Display data Provide mechanism for displaying available data sets/documents x x x and related meta data. Display of search results and details (metadata, datasets) REQ 12 Recommend data Provide functionality for data recommendations based on x x x semantic document analysis Recommendation of datasets based on search query, selected datasets and personal profile. REQ 10 Add data Add new data to a data set/document or meta data set. x x x Edit available metadata for datasets. REQ 14 Comment data Provide mechanism for commenting data sets according to x x x quality and content related criteria REQ 15 Rate data Provide mechanism for providing information on quality of data x x x set based on different quality criteria (relevance, popularity etc.) REQ 16 Tag data Provide mechanism for tagging data sets for specific purposes of x x x the user REQ 17 Link/Align data Provide mechanism for detecting and creating semantically x x x sound connections between and within data sets. Creating bundles of related datasets. REQ 06 Store data Store new or updated data to the repository x x xLOD2 Event . 24-25.03.2011 15Page 15 . http://lod2.eu
    16. 16. Creating Knowledge out of Interlinked DataWP 1: Consolidated Feature Requests – Data Interfacing FR# Title Description WP7 WP8 WP9 REQ 18 Publish linked data Provide mechanisms to export data (e.g. thesauri) to the lod x x cloud Provide metadata schema and metadata as linked data. REQ 19 Access data Provide mechanisms to access or download data x x x SPARQL Endpoint, APIs and Download of datasets and metadata.LOD2 Event . 24-25.03.2011 16Page 16 . http://lod2.eu
    17. 17. Creating Knowledge out of Interlinked DataWP 1: Consolidated Feature Requests – Services FR# Title Description WP7 WP8 WP9 REQ 20 Create data models Provide mechanisms to create taxonomies, thesauri or x x x ontologies for establishing a metadata structure. Establish a consistent, flexible metadata management. REQ 21 Quality assessment Provide mechanisms to check semantic consistency, validity and x representational quality of datasets/documents and meta data (e.g. thesauri) REQ 22 LOD monitoring Provide mechanisms to monitor usage and changes to data. x x Watch list/monitoring for datasets, bundles and search arrows. REQ 23 Version tracking Provide mechanisms to track changes (version) and usage x x (history) of datasets. Logging of changes metadata and to datasets, versioning of metadata and datasets. REQ 24 Visualise data Provide mechanisms for visualizing data models and data x x x structures. REQ 06 Store data Store data to the repository (new data, changes etc.) x x xLOD2 Event . 24-25.03.2011 17Page 17 . http://lod2.eu
    18. 18. Creating Knowledge out of Interlinked DataWP 1: D1.1 Common Requirements Can be found: •http://svn.aksw.org/lod2/D1.1 •https://grips.punkt.at/pages/viewpageattachments.action?pageId=21 891742 Contact: •Tassilo Pellegrini (t.pellegrini@semantic-web.at •Helmut Nagy (h.nagy@semantic-web.at)LOD2 Event . 24-25.03.2011 17Page 17 . http://lod2.eu
    19. 19. Creating Knowledge out of Interlinked Data Thank you for your attention!LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

    ×