Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tomas Knap | RDF Data Processing and Integration Tasks in UnifiedViews: Use Cases & Lessons Learned

257 views

Published on

http://2016.semantics.cc/tomas-knap-0

Published in: Technology
  • Be the first to comment

Tomas Knap | RDF Data Processing and Integration Tasks in UnifiedViews: Use Cases & Lessons Learned

  1. 1. Tomas Knap Semantic Web Company RDF Data Processing and Integration Tasks in UnifiedViews Use Cases & Lessons Learned 1
  2. 2. Agenda ▸ UnifiedViews ▹ Introduction of the Tool ▸ UnifiedViews Use Cases ▹ 3 Use Cases ▹ Benefits/Lessons Learned 2
  3. 3. UnifiedViews Introduction of the Tool 3
  4. 4. UnifiedViews Motivation ▸ Maintaining RDF data processing tasks is challenging ▹ Different tools ▹ Different configurations ▹ Tens of data processing tasks sharing parts of the data processing ▸ Debugging 4
  5. 5. UnifiedViews Approach ▸ UnifiedViews is an ETL tool for RDF data processing ▹ Allows users to manage RDF data processing tasks ▹ Natively supporting RDF data format 5
  6. 6. UnifiedViews Approach ▸ Standard maintenance interface ▹ Define, execute, monitor, schedule, and share data processing tasks ▹ Predefined and customizable building blocks (plugins) to set up the individual data processing tasks ▸ Debugging features ▸ Simplified documentation ▹ Visualizations of the prepared tasks ■ Plugins ■ Data flow 6
  7. 7. UnifiedViews Pipeline 7
  8. 8. UnifiedViews Core Components ▸ Web administration interface ▹ Define and maintain pipelines ▹ Validate, execute, monitor pipelines ▹ Possibility to schedule pipelines ■ Notifications ▹ Possibility to debug pipelines ▹ Possibility to share pipelines and plugins ▹ Define and maintain plugins ▹ Multi-user environment, SSO support ▸ Robust engine running the tasks ▸ API to work with tasks, executions, schedulled events 8
  9. 9. UnifiedViews Core Plugins ▸ Set of Core plugins available ▹ Extractors ■ Obtaining external sources (CSV, DBF, XLS, XML files, RDF data, or relational tables) ▹ Transformers ■ Transforming them between various formats (e.g. CSV files to RDF data, relational tables to RDF data) ■ Executing typical transformations such as SPARQL Update queries, or XSL transformations ▹ Loaders ■ Loading the transformed and curated data to external systems, repositories ▸ 35+ plugins 9
  10. 10. UnifiedViews Custom Plugins ▸ Easy way to extend UnifiedViews with your own plugins ▹ Guide for creating new plugins ▹ Tutorials 10
  11. 11. UnifiedViews Team 11
  12. 12. PoolParty Semantic Integrator and UnifiedViews ▸ UnifiedViews is part of PoolParty Semantic Integrator ▸ A semantic technology suite ▹ Organize and maintain company knowledge ▹ Annotate documents with concepts from the knowledge base ▹ Provide focused search on top of the annotated document space ▸ https://www.poolparty.biz/ ▹ Or please visit PoolParty booth 12
  13. 13. UnifiedViews Availability ▸ Available under an open source license (GPL + LGPL v3) ▹ Commercial license also available as part of PoolParty Semantic Integrator ▸ Hosted on GitHub ▹ https://github.com/UnifiedViews ▸ Latest release (June 2016): ▹ UnifiedViews Core 2.3.1 ▸ http://unifiedviews.eu 13
  14. 14. UnifiedViews Use Cases Overview 14
  15. 15. 3 Use Cases 1. Aligned Project ▹ Extraction/Annotation of data from Atlassian Confluence/JIRA 2. Boehringer Ingelheim ▹ Publication tracker 3. World Bank ▹ Annotation of World Bank docs ▹ Integration with MarkLogic 15
  16. 16. Use Case 1 Aligned Project 16
  17. 17. About ▸ Aligned project: ▹ H2020, http://aligned-project.eu/ ▸ One of the goals: ▹ Integrate outputs from commercial tools such as Atlassian Confluence, JIRA to bring a data-centric approach to governance of software and data engineering 17
  18. 18. UnifiedViews Use Case ▸ UnifiedViews pipeline ▹ Extracting data from Atlassian Confluence, JIRA ▹ Annotating textual content with a taxonomy maintained in PoolParty ▹ Loading everything to a remote triple store 18
  19. 19. UnifiedViews Pipeline 19
  20. 20. Benefits, Lessons Learned ▸ Predefined plugins which may be used out of the box ▹ No heavy programming ▸ Easy pipeline management via user interface ▸ Further support when preparing the pipeline ▹ Pipeline validation ▹ Pipeline debugging 20
  21. 21. Use Case 2 Boehringer Ingelheim Publication Tracker 21
  22. 22. About ▸ Boehringer Ingelheim wanted to get better overview over world-wide research activities ▸ Extract and annotate articles published at PubMed ▹ http://www.ncbi.nlm.nih.gov/pubmed ▸ Linking of unstructured and structured / internal and external information 22
  23. 23. UnifiedViews Use Case 23
  24. 24. Demo https://workingontologist.po olparty.biz/GraphSearch/ 24
  25. 25. Benefits, Lessons Learned ▸ Pipelines in UnifiedViews may be easily ▹ scheduled ▹ extended in the future ▸ Detailed information about the pipeline executions is available ▹ Events, logs ▸ Maintenance simplified 25
  26. 26. Benefits, Lessons Learned ▸ Missing ▹ Long running pipelines ■ Tighter integration of UnifiedViews and PoolParty Semantic Integrator ▹ Loops, conditional execution of plugins 26
  27. 27. Use Case 3 Annotation of World Bank Documents Integration with MarkLogic 27
  28. 28. About ▸ Goal: Search over annotated World Bank documents ▹ World Bank topical taxonomy ▹ Geo taxonomy ▸ Demo: ▹ http://marklogic-demo.poolparty.biz 28
  29. 29. UnifiedViews Use Case ▸ UnifiedViews pipeline to annotate portions of the World Bank documents ▹ Country & region information annotated with Geo taxonomy ▹ Full text, topics annotated with World Bank topical taxonomy 29
  30. 30. UnifiedViews Pipeline 30
  31. 31. Benefits, Lessons Learned ▸ Easy pipeline management via user interface ▹ Easy pipeline configuration ▸ Reusing already existing plugins ▹ Pipeline prepared quickly 31
  32. 32. Summary Lessons learned 32
  33. 33. Summary ▸ UnifiedViews ▹ UnifiedViews and PoolParty Semantic Integrator ▸ UnifiedViews Use Cases ▹ Conversion of sources to RDF data ▹ Annotation of sources ▹ Enrichment of the data ▹ Publication of the curated data to the target store ▸ UnifiedViews 2.0 in 5mins 33
  34. 34. Summarized Lessons Learned ▸ Easy pipeline management via user interface ▸ Predefined plugins which may be used out of the box ▹ No heavy programming ▹ Simplified pipeline creation ▸ Further support when preparing pipeline ▹ Pipeline validation ▹ Pipeline debugging ▸ Pipeline scheduling 34
  35. 35. Contact Tomas Knap, PhD Technical Consultant, Researcher Semantic Web Company ▸ t.knap@semantic-web.at ▸ https://www.semantic-web.at/ ▸ https://twitter.com/semwebcompany 35 © Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/

×