Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Text Mining in PoolParty Semantic Suite

153 views

Published on

Slides of the AIMS (http://aims.fao.org/) webinar of 21 September 2017 of Martin Kaltenböck and Timea Turdean (Semantic Web Company) about: Text Mining in PoolParty Semantic Suite (https://www.poolparty.biz)

Published in: Data & Analytics
  • Be the first to comment

Text Mining in PoolParty Semantic Suite

  1. 1. Martin Kaltenböck CFO, Semantic Web Company Timea Turdean Technical Consultant, SWC POOLPARTY SEMANTIC SUITE AIMS Webinar 21st Sept 2017 1
  2. 2. PoolParty Drupal Integration 2 Agenda ▸ Introduction Semantic Web Company (SWC) ▸ Introduction PoolParty Semantic Suite ▸ Using PoolParty for Text & Data Mining ▹ Text Mining for continuous knowledge graph modelling ▹ Entity linking and data integration ▹ Classification and semantic annotation / tagging ▸ DEMO(s) of text mining capability of PoolParty ▸ Customer Success Stories ▹ REEEP ClimateTagger ▹ healthdirect Australia ▹ CTCN Semantic Search ▹ EIP Water Matchmaking ▸ Q&A Session
  3. 3. INTRODUCTION Semantic Web Company & PoolParty Semantic Suite 3
  4. 4. INTRODUCING SEMANTIC WEB COMPANY Semantic Web Company (SWC) ▸ Founded in 2004 ▸ Based in Vienna ▸ Privately held ▸ 40+ employees, experts in text mining & linked data ▸ ~15-20% revenue growth / year ▸ 2.5 Mio Euro funding for R&D ▸ SWC named to KMWorld’s 2017 ‘100 Companies That Matter in Knowledge Management’ ▸ Organising SEMANTiCS conference series for 13 years ▸ https://www.semantic-web.com 4
  5. 5. INTRODUCING POOLPARTY PoolParty Semantic Suite ▸ First release in 2009 ▸ Current version 6.0 ▸ W3C standards compliant ▸ Over 200 installations worldwide ▸ 50% of revenue is reinvested into PoolParty development PoolParty on-premises or used as a cloud service ▸ KMWorld listed PoolParty as Trend-Setting Product 2015, 2016 and 2017 ▸ https://www.poolparty.biz/ 5
  6. 6. SELECTED CUSTOMER REFERENCES AND PARTNERS SWC head- quarters 6 Customer References ● Credit Suisse ● Boehringer Ingelheim ● Roche ● adidas ● The Pokémon Company ● Canadian Broadcasting Corporation ● Harvard Business School ● Wolters Kluwer ● Talend ● HealthStream ● TC Media ● Techtarget ● Seek ● Alliander N.V. ● Pearson - Always Learning ● Education Services Australia ● American Physical Society ● Healthdirect Australia ● World Bank Group ● Inter-American Development Bank ● Renewable Energy Partnership ● Wood MacKenzie ● Oxford University Press ● International Atomic Energy Agency ● Norwegian Directorate of Immigration ● Ministry of Finance (AT) ● Council of the E.U. ● Australian National Data Service Partners ● Accenture ● EPAM Systems ● Enterprise Knowledge ● Mekon Intelligent Content Solutions ● B-S-S Business Software Solutions ● MarkLogic ● Wolters Kluwer ● Digirati ● Quark US East US West AUS/ NZL UK
  7. 7. MAKE USE OF POOLPARTY SEMANTIC SUITE OVERVIEW 7
  8. 8. TECHNICAL CORE COMPONENTS 8 Bain Capital is a venture capital company based in Boston, MA. Since inception it has invested in hundreds of companies including AMC Entertainment, Brookstone, and Burger King. The company was co-founded by Mitt Romney. Taxonomy & Ontology Server Entity Extractor & Text Mining Data Integration & Data Linking Unstructured Data Semi- structured Data Structured Data Unified Views PoolParty GraphSearch Identify new candidate concepts to be included in a controlled vocabulary Controlled vocabularies as a basis for highly precise entity extraction Entity Extractor informs all incoming data streams about its semantics and links them Schema mapping based on ontologies RDF Graph Database
  9. 9. PoolParty Semantic Suite System Architecture Overview 9
  10. 10. 360-degree views over various content repositories 10
  11. 11. ‘Elevator Pitch’ ▸ Built as a ‘Semantic Middleware’ ▸ Outstanding user-friendliness ▸ Fully standards-compliant ▸ Highly precise entity extraction ▸ Comprehensive API ▸ Excellent maintainability of extraction models ▸ Integrated with leading search engines & graph databases ▸ Integrated with leading content management platforms ▸ Product configuration options for growing requirements ▸ Highly expertised partners / service team 11
  12. 12. Product Overview All products are available as cloud services or for on-premise installation > PoolParty Feature & Price Matrix 12 PoolParty Basic Server PoolParty Advanced Server PoolParty Enterprise Server PoolParty Semantic Integrator SKOS Taxonomy Management Multiple Projects Taxonomy Rest API Import/Export (incl. Excel) Rollback and History Ontologies and Custom Schemes Quality Management & Reports Advanced Corpus Management Vocabulary Mapping, Linked Data Mapping Linked Data Enrichment, Frontend, and SPARQL endpoint Entity Extractor Extractor API Auto Populate project from DBpedia Export to Remote Repository Workflow Management SKOS-XL (optional) Integration with Graph databases Integration with Search engines Data linking & mapping Data transformation pipelines with UnifiedViews Graph Search Server
  13. 13. HOW DOES THIS WORK Taking a look under the hood 13
  14. 14. BASIC PRINCIPLES Benefiting from the Semantic Web in a Nutshell 14
  15. 15. Four-layered Content Architecture 15
  16. 16. Metadata and semantic data 16 The Peggy Guggenheim Collection is a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.
  17. 17. Metadata and semantic data 17 The Peggy Guggenheim Collection is a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round. Peggy Guggenheim Peggy Guggenheim Collection Venice Canale Grande http://my.com/resource/328832 skos:preLabel http://my.com/docs/45367 skos:preLabel http://my.com/docs/52345 skos:preLabel http://my.com/resource/328832 skos:preLabel
  18. 18. Metadata and semantic data 18 The Peggy Guggenheim Collection is a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round. Peggy Guggenheim Peggy Guggenheim Collection Venice museum Canale Grande skos:preLabel http://my.com/docs/45367 skos:preLabel http://my.com/docs/52345 skos:preLabel skos:preLabel http://my.com/resource/62545 skos:preLabel http://www.mycom.com/ images/90546089 imgae has ladmark named after http://my.com/resource/328832 http://my.com/resource/328832 hosted in hosted in has
  19. 19. Metadata and semantic data 19 The Peggy Guggenheim Collection is a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round. Peggy Guggenheim Collection dct:title Mike Miller Michael Miller skos:prefLabel skos:altLabel dct:creator http://my.com/docs/328832 http://my.com/people/32schema:Article rdf:type http://my.com/img/99.jpg schema:image skos:subject Peggy Guggenheim Collection Venice museum skos:prefLabel skos:subject skos:altLabel skos:broader skos:prefLabel schema:image Canale Grande skos:prefLabel
  20. 20. Resolving Language Problems “While most people can deal with linguistic features as synonyms, homographs, polyhierarchies, and even with far more peculiar characteristics of natural languages, machines often struggle with automatic sense- making because of the lack of a semantic knowledge model that can be used programmatically.”
  21. 21. Knowledge Graph Text Mining for knowledge graph development 21
  22. 22. PoolParty Extractor Uses several components of a knowledge model: ▸ Taxonomies based on the SKOS standard ▸ Ontologies based on RDF Schema or OWL ▸ Word form dictionaries ▸ Blacklists and stop word lists ▸ Disambiguation settings ▸ Domain-specific reference document corpus ▸ Statistical language model 22
  23. 23. PoolParty’s SKOS editor 23 The Audi Q3 is a compact crossover SUV made by Audi. It is based on the PQ35 platform of Volkswagen. A5 platform A series
  24. 24. PoolParty’s ontology and custom schema management 24 Taxonomy Ontology Ontology 1 from library Ontology 2 (imported) Ontology 3 (custom-made) Custom Schema
  25. 25. ‘Setting the rules’ for text mining & entity extraction via thesaurus 25 Proper use of an funduscope requires a bit of practice and familiarity with the functions of your device. Diagnostic Equipment Ophtalmoscope
  26. 26. Disambiguation settings 26
  27. 27. Disambiguation settings 27
  28. 28. Corpus analysis results in a network of concepts and terms 28 I need support to continuously extend our taxonomy / controlled vocabulary! skos: Concept Reference Corpus - Websites - PDF, Word, … - Abstracts from DBpedia - RSS Feeds skos: Concept skos: Concept Term 1 Term 3 Term 7 Term 8 Term 6 Term 4 Term 2 Term 5 - Relevant terms and phrases - Relevancy of concepts - co-occurence between concepts and terms - co-occurence between terms and terms
  29. 29. Semantic Annotation Classification and Semantic Annotation / Tagging 29
  30. 30. Entity Extraction based on Knowledge Graphs 30
  31. 31. PoolParty as a supervised learning system 31 Content Manager Integrator Taxonomist/ Ontologist Thesaurus Server Extractor PowerTagging uses API is user of is user of is basis of is basis of Index annotates enriches Referenc e Corpus CMS extends is basis of analyzes uses API
  32. 32. Data Integration Mapping and Linking of Data 32
  33. 33. PoolParty Semantic Integrator - at a glance https://youtu.be/l_LppfS3wxk 33 Deep Data Analytics Semantic Search Semantic Integrator Unstructured Data Structured Data ETL / Monitoring / Scheduling
  34. 34. PoolParty Semantic Integrator High-level architecture 34
  35. 35. DEMO(s) … lets see how it works in action 35
  36. 36. PoolParty Thesaurus Manager ● SKOS editor ● Ontology and custom scheme manager PoolParty PowerTagging for Drupal (backend) ● Automated Tagging ● Manual Tagging ● Configuration of modules PoolParty GraphSearch for Drupal (frontend) ● Semantic Search ● Explore Trends & Sentiments ● Facets and Similarity 36 DEMOS
  37. 37. Drupal and PoolParty at a Glance 37 PoolParty Drupal Integration Demo: http://drupal.poolparty.biz/
  38. 38. USE CASES Success Stories about Text Mining and Linked Data using PoolParty Semantic Suite 38
  39. 39. Use Cases: Text Mining & Linked Data ▸ Climate Tagger (PDF) Streamline and catalogue data and information resources ▸ healthdirect Australia (PDF) Semantic Search based on the Australian Health Thesaurus ▸ CTCN Semantic Search Integrating thousands of documents from several sources on climate technology ▸European Innovation Partnership /EIP) on Water Online Marketplace including semantic Matchmaking 39
  40. 40. Place your screenshot here 40 Climate Tagger Help organizations in the climate and development arenas catalogue, categorize, contextualize, and connect data and information resources. Climate Tagger is backed by the expansive Climate Compatible Development Thesaurus. http://www.climatetagger.net
  41. 41. How does it work 41
  42. 42. Place your screenshot here 42 EIP Water Matchmaking Controlled vocabularies enable accurate matchmaking between Supply and Demand for Water Innovation in Europe. Matchmaking is based upon the EIP Water Innovation Thesaurus (GEMET based). http://www.eip-water.eu
  43. 43. Place your screenshot here 43 CTCN Semantic Search Help organisations in the climate technology field to explore and find relevant content from thousands of Drupal Nodes and several sources using PoolParty, PowerTagging and s0nr webmining CTCN is backed by the CTCN Climate Technology Thesaurus. https://www.ctc-n.org/semantic-search
  44. 44. Place your screenshot here 44 healthdirect Australia Integrated views and semantic search over more than 100 trusted sources. Harmonization of various metadata systems through the use of a central vocabulary hub: Australian Health Thesaurus. http://www.healthdirect.gov.au
  45. 45. SUMMARY WHY TAXONOMISTS AND INFORMATION ARCHITECTS LIKE POOLPARTY Read more Different project stakeholders expect specific qualities from a semantic technology platform: 45 I am a taxonomist. I need a tool that provides convenient functionalities and intuitive user interfaces for my daily work. I am an information architect. Enterprise metadata management deserves scalable technologies, which provide semantic services on top of rich APIs based on standards.
  46. 46. PoolParty Academy Get certified! 46 https://www.poolparty.biz/academy/
  47. 47. GET STARTED 47 Get your test account at www.poolparty.biz
  48. 48. CONNECT Timea Turdean Technical Consultant, SWC ▸ timea.turdean@semantic-web.com ▸ https://www.linkedin.com/in/timeaturdean/ ▸ https://twitter.com/poolparty_team 48 © Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/ Martin Kaltenböck CFO, Semantic Web Company ▸ m.kaltenboeck@semantic-web.at ▸ https://www.linkedin.com/in/martinkaltenboeck ▸ https://twitter.com/semwebcompany ▸ https://blog.semantic-web.at/

×