scopeKM
Knowledge Management
Text analysis with Triples –
Get added value
Enabling Insights that are critical to the compe...
scopeKM
Knowledge Management
Text analysis with Triples -2-
Understanding the meaning of naturally spoken
and written text...
scopeKM
Knowledge Management
Text analysis with Triples -3-
BBC coverage of the 2012 Olympics is well-
known example based...
scopeKM
Knowledge Management
Text analysis with Triples -4-
How do you crate
triples?
Unstructured content represents in
a...
Upcoming SlideShare
Loading in...5
×

scopeKM: Text analysis with Triples

2,649

Published on

Unstructured content represents more than 80% of information assets of an enterprise. A triple is a way of encoding information about objects and enables computer to access, mesh, and take action on information. Triples make claims about objects and may be published in knowledge bases accessed by parties that have no particular knowledge of each other. Based an award-winning natural language processing pipeline that analyzes the content, Luxid® extracts information about of the organization’s entities of interests and their relationships to derive precise and relevant triples in 20 languages.

Published in: Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,649
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

scopeKM: Text analysis with Triples

  1. 1. scopeKM Knowledge Management Text analysis with Triples – Get added value Enabling Insights that are critical to the competitiveness
  2. 2. scopeKM Knowledge Management Text analysis with Triples -2- Understanding the meaning of naturally spoken and written text It’s a potential treasure trove of business information – but are they exploiting? Luxid® extracts business information from the unstructured content, structures it into triples and feeds them into the MarkLogic triple store, enabling to query, visualize and analyze them for insights that are critical to the competitiveness. What’s a triple? A triple is a way of encoding information about objects. It is a key part of RDF (Resource Description Framework), a W3C standard that enables computer to access, mesh, and take action on information that is distributed across the Web. RDF triples take the shape of statements that link a subject to an object via a predicate. In each of the following three triples, the predicate linking both objects has been italicized. Leonardo_DiCaprio stars_in Titanic James_Cameron directed Titanic Titanic lounched_in 1997 Triples typically make claims about world objects (also called resources or entities) – in the above example, actors, directors, movies – and may be published in knowledge bases accessed by parties that have no particular knowledge of each other. To ensure robust operation, RDF triples therefore unambiguously identify each of the entities they refer to with a Unique Resource Identifier (URI), and predicates by reference to a vocabulary (or ontology) published alongside the knowledge base. How can triples be used? Triples can be queried, navigated visualized and analyzed in the context of any task that has at its core the exploitation of knowledge, whether proprietary to proprietary to the organization, or available from a third party. Recurring use cases that leverage triples include: Linked Open Data DBPedia is an open data initiative that involves the public sharing of knowledge housed in a query-able triple store. Similar query-able information repositories include Geonames (geographical features), data.gov (US federal, state, and local data) as well as legislation.gov.uk (UK statutory law). In the Life Sciences, UniProt and DrugBank are similar initiatives that offer information about proteins and drugs. Commercial Information Products Triple stores can likewise be exploited for commercial publications. In portals, they enable new added-value information and analytics features alongside more traditional content-driven offerings. The
  3. 3. scopeKM Knowledge Management Text analysis with Triples -3- BBC coverage of the 2012 Olympics is well- known example based on this approach to report key information about countries, teams, players, and disciplines. Knowledge bases that rely on query-able triple stores are also growing product category. They enable the seamless integration of structured information into end-user workflow applications and analytics tools. Enterprise Linked Data Triple stores may also house proprietary information about any entity present in an organization’s world view: other Organizations (suppliers, competitors, employees, notable individuals), Products (Parts, Accessories, Options), Objects of research (molecules, diseases, investments), etc. Here again, such information can then be queried, explored, visualized or analyzed to answer questions such as the following: € What business relationships exist between a potential partner and my competitors? € How do our clinical results compare to publicly available information about side effects caused by molecules with comparable modes of action? € Which experts are mostly closely involved in our area of investigation yet most remote from our teams? Provided relevant information is also available from third-party triple stores (commercial or open), it can be conjointly analyzed with proprietary triples, enabling insights that would not be available otherwise. Semantically enriched triple store
  4. 4. scopeKM Knowledge Management Text analysis with Triples -4- How do you crate triples? Unstructured content represents in average more than 80% of organization’s information assets, a potential treasure trove of business insights. Thanks Luxid®, a complementary application MarkLogic that is extracted via Web Services, a company can now extract business information from it and feed it as triples into the triple store. Based an award-winning natural language processing pipeline that analyzes the content, Luxid® extracts information about of the organization’s entities of interests and their relationships to derive precise and relevant triples in 20 languages. Aligned with the taxonomy or ontology, these triples then become natively accessible to any application leveraging the MarkLogic triple store, in particular for querying, visualization and analytics purpose. Platform overview and key components € Robust and scalable platform based on UIMA/XML architecture € Extracts RDF triples based on entities, relationships, sentiments, topics or terminology mentioned in text € Categorizes documents and performs corpus clustering € Extraction engines based on syntax, statistics, taxono- my, machine learning and domain-specific rules € 20 language supported € Each Skill Cartridge focuses on recurring areas of interests: people and location names, information about companies and their relationships, cate- gorization of news, biology, economy, security, etc. € The Studio enables the customization of existing Skill Cartridges as well as the creation of new ones € Import of the taxonomy/thesaurus into Skill Cartidge € Exploit the morpho-syntactic reasoning, statistical models, machine learning and/or domain-specific rules € Measure, track and optimize extraction rules

×