Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
NXTM Item
• ID
• Type (DOC, ENT)
• Attribute []
• …
•
Attribute
• Predicate
• Value (NXTM_Item_ID; String)
• Provenance (N...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Enterprise Application Development Group
University of Applied Sciences Zittau/Görlitz
The NXTM Project
Development of a t...
Upcoming SlideShare
Loading in …5
×

Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Unstructured Documents into Structured Information in the Enterprise Context

417 views

Published on

http://2016.semantics.cc/adam-bartusiak

Published in: Technology
  • Be the first to comment

Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Unstructured Documents into Structured Information in the Enterprise Context

  1. 1. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 SEMANTiCS’16 - 13.09.2016 Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz Semantic Processing for the Conversion of Unstructured Documents into Structured Information in the Enterprise Context The NXTM research project
  2. 2. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Agenda • Motivation • The NXTM Project • Data analysis • Search Engine • Representation Layer • Use case Adam Bartusiak M.Sc. : The NXTM research project 2/10
  3. 3. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Motivation • unstructured data overload (80-90% of digital data) • unstructured data is rather intended for human consumption only • it holds useful knowledge that can be utilized for: • trend analytics • decision support • problem solving • discovering new facts and relations • it can improve knowledge management within enterprise • it helps SMEs gaining a sustainable competitive advantage on the market Adam Bartusiak M.Sc. : The NXTM research project 3/10
  4. 4. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 The NXTM Project • cooperation project between HSZG and an IT company from Dresden • lifetime: January 2015 - October 2016 Adam Bartusiak M.Sc. : The NXTM research project 4/10 Goal: Improving SMEs’ processes for extracting valuable business information from UD
  5. 5. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 The NXTM Project • cooperation project between HSZG and an IT company from Dresden • lifetime: January 2015 - October 2016 Goal: Adam Bartusiak M.Sc. : The NXTM research project 4/10 Improving SMEs’ processes for extracting valuable business information from UD • extraction of structured data from unstructured data from multiple resources: • emails and text messages • MS Office and PDF documents • XML and HTML files
  6. 6. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 The NXTM Project • cooperation project between HSZG and an IT company from Dresden • lifetime: January 2015 - October 2016 Adam Bartusiak M.Sc. : The NXTM research project 4/10 Goal: Improving SMEs’ processes for extracting valuable business information from UD • extraction of structured data from unstructured data from multiple resources: • emails and text messages • MS Office and PDF documents • XML and HTML files • dynamic recognition and representation of linked information in documents
  7. 7. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 The NXTM Project • cooperation project between HSZG and an IT company from Dresden • lifetime: January 2015 - October 2016 Adam Bartusiak M.Sc. : The NXTM research project 4/10 Goal: Improving SMEs’ processes for extracting valuable business information from UD • extraction of structured data from unstructured data from multiple resources: • emails and text messages • MS Office and PDF documents • XML and HTML files • flexible and intuitive graphical user interface enabling easy access to the analyzed data • dynamic recognition and representation of linked information in documents
  8. 8. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Data analysis Data Input Interface 1. import of documents as JAVA objects from the input pipeline Adam Bartusiak M.Sc. : The NXTM research project 5/10
  9. 9. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Data analysis Data Input Interface NXTM Data and Text Analysis Engine Metadata Analysis Text Extraction Segmentation Morphology Semantic Analysis Similarity Analysis 1. import of documents as JAVA objects from the input pipeline 2. language identification, MIME-Type and metadata analysis Adam Bartusiak M.Sc. : The NXTM research project 5/10
  10. 10. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Data analysis Data Input Interface NXTM Data and Text Analysis Engine Metadata Analysis Text Extraction Segmentation Morphology Semantic Analysis Similarity Analysis 1. import of documents as JAVA objects from the input pipeline 2. language identification, MIME-Type and metadata analysis 3. NL processing in chained analysis engines and annotating semantic information Adam Bartusiak M.Sc. : The NXTM research project 5/10
  11. 11. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Data analysis Data Input Interface NXTM Data and Text Analysis Engine Metadata Analysis Text Extraction Segmentation Morphology Semantic Analysis Similarity Analysis 1. import of documents as JAVA objects from the input pipeline 2. language identification, MIME-Type and metadata analysis 3. NL processing in chained analysis engines and annotating semantic information 4. similarity calculation and document clustering Adam Bartusiak M.Sc. : The NXTM research project 5/10
  12. 12. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Data analysis Data Input Interface Data Persistence Layer NXTM Data and Text Analysis Engine Metadata Analysis Text Extraction Segmentation Morphology Semantic Analysis Similarity Analysis DB Mapper Clustering Engine 1. import of documents as JAVA objects from the input pipeline 2. language identification, MIME-Type and metadata analysis 3. NL processing in chained analysis engines and annotating semantic information 4. similarity calculation and document clustering 5. storing extracted data in DB, updating search index Adam Bartusiak M.Sc. : The NXTM research project 5/10
  13. 13. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Data analysis Data Input Interface Data Persistence Layer NXTM Data and Text Analysis Engine Metadata Analysis Text Extraction Segmentation Morphology Semantic Analysis Similarity Analysis Linked Open Data Knowledge Integrator DB Mapper Clustering Engine 1. import of documents as JAVA objects from the input pipeline 2. language identification, MIME-Type and metadata analysis 3. NL processing in chained analysis engines and annotating semantic information 4. similarity calculation and document clustering 5. storing extracted data in DB, updating search index 6. mapping annotated entities with LOD resources Adam Bartusiak M.Sc. : The NXTM research project 5/10
  14. 14. NXTM Item • ID • Type (DOC, ENT) • Attribute [] • … • Attribute • Predicate • Value (NXTM_Item_ID; String) • Provenance (NXTM_Item_ID) • Confidence • Access policy Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Data analysis 1. import of documents as JAVA objects from the input pipeline 2. language identification, MIME-Type and metadata analysis 3. NL processing in chained analysis engines and annotating semantic information 4. similarity calculation and document clustering 5. storing extracted data in DB, updating search index 6. mapping annotated entities with LOD resources Adam Bartusiak M.Sc. : The NXTM research project 5/10
  15. 15. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Search Engine Data Presistence Layer Search query… Semantic Search Machine NXTM Search Layer Field Value ID NXTM_Item_ID Content LuceneAnalyzer Semantic SIREnAnalyzer • direct queries to a DB for retrieving the analysed data is an inefficient way of searching information • a semantic search machine can effectively search for hierarchical data • search engine is still subject of research: • • • Clustering Engine Results… Adam Bartusiak M.Sc. : The NXTM research project 6/10
  16. 16. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Representation Layer • search results are represented as an interactive graph with nodes and edges • real time browsing of the graph enables the user to discover other relevant sources of information and their dependencies • d3js.org java-script library Standalone Frontend Plugins & Apps NXTM Representation Layer Document Abstract Lorem ipsum dolor sit amet, consetetur s a d i p s c i n g e l i t r, sediam nonumy eirmod temport… Updated: 03.01.2003 Entity Type: Person Name: John Smith Author of: XYZ Title: XYZ Adam Bartusiak M.Sc. : The NXTM research project 7/10
  17. 17. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Use case Lorem ipsum dolor sit amet, consetetur NY elitr, sed diam nonumy eirmod tempor invidunt ut labore et NY dolore magna aliquyam erat, NY sed diam voluptua. At vero eos et accusam et justo duo dolores NY et ea rebum. Stet #1 ipsum dolor sit amet. Lorem NY ipsum dolor sit a m e t , c o n s e t e t u r sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore e t d o l o r e m a g n a aliquyam erat, sed diam voluptua. At vero eos et #2 Entity • ID #301 • type PLACE • name NY (#1) • name NY (#2) Metadata • createdIn NY NXTM System NXTM Item • ID #1 • Type DOC • Attribute [] (Metadata) NXTM Item • ID #2 • Type DOC • Attribute [] (Metadata) NXTM Item • ID #301 • Type ENT • Attribute [] (Metadata) NXTM DB Adam Bartusiak M.Sc. : The NXTM research project 8/10
  18. 18. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Use case cont. Query: New York NXTM Results ResultItem • NXTM_ITEM_ID #1 • Score • Attribute [] ResultItem • NXTM_ITEM_ID #2 • Score • Attribute [] Result Item • NXTM_ITEM_ID #301 • Score • Attribute [] Result Triples Source; Target; Distance ResultItem#1; ResultItem#2; DOC-DOC ResultItem#1; ResultItem#3; DOC-ENT ResultItem#2; ResultItem#3; DOC-ENT ENT #301 DOC #1 DOC #2 • DOC-DOC -> f(TF*IDF Similarity, Lucene score) • DOC-ENT -> f(Confidence score, Lucene score) Adam Bartusiak … Person DOC#45 Metadata Keywords Adam Bartusiak M.Sc. : The NXTM research project 9/10
  19. 19. Enterprise Application Development Group University of Applied Sciences Zittau/Görlitz The NXTM Project Development of a technology for live analysis of data streams with regard to semantics and cross-linked data structures Adam Bartusiak M.Sc. University of Applied Sciences Zittau/Görlitz January 7, 2015 Questions Partners/Cooperations a.bartusiak@hszg.de | ead.hszg.de

×