Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Text Analysis with SAP HANA

722 views

Published on

Slide deck of talk at SAP Inside Track Munich 2015 (sitMUC) on Text Analysis with SAP HANA

Published in: Technology
  • Be the first to comment

Text Analysis with SAP HANA

  1. 1. .consulting .solutions .partnership Text Analysis with SAP HANA
  2. 2. Text Analysis with SAP HANA 2Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  3. 3. Text Analysis with SAP HANA 3Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  4. 4. Text Analysis with SAP HANA Why do we need Text Analysis? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 4 • According to Merril Lynch 80-90% of all potentially usable business information may originate in unstructured form (Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.) • The data might origin from:  Social Networks  “Letters” from Customer  ... • What is the problem with unstructured data? • It is unstructured!  Not organized  No pre-defined data model  No metadata or mix of data and metadata  We have a lot of information that is relevant for the business but we cannot access it 
  5. 5. Text Analysis with SAP HANA How can we solve that issue? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 5 • Text Analysis: Extracting high quality information from texts • Typical process of a text analysis:  Parsing of the text  Adding features like linguistic information  Entity recognition: Is it an organization or a person or a place including domain facts like requests?  Sentiment analysis: What attitudinal information is “hidden” in the text?  Insertion of information to database in structured manner
  6. 6. Text Analysis with SAP HANA 6Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  7. 7. Text Analysis with SAP HANA What has this to do with SAP HANA? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 7 © SAP SE
  8. 8. Text Analysis with SAP HANA Fulltext Index - Basics Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 8 • Starting point: database table containing the text (types like TEXT, NVARCHAR, BLOB …) • Create a Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)
  9. 9. Text Analysis with SAP HANA Entity Extraction Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 9 • In order to get valuable information out of the data SAP delivers several configurations • These configurations focus on entity and fact extraction under specific aspects • Types of Extraction:  EXTRACTION_CORE  EXTRACTION_CORE_ENTERPRISE  EXTRACTION_CORE_PUBLIC_SECTOR  EXTRACTION_CORE_VOICEOFCUSTOMER
  10. 10. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 10
  11. 11. Text Analysis with SAP HANA 11Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  12. 12. Text Analysis with SAP HANA Custom Dictionary Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 12 • In several use cases you need to enhance the dictionary due to your business domain • Structure of a dictionary © SAP SE
  13. 13. Text Analysis with HANA – Workflow of Enhancement Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 13 1. Find an extraction configuration that is most fitting for you 2. Copy the configuration into the target folder 3. Create a new custom dictionary 4. Reference the dictionary in your configuration copy 5. Recreate the fulltext index using your custom configuration
  14. 14. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 14
  15. 15. Text Analysis with HANA – What’s next? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 15 • Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities • Good example for this are sports! • We use the example of CrossFit® … as there are some funny facts to extract • Question: How can we extract complex entities from a text? • Examples:  Did somebody attend a CrossFit training?  Does somebody want to join a CrossFit box?
  16. 16. Text Analysis with HANA – Text Analysis Extraction Rules Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 16 • Extraction rules (CGUL rules): pattern-based language for pattern matching using character or token-based regular expressions combined with linguistic attributes to define custom entity types. • Goal of the rule sets:  Extract complex facts based on relations between entities and predicates.  Identify entities in domain-specific language and capture facts expressed in new, popular “slang”
  17. 17. Text Analysis with HANA – Text Analysis Extraction Rules Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 17 Extraction Rule Regular ExpressionsTokens Luck Dictionaries
  18. 18. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 20
  19. 19. Text Analysis with HANA – “Lessons Learned” Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 21 • Text Analysis on SAP HANA is extremely powerful • Besides the delivered content you have a lot of options to adopt the text analysis to extract the entities and facts that you need • This also means you have a lot of options that you can set the wrong way  • Since SP09 rules get compiled upon activation (no separate compilation necessary) • The documentation is mostly ok but has room for improvement in case of extraction rules • Creating custom dictionaries and text rules is cumbersome, finding an error (e. g. a typo) is hell  No support in IDE   You can usually activate all objects, create the index … but the index remains empty 
  20. 20. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 22 Q&A
  21. 21. .consulting .solutions .partnership Dr. Christian Lechner Principal IT Consultant +49 (0) 171 7617190 christian.lechner@msg-systems.com http://scn.sap.com/people/christian.lechner @lechnerc77
  22. 22. Text Analysis with HANA – Ressources Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 24 • SAP HANA Search Developer Guide (Fulltext Index Options) help.sap.com -> Search Developer Guide • SAP HANA Text Analysis Developer Guide: help.sap.com -> TA Developer Guide • SAP HANA Text Analysis Language Reference Guide: help.sap.com -> TA Language Refrence Guide • SAP HANA Text Analysis Extraction Customization Guide: help.sap.com -> TA Extraction Customization Guide • YouTube Playlist of SAP HANA Academy: Text Analysis and Search

×