HANA SPS07 Text Analysis


Published on

What´s New? SAP HANA SPS 07 - Text Analysis

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

HANA SPS07 Text Analysis

  1. 1. What´s New? SAP HANA SPS 07 Text Analysis (Delta from SPS 06 to SPS 07) SAP HANA Product Management November, 2013
  2. 2. Agenda New or Improved Text Analysis Features Custom dictionaries Custom configurations Indexing throughput Improved Language Coverage Social Media extraction for Japanese & Simplified Chinese Numerical extraction for Simplified Chinese Core extraction for Russian Voice of Customer for Simplified Chinese Related Topics Fulltext search Fuzzy search © 2013 SAP AG. All rights reserved. Public 2
  3. 3. New or Improved Text Analysis Features
  4. 4. New Custom Dictionary Support You can now specify your own entity types and names to be used with text analysis, which may be critical for particular industries or data domains  Single custom dictionary may support all languages or a single language  Custom dictionaries reside in the HANA repository and benefit from its life cycle management Steps 1. 2. 3. 4. 5. Choose the project to contain the new dictionary in the Development perspective of SAP HANA Studio. Enter or select a parent folder and enter the dictionary file name in the Wizard. Your text analysis dictionary file is created locally and opens as an empty file in the text editor. Enter your text analysis dictionary specification into the new file and save it locally. Commit your new dictionary. The dictionary is now synchronized to the repository as a design time object and the icon shows the dictionary is committed. Activate once you have finished editing your dictionary. The dictionary is created in the repository as a runtime object and the icon shows the dictionary is activated. This allows you and others to use the dictionary. If you haven’t done so previously, you will need to create a custom text analysis configuration as well… © 2013 SAP AG. All rights reserved. Public 4
  5. 5. New Custom Configuration Support You can now customize the features and options used for text analysis rather than using the predefined configurations:      LINGANALYSIS_BASIC LINGANALYSIS_STEMS LINGANALYSIS_FULL EXTRACTION_CORE EXTRACTION_CORE_VOICEOFCUSTOMER Custom configurations allow you to suppress the default output and incorporate custom dictionaries. You can either:  Create a new XML configuration file within SAP HANA Studio  Copy one of the predefined configurations and modify it © 2013 SAP AG. All rights reserved. Public 5
  6. 6. Greater Indexing Throughput Improved scalability of the highlighted preprocessing steps:  File filtering – converting binary document formats to text/HTML  Tokenization – decompose word sequence, e.g. “the quick brown fox” -> “the” “quick” “brown” “fox”  Stemming – reduction of tokens to linguistic base form, e.g. houses -> house; ran -> run  Linguistic analysis 30% less time Depending upon hardware configuration – part-of-speech identification, e.g. quick: Adjective; houses: Plural Noun Utilizes more threads and efficient data transfers  Applies to all text analysis configurations 50% greater throughput Depending upon hardware configuration © 2013 SAP AG. All rights reserved. Public 6
  7. 7. Improved Language Coverage
  8. 8. Available Text Analysis Configuration Options Language LINGANALYSIS_FULL EXTRACTION_CORE EXTRACTION_CORE_VOICEOFCUSTOMER Arabic LINGANALYSIS_BASIC LINGANALYSIS_STEMS    X Catalan   X X Chinese (Simplified)   IMPROVED IMPROVED Chinese (Traditional)   X X Croatian   X X Czech   X X Danish   X X Dutch    X English     Farsi    X French     German     Greek  X X X Hebrew  X X X Hungarian  X X X Italian    X Japanese   IMPROVED X Korean    X Norwegian (Bokmal)   X X Norwegian (Nynorsk)   X X Polish  X X X Portuguese    X Romanian  X X X Russian   IMPROVED X Serbian   X X Slovak   X X Slovenian   X X Spanish     Swedish   X X Thai  X X X Turkish  X X X © 2013 SAP AG. All rights reserved. Public 8
  9. 9. Improved Social Media Extraction for Japanese & Simplified Chinese Identifies with high recall and precision SOCIAL_MEDIA entities with corresponding offsets     Tags SOCIAL_MEDIA entities such as IDs (@MyTwitterName) or topics (#MyWeiboKeyword) Distinguishes between SOCIAL_MEDIA entities and emoticons like @__@ Distinguishes between SOCIAL_MEDIA entities and emails like myname@domain.com Respects important Weibo and Twitter differences, Ex: #W-TOPIC# vs. #T-TOPIC1 #T-TOPIC2 © 2013 SAP AG. All rights reserved. Public 9
  10. 10. Improved Numerical Extraction for Simplified Chinese Better identifies numerical entities with special characters  CURRENCY – expressions denoting amounts of money – 33.8万元 – 港币五千万 – 一百四十四亿七千万美元  DATE – minimally composed of a number and month name – 7月2日 – 十月十七日  MEASURE – expressions – 二百五十六公斤 – 5.5米  TIME – clock times and time expressions – 8时 – 3点零5分 © 2013 SAP AG. All rights reserved. Public 10
  11. 11. Additional Predefined Core Extractions for Russian TITLE PERSON PEOPLE LANGUAGE President Barak Obama Greeks Greek ADDRESS1 ADDRESS2 LOCALITY REGION@MINOR REGION@MAJOR COUNTRY CONTINENT GEO_FEATURE GEO_AREA 245 First Street Floor 16 Cambridge, MA 02142 Cambridge Napa Country Connecticut Brazil South America Mount Fuji Scandinavia ORGANIZATION@COMMERCIAL ORGANIZATION@EDUCATIONAL ORGANIZATION@OTHER PRODUCT TICKER AT&T University of Washington FBI iPhone NYSE:SAP SOCIAL_MEDIA@TWITTER_ID SOCIAL_MEDIA@TWITTER_TOPIC DATE DAY MONTH YEAR TIME TIME_PERIOD HOLIDAY 2/14/2011 Monday June 2011 3:47pm 3 days, from 9 to 5pm Memorial Day CURRENCY 17 euros MEASURE PERCENT 217 meters 4% PHONE URI@EMAIL URI@IP URI@URL 617-677-2030 john.smith@sap.com http://sap.com Syntactic Entities: NOUN_GROUP PROP_MISC big umbrella Cup o’ Soup @SAP #HANA © 2013 SAP AG. All rights reserved. Public 11
  12. 12. Improved Voice of Customer Extraction for Simplified Chinese The following major fact types are classified:      Sentiments: expression of a customer’s feelings about something Problems: a statement about something which impedes a customer’s work Requests: expression of a customer’s desire for an enhancement/change Profanity: defines a set of pejorative vocabulary Emoticons: expression of someone's feelings about the whole sentence or situation Focuses on finer extraction of online reviews and implementing customer feedback  Dramatic overall improvement in stances and topics  Recall and precision testing results jumped significantly higher © 2013 SAP AG. All rights reserved. Public 12
  13. 13. Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent. © 2013 SAP AG. All rights reserved. Public 13
  14. 14. Thank you Contact information Anthony Waite SAP HANA Product Management AskSAPHANA@sap.com To get the best overview of what’s new in SAP HANA SPS 07, read this blog.
  15. 15. © 2013 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Please see http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices. © 2013 SAP AG. All rights reserved. Public 15
  16. 16. © 2013 SAP AG. Alle Rechte vorbehalten. Weitergabe und Vervielfältigung dieser Publikation oder von Teilen daraus sind, zu welchem Zweck und in welcher Form auch immer, ohne die ausdrückliche schriftliche Genehmigung durch SAP AG nicht gestattet. In dieser Publikation enthaltene Informationen können ohne vorherige Ankündigung geändert werden. Einige der von der SAP AG und ihren Distributoren vermarkteten Softwareprodukte enthalten proprietäre Softwarekomponenten anderer Softwareanbieter. Produkte können länderspezifische Unterschiede aufweisen. Die vorliegenden Unterlagen werden von der SAP AG und ihren Konzernunternehmen („SAP-Konzern“) bereitgestellt und dienen ausschließlich zu Informationszwecken. Der SAP-Konzern übernimmt keinerlei Haftung oder Gewährleistung für Fehler oder Unvollständigkeiten in dieser Publikation. Der SAP-Konzern steht lediglich für Produkte und Dienstleistungen nach der Maßgabe ein, die in der Vereinbarung über die jeweiligen Produkte und Dienstleistungen ausdrücklich geregelt ist. Keine der hierin enthaltenen Informationen ist als zusätzliche Garantie zu interpretieren. SAP und andere in diesem Dokument erwähnte Produkte und Dienstleistungen von SAP sowie die dazugehörigen Logos sind Marken oder eingetragene Marken der SAP AG in Deutschland und verschiedenen anderen Ländern weltweit. Weitere Hinweise und Informationen zum Markenrecht finden Sie unter http://www.sap.com/corporateen/legal/copyright/index.epx#trademark. © 2013 SAP AG. All rights reserved. Public 16