Alyona Medelyan (Pingar)                      @zelandiya    THE NEXT-GENERATION             SHAREPOINT:POWERED BY TEXT ANA...
AGENDA• Information tasks• Text analytics• APIs• Demos• Conclusions
Information tasksWhat do they cost us?How does SharePoint help?
Avg. hours per week14.5       13.3                                               = $37K       year / person              9...
SHAREPOINT SAVES TIME Interact with SP from Outlook       Create docs collaboratively                   Customize searc...
Text AnalyticsWhat is it and how does it work?What tasks does it solve?
WHAT IS TEXT ANALYTICS?                unstructured dataLinguistics                                  Search   Statistics  ...
TEXT ANALYTICS SAVES MORE TIME    Compose search reports        Extract entities                                        ...
Text Analytics SoftwareWhat companies offer text analytics?What are open source tools like?
TEXT ANALYTICS: GLOBAL PERSPECTIVEUser adoption has grown by 25% in 2010 creating an $835 million market because:• Unstruc...
APPLICATIONS OF TEXT ANALYTICS            Search & info access                                    39%Customer experience m...
SEARCH & INFO ACCESS METADATA EXTRACTIONDocument                  Easy to extract:                Metadata               ...
SEARCH & INFO ACCESSKEYWORD EXTRACTIONDocument     Candidates                                         Keywords           H...
SEARCH & INFO ACCESSKEYWORD EXTRACTIONDocument     Candidates                                         Keywords           H...
SEARCH & INFO ACCESS    KEYWORD EXTRACTION    Document     Candidates       Properties                        Keywords    ...
SEARCH & INFO ACCESS KEYWORD EXTRACTIONDocument      Candidates       Properties         Scoring        Keywords          ...
SEARCH & INFO ACCESSNAMES EXTRACTIONDocument      Examples       Properties       Learning        Names           If you h...
<SEARCH + TEXT ANALYTICS> COMPANIES Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
BRAND & CUSTOMER MANAGEMENT   SENTIMENT ANALYSIS ReviewsDocumentDocument                                                 ...
BRAND & CUSTOMER MANAGEMENT   SENTIMENT ANALYSIS ReviewsDocumentDocument                                                 ...
SENTIMENT ANALYSIS COMPANIESAttensityAlchemyAPILexalyticsSaploMedalliaSAS
RESEARCH    TEXT SUMMARIZATION          Address      Hi All,    Announcement       As of today, MetaStock has several new...
TEXT SUMMARIZATION COMPANIESLexalytics, Pingar
COMPETITIVE INTELLIGENCE:ENTITY & ENTITY RELATION EXTRACTION     Companies:     OpenCalais, Extractiv, Pingar, Evri, Alche...
FRAUD INVESTIGATION:NORMALIZATION OF DATES & NAMES           Companies:           Cicero, BasisTech
OPEN-SOURCE TOOLS• NLTK – Apache license, Book, Python & academic  datasets, nltk.org• LingPipe – Commercial  licenses, Tu...
APIsWhat’s an API and how does it work?What are the advantages of the API model?Which API is the right one for you?
API ACCESS                                     a protocol specifies how • SOAP                                     XML nee...
REST API ACCESS FROM A BROWSERAPI requesthttp://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=m...
SOAP API ACCESS FROM VS2010
SOAP API ACCESS IN POWERSHELLRead complete blog post “Bulk metadata extraction in SharePoint”:http://bit.ly/powershell-mig...
API = EASY INTEGRATION & FLEXIBILITY• Integrate into existing architecture  via any programming language• Improve known fl...
WHICH API IS BEST FOR YOU?         I need to take some text and get a list of the         important entities/keywords/phra...
HOW TO CHOOSE AN API:• Define a specific task• Think of what features are important• Get prepared:  • Subscribe for API ke...
METADATA EXTRACTIONIN SHAREPOINTDemoPingar’s add-on for SharePoint 2010built using a text analytics API
INTEGRATING APISINTO SCANNINGVideoUsing Fuji Xerox SmartConnect and Pingar APIto scan documents in batch into SharePoint  ...
THE NEXT-GENERATION SHAREPOINT:POWERED BY TEXT ANALYTICS• What can be automated?  • Metadata extraction, Data entry, Opini...
Thank you to all of our Sponsors
The Next Generation SharePoint: Powered by Text Analytics
Upcoming SlideShare
Loading in...5
×

The Next Generation SharePoint: Powered by Text Analytics

2,218

Published on

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,218
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
48
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Opening slide please include
  • How many hours per week does an average person that uses a computer spends on Searching?What the heck is text analytics, a 101 introduction course…How API work and why they are great for both business people and developers.
  • What are your primary applications where text comes into play?
  • The Next Generation SharePoint: Powered by Text Analytics

    1. 1. Alyona Medelyan (Pingar) @zelandiya THE NEXT-GENERATION SHAREPOINT:POWERED BY TEXT ANALYTICS
    2. 2. AGENDA• Information tasks• Text analytics• APIs• Demos• Conclusions
    3. 3. Information tasksWhat do they cost us?How does SharePoint help?
    4. 4. Avg. hours per week14.5 13.3 = $37K year / person 9.6 9.5 8.8 8.3 6.8 6.7 5.6 5.6 4.3 4.2 1 Source: IDC, Hidden Cost of Information (2005)
    5. 5. SHAREPOINT SAVES TIME Interact with SP from Outlook  Create docs collaboratively  Customize search configuration  Use sites, sets & libraries  Define Managed Metadata  Configure forms  Design Workflow
    6. 6. Text AnalyticsWhat is it and how does it work?What tasks does it solve?
    7. 7. WHAT IS TEXT ANALYTICS? unstructured dataLinguistics Search Statistics Data Extraction Text Processing Document OrganizationMachine Learning Business IntelligenceNatural Language Processing Opinion Mining Text Mining
    8. 8. TEXT ANALYTICS SAVES MORE TIME  Compose search reports  Extract entities … automatically  Mine opinions & sentiment  Cluster search results  Redact  Summarize  Generate metadata  Fill databases  Profanity check
    9. 9. Text Analytics SoftwareWhat companies offer text analytics?What are open source tools like?
    10. 10. TEXT ANALYTICS: GLOBAL PERSPECTIVEUser adoption has grown by 25% in 2010 creating an $835 million market because:• Unstructured data grows (ex. social)  Text analytics!• Text analytics is central to effective information access• Many successes in NLP: IBM Watson, Wolfram Alpha Full report by Seth Grimes: http://altaplana.com/TA2011
    11. 11. APPLICATIONS OF TEXT ANALYTICS Search & info access 39%Customer experience management 39% Brand management 39% Research 36% Competitive intelligence 33% Customer service 26% E-discovery 15% Life sciences 15% Product design 15% Online commerce 11% Finance 10% Other 9% Content management 8% Insurance & fraud 8% Millitary intelligence 7% Law enforcement 6% Source: http://altaplana.com/TA2011
    12. 12. SEARCH & INFO ACCESS METADATA EXTRACTIONDocument Easy to extract: Metadata File type, name & location, creation & modification date, authors Difficult to extract: Keywords, people & companies mentioned, suppliers & addresses mentioned
    13. 13. SEARCH & INFO ACCESSKEYWORD EXTRACTIONDocument Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
    14. 14. SEARCH & INFO ACCESSKEYWORD EXTRACTIONDocument Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
    15. 15. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Properties Keywords Hi All, As of today, MetaStock has several new functions. Frequency The most important new feature is the ability to Position display forward heat rate charts.Corpus stats Also, notice that the interface looks different -- thisRelatedness reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
    16. 16. SEARCH & INFO ACCESS KEYWORD EXTRACTIONDocument Candidates Properties Scoring Keywords Hi All, As of today, MetaStock has several new functions.Heuristic The most important new feature is the ability to scoring display forward heat rate charts. Also, notice that the interface looks different -- thisMachine reflects and accommodates the new features.learning If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
    17. 17. SEARCH & INFO ACCESSNAMES EXTRACTIONDocument Examples Properties Learning Names If you have any questions regarding this new version of MetaStock, please contact Bella Santuri. NLP, Training data Machine Heuristics, (annotations) Learning Text mining
    18. 18. <SEARCH + TEXT ANALYTICS> COMPANIES Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
    19. 19. BRAND & CUSTOMER MANAGEMENT  SENTIMENT ANALYSIS ReviewsDocumentDocument Visualization Tweets Sentiment Analysis Summary SurveysNaïve approach: Sentiment-words dictionary!Negative Positive BUT: suck fantastic If you are reading this because it terrible excellent is your darling fragrance, please awful awesome wear it at home exclusively, and tape the windows shut. No sentiment words!
    20. 20. BRAND & CUSTOMER MANAGEMENT  SENTIMENT ANALYSIS ReviewsDocumentDocument Visualization Tweets Examples Properties Learning Summary Surveys Presence PositionTraining data Lexicon Machine Part-of-Speech(annotations) induction Learning Negation Generalization Important: Identifying sentiment bearing sentences Attaching sentiment to a topic!
    21. 21. SENTIMENT ANALYSIS COMPANIESAttensityAlchemyAPILexalyticsSaploMedalliaSAS
    22. 22. RESEARCH  TEXT SUMMARIZATION Address Hi All, Announcement As of today, MetaStock has several new functions. Details The most important new feature is the ability to display forward heat rate charts. More details Also, notice that the interface looks different -- this reflects and accommodates the new features. Conclusion If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.Extractive summary: As of today, MetaStock has several new functions.Sentence compression: MetaStock has several new functions. The new interface looks different.Abstractive summary: MetaStock has new features and a new interface.
    23. 23. TEXT SUMMARIZATION COMPANIESLexalytics, Pingar
    24. 24. COMPETITIVE INTELLIGENCE:ENTITY & ENTITY RELATION EXTRACTION Companies: OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
    25. 25. FRAUD INVESTIGATION:NORMALIZATION OF DATES & NAMES Companies: Cicero, BasisTech
    26. 26. OPEN-SOURCE TOOLS• NLTK – Apache license, Book, Python & academic datasets, nltk.org• LingPipe – Commercial licenses, Tutorials, Coreference & Chinese segment, alias-i.com/lingpipe• OpenNLP – Apache license, Parsing, MaxEnt ML, incubator.apache.org/opennlp• GATE – restricted GPL, Training courses, Applications & framework, gate.ac.uk• Stanford NLP – full GPL, Online docs, Full library, nlp.stanford.edu
    27. 27. APIsWhat’s an API and how does it work?What are the advantages of the API model?Which API is the right one for you?
    28. 28. API ACCESS a protocol specifies how • SOAP XML needs to be encoded • REST a call is an XML message describing the request includes API authentication calls via a web service API ENGINE SDK usage examplesDeveloper creates An interface that Software engine an application ensures communication solves a specific task
    29. 29. REST API ACCESS FROM A BROWSERAPI requesthttp://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=madonna&context=Italian+sculptors+and+painters+of+the+renaissance+favored+the+Virgin+Mary+for+inspirationAPI response
    30. 30. SOAP API ACCESS FROM VS2010
    31. 31. SOAP API ACCESS IN POWERSHELLRead complete blog post “Bulk metadata extraction in SharePoint”:http://bit.ly/powershell-migrate
    32. 32. API = EASY INTEGRATION & FLEXIBILITY• Integrate into existing architecture via any programming language• Improve known flaws in the current system/process• Minimize adoption barriers within the company no or little training required for stuff• Only pay for the features you need• Flexible deployment: • Host API on site = Secure data exchange • Access the API in the cloud = Save on tech support & hardware
    33. 33. WHICH API IS BEST FOR YOU? I need to take some text and get a list of the important entities/keywords/phrases. Y: Term Extractor API restrictions OpenCalais Supported languages BeliefNetworks Quality of results OpenAmplify Semantic links AlchemyAPI 2nd Synonyms/Duplicates Evri 1st Blog post on API comparison: faganm.com/blog
    34. 34. HOW TO CHOOSE AN API:• Define a specific task• Think of what features are important• Get prepared: • Subscribe for API keys • Get SDKs • Learn libraries• Find representative data• Build a test framework• Compare results
    35. 35. METADATA EXTRACTIONIN SHAREPOINTDemoPingar’s add-on for SharePoint 2010built using a text analytics API
    36. 36. INTEGRATING APISINTO SCANNINGVideoUsing Fuji Xerox SmartConnect and Pingar APIto scan documents in batch into SharePoint http://www.youtube.com/watch?v=kluVp25upag
    37. 37. THE NEXT-GENERATION SHAREPOINT:POWERED BY TEXT ANALYTICS• What can be automated? • Metadata extraction, Data entry, Opinion mining, Sanitization, Doc approval, Summarization, …• How to integrate text analytics into existing SharePoint applications? • Easy! Via an API• How to find the right text analytics API? • Review what’s available Set up an experiment Compare results
    38. 38. Thank you to all of our Sponsors
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×