A Primer on Text Mining for Business


Published on

Slides of the course on big data by C. Levallois from EMLYON Business School.
For business students. Check the online video connected with these slides.

-> Definition of text mining, the main categories of tools available (such as topic categorization or sentiment analysis) and their use for business.

Published in: Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A Primer on Text Mining for Business

  1. 1. MK99 – Big Data 1 Big data & cross-platform analytics MOOC lectures Pr. Clement Levallois
  2. 2. MK99 – Big Data 2 A primer on text mining for business • Text mining: computational methods to find interesting information in texts • Quasi synonyms: – natural language processing (abbreviated in NLP) – computational linguistics (name of a scientific discipline)
  3. 3. MK99 – Big Data 3 Text… what kinds? • Books • Tweets • Product reviews on Amazon • LinkedIn profiles • The whole Wikipedia • Free text answers in the results of a survey • Tenders, contracts, laws, … • Print and online media • Archival material • …
  4. 4. MK99 – Big Data 4 What can be done? • Sentiment analysis – Is this piece of text of a positive or negative tone? • Topic modeling / topic detection – What is the main theme of this 20-page booklet? • Semantic disambiguation – “Paris” is mentioned in this text. Is this Paris Hilton or Paris, France? • Named Entity Recognition (NER) – Automatically find the individuals, organizations and events named in the text, and the relations between them. • Semantic enrichment – If you searched Google for “TV”, results for “television” will also show up • Language detection – “Ich spreche Deutsch” -> this sentence is written in German • Automatic Translation – See Google Translate •Summarizing –Shortening a text while keeping its core message intact •Spelling correction –Well, that’s easy •Topic Classification –Is this email a spam or not?
  5. 5. MK99 – Big Data 5 Amaze me! • Demo on sentiment analysis With a tool by Stanford: http://nlp.stanford.edu:8080/sentiment/rntnDemo.html • Demo on semantic disambiguation With a tool by a collaborative effort: http://dbpedia-spotlight.github.io/demo/ (click on “annotate”, and also change the text for one of your own)
  6. 6. MK99 – Big Data 6 What can’t be done yet (but is actively researched) • Detection of irony • Robust translation • Reasoning beyond Q&A What makes things harder • Non English texts • Slang and colloquial speech-forms • Real time processing
  7. 7. MK99 – Big Data 7 Example of routine operations when working with text (or, how to follow the most basic conversation in comput. linguistics) • Stemming – “liked” and “like” will be reduced to their stem “lik” to facilitate further operations • Lemmatizing – Grouping “liked”, “like” and “likes” to count them as one basic semantic unit • Part-of-Speech tagging (aka POS tagging) – Automatically detecting the grammatical function of the terms used in a sentence, to facilitate translation or else • “Starting the text analysis with a bag-of-words model” – Operation which consists in just listing and counting all different words in the text. • N-grams – The text “I am Dutch” is made of 3 words: I, am, Dutch. But it can also be interesting to look at bigrams in the text: “I am”, “am Dutch”. Or trigrams: “I am Dutch”. – When neighboring words are considered together just like we did, they are called n-grams. This can reveal interesting things about frequent expressions used in the text. – A good example of how useful this can be: visit the Ngram Viewer by Google: https://books.google.com/ngrams
  8. 8. MK99 – Big Data 8 Chief benefit: Getting to know individuals better • Without text mining, we have access to “external”, “cold” states of the individual – Behavior (eg, clicks), external attributes (address, gender, encyclopedia entry), social networks (but relatively cold ones.) • With text mining, we have access to “internal”, “hot” states: - opinions - intentions - preferences - degree of consensus - social networks (who mentions whom: how, in which context) - implicit attributes of the speaker
  9. 9. MK99 – Big Data 9 How easy is it? • Too easy… the limit is legal and ethical, not technical “Predicting the Political Alignment of Twitter Users” by Conover et al. (2011). http://cnets.indiana.edu/wp-content/uploads/conover_prediction_socialcom_pdfexpress_ok_version.pdf “Political Tendency Identification in Twitter using Sentiment Analysis Techniques” by Pla and Hurtado (2014). http://anthology.aclweb.org/C/C14/C14-1019.pdf “Private traits and attributes are predictable from digital records of human behavior” by Kosinski et al. (2013). http://www.pnas.org/content/110/15/5802.abstract (and this gets even more powerful when mixing text mining, network analysis and machine learning)
  10. 10. MK99 – Big Data 10 What use for text mining in a business context? 1. Client facing 2. Business management 3. Business development
  11. 11. MK99 – Big Data 11 1. Market facing activities • Refined scoring: propensity scores (including churn), scoring of prospects •Refined individualization of campaigns –ads, email campaigns, coupons, etc. •Better community management –Getting a clear and precise picture of how customers and prospects perceive, talk about, and engage with your brand / product / industry.
  12. 12. MK99 – Big Data 12 2. Business Management • Organizational mapping – Getting a view of the organization through text flows. – Example: getting a view on the activity of a business school through a map of its scientific publications. • HRM – Finding talents in niche industries, based on the mining of their profiles • Marketing research – refined segmentation + targeting + positioning, measuring customer satisfaction, perceptual mapping.
  13. 13. MK99 – Big Data 13 3. Business development • Developing adjunct services – product recommendation systems (eg, Amazon’s) – detection and matching of needs (eg, detection of complaints / mood changes) – product enhancements (eg, content enrichment through localization/personalization) • Developing new products entirely, based on – different search engines – alert systems / automated systems based on monitoring textual input – knowledge databases – new forms of content curation / high value info creation + delivery
  14. 14. MK99 – Big Data 14 Interesting players through their “Data Services” package + many APIs listed on www.programmableweb.com
  15. 15. MK99 – Big Data 15 This slide presentation is part of a course offered by EMLYON Business School (www.em-lyon.com) Contact Clement Levallois (levallois [at] em-lyon.com) for more information.