Using New Technologies to Make Sense of Content Chaos: Text mining and visualization


Published on

KM Chicago (December, 2005)

Published in: Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Bill gates made the claim this year that the problem is not too much information. But not enough of the right information. Internet growth of 150% from 2000-2005 900 million worldwide users Fastest growth: Middle East, Latin America, Africa, Asia Korean blogs; Chinese local news Everyone’s a publisher ~20 million blogs (85,000 new per week) Your clients, your employees, your competitors
  • How many of you know Gary Price’s List of Lists. That site has been a source for years for this type of information because there’s no easy way to find it otherwise.
  • 1) Microsoft has offered domestic partner benefits to same-sex couples since the early 1990s. It also has long barred discrimination based on sexual orientation. 2) Gay rights bill -- Rev. Ken Hutcherson, Redmond pastor, met with Microsoft in February, threatening to launch a national boycott of Microsoft products if the company didn't take a stand against the bill. 3) MSFT decided not to publicly support the bill 4) Apr 22 memo to employees to try to minimize anger Microsoft reversed a decision and returned to support of a Washington state anti-discrimination legislation.
  • Maytag has been one of the most well-respected brand names in Washers and Dryers in the US for many years, representing reliability and quality. The advertising campaign, used for 25 years, showed a lonely Maytag repair man sitting around the shop with nothing to do because he never got any calls. Talking points: In 1997 a new Maytag’s washer Neptune, started off strong. In 1999 Maytag’s shareprice was at USD $63.25 In 2000, messages started to surface on that spoke of the quality problems owners were facing. By 2002 the Neptune it has lost it’s luster. More and more postings began to appear on the internet message boards. In late 2004, the story was picked up by the mainstream press. A local CBS affliate in New York, (huge reach) aired a story about the quality issues (mold in the door seal). In the interview company executives claimed that they had provided a fix to consumers and there was no issue to resolve. However, in the same interview Neptune consumers stated that the fix supplied by the company did not correct the issue. More chatter on the blogs about the quality problems. This ultimately cost the company $35M in a class action lawsuit. In early Summer 2005, published reports state that the company is fighting an uphill battle to stay in business and that Samsung washers and dryers have now replaced Maytag in more than 20% of the retailers By late July 2005, the company is in the process of being acquired and the shareprice is down $15.79 and drops nearly $2 on news of acquisition talks.
  • Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

    1. 1. Using New Technologies to Make Sense of Content Chaos: Text mining and visualization Glenn Fannick Product Development Manager 12 December 2005
    2. 2. <ul><li>No longer “information overload” … </li></ul><ul><li>… we’re awash in “content chaos”. </li></ul>
    3. 3. How difficult is it to find…? <ul><li>thought leaders in an industry </li></ul><ul><li>newly hired CEOs who’ve commented on wifi </li></ul><ul><li>which of your products are written about most often </li></ul><ul><li>most mentioned people near Oracle </li></ul><ul><li>most prolific journalists in an industry </li></ul><ul><li>how much of your press coverage is negative </li></ul>
    4. 4. Three Causes of Chaos <ul><li>Blogs Mean Everyone’s a Publisher </li></ul><ul><li>‘Markets are Conversations’ </li></ul><ul><li>More dynamic news cycles </li></ul>
    5. 5. Everyone’s a Publisher <ul><li>Feb: Decided to break with long-time public support for anti-discrimination legislation. </li></ul><ul><li>Apr 21: Local press coverage spurred Microsoft employee bloggers to speak out. </li></ul><ul><li>May 6: Steve Ballmer reverses Microsoft’s stance. </li></ul>Cause #1
    6. 6. Markets are Conversations <ul><li>Savvy consumers are not trusting of corporate marketing. </li></ul><ul><li>On the Web, people tell each other their opinions about products and companies. </li></ul><ul><li>The most reliable information comes from peers. </li></ul><ul><li>Companies must participate in the conversation or risk irrelevance. </li></ul>Cause #2
    7. 7. maytag 2004 2005 2006 2007 2008 2003 2002 2001 2000 2001 2002 2000 1999 1998 1997 2003 2004 2005 May October Cause #2
    8. 8. Shrinking News Cycle <ul><li>Newspapers continue to wane in influence </li></ul><ul><ul><li>Radio long-ago filled the role of the evening newspaper. </li></ul></ul><ul><ul><li>Web now fills the role of the morning newspaper. </li></ul></ul><ul><ul><li>Pushing newspapers into the analysis role formerly filled by the newsweeklies. </li></ul></ul><ul><li>News is reported 24 / 7 </li></ul><ul><ul><li>Web editions </li></ul></ul><ul><ul><li>Citizen journalists </li></ul></ul>Cause #3
    9. 9. Managing the Chaos <ul><li>People need answers, not documents </li></ul><ul><li>Trends must be discovered early </li></ul><ul><li>Going beyond search </li></ul>
    10. 10. People need answers, not documents <ul><li>Articles 1-100 of about 2,343,000 </li></ul><ul><li>Spend more time analyzing, less time looking </li></ul><ul><li>We must continue to push technology toward a point where it can provide us facts and answers , not headlines and links . </li></ul>Act Decide Analyze Search/Gather Identify Act Decide Analyze Find/Discover Identify Now Goal
    11. 11. Trends must be discovered early <ul><li>Identify the waves before they break on shore. </li></ul>Principle #3
    12. 12. Using technology to power serendipity Facts gleaned from across an entire day’s news can visually summarize an industry. Extracted entities, phrases and events can direct users to the top newsmakers of the day. Hurricane Rita Goldman Sachs Florida Keys John Roberts Oil Prices
    13. 13. How To Get There
    14. 14. How To Get There: Text Mining <ul><li>Phase 1 | Classification / Taxonomy </li></ul><ul><ul><li>Metadata tags what an article is about </li></ul></ul><ul><li>Phase 2 | Entity Extraction </li></ul><ul><ul><li>Extracting the billions of facts and entities stored in millions of documents </li></ul></ul><ul><li>Phase 3 | Ontological Search </li></ul><ul><ul><li>searching for concepts </li></ul></ul>
    15. 15. <ul><li>text mining – n., a process of extracting information from unstructured text, drawing on practices from information retrieval, data mining, machine learning, computational linguistics and statistics. </li></ul>
    16. 16. 1. Document Classification Unstructured Text Company Codes Industries Regions Subjects FII Technology Editorial Experts Metadata
    17. 17. 2. Entity Extraction Unstructured Text People Products companies events authors Metadata Document level Sentence level Metadata Technology Editorial Experts Company Codes Industries Regions Subjects FII Entities Companies People Brands Relationships Events Authors
    18. 18. Extracting More Value from Documents Article receives company code for: T-Mobile USA But there are other companies involved And captures news subjects and industry. And people and authors And brands and products And quotations And regions
    19. 19. Today’s Search
    20. 20. Ontological Search Articles containing executive appointments List of people and companies found in relationship to executive appointments
    21. 21. Re-Engineering Search Results Concept Screen Related companies and subjects provide: filtering, navigation and discovery. Previous dates can be navigated. Publications can act as filters. People and phrases can be discovered. Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct David Sifry [45] Robert Scoble [30] Sergey Brin [23] John Battelle [19] Mena Trott [1] People
    22. 22. Factiva Insight: Reputation Intelligence
    23. 23. Questions ? Glenn Fannick Product Development Manager +1.609.627.2602 [email_address]