Managers often want text analytics automation to be a "push button" process. Humans need to be intimately involved with the process - this presentation provides explanations and examples of why.
2. • GRADUATED US NAVAL
ACADEMY
• SERVED ON USS MIDWAY
DURING DESERT STORM
• STARTED DEVELOPING
SOFTWARE IN 1995
• GRADUATED FROM
GEORGETOWN UNIVERSITY LAW
CENTER IN 1999
• YES, THIS IS A LONG STORY
• SHAREPOINT, SQL SERVER, AND
BI TECHNOLOGY SPECIALIST AT
MICROSOFT FROM 2003 – 2009
• AUTHORED FOUR BOOKS ON
MICROSOFT TECHNOLOGIES
• SENIOR SHAREPOINT
ARCHITECT AT AARP FROM
2012-2015
• CURRENTLY DECIDING WHAT
TO DO NEXT
PHILO JANUS
PHILO JANUS
BSEE, JD
SHAREPOINT ARCHITECT
3. WHY PEOPLE?
• A look at machine learning and what
aspects of text analytics still requires
the human touch
• Lucky for us, and unlucky for the
robots, successfully utilizing text
analytics at your company still
requires human talent, manpower and
the resources of a strong team.
Machines aren’t all-knowing, but
neither are we – the key is striking the
right balance.
AARP the company AARP to mean
4. EXAMPLES
• Movie reviews
• Sentiment analysis
• :-)
• :-(
• “…WHICH IS WHAT I MEANT; - (JOKINGLY, ANYWAY)”
• Email
• Spam
• Sorting
• Fraud detection
• Need human interaction for outliers
5. PROBLEMS
• Parsing can adjusted to include
“stemming”
• treat house and houses as one term
• But: stately is not related to states
• Synonym lists
• treat movie and film as one term
• But: compare the difference in sentiment
between “The end of the world” and “The
ends of the earth”
• Stop lists
• Ignore the, in, and with
• But: what about the band “The The”?
8. WHO DOES WHAT
• What machines can do, what we need to teach them, and
who is best qualified to fill the gap in order to achieve
business goals
• Training
• Human ➔ computer ➔ human ➔ computer
• Humans create the rules to drive
computer analysis
• Computers run the analysis
• Humans review the result sets to tune
the rules
• Computers process the corpus
• Humans review the results
9. SOMETIMES YOU JUST NEED A SANITY
CHECK
BECAUSE OF A COMPUTER ERROR, THE
CATALOGS HAD REACHED THE MEMBERS OF
EZIBA'S MAILING LIST WHO SHOWED THE LOWEST
LIKELIHOOD TO RESPOND TO THE CATALOG.
After Catalog Blunder, Eziba.com Suspends Business
New York Times, January 24, 2005
"Sadly, our probability estimates were correct," Mr.
Sabot said.
10. THE NUTS AND BOLTS
• Discuss the real science of machine learning, see how taxonomy and
machine learning work hand-in-hand, and recognize how tools like
algorithms can achieve greater accuracy and success in text mining
• Tying structured data to insights from mining unstructured data for greater
insight
• Customer comments – scores vs. Comments
• In web purchases, connect comments to purchase history and demographic data
• Housing – price & numerical data vs. Text – features, comments, description
• Financial fraud – amounts, addresses, dates, timeframes vs. Items purchased
• When do you combine data?
• When designing a business intelligence solution (Dashboard, scorecard, etc)
• Use mining unstructured data to better understand structured data
11. TAXONOMY AND MACHINE
LEARNING
• Two directions
• ML to generate a taxonomy
• Using a taxonomy to improve ML
• Be wary of homophones
• Tagging can improve results
12. CASE STUDY – ODINTEXT AND
DISNEY
• Metrics indicated high satisfaction from hispanic visitors
• Mine text on comment cards to verify results
• Goals
• Identify specifics
• Validate comment sentiment against scores
13. FRAMEWORK / HARNESS
• Where does all this stuff go?
• Unstructured content storage
• Structured and semi-structured content
• User assignment
• Analytics hosting
• Documentation
• Output display
17. DECISION MAKING
• The processes of picking the right software,
deciding who should be involved on a
project, selecting metrics for each stage of
analysis and who will oversee them
• Don’t try to solve every problem with one
package
• Rely on trusted advisors
• BUT – be wary of bias
• (If someone tells you their favorite package
can do everything, be very skeptical)
18. THE FUTURE. THE MATRIX?
When, if ever, can we expect less or no need for humans in
text analytics, and will machines ever fully automate the
process? What does that mean for your strategy and your
company’s business goals?