SlideShare a Scribd company logo
1 of 19
THE HUMAN FACTOR
• GRADUATED US NAVAL
ACADEMY
• SERVED ON USS MIDWAY
DURING DESERT STORM
• STARTED DEVELOPING
SOFTWARE IN 1995
• GRADUATED FROM
GEORGETOWN UNIVERSITY LAW
CENTER IN 1999
• YES, THIS IS A LONG STORY
• SHAREPOINT, SQL SERVER, AND
BI TECHNOLOGY SPECIALIST AT
MICROSOFT FROM 2003 – 2009
• AUTHORED FOUR BOOKS ON
MICROSOFT TECHNOLOGIES
• SENIOR SHAREPOINT
ARCHITECT AT AARP FROM
2012-2015
• CURRENTLY DECIDING WHAT
TO DO NEXT
PHILO JANUS
PHILO JANUS
BSEE, JD
SHAREPOINT ARCHITECT
WHY PEOPLE?
• A look at machine learning and what
aspects of text analytics still requires
the human touch
• Lucky for us, and unlucky for the
robots, successfully utilizing text
analytics at your company still
requires human talent, manpower and
the resources of a strong team.
Machines aren’t all-knowing, but
neither are we – the key is striking the
right balance.
AARP the company AARP to mean
EXAMPLES
• Movie reviews
• Sentiment analysis
• :-)
• :-(
• “…WHICH IS WHAT I MEANT; - (JOKINGLY, ANYWAY)”
• Email
• Spam
• Sorting
• Fraud detection
• Need human interaction for outliers
PROBLEMS
• Parsing can adjusted to include
“stemming”
• treat house and houses as one term
• But: stately is not related to states
• Synonym lists
• treat movie and film as one term
• But: compare the difference in sentiment
between “The end of the world” and “The
ends of the earth”
• Stop lists
• Ignore the, in, and with
• But: what about the band “The The”?
REAL LIFE EXAMPLE
• Search on “band: the the” and get
SOMET
IMES
ENGLIS
H IS
JUST
WEIRD
WHO DOES WHAT
• What machines can do, what we need to teach them, and
who is best qualified to fill the gap in order to achieve
business goals
• Training
• Human ➔ computer ➔ human ➔ computer
• Humans create the rules to drive
computer analysis
• Computers run the analysis
• Humans review the result sets to tune
the rules
• Computers process the corpus
• Humans review the results
SOMETIMES YOU JUST NEED A SANITY
CHECK
BECAUSE OF A COMPUTER ERROR, THE
CATALOGS HAD REACHED THE MEMBERS OF
EZIBA'S MAILING LIST WHO SHOWED THE LOWEST
LIKELIHOOD TO RESPOND TO THE CATALOG.
After Catalog Blunder, Eziba.com Suspends Business
New York Times, January 24, 2005
"Sadly, our probability estimates were correct," Mr.
Sabot said.
THE NUTS AND BOLTS
• Discuss the real science of machine learning, see how taxonomy and
machine learning work hand-in-hand, and recognize how tools like
algorithms can achieve greater accuracy and success in text mining
• Tying structured data to insights from mining unstructured data for greater
insight
• Customer comments – scores vs. Comments
• In web purchases, connect comments to purchase history and demographic data
• Housing – price & numerical data vs. Text – features, comments, description
• Financial fraud – amounts, addresses, dates, timeframes vs. Items purchased
• When do you combine data?
• When designing a business intelligence solution (Dashboard, scorecard, etc)
• Use mining unstructured data to better understand structured data
TAXONOMY AND MACHINE
LEARNING
• Two directions
• ML to generate a taxonomy
• Using a taxonomy to improve ML
• Be wary of homophones
• Tagging can improve results
CASE STUDY – ODINTEXT AND
DISNEY
• Metrics indicated high satisfaction from hispanic visitors
• Mine text on comment cards to verify results
• Goals
• Identify specifics
• Validate comment sentiment against scores
FRAMEWORK / HARNESS
• Where does all this stuff go?
• Unstructured content storage
• Structured and semi-structured content
• User assignment
• Analytics hosting
• Documentation
• Output display
DASHBOARD
SEARCH
USER MANAGEMENT
DECISION MAKING
• The processes of picking the right software,
deciding who should be involved on a
project, selecting metrics for each stage of
analysis and who will oversee them
• Don’t try to solve every problem with one
package
• Rely on trusted advisors
• BUT – be wary of bias
• (If someone tells you their favorite package
can do everything, be very skeptical)
THE FUTURE. THE MATRIX?
When, if ever, can we expect less or no need for humans in
text analytics, and will machines ever fully automate the
process? What does that mean for your strategy and your
company’s business goals?
Text mining   why people need to be part of the process

More Related Content

Similar to Text mining why people need to be part of the process

Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveoralonso
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)Julien SIMON
 
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...AILABS Academy
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causationPeter Varhol
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managersNitin T Bhat
 
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...DataScienceConferenc1
 
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...DataStax
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product ManagerProduct School
 
MLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business PerspectiveMLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business PerspectiveBigML, Inc
 
ASC Marketing Workshop - Mar 2012
ASC Marketing Workshop - Mar 2012ASC Marketing Workshop - Mar 2012
ASC Marketing Workshop - Mar 2012TRG Arts
 
How Machine Learning Will Transform Finance
How Machine Learning Will Transform FinanceHow Machine Learning Will Transform Finance
How Machine Learning Will Transform FinanceRich Clayton
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Information Architecture Explained
Information Architecture ExplainedInformation Architecture Explained
Information Architecture ExplainedLeigh White
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantLynne Thomas
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech TalentRecruitingDaily.com LLC
 

Similar to Text mining why people need to be part of the process (20)

Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)
 
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
 
Machine learning in Banks
Machine learning in BanksMachine learning in Banks
Machine learning in Banks
 
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product Manager
 
MLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business PerspectiveMLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business Perspective
 
ASC Marketing Workshop - Mar 2012
ASC Marketing Workshop - Mar 2012ASC Marketing Workshop - Mar 2012
ASC Marketing Workshop - Mar 2012
 
How Machine Learning Will Transform Finance
How Machine Learning Will Transform FinanceHow Machine Learning Will Transform Finance
How Machine Learning Will Transform Finance
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Ch 3
Ch   3Ch   3
Ch 3
 
Information Architecture Explained
Information Architecture ExplainedInformation Architecture Explained
Information Architecture Explained
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Managing AI Products
Managing AI ProductsManaging AI Products
Managing AI Products
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
 

Recently uploaded

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Recently uploaded (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

Text mining why people need to be part of the process

  • 2. • GRADUATED US NAVAL ACADEMY • SERVED ON USS MIDWAY DURING DESERT STORM • STARTED DEVELOPING SOFTWARE IN 1995 • GRADUATED FROM GEORGETOWN UNIVERSITY LAW CENTER IN 1999 • YES, THIS IS A LONG STORY • SHAREPOINT, SQL SERVER, AND BI TECHNOLOGY SPECIALIST AT MICROSOFT FROM 2003 – 2009 • AUTHORED FOUR BOOKS ON MICROSOFT TECHNOLOGIES • SENIOR SHAREPOINT ARCHITECT AT AARP FROM 2012-2015 • CURRENTLY DECIDING WHAT TO DO NEXT PHILO JANUS PHILO JANUS BSEE, JD SHAREPOINT ARCHITECT
  • 3. WHY PEOPLE? • A look at machine learning and what aspects of text analytics still requires the human touch • Lucky for us, and unlucky for the robots, successfully utilizing text analytics at your company still requires human talent, manpower and the resources of a strong team. Machines aren’t all-knowing, but neither are we – the key is striking the right balance. AARP the company AARP to mean
  • 4. EXAMPLES • Movie reviews • Sentiment analysis • :-) • :-( • “…WHICH IS WHAT I MEANT; - (JOKINGLY, ANYWAY)” • Email • Spam • Sorting • Fraud detection • Need human interaction for outliers
  • 5. PROBLEMS • Parsing can adjusted to include “stemming” • treat house and houses as one term • But: stately is not related to states • Synonym lists • treat movie and film as one term • But: compare the difference in sentiment between “The end of the world” and “The ends of the earth” • Stop lists • Ignore the, in, and with • But: what about the band “The The”?
  • 6. REAL LIFE EXAMPLE • Search on “band: the the” and get
  • 8. WHO DOES WHAT • What machines can do, what we need to teach them, and who is best qualified to fill the gap in order to achieve business goals • Training • Human ➔ computer ➔ human ➔ computer • Humans create the rules to drive computer analysis • Computers run the analysis • Humans review the result sets to tune the rules • Computers process the corpus • Humans review the results
  • 9. SOMETIMES YOU JUST NEED A SANITY CHECK BECAUSE OF A COMPUTER ERROR, THE CATALOGS HAD REACHED THE MEMBERS OF EZIBA'S MAILING LIST WHO SHOWED THE LOWEST LIKELIHOOD TO RESPOND TO THE CATALOG. After Catalog Blunder, Eziba.com Suspends Business New York Times, January 24, 2005 "Sadly, our probability estimates were correct," Mr. Sabot said.
  • 10. THE NUTS AND BOLTS • Discuss the real science of machine learning, see how taxonomy and machine learning work hand-in-hand, and recognize how tools like algorithms can achieve greater accuracy and success in text mining • Tying structured data to insights from mining unstructured data for greater insight • Customer comments – scores vs. Comments • In web purchases, connect comments to purchase history and demographic data • Housing – price & numerical data vs. Text – features, comments, description • Financial fraud – amounts, addresses, dates, timeframes vs. Items purchased • When do you combine data? • When designing a business intelligence solution (Dashboard, scorecard, etc) • Use mining unstructured data to better understand structured data
  • 11. TAXONOMY AND MACHINE LEARNING • Two directions • ML to generate a taxonomy • Using a taxonomy to improve ML • Be wary of homophones • Tagging can improve results
  • 12. CASE STUDY – ODINTEXT AND DISNEY • Metrics indicated high satisfaction from hispanic visitors • Mine text on comment cards to verify results • Goals • Identify specifics • Validate comment sentiment against scores
  • 13. FRAMEWORK / HARNESS • Where does all this stuff go? • Unstructured content storage • Structured and semi-structured content • User assignment • Analytics hosting • Documentation • Output display
  • 17. DECISION MAKING • The processes of picking the right software, deciding who should be involved on a project, selecting metrics for each stage of analysis and who will oversee them • Don’t try to solve every problem with one package • Rely on trusted advisors • BUT – be wary of bias • (If someone tells you their favorite package can do everything, be very skeptical)
  • 18. THE FUTURE. THE MATRIX? When, if ever, can we expect less or no need for humans in text analytics, and will machines ever fully automate the process? What does that mean for your strategy and your company’s business goals?