SlideShare a Scribd company logo
1 of 15
Master DMKM Presentation




    Entity Aspect Analysis
        By: Ahmed Kamel
Supervision: Ingmar Weber, Yahoo! Labs Barcelona
             Marta Arias, Universitat Politècnica de Catalunya
  Location: Yahoo! Labs Barcelona
The Web
Opinion Summarization
Entity                Freq      +Freq     -Freq       +Score           -Score        Score
Lionel_Messi          378,076   283,450 94,626        89,386.5         -29,449.3     59,937.2
Cristiano_Ronaldo     312,338   228,480 83,858        72,342.7         -27,883.2     44,459.5




Entity   EFreq         Aspect    EAFreq    +EAFreq   -EAFreq +Score         -Score     Score
France   11,697,238    economy 2,633       1,452     1,181     469.2        -390.6     78.5
Spain    6,602,450     economy 1,561       620       941       211.7        -312.2     -100.3
Architecture
Text Extraction

                                             Boilerpipe




                                                                     Stanford CoreNLP

Barack Hussein Obama was sworn in as the 44th president of the United States on Jan. 20, 2009.

The son of a black man from Kenya and a white woman from Kansas, he is the first African-American to ascend
to the highest office in the land.

He defeated Hillary Rodham Clinton in a lengthy and bitter primary battle before defeating Senator John McCain
, the Arizona Republican, in November 2008.
…
Entity Recognition

Barack Hussein Obama was sworn in as the 44th president of the United States on Jan. 20, 2009.
…



                                                Entity Recognition
                                                (Wikification)


 Barack Hussein Obama was sworn in as the 44th president of the United States on Jan. 20, 2009.
 Barack_Obama||0.9727||Barack||0.9868||Barack Hussein Obama||0.9907
 President_of_the_United_States||0.9707||president of the United States||0.9918
 …
Aspect Extraction

Barack Hussein Obama was sworn in as the 44th president of the United States on Jan. 20, 2009.
Barack_Obama||0.9727||Barack||0.9868||Barack Hussein Obama||0.9907
President_of_the_United_States||0.9707||president of the United States||0.9918
…


                                                   PoS tagging aspect extraction

Barack/NNP Hussein/NNP Obama/NNP was/VBD sworn/VBN in/IN as/IN the/DT 44th/JJ
president/NN of/IN the/DT United/NNP States/NNPS on/IN Jan./NNP 20/CD ,/, 2009/CD ./.
…




Barack_Obama                                  President_of_the_United_States
Barack Hussein Obama                          Barack Hussein Obama
president                                     president
United States                                 United States
Jan                                           Jan
44th president                                44th president
Sentiment Analysis

The iPhone is in general very good, however, its battery life is very bad
…



                           distance=10
                                                  SentiStrength
             distance=3




The iPhone is in general very good[2][+1 booster word],however ,its battery life is very bad[-2][-1
booster word][sentence: 3,-3] [result: max + and - of any sentence]
very good||3||3
very bad||-3||10
Score = 3/3 + -3/10
Our work is
• Doing the previous for
  – Over 2 billion english pages
  – Wikipedia entities (over 3.5 million entities)
• Mostly using
  – Hadoop
  – Pig
Experiments
• Lack of ground truth
• Correlations to real-world factors
• Three experiments
  – Countries
  – Countries’ economy
  – Grammy award winners
Countries
                                •Travel




                                                Costa Rica positive aspects




Top 10 positively mentioned



                                •Axis of Evil
                                •BBC Poll
                                                Iran negative aspects




 Top 10 negatively mentioned                     Israel negative aspects
Countries’ economy
• Correlation between sentiment scores and
  countries’ nominal GDP
• Normalized scores vs. non-normalized scores
Grammy Award Winners


Correlations with Grammy   Inequality of scores
Conclusion
• Analysis
   – Methodology for correlating sentiments with other real-
     world factors
   – Experiments
• Pipeline
   – Big data
   – Can be an online in-production system
• Future work
   – Restricting the analysis to a subset of the Web, e.g., blogs
   – Sentiment scoring scheme (taking the volume problem
     into account)
Thanks
     Merci
Gràcies – Gracias
     Danke
  Teşekkürler

More Related Content

Similar to Entity Aspect Analysis

Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with DataRitvvij Parrikh
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebMatthew Russell
 
TBEX North America 2017, Rally those SuperHeroes and make positive change, Ji...
TBEX North America 2017, Rally those SuperHeroes and make positive change, Ji...TBEX North America 2017, Rally those SuperHeroes and make positive change, Ji...
TBEX North America 2017, Rally those SuperHeroes and make positive change, Ji...tbexcon
 
The Business of APIs 2009 - Active Network
The Business of APIs 2009 - Active NetworkThe Business of APIs 2009 - Active Network
The Business of APIs 2009 - Active NetworkMashery
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversUNCResearchHub
 
Presentation on-google
Presentation on-googlePresentation on-google
Presentation on-googleGurjit
 

Similar to Entity Aspect Analysis (10)

Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social Web
 
TBEX North America 2017, Rally those SuperHeroes and make positive change, Ji...
TBEX North America 2017, Rally those SuperHeroes and make positive change, Ji...TBEX North America 2017, Rally those SuperHeroes and make positive change, Ji...
TBEX North America 2017, Rally those SuperHeroes and make positive change, Ji...
 
Rethinking Innovation
Rethinking InnovationRethinking Innovation
Rethinking Innovation
 
The Business of APIs 2009 - Active Network
The Business of APIs 2009 - Active NetworkThe Business of APIs 2009 - Active Network
The Business of APIs 2009 - Active Network
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
 
Presentation on-google
Presentation on-googlePresentation on-google
Presentation on-google
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Entity Aspect Analysis

  • 1. Master DMKM Presentation Entity Aspect Analysis By: Ahmed Kamel Supervision: Ingmar Weber, Yahoo! Labs Barcelona Marta Arias, Universitat Politècnica de Catalunya Location: Yahoo! Labs Barcelona
  • 3. Opinion Summarization Entity Freq +Freq -Freq +Score -Score Score Lionel_Messi 378,076 283,450 94,626 89,386.5 -29,449.3 59,937.2 Cristiano_Ronaldo 312,338 228,480 83,858 72,342.7 -27,883.2 44,459.5 Entity EFreq Aspect EAFreq +EAFreq -EAFreq +Score -Score Score France 11,697,238 economy 2,633 1,452 1,181 469.2 -390.6 78.5 Spain 6,602,450 economy 1,561 620 941 211.7 -312.2 -100.3
  • 5. Text Extraction Boilerpipe Stanford CoreNLP Barack Hussein Obama was sworn in as the 44th president of the United States on Jan. 20, 2009. The son of a black man from Kenya and a white woman from Kansas, he is the first African-American to ascend to the highest office in the land. He defeated Hillary Rodham Clinton in a lengthy and bitter primary battle before defeating Senator John McCain , the Arizona Republican, in November 2008. …
  • 6. Entity Recognition Barack Hussein Obama was sworn in as the 44th president of the United States on Jan. 20, 2009. … Entity Recognition (Wikification) Barack Hussein Obama was sworn in as the 44th president of the United States on Jan. 20, 2009. Barack_Obama||0.9727||Barack||0.9868||Barack Hussein Obama||0.9907 President_of_the_United_States||0.9707||president of the United States||0.9918 …
  • 7. Aspect Extraction Barack Hussein Obama was sworn in as the 44th president of the United States on Jan. 20, 2009. Barack_Obama||0.9727||Barack||0.9868||Barack Hussein Obama||0.9907 President_of_the_United_States||0.9707||president of the United States||0.9918 … PoS tagging aspect extraction Barack/NNP Hussein/NNP Obama/NNP was/VBD sworn/VBN in/IN as/IN the/DT 44th/JJ president/NN of/IN the/DT United/NNP States/NNPS on/IN Jan./NNP 20/CD ,/, 2009/CD ./. … Barack_Obama President_of_the_United_States Barack Hussein Obama Barack Hussein Obama president president United States United States Jan Jan 44th president 44th president
  • 8. Sentiment Analysis The iPhone is in general very good, however, its battery life is very bad … distance=10 SentiStrength distance=3 The iPhone is in general very good[2][+1 booster word],however ,its battery life is very bad[-2][-1 booster word][sentence: 3,-3] [result: max + and - of any sentence] very good||3||3 very bad||-3||10 Score = 3/3 + -3/10
  • 9. Our work is • Doing the previous for – Over 2 billion english pages – Wikipedia entities (over 3.5 million entities) • Mostly using – Hadoop – Pig
  • 10. Experiments • Lack of ground truth • Correlations to real-world factors • Three experiments – Countries – Countries’ economy – Grammy award winners
  • 11. Countries •Travel Costa Rica positive aspects Top 10 positively mentioned •Axis of Evil •BBC Poll Iran negative aspects Top 10 negatively mentioned Israel negative aspects
  • 12. Countries’ economy • Correlation between sentiment scores and countries’ nominal GDP • Normalized scores vs. non-normalized scores
  • 13. Grammy Award Winners Correlations with Grammy Inequality of scores
  • 14. Conclusion • Analysis – Methodology for correlating sentiments with other real- world factors – Experiments • Pipeline – Big data – Can be an online in-production system • Future work – Restricting the analysis to a subset of the Web, e.g., blogs – Sentiment scoring scheme (taking the volume problem into account)
  • 15. Thanks Merci Gràcies – Gracias Danke Teşekkürler

Editor's Notes

  1. Explosive growthUser Generated Content (UGC)Question-answering databasesDigital videoBloggingSocial networksWikisSelf expression and opinionated contentWeb of Concepts – or entitiesGoogle 2008 Over one trillion unique URLsIndexed web at least 8.47 billion pages
  2. Opinion summaries allow for discovering all kinds of fun factsMessi vs. RonaldoFrance’s economy vs. Spain’s economyIt also allows for something that’s more interesting. That is, further studies between sentiments as discovered on the Web and other real-world factors
  3. We build a system thatIs simple yet effective approach capable of handling sentiments from all over the WebGenerates opinion summary for entitiesGenerates opinion summary for entities’ aspectsThe system we are building here “allows for interesting types of analysis“
  4. The Web is mostly in HTML. We need to be able to get the text out of itBoilerpipe is a machine learnt classifier that uses shallow text features – word counts – to extract text from htmlStanford CoreNLP allows for sentence splitting on common sentence ends like full stops, question and exclamation marks
  5. In house propreitory tool that uses machine learning to learn a model that’s able to infer the topics of a given textWikipedia entities, allow for rich information about entities
  6. An aspect is a predefined sequence of postagsWe use two main patters; nouns and adjectives nouns
  7. Ranking countries by sentimentsMost frequent sentimental aspectsNormalized vs. non-normalized scoresRANKING
  8. RANKINGS AND CORRELATIONS FOR RANKINGS
  9. Are sentiments associated with Grammy Award winners different from those associated with other musicians?Statistical tests1. Correlations with Grammy2.Inequality of scores3.Positive score to predict a Grammy winner.Receiver Operating Characteristic (ROC) not shown
  10. Analysis ExperimentsCountries: are really different in the sense that we picked up a good signal whether we normalize or notGDP: we unfortunately didn’t get the expected results where frequency tended to top the sentiments. Maybe it’s not the right criteria to compare against. Maybe unemployment rate or maybe the volume problem is just inherently thereGrammy: it worked – though with not strong correlation – when restricting frequencies and normalizing.Sentiments vs. volumeBig Dataif something can go wrong it will definitely go wrongWe had to choose simple effective approaches that can scale easilyOnline in production systemI imagine it running in parallel with the web crawlers, doing its analysis and updating the summariesThe methods chosen as well allow for continous updates, generating the summaries doesn’t require the presence of the whole set of webpages at onceINTERNSHIP STILL GOING ON