The growth of social media has populated the Web with valuable user generated content that can be exploited for many different and interesting purposes, such as, explaining or predicting real world outcomes through opinion mining. In this context, natural language
processing techniques are a key technology for analysing user generated content. Such content is characterised by its casual language, with short texts, misspellings, and set-phrases, among other characteristics that challenge content analysis. This paper shows the differences of the language used in heterogeneous social media sources, by analysing the distribution of the part-of-speech categories extracted from the analysis of the morphology of a sample of texts published in such sources. In addition, we evaluate the performance of three natural language processing techniques (i.e., language identification, sentiment analysis, and topic identification) showing the differences
on accuracy when applying such techniques to different types of user generated content.
El documento describe las diferencias entre la investigación cualitativa y cuantitativa. La investigación cualitativa se centra en comprender fenómenos de manera holística y detallada a través de métodos como la observación, mientras que la investigación cuantitativa utiliza métodos como encuestas y muestras representativas para medir y analizar datos numéricos de manera objetiva. También presenta diferentes técnicas e instrumentos para cada enfoque.
Este documento presenta estadísticas sobre educación, salud y género en Perú en el cuarto trimestre de 2015. Las tasas de asistencia a educación inicial fueron iguales para niñas y niños (74%), aunque las niñas tuvieron mayores tasas en áreas rurales. Las tasas de asistencia a educación secundaria fueron mayores para adolescentes mujeres que hombres. Las mujeres también reportaron mayores problemas de salud crónicos que los hombres, especialmente las adultas mayores. El documento proporciona detalles adicional
The Next Web 2009 Highlights - Ruigrok | NetPanelMarja Ruigrok
This document summarizes the key findings of a 2009 report on online behaviors in the Netherlands. It identifies different types of online users and finds that 70% of Dutch people have profiles on online networks. Half of Dutch people have access to mobile internet, though only 40% use it. The document also examines online identity, usability issues, and e-commerce habits. It concludes by noting the full report will provide more detailed analyses and be available later in April 2009.
Neil Walker is the Group CTO for Getupdated Internet Marketing and CTO for Domain Invest (US). He has over 10 years of online marketing experience, including SEO work for over 2,500 UK clients and 5,000 pan-European clients. The document provides tips for analyzing competitors, including using tools to track backlinks, keyword rankings, social mentions, and competitor changes over time to inform strategic decisions.
- The author received feedback from a variety of people on websites, social media, and other media about their LigatureSymbols font project. This included bug reports, feature requests, and praise.
- Responding to feedback quickly allowed the author to fix issues, get more suggestions, and release improved versions. It also led to the font being shared more widely on sites like Facebook and Twitter.
- The author aims to get more feedback to continue creating an even better font project going forward.
1) Ohio University librarians conducted two surveys of students to investigate their technology use patterns and perceptions of emerging library technologies.
2) The surveys found that students spend 11 or more hours online per week on average and nearly all own a computer and cell phone.
3) Younger students tended to use more social media and communication technologies while older students were more likely to use podcasts, wikis, and Flickr.
The document discusses online communications strategies for colleges and universities. It provides additional resources on networking and social media best practices. It also outlines some of the major changes in communications over the past 20 years, including more voices, opinions, and channels to manage. Today, institutions have lost control of their message as individuals can communicate rapidly over electronic and social media channels. It emphasizes the importance of an integrated online communications strategy that considers how everything is now connected.
This document summarizes digital media usage across Asia based on research conducted by Michael Netzley. It begins by noting the diversity within Asia and issues with viewing it through a Western lens. It then provides statistics on internet penetration rates in various Asian countries, showing China and South Korea as leaders. National social networks, search engines, and communication tools are also described as varying by country. Survey results from Singapore are presented showing differences in online behaviors by age. Reasons for going online and issues like internet blocking are also briefly discussed.
El documento describe las diferencias entre la investigación cualitativa y cuantitativa. La investigación cualitativa se centra en comprender fenómenos de manera holística y detallada a través de métodos como la observación, mientras que la investigación cuantitativa utiliza métodos como encuestas y muestras representativas para medir y analizar datos numéricos de manera objetiva. También presenta diferentes técnicas e instrumentos para cada enfoque.
Este documento presenta estadísticas sobre educación, salud y género en Perú en el cuarto trimestre de 2015. Las tasas de asistencia a educación inicial fueron iguales para niñas y niños (74%), aunque las niñas tuvieron mayores tasas en áreas rurales. Las tasas de asistencia a educación secundaria fueron mayores para adolescentes mujeres que hombres. Las mujeres también reportaron mayores problemas de salud crónicos que los hombres, especialmente las adultas mayores. El documento proporciona detalles adicional
The Next Web 2009 Highlights - Ruigrok | NetPanelMarja Ruigrok
This document summarizes the key findings of a 2009 report on online behaviors in the Netherlands. It identifies different types of online users and finds that 70% of Dutch people have profiles on online networks. Half of Dutch people have access to mobile internet, though only 40% use it. The document also examines online identity, usability issues, and e-commerce habits. It concludes by noting the full report will provide more detailed analyses and be available later in April 2009.
Neil Walker is the Group CTO for Getupdated Internet Marketing and CTO for Domain Invest (US). He has over 10 years of online marketing experience, including SEO work for over 2,500 UK clients and 5,000 pan-European clients. The document provides tips for analyzing competitors, including using tools to track backlinks, keyword rankings, social mentions, and competitor changes over time to inform strategic decisions.
- The author received feedback from a variety of people on websites, social media, and other media about their LigatureSymbols font project. This included bug reports, feature requests, and praise.
- Responding to feedback quickly allowed the author to fix issues, get more suggestions, and release improved versions. It also led to the font being shared more widely on sites like Facebook and Twitter.
- The author aims to get more feedback to continue creating an even better font project going forward.
1) Ohio University librarians conducted two surveys of students to investigate their technology use patterns and perceptions of emerging library technologies.
2) The surveys found that students spend 11 or more hours online per week on average and nearly all own a computer and cell phone.
3) Younger students tended to use more social media and communication technologies while older students were more likely to use podcasts, wikis, and Flickr.
The document discusses online communications strategies for colleges and universities. It provides additional resources on networking and social media best practices. It also outlines some of the major changes in communications over the past 20 years, including more voices, opinions, and channels to manage. Today, institutions have lost control of their message as individuals can communicate rapidly over electronic and social media channels. It emphasizes the importance of an integrated online communications strategy that considers how everything is now connected.
This document summarizes digital media usage across Asia based on research conducted by Michael Netzley. It begins by noting the diversity within Asia and issues with viewing it through a Western lens. It then provides statistics on internet penetration rates in various Asian countries, showing China and South Korea as leaders. National social networks, search engines, and communication tools are also described as varying by country. Survey results from Singapore are presented showing differences in online behaviors by age. Reasons for going online and issues like internet blocking are also briefly discussed.
The document outlines a social media strategy for Smokey Bear, a nonprofit focused on wildfire prevention. It begins by discussing common challenges nonprofits face with social media and the need for a clear strategy. It then provides templates and considerations for key elements of a strategy, including goals and objectives, audience segmentation, competitive analysis, content planning, tactics by channel, tools, and metrics. The overall purpose is to provide guidance to nonprofits on developing an effective yet organized social media presence.
Event:
Digital Curation Institute Symposium
November 22, 2011
4:30-6:30pm
iSchool, University Of Toronto
Abstract:
This presentation reports select findings from two descriptive studies of blogs and bloggers in the areas of history, economics, law, biology, chemistry and physics. The first study focused on scholar bloggersʼ preferences for digital preservation, as well as their publishing behaviors and blog characteristics that influence preservation action. Findings are drawn from 153 questionnaires, 24 interviews, and content analysis of 93 blogs. Briefly, questionnaire respondents are generally interested in blog preservation with a strong sense of personal responsibility. Most feel their blogs should be preserved for both personal and public access and use into the indefinite, rather than short-term, future. Over half of questionnaire respondents report saving their blog content, in whole or in part, and many interviewees expressed a sophisticated understanding of issues of digital preservation. However, the findings also indicate that bloggers exhibit behaviors and preferences complicating preservation action, including issues related to rights and use, co-producer dependencies, and content integrity.
The second study, currently on-going, looks toward the public availability of scholar blogs over-time, with findings drawn from a sample of 644 blogs. Content analysis is currently underway on inactive blogs, characterized as available, but with no new posts published within three months of coding. Initial analysis of the most recent post published to these inactive blogs shows that some bloggers did provide indicators of their respective blogʼs declining activity or, in some cases, blog stoppage. However, such indicators are only present in a clear minority of publicly available, yet inactive blogs. These preliminary findings offer implications for both personal and programmatic preservation approaches, including, notably, issues related to selection and appraisal.
Lee Rainie, Director of the Pew Research Center’s Internet & American Life Project, will share findings from a new report on e-book lending at libraries. He will also discuss other research about the rise of e-books, their impact on people’s reading habits, and the way that library patrons are hoping to avail themselves of e-book borrowing. Finally, he will explore general reading trends and describe the next steps in the Project’s ongoing research about the evolving role of libraries.
Successful social CRM needs a solid basis. Few companies have succeeded in creating the connection between social media activities of their consumers and the other consumer information. Bisnode Inetract's social value solutions shows the truly consumer centric approach to the social CRM.
This document provides insights from Experian Simmons' 2010 Social Networking Report. It finds that 66% of online Americans now use social networking sites, up significantly from 20% in 2007. Users are increasingly addicted to these sites, with 43% visiting multiple times per day. Additionally, 70% of social networkers now keep in touch with family on these sites. The report provides details on user demographics, behaviors, and brand preferences across major social networking sites.
The document summarizes key findings from recent Pew Internet Project reports about changing digital media behaviors and their impact. It discusses how:
1) Only 4% of Facebook users actually derive pleasure from using the site, with most feeling despair upon logging in.
2) Mobile internet access is widespread, with 89% of adults owning phones and 46% owning smartphones.
3) Social media engagement is common, with 59% of adults using sites like Facebook and 16% using Twitter.
4) These trends are changing how knowledge is accessed, shared and influenced as information becomes more pervasive, participatory and networked through various online platforms.
Seattle Interactive Conference - Social and SeachMicrosoft
The document discusses research into how people find and use opinions from various sources when making different types of decisions. It finds that people rely on a diverse set of sources that varies depending on the type of decision. For everyday purchases, people most often consider reviews on retailer sites and online review sites, while for important decisions like choosing a doctor they place more weight on opinions from people they know personally. The factors that influence which opinions people find most helpful can differ depending on whether the decision is related to day-to-day tasks, commercial services, or entertainment.
Increasing Social Media ROI Using Gladwell's Tipping Point FrameworkColleen Carrington
Inspired by some of the brightest thought-leaders in social media, this deck explores how to increase social media ROI using Gladwell's tipping point framework: the right people, a sticky idea, the right context. It is designed for on-line viewing without having to be presented in person. Enjoy!
Social media gets a lot of buzz these days. And as the number of social platforms and tools continues to grow, so does the perplexity.
Real results from social media require leadership commitment, investment of resources, and an integrated communications strategy. In this fast-paced workshop, you’ll learn from actual case studies of nonprofits that have successfully harnessed the power social media and how to measure your organization's return on investment (ROI).
Key Takeaways:
- Trends and opportunities in online behavior
- How nonprofits can leverage mobile and geolocation for cause awareness
- Best practices in social sharing and Search Engine Optimization (SEO)
- Key metrics, tools, and tips to measure and track your social media success
- How to develop a strategic framework for listening, planning, and implementing a social media strategy
ABOUT THE PRESENTER:
Rosie Branstetter, Principal, fiveseed
Rosie has led innovative strategic communications initiatives across numerous industries for more than 10 years. In 2009, she founded fiveseed, a strategic communications agency with global reach built on the philosophy of creating positive change for clients and our community.
Her background includes tenure as a consultant with an advertising agency specializing in higher education, where she was responsible for account management, marketing strategy, brand development and positioning, and market research for top institutions across the U.S.
Today Rosie develops and manages integrated marketing campaigns for forward‐thinking companies, nonprofits, and government agencies. And as a recognized expert in strategic marketing, she facilitates workshops and is a frequent guest speaker on topics of branding,
social media, and international marketing.
Rosie serves as a board member with the Colorado Chapter of the American Marketing Association, Rotary Club of Five Points Cultural District, and the Denver Young Non‐profit Professionals Network; and is actively involved in the Business
Marketing Association, Frontier Asset Building, T4T Colorado, and Denver Public Schools (Goodwill Industries).
1. The evolution of media and how people communicate is changing how brands interact with consumers. Consumers are more informed before purchasing and are comfortable sharing opinions online, both positive and negative.
2. Top Italian brands were analyzed based on their social media presence and online conversations. While having social profiles doesn't necessarily impact a brand, the topics discussed can make a difference. Video sharing is popular for sharing ads and runway shows.
3. To benefit from social media, brands need to listen to online conversations, understand their audiences, and produce engaging content at a steady pace. The goal is to build trust so marketing messages are better received rather than seen as annoying advertisements.
Lee Rainie will describe the latest findings of the Pew Internet Project about libraries and the new mix of services they are offering their patrons – and considering offering.
Converseon is a leading provider of social media monitoring and conversation mining services. They presented on measuring the return on investment of social media campaigns. They discussed who participates in online discussions, approaches to social media listening, how listening can inform objectives at different stages of a campaign, and use cases for social media listening across marketing, customer service, and other business functions.
The Social Web. Why Brands Must Listen, Measure and Act v2.0Visible Technologies
The document discusses the rise of social media and user-generated content. It provides statistics showing rapid growth in internet and social media usage. Key points include:
- Billions of people are still not online as internet penetration increases
- The amount of content on YouTube now exceeds what was on the entire web in 2000
- Blogging and social media usage has grown tremendously in the past decade
- Consumers are increasingly getting news, information and influencing purchase decisions through social media
- Both mainstream media and brands are recognizing the importance of participating in social conversations
The Changing World of Libraries: Lee Rainie, Director of the Pew Research Center’s Internet & American Life Project, will discuss the Project’s latest research about how people use technology and how people use libraries. He will discuss the implications of this work for libraries.
The document summarizes metrics on mobile web fragmentation across various dimensions:
1) It shows data on the market share of handset brands, operating systems, and browsers across different regions. Popular brands, operating systems, and browsers are highlighted.
2) Versioning and usage data is provided for dominant operating systems like Android and iOS, showing differences across regions.
3) Browser market share and versions of rendering engines like WebKit are analyzed.
4) Support for HTML5 features like input tags and touch events is quantified based on metrics, finding lack of full support across devices currently.
Still Setting the Pace in Social Media: The First Longitudinal Study of Usage...Elizabeth Lupfer
This research shows that charitable organizations are still outpacing the business world and academia in their use of social media. In the latest study (2008) a remarkable eighty-nine percent of charitable organizations are using some form of social media including blogs, podcasts, message boards, social networking, video blogging and wikis. A majority (57%) of the organizations are blogging. Forty-five percent of those studied report social media is very important to their fundraising strategy. While these organizations are best known for their non-profit status and their fundraising campaigns, they demonstrate an acute, and still growing, awareness of the importance of Web 2.0 strategies in meeting their objectives.
This study analyzed over 22,000 tweets from 500 journalists on Twitter to understand how they are adapting to the new medium. The researchers found that journalists frequently expressed opinions in their tweets, deviating from traditional norms of impartiality. Non-elite journalists were more likely to share opinions and discuss personal topics, while elite journalists focused more on linking to other sources and avoiding discussions. The study provided insights into how journalists are negotiating professional practices on social media, but was limited by only analyzing US journalists and not capturing intentions through interviews.
(Sept 2011) Considerations for Preserving Blogademia: Scholar Bloggers’ Perce...Carolyn Hank
This document discusses a research study on scholar bloggers' perceptions, preferences, and practices regarding blog preservation. The study involved analyzing over 100 blogs by academics, as well as questionnaires and interviews with blogger respondents. Key findings include:
- Most bloggers saw their blog as part of their scholarly record but not as highly important to preserve as traditional scholarly outputs.
- Bloggers felt they had primary responsibility for blog preservation but limited capability. Libraries and archives were seen as having greater capability but less responsibility.
- Common blog features that may impact preservation included dynamic or changing content, dependencies on co-authors, lack of versioning or rights/use information, and limited archiving activity by bloggers.
Leveraging an international infrastructure: Case studies from the Encyclopeda...Cyndy Parr
This document summarizes a presentation about leveraging international infrastructure for species descriptions using the Encyclopedia of Life (EOL) as a case study. It describes EOL's efforts to aggregate and curate over 1 million taxon pages from 200 providers. It analyzes the types and languages of content, license restrictions, ratings of providers, and the roles of curators. It also discusses opportunities to improve standards, support quality control, and make content more multilingual and open. Case studies demonstrate how EOL coordinates with other databases to resolve errors. The presentation concludes that EOL has made progress but there is still room to expand coverage and engage more users, content providers, and funders.
Methods and Techniques for Segmentation of Consumers in Social MediaÓscar Muñoz García
Social media has revolutionised the way in which consumers relate to each other and with brands. The opinions published in social media have a power of influencing purchase decisions as important as advertising campaigns. Consequently, marketers are increasing efforts and investments for obtaining indicators to measure brand health from the digital content generated by consumers.
Given the unstructured nature of social media contents, the technology used for processing such contents often implements Artificial Intelligence techniques, such as natural language processing, machine learning and semantic analysis algorithms.
This thesis contributes to the State of the Art, with a model for structuring and integrating the information posted on social media, and a number of techniques whose objectives are the identification of consumers, as well as their socio-demographic and psychographic segmentation. The consumer identification technique is based on the fingerprint of the devices they use to surf the Web and is tolerant to the changes that occur frequently in such fingerprint. The psychographic profiling techniques described infer the position of consumer in the purchase funnel, and allow to classify the opinions based on a series of marketing attributes. Finally, the socio-demographic profiling
techniques allow to obtain the residence and gender of consumers.
The document outlines a social media strategy for Smokey Bear, a nonprofit focused on wildfire prevention. It begins by discussing common challenges nonprofits face with social media and the need for a clear strategy. It then provides templates and considerations for key elements of a strategy, including goals and objectives, audience segmentation, competitive analysis, content planning, tactics by channel, tools, and metrics. The overall purpose is to provide guidance to nonprofits on developing an effective yet organized social media presence.
Event:
Digital Curation Institute Symposium
November 22, 2011
4:30-6:30pm
iSchool, University Of Toronto
Abstract:
This presentation reports select findings from two descriptive studies of blogs and bloggers in the areas of history, economics, law, biology, chemistry and physics. The first study focused on scholar bloggersʼ preferences for digital preservation, as well as their publishing behaviors and blog characteristics that influence preservation action. Findings are drawn from 153 questionnaires, 24 interviews, and content analysis of 93 blogs. Briefly, questionnaire respondents are generally interested in blog preservation with a strong sense of personal responsibility. Most feel their blogs should be preserved for both personal and public access and use into the indefinite, rather than short-term, future. Over half of questionnaire respondents report saving their blog content, in whole or in part, and many interviewees expressed a sophisticated understanding of issues of digital preservation. However, the findings also indicate that bloggers exhibit behaviors and preferences complicating preservation action, including issues related to rights and use, co-producer dependencies, and content integrity.
The second study, currently on-going, looks toward the public availability of scholar blogs over-time, with findings drawn from a sample of 644 blogs. Content analysis is currently underway on inactive blogs, characterized as available, but with no new posts published within three months of coding. Initial analysis of the most recent post published to these inactive blogs shows that some bloggers did provide indicators of their respective blogʼs declining activity or, in some cases, blog stoppage. However, such indicators are only present in a clear minority of publicly available, yet inactive blogs. These preliminary findings offer implications for both personal and programmatic preservation approaches, including, notably, issues related to selection and appraisal.
Lee Rainie, Director of the Pew Research Center’s Internet & American Life Project, will share findings from a new report on e-book lending at libraries. He will also discuss other research about the rise of e-books, their impact on people’s reading habits, and the way that library patrons are hoping to avail themselves of e-book borrowing. Finally, he will explore general reading trends and describe the next steps in the Project’s ongoing research about the evolving role of libraries.
Successful social CRM needs a solid basis. Few companies have succeeded in creating the connection between social media activities of their consumers and the other consumer information. Bisnode Inetract's social value solutions shows the truly consumer centric approach to the social CRM.
This document provides insights from Experian Simmons' 2010 Social Networking Report. It finds that 66% of online Americans now use social networking sites, up significantly from 20% in 2007. Users are increasingly addicted to these sites, with 43% visiting multiple times per day. Additionally, 70% of social networkers now keep in touch with family on these sites. The report provides details on user demographics, behaviors, and brand preferences across major social networking sites.
The document summarizes key findings from recent Pew Internet Project reports about changing digital media behaviors and their impact. It discusses how:
1) Only 4% of Facebook users actually derive pleasure from using the site, with most feeling despair upon logging in.
2) Mobile internet access is widespread, with 89% of adults owning phones and 46% owning smartphones.
3) Social media engagement is common, with 59% of adults using sites like Facebook and 16% using Twitter.
4) These trends are changing how knowledge is accessed, shared and influenced as information becomes more pervasive, participatory and networked through various online platforms.
Seattle Interactive Conference - Social and SeachMicrosoft
The document discusses research into how people find and use opinions from various sources when making different types of decisions. It finds that people rely on a diverse set of sources that varies depending on the type of decision. For everyday purchases, people most often consider reviews on retailer sites and online review sites, while for important decisions like choosing a doctor they place more weight on opinions from people they know personally. The factors that influence which opinions people find most helpful can differ depending on whether the decision is related to day-to-day tasks, commercial services, or entertainment.
Increasing Social Media ROI Using Gladwell's Tipping Point FrameworkColleen Carrington
Inspired by some of the brightest thought-leaders in social media, this deck explores how to increase social media ROI using Gladwell's tipping point framework: the right people, a sticky idea, the right context. It is designed for on-line viewing without having to be presented in person. Enjoy!
Social media gets a lot of buzz these days. And as the number of social platforms and tools continues to grow, so does the perplexity.
Real results from social media require leadership commitment, investment of resources, and an integrated communications strategy. In this fast-paced workshop, you’ll learn from actual case studies of nonprofits that have successfully harnessed the power social media and how to measure your organization's return on investment (ROI).
Key Takeaways:
- Trends and opportunities in online behavior
- How nonprofits can leverage mobile and geolocation for cause awareness
- Best practices in social sharing and Search Engine Optimization (SEO)
- Key metrics, tools, and tips to measure and track your social media success
- How to develop a strategic framework for listening, planning, and implementing a social media strategy
ABOUT THE PRESENTER:
Rosie Branstetter, Principal, fiveseed
Rosie has led innovative strategic communications initiatives across numerous industries for more than 10 years. In 2009, she founded fiveseed, a strategic communications agency with global reach built on the philosophy of creating positive change for clients and our community.
Her background includes tenure as a consultant with an advertising agency specializing in higher education, where she was responsible for account management, marketing strategy, brand development and positioning, and market research for top institutions across the U.S.
Today Rosie develops and manages integrated marketing campaigns for forward‐thinking companies, nonprofits, and government agencies. And as a recognized expert in strategic marketing, she facilitates workshops and is a frequent guest speaker on topics of branding,
social media, and international marketing.
Rosie serves as a board member with the Colorado Chapter of the American Marketing Association, Rotary Club of Five Points Cultural District, and the Denver Young Non‐profit Professionals Network; and is actively involved in the Business
Marketing Association, Frontier Asset Building, T4T Colorado, and Denver Public Schools (Goodwill Industries).
1. The evolution of media and how people communicate is changing how brands interact with consumers. Consumers are more informed before purchasing and are comfortable sharing opinions online, both positive and negative.
2. Top Italian brands were analyzed based on their social media presence and online conversations. While having social profiles doesn't necessarily impact a brand, the topics discussed can make a difference. Video sharing is popular for sharing ads and runway shows.
3. To benefit from social media, brands need to listen to online conversations, understand their audiences, and produce engaging content at a steady pace. The goal is to build trust so marketing messages are better received rather than seen as annoying advertisements.
Lee Rainie will describe the latest findings of the Pew Internet Project about libraries and the new mix of services they are offering their patrons – and considering offering.
Converseon is a leading provider of social media monitoring and conversation mining services. They presented on measuring the return on investment of social media campaigns. They discussed who participates in online discussions, approaches to social media listening, how listening can inform objectives at different stages of a campaign, and use cases for social media listening across marketing, customer service, and other business functions.
The Social Web. Why Brands Must Listen, Measure and Act v2.0Visible Technologies
The document discusses the rise of social media and user-generated content. It provides statistics showing rapid growth in internet and social media usage. Key points include:
- Billions of people are still not online as internet penetration increases
- The amount of content on YouTube now exceeds what was on the entire web in 2000
- Blogging and social media usage has grown tremendously in the past decade
- Consumers are increasingly getting news, information and influencing purchase decisions through social media
- Both mainstream media and brands are recognizing the importance of participating in social conversations
The Changing World of Libraries: Lee Rainie, Director of the Pew Research Center’s Internet & American Life Project, will discuss the Project’s latest research about how people use technology and how people use libraries. He will discuss the implications of this work for libraries.
The document summarizes metrics on mobile web fragmentation across various dimensions:
1) It shows data on the market share of handset brands, operating systems, and browsers across different regions. Popular brands, operating systems, and browsers are highlighted.
2) Versioning and usage data is provided for dominant operating systems like Android and iOS, showing differences across regions.
3) Browser market share and versions of rendering engines like WebKit are analyzed.
4) Support for HTML5 features like input tags and touch events is quantified based on metrics, finding lack of full support across devices currently.
Still Setting the Pace in Social Media: The First Longitudinal Study of Usage...Elizabeth Lupfer
This research shows that charitable organizations are still outpacing the business world and academia in their use of social media. In the latest study (2008) a remarkable eighty-nine percent of charitable organizations are using some form of social media including blogs, podcasts, message boards, social networking, video blogging and wikis. A majority (57%) of the organizations are blogging. Forty-five percent of those studied report social media is very important to their fundraising strategy. While these organizations are best known for their non-profit status and their fundraising campaigns, they demonstrate an acute, and still growing, awareness of the importance of Web 2.0 strategies in meeting their objectives.
This study analyzed over 22,000 tweets from 500 journalists on Twitter to understand how they are adapting to the new medium. The researchers found that journalists frequently expressed opinions in their tweets, deviating from traditional norms of impartiality. Non-elite journalists were more likely to share opinions and discuss personal topics, while elite journalists focused more on linking to other sources and avoiding discussions. The study provided insights into how journalists are negotiating professional practices on social media, but was limited by only analyzing US journalists and not capturing intentions through interviews.
(Sept 2011) Considerations for Preserving Blogademia: Scholar Bloggers’ Perce...Carolyn Hank
This document discusses a research study on scholar bloggers' perceptions, preferences, and practices regarding blog preservation. The study involved analyzing over 100 blogs by academics, as well as questionnaires and interviews with blogger respondents. Key findings include:
- Most bloggers saw their blog as part of their scholarly record but not as highly important to preserve as traditional scholarly outputs.
- Bloggers felt they had primary responsibility for blog preservation but limited capability. Libraries and archives were seen as having greater capability but less responsibility.
- Common blog features that may impact preservation included dynamic or changing content, dependencies on co-authors, lack of versioning or rights/use information, and limited archiving activity by bloggers.
Leveraging an international infrastructure: Case studies from the Encyclopeda...Cyndy Parr
This document summarizes a presentation about leveraging international infrastructure for species descriptions using the Encyclopedia of Life (EOL) as a case study. It describes EOL's efforts to aggregate and curate over 1 million taxon pages from 200 providers. It analyzes the types and languages of content, license restrictions, ratings of providers, and the roles of curators. It also discusses opportunities to improve standards, support quality control, and make content more multilingual and open. Case studies demonstrate how EOL coordinates with other databases to resolve errors. The presentation concludes that EOL has made progress but there is still room to expand coverage and engage more users, content providers, and funders.
Similar to Comparing user generated content published in different social media sources (20)
Methods and Techniques for Segmentation of Consumers in Social MediaÓscar Muñoz García
Social media has revolutionised the way in which consumers relate to each other and with brands. The opinions published in social media have a power of influencing purchase decisions as important as advertising campaigns. Consequently, marketers are increasing efforts and investments for obtaining indicators to measure brand health from the digital content generated by consumers.
Given the unstructured nature of social media contents, the technology used for processing such contents often implements Artificial Intelligence techniques, such as natural language processing, machine learning and semantic analysis algorithms.
This thesis contributes to the State of the Art, with a model for structuring and integrating the information posted on social media, and a number of techniques whose objectives are the identification of consumers, as well as their socio-demographic and psychographic segmentation. The consumer identification technique is based on the fingerprint of the devices they use to surf the Web and is tolerant to the changes that occur frequently in such fingerprint. The psychographic profiling techniques described infer the position of consumer in the purchase funnel, and allow to classify the opinions based on a series of marketing attributes. Finally, the socio-demographic profiling
techniques allow to obtain the residence and gender of consumers.
¿Cómo puede ayudar el Big Data a dirigir las campañas de comunicación?Óscar Muñoz García
Presentación que trata sobre diferentes proyectos de innovación en los que estamos trabajando en Havas Media orientados a entender el comportamiento de los consumidores mediante el análisis de Big Data procedente de medios sociales, y a activar la estrategia de comunicación de marca en tiempo real en un entorno omnicanal que tenga encuenta todos los puntos de contacto entre marcas y consumidores.
Caracterización de los usuarios de medios sociales mediante lugar de residenc...Óscar Muñoz García
La caracterización de los usuarios mediante atributos sociodemográficos es un paso necesario previo a la realización de estudios de opinión a partir de información publicada por dichos usuarios en los medios sociales. En este trabajo se presentan, comparan y evalúan diversas técnicas para la identificación de los atributos “género” y “lugar de residencia”, a partir de los metadatos asociados a dichos usuarios, as ́ı como el contenido publicado y compartido por los mismos, y sus redes de amistad. Los resultados obtenidos demuestran que la información proporcionada por la red social es muy útil para identificar dichos atributos.
Identifying Topics in Social Media Posts using DBpediaÓscar Muñoz García
This document discusses a method for identifying topics in social media posts using DBpedia. It begins with an introduction that outlines the task of topic identification, applications for social media, and challenges with short, misspelled texts. It then reviews related work exploiting Wikipedia and DBpedia for tasks like text categorization. The method section describes the process of part-of-speech tagging, context selection, disambiguation against DBpedia, and language filtering. An evaluation on 10,000 Spanish posts finds high coverage rates and precision varying by channel from 59-89%. The conclusions discuss achieving good coverage while noting precision depends on the channel and no single context approach works best across all channels.
Big Data represent an opportunity for organizations with data analysis needs. Companies need to prepare a number of functions to address the Big Data Challenge.
The following presentation describes the Big Data landscape for marketing technology, introducing several applications, and describing the three key aspects a media agency must focus on when dealing with Big Data analysis applications.
Análisis de Sentimientos en un Corpus de Redes SocialesÓscar Muñoz García
El análisis de sentimientos de textos en las redes sociales se ha convertido en un área de investigación cada vez más relevante debido a la influencia que las opiniones expresadas tienen en potenciales usuarios. De acuerdo con una clasificación conceptual de sentimientos y basándonos en un corpus de diversos dominios comerciales, hemos trabajado en la confección de reglas que permitan la clasificación de dichos textos según el sentimiento expresado con respecto a una marca, empresa o producto.
Social TV, más allá de la audiencia. Participación y relacionesÓscar Muñoz García
La medición de audiencias permite averiguar a las cadenas emisoras de contenido audiovisual los contenidos con mayor aceptación y facilita a los anunciantes la optimización de la inversión publicitaria en los espacios televisivos.
El consumo de televisión está cambiando de un escenario en el que la interacción con el aparato de TV se limita al cambio de canal, a otro en el que se produce una participación activa, pública y espontánea de la audiencia, como respuesta a la emisión televisiva.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Comparing user generated content published in different social media sources
1. Comparing user generated content
published in different social media sources
Óscar Muñoz-García, Carlos Navarro
@NLP can u tag #user_generated_content ?! via lrec-conf.org
26 May 2012
2. Introduction
The growth of social media has populated the Web with valuable
UGC that can be exploited for many interesting purposes
E.g. explaining or predicting real world outcomes through opinion
mining
Advertising companies use social media content for market research
By mining users’ interests for focusing advertisement actions
By obtaining the opinion of customers about brands
NLP lets us automatizing social media content analysis
However, UGC presents differences on text quality w.r.t. content
source (e.g., Blogs vs. Twitter)
Such differences challenge existing NLP techniques
Comparing user generated content published in different social media sources ⎢2
3. Introduction
We show the differences of the language used in UGC w.r.t. social media sources
By analysing the distribution of PoS categories on different sources
We evaluate the performance of three NLP techniques
Language Identification
Sentiment Analysis
Topic Identification
Social media sources analysed
Blogs (e.g., Wordpress and Blogger posts)
Forums
Microblogs (e.g., Twitter)
Social networks (e.g., Facebook, Google+, MySpace, LinkedIn and Xing)
Review Sites (e.g., Ciao and Dooyoo)
Audio-visual content publishing sites (e.g., Youtube and Vimeo)
News publishing sites (i.e., mainstream media)
Other sites
Comparing user generated content published in different social media sources ⎢3
4. Comparing user generated content published in different social media sources
Distribution of PoS categories
5. Distribution of PoS categories
Content analysed
Corpora with 10,000 posts extracted from heterogeneous SM sources
l written in Spanish
l related to telecommunications domain
The distribution has been obtained by using an automatic tagger
Tools used:
l PoS tagging:
TreeTagger [Schmid, 1994] with a Spanish parameterisation
l Annotation pipeline:
GATE [Cunningham et al., 2011]
Categories identified
Main: noun, adjective, adverb, determiner, conjunction, pronoun, verb, …
Secondary: common noun, proper noun, negation adverb, personal pronoun, …
Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in
Language Processing, Manchester, UK.
Hamish Cunningham, Diana Maynard , Kalina Bontcheva et al. 2011. Text Processing with GATE (Version 6). University of Sheffield. Department of
Computer Science, April.
Comparing user generated content published in different social media sources ⎢5
6. Distribution of PoS categories
Microblogs: determiners and prepositions are used to a lesser extent
Limitation of length (140 characters)
Posts need to be written more concisely → Meaningless grammatical categories
tend to be used less
Social
News Blogs Video Reviews Microblogs Forums Other
networks
Nouns 31% 30% 29% 23% 34% 22% 27% 33%
Adjectives 9% 8% 6% 8% 9% 7% 8% 6%
Adverbs 2% 3% 3% 5% 4% 4% 4% 3%
Determiners 11% 10% 8% 8% 6% 8% 9% 7%
Conjunctions 6% 8% 7% 10% 6% 10% 9% 7%
Pronouns 2% 3% 5% 6% 5% 6% 4% 4%
Prepositions 15% 15% 12% 13% 8% 12% 13% 11%
Punctuaction marks 11% 8% 13% 9% 8% 9% 10% 11%
Verbs 12% 14% 17% 18% 19% 21% 16% 16%
Other particles 1% 1% 1% 1% 1% 1% 1% 1%
Comparing user generated content published in different social media sources ⎢6
7. Distribution of PoS categories
News and blogs present similar distributions
Because of similar writing styles
No limitations on the size of posts
Social
News Blogs Video Reviews Microblogs Forums Other
networks
Nouns 31% 30% 29% 23% 34% 22% 27% 33%
Adjectives 9% 8% 6% 8% 9% 7% 8% 6%
Adverbs 2% 3% 3% 5% 4% 4% 4% 3%
Determiners 11% 10% 8% 8% 6% 8% 9% 7%
Conjunctions 6% 8% 7% 10% 6% 10% 9% 7%
Pronouns 2% 3% 5% 6% 5% 6% 4% 4%
Prepositions 15% 15% 12% 13% 8% 12% 13% 11%
Punctuaction marks 11% 8% 13% 9% 8% 9% 10% 11%
Verbs 12% 14% 17% 18% 19% 21% 16% 16%
Other particles 1% 1% 1% 1% 1% 1% 1% 1%
Comparing user generated content published in different social media sources ⎢7
8. Distribution of PoS categories
Nouns
Common and proper nouns present similar distributions for all sources
PoS tagger fails when proper nouns are written in lower case
l In special in Forums and Reviews where discussion about specific products are raised
l Solution: use gazetteers
Improves entity detection
Domain dependent
Foreign words are less used in news that in other sources because of style rules
of Spanish mainstream media
l Avoid foreign words, as far as possible, whenever a Spanish word exists
Adjectives
Adjectives of quantity are the most used (47%) in all the channels
l Cardinals (30%) more used than ordinals (2%)
Multiplicative, partitive and indefinite quantity adjectives are used more frequently
in forums and review sites:
l Due to quantitative evaluations and comparison of products
Comparing user generated content published in different social media sources ⎢8
9. Distribution of PoS categories
Adverbs
There is a correlation with the distribution of adverbs of negation and the size of
the posts
l More used in channels with shorter texts
l Detection of negations is essential when performing sentiment analysis
Conjunctions
The distribution of coordinating conjunctions is higher in News and Blogs
l More used in channels with longer texts
l Coordinating conjunctions are used to identify opinion chunks as they were punctuation
marks.
Pronouns
The distribution of personal pronouns is higher in Microblogs, Reviews, Forums
and audio-visual content publishing sites
l Due to conversations between users vs. narrative style of News and Blogs
l Pronouns make it difficult to identify entities within opinions
Entities not explicitly mentioned
Comparing user generated content published in different social media sources ⎢9
10. Distribution of PoS categories
Punctuation marks
Full stop less used in news
l Sentences are longer than in other sources
Comma less used on Microblogs and Audio-visual content sites
Ellipses are more used in Microblogs
l To denote unfinished sentences
l Automatically truncated messages
Secondary punctuation marks less used in Microblogs
l Difficulty for introducing these characters on mobile terminals
l Content length limitation
Verbs
More used in Microblogs and Forums
l Intentions and actions are expressed more often
Past tenses less used in Microblogs
l Immediate experiences
Infinitive more used in Microblogs
Comparing user generated content published in different social media sources ⎢10
11. Comparing user generated content published in different social media sources
Performance of language
identification
12. Performance of Language Identification
Content analysed
3,368 tweets
2,768 posts extracted from other social media sources (not
Twitter)
Written in Spanish, Portuguese and English
Technique used
Implementation of an existing text categorization algorithm
l Analysis of the frequency of n-grams of characters within documents
[Cavnar and Trenkle, 1994]
Cavnar, W. B., & Trenkle, J. M. (1994). N-Gram-Based Text Categorization. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis
and Information Retrieval (pp. 161-175).
Comparing user generated content published in different social media sources ⎢12
13. Performance of Language Identification
Language identification method
Comparing user generated content published in different social media sources ⎢13
14. Performance of Language Identification
Evaluation Results
Overall accuracy
l Twitter: 93.02%
l Other sources: 96.76%
Kappa
l Twitter: 0.844
l Other sources: 0.916
Normalizing tweets does not improve performance
Syntactic normalization of Twitter messages [Kauffmann and Jugal, 2010]
1. Delete references to users at the beginning of the tweet
2. Delete “RT @user:” sequences
3. Delete hash tags found at the end of the tweet
4. Delete “#” at the beginning of hash tags
5. Delete URLs
6. Delete “…” followed by a URL
Max Kaufmann and Kalita Jugal. 2010. Syntactic normalization of twitter messages. In Proceedings of the International Conference on Natural
Language Processing (ICON-2010).
Comparing user generated content published in different social media sources ⎢14
15. Comparing user generated content published in different social media sources
Performance of sentiment
analysis
16. Performance of Sentiment Analysis
Content analysed
1,859 tweets and 1,847 posts extracted from other social media sources (not
Twitter) written in Spanish
Technique used
Matching of linguistic expressions based on a Lexicon
l Each expression is a sequence of pairs (lemma, PoS)
E.g. “Your brand is cool!” matches with {(Σ,Noun),(‘be’,Verb), (‘cool’,Adjective)}
Kind of expressions
l For detecting subjectivity (20 expressions)
Use to include specific verbs
l For detecting sentiment of opinions (1,480 expressions)
Negative expressions add a value in {-2,-1} to overall sentiment
Positive expressions add a value in {1,2} to overall sentiment
l For reversing sentiment (22)
Include negations
Multiply detected sentiment by (-1)
l For augmenting or reducing sentiment (32)
Use to include adverbs
Multiply detected sentiment by 1.5 or 0.75
Comparing user generated content published in different social media sources ⎢16
17. Performance of Sentiment Analysis
Evaluation Results
Overall accuracy
l Twitter: 66.92%
l Other sources: 80.17%
Kappa
l Twitter: 0.198
l Other sources: 0.31
Normalizing tweets does not improve performance
Syntactic normalization of Twitter messages [Kauffmann and Jugal, 2010]
1. Delete references to users at the beginning of the tweet
2. Delete “RT @user:” sequences
3. Delete hash tags found at the end of the tweet
4. Delete “#” at the beginning of hash tags
5. Delete URLs
6. Delete “…” followed by a URL
Max Kaufmann and Kalita Jugal. 2010. Syntactic normalization of twitter messages. In Proceedings of the International Conference on Natural
Language Processing (ICON-2010).
Comparing user generated content published in different social media sources ⎢17
18. Comparing user generated content published in different social media sources
Performance of topic
identification
19. Performance of topic identification
Description of the method [Muñoz-García et al., 2011]
Input
PoS • “torino”, “art”, “media”, “user”, “cloud”
Filtering
• http://dbpedia.org/resource/Turin
• http://dbpedia.org/resource/Art
Topic
Recognition • http://dbpedia.org/resource/User_(computing)
Language
• “Torino”, “arte”, “utente”, “mezzo di comunicazione di massa”, ...
Filtering
Óscar Muñoz-Garcíaa, Andrés García-Silva, Óscar Corcho, Manuel de la Higuera Hern´andez, and Carlos Navarro. 2011. Identifying Topics in Social
Media Posts using DBpedia. In Jean-Dominique Meunier, Halid Hrasnica, and Florent Genoux, editors, Proceedings of the NEM Summit 2011, pages
81–86, Torino, Italy. Eurescom the European Institute for Research and Strategic Studies in Telecommunications GmbH.
Comparing user generated content published in different social media sources ⎢19
20. Performance of topic identification
PoS filtering example
• But a hardware problem is more likely, especially if
you use the phone a lot while eating. The
Blackberry's tiny trackball could be suffering the
same accumulation of gunk and grime that can
plague a computer mouse that still uses a rubber
Input ball on the underside to roll around the desk.
• Blackberry, phone, trackball, computer,
problem, grime, hardware, mouse, desk,
PoS filtering rubber ball, gunk
example
Comparing user generated content published in different social media sources ⎢20
21. Performance of topic identification
Topic Recognition (Sem4Tags [García-Silva et al, 2010])
• Blackberry, phone, trackball, computer, problem, grime, hardware,
PoS mouse, desk, rubber ball, gunk
filtering
• Blackberry, {phone, hardware, trackball, mouse}
• Computer, {hardware, mouse, problem, desk}
Context
Selection • …
• http://dbpedia.org/resource/BlackBerry
• http://dbpedia.org/resource/Computer
Disambiguation
Andrés García-Silva, Oscar Corcho, and Jorge Gracia. 2010. Associating semantics to multilingual tags in folksonomies. In 17th Int.
Conference on Knowledge Engineering and Knowledge Management EKAW 2010, Lisbon (Portugal), October
Comparing user generated content published in different social media sources ⎢21
22. Performance of topic identification
Context Selection
For each keyword, a set of up to 4 related keywords that will help to
disambiguate the its meaning
4 is the number of words above which the context does not add more resolving
power to disambiguation [Kaplan, 1955]
We compute semantic relatedness (active context) taking into account the
co-ocurrence of words in web pages [Gracia et al, 2009]
Keyword Relatedness Keyword Relatedness
phone 0.347 hardware 0.347
trackball 0.311 mouse 0.311
computer 0.288 desk 0.287
problem 0.246 rubber ball 0.246
grime 0.190 gunk 0.168
Active context selection for blackberry keyword
A. Kaplan.1955. An experimental study of ambiguity and context. Mechanical Translation, 2:39-46
Jorge Gracia and Eduardo Mena. 2009. Multiontology semantic disambiguation in unstructured web contexts. In
Proc. of Workshop on Collective Knowledge Capturing and Representation (CKCaR’09) at K-CAP’09,
Identifying Topics in Social Media Posts using DBpedia ⎢22
23. Performance of topic identification
Disambiguation Criteria
OPTION 1: Most frequent sense for the ambiguous word
l Determined by Wikipedia editors (the first link in a disambiguation page)
OPTION 2: Vector space model
1. A vector containing the keyword and its context
2. A vector containing top N terms is created from each candidate sense is created using
TF-IDF (Term Frequency and Inverse Document Frequency)
3. The cosine similarity is used to determine which vectorised sense is more similar to
the vector associated to the keyword
DBpedia resource Definition Similarity
Is a line of mobile e-mail and
BlackBerry 0.224
smartphone
Blackberry is an edible fruit 0.15
BlackBerry_(song) is a song by the Black Crowes 0.0
BlackBerry_Township,
_Itasca_County, Is a towship in … Itasca County 0.0
_Minnesota
Comparing user generated content published in different social media sources ⎢23
24. Performance of topic identification
Evaluation settings
Evaluated a random sample of 1,816 posts (18,16%)
47 human evaluators
Each post and topics identified shown to 3 different evaluators
Evaluation options:
1. The topic is not related with the post
2. The topic is somehow related with the post
3. The topic is closely related with the post
4. The evaluator has not enough information for taking a decision
Fleiss’ kappa test
l Strength of agreement for 2 evaluators = 0.826 (very good)
l Strength of agreement for 3 evaluators = 0.493 (moderate)
Comparing user generated content published in different social media sources ⎢24
25. Performance of topic identification
Evaluation Results
Precision depends on the channel
l From 59.19% for social networks
More misspellings
More common nouns
l To 88.89% for review sites
Concrete products and brands
Proper nouns tend to have a Wikipedia entry
Context selection criteria also depends on the channel
l Active context selection better for microblogs and review sites
l Considering all the post keywords as context better for blogs
l Without context selection is better for the rest of the cases (almost all the channels)
Naïve default sense selection is effective
Comparing user generated content published in different social media sources ⎢25
27. Conclusions
We have found differences among social media sources for every
experiment executed
Distribution of PoS tagging vary across different sources
l Since PoS tagging is a previous step for many NLP techniques, the
performance of such techniques may be affected
E.g. Using nouns as context for performing term disambiguation.
More nouns → More context
E.g. Adjectives and adverbs for performing sentiment analysis
Language identification is less accurate for content extracted from
Twitter
Sentiment analysis is less accurate for content extracted from Twitter
Precision of topic identification also depends on the source
l With respect to context selection there is not a technique that performs
better for all the sources
Comparing user generated content published in different social media sources ⎢27