How to automatically distinguish between high-quality and low-quality content in Twitter?
Twitter is a rapidly growing microblogging platform, which provides a large amount, diversity and varying quality of content. In order to provide higher quality content (e.g. posts mentioning news, events, useful facts or well-formed opinions) when a user searches for tweets on Twitter, we propose a new method to filter and rank tweets according to their quality. In order to model the quality of tweets, we devise a new set of link-based features, in addition to content-based features. We examine the implicit links between tweets, URLs, hashtags and users, and then propose novel metrics to reflect the popularity as well as quality-based reputation of websites, hashtags and users. We then evaluate both the content-based and link-based features in terms of classification effectiveness and identify an optimal feature subset that achieves the best classification accuracy.
Presentation given at the DASFAA 2011 conference (15-18 April 2012, Busan, South Korea).
Authors: Jan Vosecky, Kenneth Wai-Ting Leung, and Wilfred Ng
Full paper: http://www.cse.ust.hk/~wilfred/paper/dasfaa12.pdf
It's not just privacy, porn, and pipe-bombs: Libraries and the ethics of serviceLane Wilkinson
Slides from an 10/12/12 at the University of Illinois, Urbana-Champaign. Invited to speak as part of Ethics Awareness Week. Thank you to UIUC libraries, the GSLIS, and the National Center for Professional and Research Ethics.
PatentTransformer is developed based on BERT and GPT-2 , which are the-state-of-the-art deep learning technology for the language models such as NLP (Natural Language Processing), NLU (Natural Language Understanding), and NLG (Natural Language Generation). PatentTransformer implements transfer learning exploiting pre-trained BERT and GPT-2, which are the unsupervised language model on a large corpus, and fine-tuned domain-specific language model on downstream tasks with much fewer data for patent claims.
Collaborative Personalized Twitter Search with Topic-Language ModelsJan Vosecky
The vast amount of real-time and social content in microblogs results in an information overload for users when searching microblog data. Given the user’s search query, delivering content that is relevant to her interests is a challenging problem. Traditional methods for personalized Web search are insufficient in the microblog domain, because of the diversity of topics, sparseness of user data and the highly social nature. In particular, social interactions between users need to be considered, in order to accurately model user’s interests, alleviate data sparseness and tackle the cold-start problem. In this paper, we therefore propose a novel framework for Collaborative Personalized Twitter Search. At its core, we develop a collaborative user model, which exploits the user’s social connections in order to obtain a comprehensive account of her preferences. We then propose a novel user model structure to manage the topical diversity in Twitter and to enable semantic-aware query disambiguation. Our framework integrates a variety of information about the user’s preferences in a principled manner.
Dynamic Multi-Faceted Topic Discovery in TwitterJan Vosecky
Discovering high-level topics from social streams is important for many downstream applications. However, traditional text mining methods that rely on the bag-of-words model are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, topics in Twitter are inherently dynamic and often focus on specific entities, such as people or organizations. In this paper, we therefore propose a method for mining multifaceted topics from Twitter streams. The Multi-Faceted Topic Model (MfTM) is proposed to jointly model latent semantics among terms and entities and captures the temporal characteristics of each topic. We develop an efficient online inference method for MfTM, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We further demonstrate the effectiveness of our framework in the context of tweet clustering.
More info: http://www.cse.ust.hk/~jvosecky/
This is a tutorial that was presented at: The 20th International Conference on Software and Systems Reuse (ICSR'22)
Date of Conference: 15-17 June 2022
Conference Location: Virtual
Conference website: https://icsr2022v2.wp.imt.fr/
The document presents a model to predict question quality in community question answering sites. It aims to predict user satisfaction and question quality in both the online and offline scenarios. In the online scenario, it uses features from question text and the asker's profile, while in the offline scenario it adds features from community responses. Experimental results show that predicting satisfaction achieves 70% accuracy using logistic regression with additional text features. Community interaction features are more predictive than question content features alone. The model performs better at predicting unsatisfied questions.
This document summarizes a study of programming questions and answers on StackOverflow. The study found that the most recognized answers included concise code examples with explanations in the context of the question. Attributes like step-by-step solutions and links to external resources also contributed to higher-scored answers. In contrast, lower-scored answers lacked code and thorough explanations. The study suggests customized answers may better help programmers compared to typical documentation due to providing familiar contexts and reducing cognitive effort.
It's not just privacy, porn, and pipe-bombs: Libraries and the ethics of serviceLane Wilkinson
Slides from an 10/12/12 at the University of Illinois, Urbana-Champaign. Invited to speak as part of Ethics Awareness Week. Thank you to UIUC libraries, the GSLIS, and the National Center for Professional and Research Ethics.
PatentTransformer is developed based on BERT and GPT-2 , which are the-state-of-the-art deep learning technology for the language models such as NLP (Natural Language Processing), NLU (Natural Language Understanding), and NLG (Natural Language Generation). PatentTransformer implements transfer learning exploiting pre-trained BERT and GPT-2, which are the unsupervised language model on a large corpus, and fine-tuned domain-specific language model on downstream tasks with much fewer data for patent claims.
Collaborative Personalized Twitter Search with Topic-Language ModelsJan Vosecky
The vast amount of real-time and social content in microblogs results in an information overload for users when searching microblog data. Given the user’s search query, delivering content that is relevant to her interests is a challenging problem. Traditional methods for personalized Web search are insufficient in the microblog domain, because of the diversity of topics, sparseness of user data and the highly social nature. In particular, social interactions between users need to be considered, in order to accurately model user’s interests, alleviate data sparseness and tackle the cold-start problem. In this paper, we therefore propose a novel framework for Collaborative Personalized Twitter Search. At its core, we develop a collaborative user model, which exploits the user’s social connections in order to obtain a comprehensive account of her preferences. We then propose a novel user model structure to manage the topical diversity in Twitter and to enable semantic-aware query disambiguation. Our framework integrates a variety of information about the user’s preferences in a principled manner.
Dynamic Multi-Faceted Topic Discovery in TwitterJan Vosecky
Discovering high-level topics from social streams is important for many downstream applications. However, traditional text mining methods that rely on the bag-of-words model are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, topics in Twitter are inherently dynamic and often focus on specific entities, such as people or organizations. In this paper, we therefore propose a method for mining multifaceted topics from Twitter streams. The Multi-Faceted Topic Model (MfTM) is proposed to jointly model latent semantics among terms and entities and captures the temporal characteristics of each topic. We develop an efficient online inference method for MfTM, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We further demonstrate the effectiveness of our framework in the context of tweet clustering.
More info: http://www.cse.ust.hk/~jvosecky/
This is a tutorial that was presented at: The 20th International Conference on Software and Systems Reuse (ICSR'22)
Date of Conference: 15-17 June 2022
Conference Location: Virtual
Conference website: https://icsr2022v2.wp.imt.fr/
The document presents a model to predict question quality in community question answering sites. It aims to predict user satisfaction and question quality in both the online and offline scenarios. In the online scenario, it uses features from question text and the asker's profile, while in the offline scenario it adds features from community responses. Experimental results show that predicting satisfaction achieves 70% accuracy using logistic regression with additional text features. Community interaction features are more predictive than question content features alone. The model performs better at predicting unsatisfied questions.
This document summarizes a study of programming questions and answers on StackOverflow. The study found that the most recognized answers included concise code examples with explanations in the context of the question. Attributes like step-by-step solutions and links to external resources also contributed to higher-scored answers. In contrast, lower-scored answers lacked code and thorough explanations. The study suggests customized answers may better help programmers compared to typical documentation due to providing familiar contexts and reducing cognitive effort.
The document discusses best practices for code review from three perspectives: communication, psychology, and work processes. It provides tips for effective communication in code reviews such as using clear and concise language, sharing context, and resolving discussions properly. On the psychology side, it notes the difficulty of admitting mistakes and not taking feedback personally due to factors like self-worth being tied to work. For work processes, it recommends habits like reviewing your own code first, focusing discussions, and approving pull requests once only minor issues remain. The overall goal is for code reviews to achieve quality assurance, knowledge sharing, and collective ownership in the most efficient and pleasant way possible.
Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifiers from tweets data often faces the data sparsity problem partly due to the large variety of short and irregular forms introduced to tweets because of the 140-character limit. In this work we propose using two different sets of features to alleviate the data sparseness problem. One is the semantic feature set where we extract semantically hidden concepts from tweets and then incorporate them into classifier training through interpolation. Another is the sentiment-topic feature set where we extract latent topics and the associated topic sentiment from tweets, then augment the original feature space with these sentiment-topics. Experimental results on the Stanford Twitter Sentiment Dataset show that both feature sets outperform the baseline model using unigrams only. Moreover, using semantic features rivals the previously reported best result. Using sentiment-topic features achieves 86.3% sentiment classification accuracy, which outperforms existing approaches.
Our method ranks Twitter conversations by giving importance scores to tweets based on their position in conversations and the participating users. We filter tweets, create a word index considering SMS language, and score tweets based on term frequency and the tweet and user scores. Experimental settings included removing or adding constraints like stop words and duplicate words. Results were subjective as accurately evaluating conversations is difficult without contextual knowledge. Future work could infer conversation context and use precision/recall by programmatically tagging tweets.
Presented at: The 6th International Workshop on Refactoring (IWoR 2022)
Date of Workshop: 14 October 2022
Conference Location: Oakland Center, Michigan, USA
Analysing and Reporting Qualitative Data.pdfSarah Pollard
This document provides guidance on analysing qualitative data. It discusses four main stages of qualitative analysis: 1) reading transcripts to become familiar with context, 2) identifying recurring themes, 3) organizing themes into sections and dividing text among themes, and 4) writing conclusions by discussing themes and their interactions. It also addresses frequently asked questions about qualitative analysis, such as using quotes to support interpretations, cleaning quotes, shortening extracts, and using identifiers to link quotes to respondents.
The document describes a thesis that aims to develop a trust-aware recommender system for Twitter. It introduces recommender systems and discusses computing trust through direct interactions and propagating trust through random walks. It then describes building a Twitter crawler and prototype recommender system that scores tweets based on trust and content. Experiments analyze trust properties in Twitter and benchmark the recommender system. The thesis contributes a trust metric, crawler, and initial recommender system while finding that trust models may be useful but do not exhibit transitivity in Twitter.
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Traian Rebedea
- PolyCAFe is an automatic assessment tool that analyzes collaborative chat conversations and provides feedback without requiring tutor time.
- It uses natural language processing, social network analysis, and information retrieval techniques to evaluate utterances, measure collaboration, and generate feedback.
- An experiment found that tutors could prepare feedback 35% faster with PolyCAFe and that students found its feedback useful, accurate, and helpful for their learning.
How Anonymous Can Someone be on Twitter?George Sam
The document describes a study that aimed to identify Twitter users based on their writing styles in tweets. The researchers collected over 3 million tweets from 100 users and extracted features such as word counts and frequencies. They were able to correctly identify the author of tweets 67% of the time using tf-idf vectors and the extracted stylistic features.
Themes identification techniques in qualitative researchGhulam Qambar
The document discusses qualitative data analysis techniques for identifying themes in text. It describes codes, categories, and themes, with themes representing major elements or concepts in the analyzed text. Several techniques for identifying themes are outlined, including analyzing word repetitions, comparing and contrasting text, examining linguistic features like metaphors and transitions, and physically manipulating text through cutting and sorting. The selection of appropriate techniques depends on the type of data, required skills and labor, and desired number/types of themes to ensure validity and reliability of the analysis.
The document discusses test-driven development (TDD). TDD involves writing tests before writing code to specify desired functionality, then writing just enough code to pass each test. This process of red-green-refactor evolves the system design through tests. TDD results in decoupled, well-designed code and provides benefits like confidence, documentation, and safe refactoring. The document also provides tips for writing better tests, such as keeping tests small, expressive, isolated, and automated.
The document discusses various qualitative research methods including focus groups, depth interviews, and projective techniques. It provides descriptions of these methods, how they differ, their advantages and disadvantages. Focus groups involve group discussion and interaction, depth interviews probe individuals in-depth, and projective techniques seek to uncover subconscious motivations through indirect questioning techniques.
The document discusses various qualitative research methods including focus groups, depth interviews, and projective techniques. It provides descriptions of these methods, how they differ, their advantages and disadvantages. Focus groups involve group discussion and interaction, depth interviews probe individuals in-depth, and projective techniques seek to uncover subconscious motivations through indirect questioning techniques.
Building a multi headed model thats capable of detecting different types of toxicity like threats, obscenity, insult and identity based hate. Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to efficiently facilitate conversations, leading many communities to limit or completely shut down user comments. So far we have a range of publicly available models served through the perspective APIs, including toxicity. But the current models still make errors, and they dont allow users to select which type of toxicity theyre interested in finding. Pallam Ravi | Hari Narayana Batta | Greeshma S | Shaik Yaseen ""Toxic Comment Classification"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23464.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/23464/toxic-comment-classification/pallam-ravi
Assignment 1: Discussion—Developing Trust
Communicating ethically to employees as well as other internal and external stakeholders is of the utmost importance for leaders and managers. Being able to build and maintain trust in order to be credible is essential.
Using the module readings, the online library resources, and the Internet, research ways of developing trust.
Respond to the following:
Explain how leaders and managers can overcome obstacles and develop trust in communicating corporate ethics.
By Wednesday
, July 1, 2015
, post your response to the appropriate
Discussion Area
. Through
Wednesday, July 8, 2015
, review and comment on at least two peers’ responses.
Write your initial response in 300–500 words. Your response should be thorough and address all components of the discussion question in detail, include citations of all sources, where needed, according to the APA Style, and demonstrate accurate spelling, grammar, and punctuation
Do the following when responding to your peers:
Read your peers’ answers.
Provide substantive comments by
contributing new, relevant information from course readings, Web sites, or other sources;
building on the remarks or questions of others; or
sharing practical examples of key concepts from your professional or personal experiences
Respond to feedback on your posting and provide feedback to other students on their ideas.
Make sure your writing
is clear, concise, and organized;
demonstrates ethical scholarship in accurate representation and attribution of sources; and
displays accurate spelling, grammar, and punctuation.
Grading Criteria
Assignment Components
Max Points
Initial response was:
Insightful, original, accurate, and timely.
Substantive and demonstrated advanced understanding of concepts.
Compiled/synthesized theories and concepts drawn from a variety of sources to support statements and conclusions.
16
Discussion Response and Participation:
Responded to a minimum of two peers in a timely manner.
Offered points of view supported by research.
Asked challenging questions that promoted discussion.
Drew relationships between one or more points in the discussion.
16
Writing:
Wrote in a clear, concise, formal, and organized manner.
Responses were error free.
Information from sources, where applicable, was paraphrased appropriately and accurately cited.
8
Total:
40
.
Learning Analytics for Online Discussions: A Pedagogical Model for Intervent...alywise
This document summarizes Alyssa Friend Wise's pedagogical model for using learning analytics to intervene in online discussions. The model focuses on capturing meaningful traces of student activity through metrics of "speaking" and "listening". It presents analytics to students in an extracted table format and embedded in the discussion interface. The model supports interpretation through a framework that emphasizes integration with learning activities, agency in interpretation, reflection, and dialogue between students and instructors. An initial implementation in a doctoral seminar found students engaged productively with the model.
Need help with this assignmentPreliminary research is attached w.docxgibbonshay
Need help with this assignment
Preliminary research is attached with sources. Also a sample is attached to give you an idea of how it should be formatted.
Evaluate a Source
ASSIGNMENT: For this essay, you will select one of the sources you have found through your preliminary research about your research topic. Which source you choose is up to you; however, it should be substantial enough that you will be able to talk about it at length, and intricate enough that it will keep you (and your reader) interested.
In order to foster learning and growth, all essays you submit must be newly written specifically for this course. Any recycled work will be sent back with a 0, and you will be given one attempt to redo the Touchstone.
The introduction of this paper will involve introducing the source: Provide the author, the title, and the context (where you found the source, where it was originally published, who sponsored it, etc.)
You will then go on to evaluate the source on two levels:
1. Credibility: Using the information in this unit as a guide, evaluate the source’s authenticity and reliability. Look at all the information that you can find about the source to establish the author’s (or sponsor’s) trustworthiness.
2. Usefulness: Using a combination of summary and analysis, examine the source on a critical level. Determine what the source’s purpose (thesis) is, and how it arrives at that goal. Examine its value to you and the project you are working on. How will it help you prove your own points? How might it come in handy to back up a claim (or address a counter-claim)?
Finally, you will include a conclusion which shows your final assessments on both counts.
Sample Touchstone
A. Assignment Guidelines
DIRECTIONS: Refer to the list below throughout the writing process. Do not submit your Touchstone until it meets these guidelines.
1. Source Identification
The introduction of this paper will be introducing the source:
❒ Have you provided the author's name?
❒ Have you provided the source title?
❒ Have you provided the context (where you found the source, where it was originally published, who sponsored it, etc.)?
2. Source Evaluation
❒ Have you provided a judgment on the source's credibility?
❒ Have you used specific examples from the source to illustrate your judgment on credibility?
❒ Have you provided a judgment on the source's usefulness?
❒ Have you used specific examples from the source to illustrate your judgment on usefulness?
3. Reflection
❒ Have you answered all reflection questions thoughtfully and included insights, observations, and/or examples in all responses?
❒ Are your answers included on a separate page below the main assignment?
B. Reflection Questions
DIRECTIONS: Below your assignment, include answers to all of the following reflection questions.
1. What types of questions did you ask yourself when evaluating the credibility and usefulness of your source? (2-3 sentences)
2. How do you feel this ev.
This document presents a system for detecting semantically similar questions in online forums like Quora to reduce duplicate content. It proposes using natural language processing techniques like tagging questions with keywords, vectorizing text with Google News vectors, and calculating similarity with Word Mover's Distance. The system cleans and preprocesses questions before generating tags and calculating similarity between questions to identify duplicates. An evaluation of the system achieved accurate detection of matching and non-matching question pairs.
The document provides an overview of qualitative research design and methods, including focus groups, depth interviews, and projective techniques. It discusses the characteristics, advantages, and disadvantages of each method. Focus groups generate ideas through group dynamics but don't allow for in-depth probing of individuals. Depth interviews enable deeper probing of individuals but lack group synergy. Projective techniques can uncover subconscious motivations but require skilled interpreters and carry risks of bias. Qualitative research aims to understand perspectives rather than quantify results.
Touchstone 2.1 Evaluate a SourceASSIGNMENT For this essay, y.docxnovabroom
Touchstone 2.1: Evaluate a Source
ASSIGNMENT:
For this essay, you will select one of the sources you have found through your preliminary research about your research topic. Which source you choose is up to you; however, it should be substantial enough that you will be able to talk about it at length, and intricate enough that it will keep you (and your reader) interested.
The introduction of this paper will involve introducing the source: Provide the author, the title, and the context (where you found the source, where it was originally published, who sponsored it, etc.)
You will then go on to evaluate the source on two levels:
Credibility:
Using the information in this unit as a guide, evaluate the source’s authenticity and reliability. Look at all the information that you can find about the source to establish the author’s (or sponsor’s) trustworthiness.
Usefulness:
Using a combination of summary and analysis, examine the source on a critical level. Determine what the source’s purpose (thesis) is, and how it arrives at that goal. Examine its value to you and the project you are working on. How will it help you prove your own points? How might it come in handy to back up a claim (or address a counter-claim)?
Finally, you will include a conclusion which shows your final assessments on both counts.
Sample Touchstone
A. Assignment Guidelines
DIRECTIONS:
Refer to the list below throughout the writing process. Do not submit your Touchstone until it meets these guidelines.
1. Source Identification
The introduction of this paper will be introducing the source:
❒ Have you provided the author's name?❒ Have you provided the source title?❒ Have you provided the context (where you found the source, where it was originally published, who sponsored it, etc.)?
2. Source Evaluation
❒ Have you provided a judgment on the source's credibility?❒ Have you used specific examples from the source to illustrate your judgment on credibility?❒ Have you provided a judgment on the source's usefulness?❒ Have you used specific examples from the source to illustrate your judgment on usefulness?
3. Reflection
❒ Have you answered all reflection questions thoughtfully and included insights, observations, and/or examples in all responses?
❒ Are your answers included on a separate page below the main assignment?
B. Reflection Questions
DIRECTIONS:
Below your assignment, include answers to all of the following reflection questions.
What types of questions did you ask yourself when evaluating the credibility and usefulness of your source? (2-3 sentences)
How do you feel this evaluation practice will help you as you continue to move through the research process? (2-3 sentences)
C. Rubric
Advanced (90-100%)Proficient (80-89%)Acceptable (70-79%)Needs Improvement (50-69%)Non-Performance (0-49%)
Thesis Statement
Provide a clear thesis statement with sufficient support.The thesis statement consists of an original observation that is clear, focused, ...
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
More Related Content
Similar to Searching for Quality Microblog Posts: Filtering and Ranking based on Content Analysis and Implicit Links
The document discusses best practices for code review from three perspectives: communication, psychology, and work processes. It provides tips for effective communication in code reviews such as using clear and concise language, sharing context, and resolving discussions properly. On the psychology side, it notes the difficulty of admitting mistakes and not taking feedback personally due to factors like self-worth being tied to work. For work processes, it recommends habits like reviewing your own code first, focusing discussions, and approving pull requests once only minor issues remain. The overall goal is for code reviews to achieve quality assurance, knowledge sharing, and collective ownership in the most efficient and pleasant way possible.
Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifiers from tweets data often faces the data sparsity problem partly due to the large variety of short and irregular forms introduced to tweets because of the 140-character limit. In this work we propose using two different sets of features to alleviate the data sparseness problem. One is the semantic feature set where we extract semantically hidden concepts from tweets and then incorporate them into classifier training through interpolation. Another is the sentiment-topic feature set where we extract latent topics and the associated topic sentiment from tweets, then augment the original feature space with these sentiment-topics. Experimental results on the Stanford Twitter Sentiment Dataset show that both feature sets outperform the baseline model using unigrams only. Moreover, using semantic features rivals the previously reported best result. Using sentiment-topic features achieves 86.3% sentiment classification accuracy, which outperforms existing approaches.
Our method ranks Twitter conversations by giving importance scores to tweets based on their position in conversations and the participating users. We filter tweets, create a word index considering SMS language, and score tweets based on term frequency and the tweet and user scores. Experimental settings included removing or adding constraints like stop words and duplicate words. Results were subjective as accurately evaluating conversations is difficult without contextual knowledge. Future work could infer conversation context and use precision/recall by programmatically tagging tweets.
Presented at: The 6th International Workshop on Refactoring (IWoR 2022)
Date of Workshop: 14 October 2022
Conference Location: Oakland Center, Michigan, USA
Analysing and Reporting Qualitative Data.pdfSarah Pollard
This document provides guidance on analysing qualitative data. It discusses four main stages of qualitative analysis: 1) reading transcripts to become familiar with context, 2) identifying recurring themes, 3) organizing themes into sections and dividing text among themes, and 4) writing conclusions by discussing themes and their interactions. It also addresses frequently asked questions about qualitative analysis, such as using quotes to support interpretations, cleaning quotes, shortening extracts, and using identifiers to link quotes to respondents.
The document describes a thesis that aims to develop a trust-aware recommender system for Twitter. It introduces recommender systems and discusses computing trust through direct interactions and propagating trust through random walks. It then describes building a Twitter crawler and prototype recommender system that scores tweets based on trust and content. Experiments analyze trust properties in Twitter and benchmark the recommender system. The thesis contributes a trust metric, crawler, and initial recommender system while finding that trust models may be useful but do not exhibit transitivity in Twitter.
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Traian Rebedea
- PolyCAFe is an automatic assessment tool that analyzes collaborative chat conversations and provides feedback without requiring tutor time.
- It uses natural language processing, social network analysis, and information retrieval techniques to evaluate utterances, measure collaboration, and generate feedback.
- An experiment found that tutors could prepare feedback 35% faster with PolyCAFe and that students found its feedback useful, accurate, and helpful for their learning.
How Anonymous Can Someone be on Twitter?George Sam
The document describes a study that aimed to identify Twitter users based on their writing styles in tweets. The researchers collected over 3 million tweets from 100 users and extracted features such as word counts and frequencies. They were able to correctly identify the author of tweets 67% of the time using tf-idf vectors and the extracted stylistic features.
Themes identification techniques in qualitative researchGhulam Qambar
The document discusses qualitative data analysis techniques for identifying themes in text. It describes codes, categories, and themes, with themes representing major elements or concepts in the analyzed text. Several techniques for identifying themes are outlined, including analyzing word repetitions, comparing and contrasting text, examining linguistic features like metaphors and transitions, and physically manipulating text through cutting and sorting. The selection of appropriate techniques depends on the type of data, required skills and labor, and desired number/types of themes to ensure validity and reliability of the analysis.
The document discusses test-driven development (TDD). TDD involves writing tests before writing code to specify desired functionality, then writing just enough code to pass each test. This process of red-green-refactor evolves the system design through tests. TDD results in decoupled, well-designed code and provides benefits like confidence, documentation, and safe refactoring. The document also provides tips for writing better tests, such as keeping tests small, expressive, isolated, and automated.
The document discusses various qualitative research methods including focus groups, depth interviews, and projective techniques. It provides descriptions of these methods, how they differ, their advantages and disadvantages. Focus groups involve group discussion and interaction, depth interviews probe individuals in-depth, and projective techniques seek to uncover subconscious motivations through indirect questioning techniques.
The document discusses various qualitative research methods including focus groups, depth interviews, and projective techniques. It provides descriptions of these methods, how they differ, their advantages and disadvantages. Focus groups involve group discussion and interaction, depth interviews probe individuals in-depth, and projective techniques seek to uncover subconscious motivations through indirect questioning techniques.
Building a multi headed model thats capable of detecting different types of toxicity like threats, obscenity, insult and identity based hate. Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to efficiently facilitate conversations, leading many communities to limit or completely shut down user comments. So far we have a range of publicly available models served through the perspective APIs, including toxicity. But the current models still make errors, and they dont allow users to select which type of toxicity theyre interested in finding. Pallam Ravi | Hari Narayana Batta | Greeshma S | Shaik Yaseen ""Toxic Comment Classification"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23464.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/23464/toxic-comment-classification/pallam-ravi
Assignment 1: Discussion—Developing Trust
Communicating ethically to employees as well as other internal and external stakeholders is of the utmost importance for leaders and managers. Being able to build and maintain trust in order to be credible is essential.
Using the module readings, the online library resources, and the Internet, research ways of developing trust.
Respond to the following:
Explain how leaders and managers can overcome obstacles and develop trust in communicating corporate ethics.
By Wednesday
, July 1, 2015
, post your response to the appropriate
Discussion Area
. Through
Wednesday, July 8, 2015
, review and comment on at least two peers’ responses.
Write your initial response in 300–500 words. Your response should be thorough and address all components of the discussion question in detail, include citations of all sources, where needed, according to the APA Style, and demonstrate accurate spelling, grammar, and punctuation
Do the following when responding to your peers:
Read your peers’ answers.
Provide substantive comments by
contributing new, relevant information from course readings, Web sites, or other sources;
building on the remarks or questions of others; or
sharing practical examples of key concepts from your professional or personal experiences
Respond to feedback on your posting and provide feedback to other students on their ideas.
Make sure your writing
is clear, concise, and organized;
demonstrates ethical scholarship in accurate representation and attribution of sources; and
displays accurate spelling, grammar, and punctuation.
Grading Criteria
Assignment Components
Max Points
Initial response was:
Insightful, original, accurate, and timely.
Substantive and demonstrated advanced understanding of concepts.
Compiled/synthesized theories and concepts drawn from a variety of sources to support statements and conclusions.
16
Discussion Response and Participation:
Responded to a minimum of two peers in a timely manner.
Offered points of view supported by research.
Asked challenging questions that promoted discussion.
Drew relationships between one or more points in the discussion.
16
Writing:
Wrote in a clear, concise, formal, and organized manner.
Responses were error free.
Information from sources, where applicable, was paraphrased appropriately and accurately cited.
8
Total:
40
.
Learning Analytics for Online Discussions: A Pedagogical Model for Intervent...alywise
This document summarizes Alyssa Friend Wise's pedagogical model for using learning analytics to intervene in online discussions. The model focuses on capturing meaningful traces of student activity through metrics of "speaking" and "listening". It presents analytics to students in an extracted table format and embedded in the discussion interface. The model supports interpretation through a framework that emphasizes integration with learning activities, agency in interpretation, reflection, and dialogue between students and instructors. An initial implementation in a doctoral seminar found students engaged productively with the model.
Need help with this assignmentPreliminary research is attached w.docxgibbonshay
Need help with this assignment
Preliminary research is attached with sources. Also a sample is attached to give you an idea of how it should be formatted.
Evaluate a Source
ASSIGNMENT: For this essay, you will select one of the sources you have found through your preliminary research about your research topic. Which source you choose is up to you; however, it should be substantial enough that you will be able to talk about it at length, and intricate enough that it will keep you (and your reader) interested.
In order to foster learning and growth, all essays you submit must be newly written specifically for this course. Any recycled work will be sent back with a 0, and you will be given one attempt to redo the Touchstone.
The introduction of this paper will involve introducing the source: Provide the author, the title, and the context (where you found the source, where it was originally published, who sponsored it, etc.)
You will then go on to evaluate the source on two levels:
1. Credibility: Using the information in this unit as a guide, evaluate the source’s authenticity and reliability. Look at all the information that you can find about the source to establish the author’s (or sponsor’s) trustworthiness.
2. Usefulness: Using a combination of summary and analysis, examine the source on a critical level. Determine what the source’s purpose (thesis) is, and how it arrives at that goal. Examine its value to you and the project you are working on. How will it help you prove your own points? How might it come in handy to back up a claim (or address a counter-claim)?
Finally, you will include a conclusion which shows your final assessments on both counts.
Sample Touchstone
A. Assignment Guidelines
DIRECTIONS: Refer to the list below throughout the writing process. Do not submit your Touchstone until it meets these guidelines.
1. Source Identification
The introduction of this paper will be introducing the source:
❒ Have you provided the author's name?
❒ Have you provided the source title?
❒ Have you provided the context (where you found the source, where it was originally published, who sponsored it, etc.)?
2. Source Evaluation
❒ Have you provided a judgment on the source's credibility?
❒ Have you used specific examples from the source to illustrate your judgment on credibility?
❒ Have you provided a judgment on the source's usefulness?
❒ Have you used specific examples from the source to illustrate your judgment on usefulness?
3. Reflection
❒ Have you answered all reflection questions thoughtfully and included insights, observations, and/or examples in all responses?
❒ Are your answers included on a separate page below the main assignment?
B. Reflection Questions
DIRECTIONS: Below your assignment, include answers to all of the following reflection questions.
1. What types of questions did you ask yourself when evaluating the credibility and usefulness of your source? (2-3 sentences)
2. How do you feel this ev.
This document presents a system for detecting semantically similar questions in online forums like Quora to reduce duplicate content. It proposes using natural language processing techniques like tagging questions with keywords, vectorizing text with Google News vectors, and calculating similarity with Word Mover's Distance. The system cleans and preprocesses questions before generating tags and calculating similarity between questions to identify duplicates. An evaluation of the system achieved accurate detection of matching and non-matching question pairs.
The document provides an overview of qualitative research design and methods, including focus groups, depth interviews, and projective techniques. It discusses the characteristics, advantages, and disadvantages of each method. Focus groups generate ideas through group dynamics but don't allow for in-depth probing of individuals. Depth interviews enable deeper probing of individuals but lack group synergy. Projective techniques can uncover subconscious motivations but require skilled interpreters and carry risks of bias. Qualitative research aims to understand perspectives rather than quantify results.
Touchstone 2.1 Evaluate a SourceASSIGNMENT For this essay, y.docxnovabroom
Touchstone 2.1: Evaluate a Source
ASSIGNMENT:
For this essay, you will select one of the sources you have found through your preliminary research about your research topic. Which source you choose is up to you; however, it should be substantial enough that you will be able to talk about it at length, and intricate enough that it will keep you (and your reader) interested.
The introduction of this paper will involve introducing the source: Provide the author, the title, and the context (where you found the source, where it was originally published, who sponsored it, etc.)
You will then go on to evaluate the source on two levels:
Credibility:
Using the information in this unit as a guide, evaluate the source’s authenticity and reliability. Look at all the information that you can find about the source to establish the author’s (or sponsor’s) trustworthiness.
Usefulness:
Using a combination of summary and analysis, examine the source on a critical level. Determine what the source’s purpose (thesis) is, and how it arrives at that goal. Examine its value to you and the project you are working on. How will it help you prove your own points? How might it come in handy to back up a claim (or address a counter-claim)?
Finally, you will include a conclusion which shows your final assessments on both counts.
Sample Touchstone
A. Assignment Guidelines
DIRECTIONS:
Refer to the list below throughout the writing process. Do not submit your Touchstone until it meets these guidelines.
1. Source Identification
The introduction of this paper will be introducing the source:
❒ Have you provided the author's name?❒ Have you provided the source title?❒ Have you provided the context (where you found the source, where it was originally published, who sponsored it, etc.)?
2. Source Evaluation
❒ Have you provided a judgment on the source's credibility?❒ Have you used specific examples from the source to illustrate your judgment on credibility?❒ Have you provided a judgment on the source's usefulness?❒ Have you used specific examples from the source to illustrate your judgment on usefulness?
3. Reflection
❒ Have you answered all reflection questions thoughtfully and included insights, observations, and/or examples in all responses?
❒ Are your answers included on a separate page below the main assignment?
B. Reflection Questions
DIRECTIONS:
Below your assignment, include answers to all of the following reflection questions.
What types of questions did you ask yourself when evaluating the credibility and usefulness of your source? (2-3 sentences)
How do you feel this evaluation practice will help you as you continue to move through the research process? (2-3 sentences)
C. Rubric
Advanced (90-100%)Proficient (80-89%)Acceptable (70-79%)Needs Improvement (50-69%)Non-Performance (0-49%)
Thesis Statement
Provide a clear thesis statement with sufficient support.The thesis statement consists of an original observation that is clear, focused, ...
Similar to Searching for Quality Microblog Posts: Filtering and Ranking based on Content Analysis and Implicit Links (20)
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Pushing the limits of ePRTC: 100ns holdover for 100 days
Searching for Quality Microblog Posts: Filtering and Ranking based on Content Analysis and Implicit Links
1. Searching for Quality Microblog Posts:
Filtering and Ranking based on Content
Analysis and Implicit Links
Jan Vosecky, Kenneth Wai-Ting Leung, and Wilfred Ng
Department of Computer Science and Engineering
HKUST
Hong Kong
DASFAA‟12
2. Introduction Method Features Experiments Conclusions
Agenda
2
Introduction
Proposed method
Quality features of tweets
Experiments
Conclusions
3. Introduction Method Features Experiments Conclusions
3 Introduction
4. Introduction Method Features Experiments Conclusions
Microblogs
4
mentioned user timestamp
user
Tweet 1
Tweet 2
hashtag
URL link
Both social network and social media
Linksbetween users (follow, mention, re-tweet)
Users post updates (tweets)
5. Introduction Method Features Experiments Conclusions
Searching for “ipad” on Twitter
5
Around 50 tweets
mentioning “iPad”
posted within a
1-minute period
6. Introduction Method Features Experiments Conclusions
Research challenge
6
Twitter: user-generated content
Short messages, often comments or opinions
High volume
Varying quality
“Most tweets are not of general interest (57%)” (Alonso et
al.’10)
Information overload
Research questions:
How to distinguish content worth reading from
useless or less important messages?
How to promote „high quality‟ content?
10. Introduction Method Features Experiments Conclusions
Research goals
10
Quality-based tweet filtering
Filtering out low-quality tweets
In twitter feeds
In search results
Quality-based tweet ranking
Re-ranking Twitter search results
For a given time period
11. Introduction Method Features Experiments Conclusions
11 Proposed Method
12. Introduction Method Features Experiments Conclusions
Representation of tweets
12
Vector-space model: not sufficient
Short tweet length, terms often malformed
Ignores special features in Twitter
Feature-vector representation
Extract features from tweet
Traditional features: e.g. length, spelling
Twitter-specific features:
Exploiting hashtags, URL links, mentioned usernames
13. Introduction Method Features Experiments Conclusions
13 Quality Features of Tweets
14. Introduction Method Features Experiments Conclusions
Feature categories
14
1. Punctuation and Spelling 2. Syntactic and semantic
complexity
Number of exclamation marks Max. & Avg. word length
Number of question marks Length of tweet
Max. no. of repeated letters Percentage of stopwords
% of correctly spelled words Contains numbers
No. of capitalized words Contains a measure
Max. no. of consecutive capitalized Contains emoticons
words Uniqueness score
3. Grammaticality 4. Link-based
Has first-person part-of-speech Contains link
Formality score Is reply-tweet
Number of proper names Is re-tweet
Max. no. of consecutive No. of mentions of users
proper names Number of hashtags
Number of named entities URL domain reputation score
RT source reputation score
Hashtag reputation score
5. Timestamp
15. Introduction Method Features Experiments Conclusions
1. Punctuation and spelling
15
Excessive punctuation
Number of exclamation marks
Number of question marks
Max. number of consecutive dots
Capitalization
Presence of all-capitalized words
Largest number of consecutive words in capital letters
Spellchecking
Number of correctly spelled words
Percentage of words found in a dictionary
RT @_ChocolateCoco: WHO IS CHUCK NORRIS??!!!??
lls. He's only the greatest guy next to jesus lmao
16. Introduction Method Features Experiments Conclusions
2. Syntactic and semantic
16
complexity
Syntactic complexity
Tweet length
Max. & avg. word length
Percentage of stopwords
Presence of emoticons and other sentiment indicators
Presence of measure symbols ($, %)
Numbers – number of digits
Tweet uniqueness
Uniqueness of the tweet relative to other tweets by the author
where
17. Introduction Method Features Experiments Conclusions
3. Grammaticality
17
Parts-of-speech labelling
Presence of first person parts-of-speech
Formality score [Heylighen‟02]
F = (noun frequency + adjective freq. + preposition freq.+ article freq.
− pronoun freq. − verb freq. − adverb freq. − interjection freq. + 100)/2
Names
Number of „proper names‟ as words with a single initial capital
letter
Number of consecutive „proper names‟
Number of Named entities
F. Heylighen and J.-M. Dewaele. Variation in the contextuality of language: An empirical measure.
Context in Context. Special issue Foundations of Science, 7(3):293–340, 2002.
18. Introduction Method Features Experiments Conclusions
4. Link-based features
18
Links to other items
Re-tweet(RT), reply tweet, mention of other users
Presence of a URL link
Number of hashtags as indicated by the “#” sign
Link target‟s quality reputation
metrics to reflect the quality of tweets which relate
to a
URL domain
Hashtag
a user
19. Introduction Method Features Experiments Conclusions
URL domain reputation
19
Observation:
Tweets which link to news articles usually better quality than
tweets which link to photo sharing websites
Q=1 Q=5
Tweet 1 Tweet 4
Tweetpic.co NYtimes.co
Q=3
m m
Q=4
Q=2
Tweet 2 Tweet 5 Q=5
Tweet 3 Tweet 6
Questions:
What does the quality of tweets linking to a website say about its
quality?
Can we predict quality of future tweets linking to that website?
20. Introduction Method Features Experiments Conclusions
URL domain reputation
20
Step 1: URL translation
Short link to original link
bit.ly/e2jt9F http://www.reuters.com/4151120
Step 2: summarize tweets linking to a URL
domain
Accumulate “quality reputation” over time
21. Introduction Method Features Experiments Conclusions
URL domain reputation
21
Average URL domain quality
Td = set of tweets linking to domain d
qt = quality label of tweet t
Weakness:
Does not reflect the number of inlink tweets in the score
Favours domains with few inlink tweets
22. Introduction Method Features Experiments Conclusions
URL domain reputation
22
Domain reputation score
where AvgQ(d) is between [-1, +1]
“Collecting evidence” behaviour:
Score getting higher with more good quality inlink tweets
4.00
-1
2.00
-0,5
DRS 0.00 0 AvgQ
1 10 100 1000 0,5
-2.00
1
-4.00
|Td|
24. Introduction Method Features Experiments Conclusions
Reputation of hashtag & user
24
Q=1 Q=5
Tweet 1 Tweet 4
#justforfun #DASFAA
Q=3 Q=4
Q=2
Tweet 2 Tweet 5 Q=5
Tweet 3 Tweet 6
Hashtag reputation #DASFAA vs. #justforfun
Re-tweet source user reputation @barackobama vs.
@wysz22212
25. Introduction Method Features Experiments Conclusions
25 Experiments
26. Introduction Method Features Experiments Conclusions
Dataset
27
10,000 tweets
100 users, 100 recent tweets per user
Users:
50 random users
50 influential users
Selected from listorious.com
5 categories: technology, business, politics,
celebrities, activism
10 users per category
27. Introduction Method Features Experiments Conclusions
Labelling
28
Crowdsourcing
Amazon Mechanical Turk
3 labels per tweet from different reviewers
Possible labels: 1 to 5
1 = low quality, 5 = high quality
Random order of tweets
28. Introduction Method Features Experiments Conclusions
Labelling results
29
Tweet quality distribution
Quality score:
29. Introduction Method Features Experiments Conclusions
Feature analysis
30
Total 29 features
Top 5 features based on Information Gain:
0.374 Domain reputation
0.287 Contains link
0.130 Formality score
0.127 Num. proper names
0.113 Max. proper names
30. Introduction Method Features Experiments Conclusions
Feature selection
31
Greedy attribute selection
15 selected features:
Domain reputation RT source reputation
Formality Tweet uniqueness
No. named entities % correct. spelled words
Max. no. repeat. Letters No. hash-tags
Contains numbers No. capitalized words
Is reply-tweet Is re-tweet
Avg. word length Contains first-person
No. exclam. Marks
31. Introduction Method Features Experiments Conclusions
Classification and Ranking
32
Method
Classification:
SVM, binary classification (high-quality, low-
quality)
50/50 split for training/testing
Ranking:
Learning-to-rank (Rank SVM)
30 queries from 5 topic categories
Process:
1. Retrieve tweets matching a query
2. Extract features from the tweets
3. „Query-tweet vector‟ pairs + quality scores of the
32. Introduction Method Features Experiments Conclusions
Classification results
33
#attribute High-Quality Low-Quality Overall
Features s P R P R AUC
Link only 1 0.798 0.702 0.894 0.934 0.818
TF-IDF 3322 0.862 0.665 0.885 0.96 0.813
Subset.Reputation 3 0.812 0.746 0.909 0.936 0.841
Subset.SVM (“greedy”) 15 0.715 0.758 0.912 0.936 0.847
All quality features 29 0.815 0.66 0.882 0.944 0.802
All quality ftr‟s + TF- 3351 0.739 0.775 0.915 0.899 0.837
IDF
Optimal feature set (15 attrs.) outperforms TF-IDF (3322 attrs.)
Link-based “reputation” features (3 attrs.) achieve the 2nd best result
Combining quality features + TF-IDF does not improve result
33. Introduction Method Features Experiments Conclusions
Classification results
34
#attribute
Features s AUC
Link only 1 0.818
TF-IDF 3322 0.813
Subset.Reputation 3 0.841
Subset.SVM 15 0.847
(“greedy”) Storage cost
All quality features 29 0.802
All quality ftr‟s + TF- 3351 0.837
IDF
Optimal feature set achieves
reduced training time and storage
cost
Training time
34. Introduction Method Features Experiments Conclusions
Ranking results
35
where
NDCG@N
Features #attributes 1 2 5 10 MAP
Link only 1 0.067 0.111 0.22 0.324 0.398
Subset.Reputation 3 0.822 0.777 0.777 0.764 0.661
Subset.SVM (“greedy”) 15 0.867 0.767 0.778 0.769 0.653
All quality features 29 0.733 0.733 0.763 0.753 0.637
Optimal feature set (15 attrs.) achieves the best result
Link-based “reputation” features (3 attrs.) achieve the 2nd best result
35. Introduction Method Features Experiments Conclusions
36 Conclusions
36. Introduction Method Features Experiments Conclusions
Summary
37
Method for quality-based classification and
ranking of tweets
Proposed and evaluated a set of tweet‟s
features to capture the tweet‟s quality
Link-based features lead to the best
performance
37. Introduction Method Features Experiments Conclusions
Future work
38
Consider different types of queries in Twitter
E.g. searching for hot topics, movie reviews,
facts, opinions, etc.
Different features may be important in different
scenarios
Incorporating recent hot topics
Personalized re-ranking
38. Introduction Method Features Experiments Conclusions
Q/A
39
39. Introduction Method Features Experiments Conclusions
Thank You
40
40. Related work
41
Spam detection
Bag-of-words, keyword-based
Feature-based approaches
Combinations
Social networks
Finding quality answers in Q-A systems
E.g. Yahoo Answers
Feature-based
Web search
Quality-based ranking of web documents
Feature-based quality score (WSDM‟11)
41. ROC Curve
42
Area under the ROC curve: probability that a classifier
will rank a randomly chosen positive instance higher
than a randomly chosen negative one