This document discusses user-generated content on social media and the challenges and opportunities it presents. It focuses on analyzing content at the micro-level by examining named entities, topics, intentions, and word usage. It presents approaches for identifying cultural entities, determining user intentions, and measuring the complexity of extracting entities from different contexts. Analyzing user intentions through identifying patterns surrounding named entities can help recognize information seeking, sharing, and transactional intents with applications such as targeted online advertising.
Video of the talk: https://www.youtube.com/watch?v=7k-u_TUew3o
Abstract: Social media has experienced immense growth in recent times. These platforms are becoming increasingly common for information seeking and consumption, and as part of its growing popularity, information overload pose a significant challenge to users. For instance, Twitter alone generates around 500 million tweets per day and it is impractical for users to have to parse through such an enormous stream to find information that are interesting to them. This situation necessitates efficient personalized filtering mechanisms for users to consume relevant, interesting information from social media.
Building a personalized filtering system involves understanding users interests and utilizing these interests to deliver relevant information to users. These tasks primarily include analyzing and processing social media text which is challenging due to its shortness in length, and the real-time nature of the medium. The challenges include: (1) Lack of semantic context: Social Media posts are on an average short in length, which provides limited semantic context to perform textual analysis. This is particularly detrimental for topic identification which is a necessary task for mining users interests; (2) Dynamically changing vocabulary: Most social media websites such as Twitter and Facebook generate posts that are of current (timely) interests to the users. Due to this real-time nature, information relevant to dynamic topics of interest evolve reflecting the changes in the real world. This in turn changes the vocabulary associated with these dynamic topics of interest making it harder to filter relevant information; (3) Scalability: The number of users on social media platforms are significantly large, which is difficult for centralized systems to scale to deliver relevant information to users. This dissertation is devoted to exploring semantic techniques and Semantic Web technologies to address the above mentioned challenges in building a personalized information filtering system for social media. Particularly, the necessary semantics (knowledge) is derived from crowd sourced knowledge bases such as Wikipedia to improve context for understanding short-text and dynamic topics on social media.
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Kalpa Gunaratna's Ph.D. dissertation defense: April 19 2017
The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the relationships and classes are defined. Today, there exist large knowledge graphs in the research community (e.g., encyclopedic datasets like DBpedia and Yago) and corporate world (e.g., Google knowledge graph) that encapsulate a large amount of knowledge for human and machine consumption. Typically, they consist of millions of entities and billions of facts describing these entities. While it is good to have this much knowledge available on the Web for consumption, it leads to information overload, and hence proper summarization (and presentation) techniques need to be explored.
In this dissertation, we focus on creating both comprehensive and concise entity summaries at: (i) the single entity level and (ii) the multiple entity level. To summarize a single entity, we propose a novel approach called FACeted Entity Summarization (FACES) that considers importance, which is computed by combining popularity and uniqueness, and diversity of facts getting selected for the summary. We first conceptually group facts using semantic expansion and hierarchical incremental clustering techniques and form facets (i.e., groupings) that go beyond syntactic similarity. Then we rank both the facts and facets using Information Retrieval (IR) ranking techniques to pick the highest ranked facts from these facets for the summary. The important and unique contribution of this approach is that because of its generation of facets, it adds diversity into entity summaries, making them comprehensive. For creating multiple entity summaries, we simultaneously process facts belonging to the given entities using combinatorial optimization techniques. In this process, we maximize diversity and importance of facts within each entity summary and relatedness of facts between the entity summaries. The proposed approach uniquely combines semantic expansion, graph-based relatedness, and combinatorial optimization techniques to generate relatedness-based multi-entity summaries.
Complementing the entity summarization approaches, we introduce a novel approach using light Natural Language Processing (NLP) techniques to enrich knowledge graphs by adding type semantics to literals.
Video of the talk: https://www.youtube.com/watch?v=7k-u_TUew3o
Abstract: Social media has experienced immense growth in recent times. These platforms are becoming increasingly common for information seeking and consumption, and as part of its growing popularity, information overload pose a significant challenge to users. For instance, Twitter alone generates around 500 million tweets per day and it is impractical for users to have to parse through such an enormous stream to find information that are interesting to them. This situation necessitates efficient personalized filtering mechanisms for users to consume relevant, interesting information from social media.
Building a personalized filtering system involves understanding users interests and utilizing these interests to deliver relevant information to users. These tasks primarily include analyzing and processing social media text which is challenging due to its shortness in length, and the real-time nature of the medium. The challenges include: (1) Lack of semantic context: Social Media posts are on an average short in length, which provides limited semantic context to perform textual analysis. This is particularly detrimental for topic identification which is a necessary task for mining users interests; (2) Dynamically changing vocabulary: Most social media websites such as Twitter and Facebook generate posts that are of current (timely) interests to the users. Due to this real-time nature, information relevant to dynamic topics of interest evolve reflecting the changes in the real world. This in turn changes the vocabulary associated with these dynamic topics of interest making it harder to filter relevant information; (3) Scalability: The number of users on social media platforms are significantly large, which is difficult for centralized systems to scale to deliver relevant information to users. This dissertation is devoted to exploring semantic techniques and Semantic Web technologies to address the above mentioned challenges in building a personalized information filtering system for social media. Particularly, the necessary semantics (knowledge) is derived from crowd sourced knowledge bases such as Wikipedia to improve context for understanding short-text and dynamic topics on social media.
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Kalpa Gunaratna's Ph.D. dissertation defense: April 19 2017
The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the relationships and classes are defined. Today, there exist large knowledge graphs in the research community (e.g., encyclopedic datasets like DBpedia and Yago) and corporate world (e.g., Google knowledge graph) that encapsulate a large amount of knowledge for human and machine consumption. Typically, they consist of millions of entities and billions of facts describing these entities. While it is good to have this much knowledge available on the Web for consumption, it leads to information overload, and hence proper summarization (and presentation) techniques need to be explored.
In this dissertation, we focus on creating both comprehensive and concise entity summaries at: (i) the single entity level and (ii) the multiple entity level. To summarize a single entity, we propose a novel approach called FACeted Entity Summarization (FACES) that considers importance, which is computed by combining popularity and uniqueness, and diversity of facts getting selected for the summary. We first conceptually group facts using semantic expansion and hierarchical incremental clustering techniques and form facets (i.e., groupings) that go beyond syntactic similarity. Then we rank both the facts and facets using Information Retrieval (IR) ranking techniques to pick the highest ranked facts from these facets for the summary. The important and unique contribution of this approach is that because of its generation of facets, it adds diversity into entity summaries, making them comprehensive. For creating multiple entity summaries, we simultaneously process facts belonging to the given entities using combinatorial optimization techniques. In this process, we maximize diversity and importance of facts within each entity summary and relatedness of facts between the entity summaries. The proposed approach uniquely combines semantic expansion, graph-based relatedness, and combinatorial optimization techniques to generate relatedness-based multi-entity summaries.
Complementing the entity summarization approaches, we introduce a novel approach using light Natural Language Processing (NLP) techniques to enrich knowledge graphs by adding type semantics to literals.
Slides of presentation given to Organization of Information Resources class at University of Washington iSchool Saturday, Februray 10, 2007 by Michael Braly and Geoff Froh
2010-03-10 PARC Augmented Social Cognition Research OverviewEd Chi
This is an overview of the 3-year research works done at the Augmented Social Cognition research group at PARC.
See blog at:
http://asc-parc.blogspot.com
Breaking Down Walls in Enterprise with Social SemanticsJohn Breslin
Keynote Talk at the Workshop on New Trends in Service Oriented Architecture for massive Knowledge processing in Modern Enterprise (SOA-KME 2012) / Palermo, Italy / 6th July 2012
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...Sebastian Dennerlein
Introduction: Scaling Informal Workplace Learning
System Design: Designing a flexible framework for informal workplace learning
Theoretical Underpinning
Design Principles
System Implementation: SOA for a Hybrid Knowledge Representation
Software Architecture
Services
Applications: B&P, KnowBrain & Bookmarker/ Attacher
Conclusion on the Support of Informal Learning
Future Work: Next Steps & What else can be achieve by the SSS?
Linked Data and Semantic Technologies can support a next generation of science. This talk shows examples of discovery, access, integration, analysis, and shows directions towards prediction and vision.
LSS'11: Charting Collections Of Connections In Social MediaLocal Social Summit
Keynote Title: Charting Collections of Connections in Social Media: Creating Maps and Measures with NodeXL
Abstract: Networks are a data structure common found across all social media services that allow populations to author collections of connections. The Social Media Research Foundation‘s NodeXL project makes analysis of social media networks accessible to most users of the Excel spreadsheet application. With NodeXL, Networks become as easy to create as pie charts. Applying the tool to a range of social media networks has already revealed the variations present in online social spaces. A review of the tool and images of Twitter, flickr, YouTube, and email networks will be presented.
Slides for talk at ConTech 2011 the International Symposium on Convergence Technology (ConTech 2011) – Smart & Humane World – on November 3rd in Seoul, South Korea.
Date: 2011 November 3 (Thurs)
Place: COEX Grand Ballroom, Seoul, Korea
Organized by Advanced Institutes of Convergence Technologies (AICT), Seoul National University (SNU)
In Cooperation with Ministry of Knowledge Economy, Ministry of Education, Science and Technology, National Research Foundation of Korea, Graduate School of Convergence Science and Technology (GSCST)
This talk introduces Linked Data and Semantic Web by using two examples - population sciences grid and semantAqua - a semantically enabled environmental monitoring. It shows a few tools and the semantic methodology and opens discussion for LOD and team science
Annual LIANZA / SLANZA Weekend School, held Nelson, New Zealand on 28 April, 2007. This keynote presentation explores the Web 2.0 world, and the 'possibilities' for libraries in a digitally networked world.
The Semantic Web and Libraries in the United States: Experimentation and Achi...New York University
This presentation reflects the paper titled "The Semantic Web and Libraries in the United States: Experimentation and Achievements," published in the proceedings of 75th IFLA General Conference and Assembly, Satellite Meeting: Emerging Trends in Technology: Libraries between Web 2.0, Semantic Web and Search Technology 8/19-20/2009, in Florence, Italy, presented by Sharon Yang, Rider University, Yanyi Lee, Wagner College, and Amanda Xu, St. John's University. Here is the URL to the full paper: http://www.ifla2009satelliteflorence.it/meeting3/program/assets/SharonYang.pdf
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Slides of presentation given to Organization of Information Resources class at University of Washington iSchool Saturday, Februray 10, 2007 by Michael Braly and Geoff Froh
2010-03-10 PARC Augmented Social Cognition Research OverviewEd Chi
This is an overview of the 3-year research works done at the Augmented Social Cognition research group at PARC.
See blog at:
http://asc-parc.blogspot.com
Breaking Down Walls in Enterprise with Social SemanticsJohn Breslin
Keynote Talk at the Workshop on New Trends in Service Oriented Architecture for massive Knowledge processing in Modern Enterprise (SOA-KME 2012) / Palermo, Italy / 6th July 2012
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...Sebastian Dennerlein
Introduction: Scaling Informal Workplace Learning
System Design: Designing a flexible framework for informal workplace learning
Theoretical Underpinning
Design Principles
System Implementation: SOA for a Hybrid Knowledge Representation
Software Architecture
Services
Applications: B&P, KnowBrain & Bookmarker/ Attacher
Conclusion on the Support of Informal Learning
Future Work: Next Steps & What else can be achieve by the SSS?
Linked Data and Semantic Technologies can support a next generation of science. This talk shows examples of discovery, access, integration, analysis, and shows directions towards prediction and vision.
LSS'11: Charting Collections Of Connections In Social MediaLocal Social Summit
Keynote Title: Charting Collections of Connections in Social Media: Creating Maps and Measures with NodeXL
Abstract: Networks are a data structure common found across all social media services that allow populations to author collections of connections. The Social Media Research Foundation‘s NodeXL project makes analysis of social media networks accessible to most users of the Excel spreadsheet application. With NodeXL, Networks become as easy to create as pie charts. Applying the tool to a range of social media networks has already revealed the variations present in online social spaces. A review of the tool and images of Twitter, flickr, YouTube, and email networks will be presented.
Slides for talk at ConTech 2011 the International Symposium on Convergence Technology (ConTech 2011) – Smart & Humane World – on November 3rd in Seoul, South Korea.
Date: 2011 November 3 (Thurs)
Place: COEX Grand Ballroom, Seoul, Korea
Organized by Advanced Institutes of Convergence Technologies (AICT), Seoul National University (SNU)
In Cooperation with Ministry of Knowledge Economy, Ministry of Education, Science and Technology, National Research Foundation of Korea, Graduate School of Convergence Science and Technology (GSCST)
This talk introduces Linked Data and Semantic Web by using two examples - population sciences grid and semantAqua - a semantically enabled environmental monitoring. It shows a few tools and the semantic methodology and opens discussion for LOD and team science
Annual LIANZA / SLANZA Weekend School, held Nelson, New Zealand on 28 April, 2007. This keynote presentation explores the Web 2.0 world, and the 'possibilities' for libraries in a digitally networked world.
The Semantic Web and Libraries in the United States: Experimentation and Achi...New York University
This presentation reflects the paper titled "The Semantic Web and Libraries in the United States: Experimentation and Achievements," published in the proceedings of 75th IFLA General Conference and Assembly, Satellite Meeting: Emerging Trends in Technology: Libraries between Web 2.0, Semantic Web and Search Technology 8/19-20/2009, in Florence, Italy, presented by Sharon Yang, Rider University, Yanyi Lee, Wagner College, and Amanda Xu, St. John's University. Here is the URL to the full paper: http://www.ifla2009satelliteflorence.it/meeting3/program/assets/SharonYang.pdf
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Description - Ajith defended his thesis on application and data portability in cloud
computing. More details on Ajith's research and publications can be
found at http://knoesis.wright.edu/researchers/ajith/
Video can be found at : http://www.youtube.com/watch?v=oDBeBIIFmHc&list=UUORqXk1ZV44MOwpCorAROyQ&index=1&feature=plpp_video
Cory Henson defended his thesis on "A Semantics-based Approach to Machine Perception".
Video can be found at: http://www.youtube.com/watch?v=L8M7eoGKtSE
Video: https://www.youtube.com/watch?v=ZCToaDgxnAs
Abstract:
People's emotions can be gleaned from their text using machine learning techniques to build models that exploit large self-labeled emotion data from social media. Further, the self-labeled emotion data can be effectively adapted to train emotion classifiers in different target domains where training data are sparse.
Emotions are both prevalent in and essential to most aspects of our lives. They influence our decision-making, affect our social relationships and shape our daily behavior. With the rapid growth of emotion-rich textual content, such as microblog posts, blog posts, and forum discussions, there is a growing need to develop algorithms and techniques for identifying people's emotions expressed in text. It has valuable implications for the studies of suicide prevention, employee productivity, well-being of people, customer relationship management, etc. However, emotion identification is quite challenging partly due to the following reasons: i) It is a multi-class classification problem that usually involves at least six basic emotions. Text describing an event or situation that causes the emotion can be devoid of explicit emotion-bearing words, thus the distinction between different emotions can be very subtle, which makes it difficult to glean emotions purely by keywords. ii) Manual annotation of emotion data by human experts is very labor-intensive and error-prone. iii) Existing labeled emotion datasets are relatively small, which fails to provide a comprehensive coverage of emotion-triggering events and situations.
Vahid Taslimitehrani's Dissertation Defense: Friday, February 19 2015.
Ph.D. Committee: Drs. Guozhu Dong, Advisor, T.K. Prasad, Amit Sheth, Keke Chen
and Jyotishman Pathak, Division of Health Informatics, Weill Cornell Medical College, Cornell University.
ABSTRACT:
Regression and classification techniques play an essential role in many data mining tasks and have broad applications. However, most of the state-of-the-art regression and classification techniques are often unable to adequately model the interactions among predictor variables in highly heterogeneous datasets. New techniques that can effectively model such complex and heterogeneous structures are needed to significantly improve prediction accuracy.
In this dissertation, we propose a novel type of accurate and interpretable regression and classification models, named as Pattern Aided Regression (PXR) and Pattern Aided Classification (PXC) respectively. Both PXR and PXC rely on identifying regions in the data space where a given baseline model has large modeling errors, characterizing such regions using patterns, and learning specialized models for those regions. Each PXR/PXC model contains several pairs of contrast patterns and local models, where a local classifier is applied only to data instances matching its associated pattern. We also propose a class of classification and regression techniques called Contrast Pattern Aided Regression (CPXR) and Contrast Pattern Aided Classification (CPXC) to build accurate and interpretable PXR and PXC models.
We have conducted a set of comprehensive performance studies to evaluate the performance of CPXR and CPXC. The results show that CPXR and CPXC outperform state-of-the-art regression and classification algorithms, often by significant margins. The results also show that CPXR and CPXC are especially effective for heterogeneous and high dimensional datasets. Besides being new types of modeling, PXR and PXC models can also provide insights into data heterogeneity and diverse predictor-response relationships.
We have also adapted CPXC to handle classifying imbalanced datasets and introduced a new algorithm called Contrast Pattern Aided Classification for Imbalanced Datasets (CPXCim). In CPXCim, we applied a weighting method to boost minority instances as well as a new filtering method to prune patterns with imbalanced matching datasets.
Finally, we applied our techniques on three real applications, two in the healthcare domain and one in the soil mechanic domain. PXR and PXC models are significantly more accurate than other learning algorithms in those three applications.
Dissertation Defense:
" Mining and Analyzing Subjective Experiences in User Generated Content "
By Lu Chen
Tuesday, April 9, 2016
Dissertation Committee: Dr. Amit Sheth, Advisor, Dr. T. K. Prasad, Dr. Keke Chen, Dr. Ingmar Weber, and Dr. Justin Martineau,
Pictures: https://www.facebook.com/Kno.e.sis/photos/?tab=album&album_id=1225911137443732
Video: https://youtu.be/tzLEUB-hggQ
Lu's Home page: http://knoesis.wright.edu/researchers/luchen/
ABSTRACT
Web 2.0 and social media enable people to create, share and discover information instantly anywhere, anytime. A great amount of this information is subjective information -- the information about people's subjective experiences, ranging from feelings of what is happening in our daily lives to opinions on a wide variety of topics. Subjective information is useful to individuals, businesses, and government agencies to support decision making in areas such as product purchase, marketing strategy, and policy making. However, much useful subjective information is buried in ever-growing user generated data on social media platforms, it is still difficult to extract high quality subjective information and make full use of it with current technologies.
Current subjectivity and sentiment analysis research has largely focused on classifying the text polarity -- whether the expressed opinion regarding a specific topic in a given text is positive, negative, or neutral. This narrow definition does not take into account the other types of subjective information such as emotion, intent, and preference, which may prevent their exploitation from reaching its full potential. This dissertation extends the definition and introduces a unified framework for mining and analyzing diverse types of subjective information. We have identified four components of a subjective experience: an individual who holds it, a target that elicits it (e.g., a movie, or an event), a set of expressions that describe it (e.g., "excellent", "exciting"), and a classification or assessment that characterize it (e.g., positive vs. negative). Accordingly, this dissertation makes contributions in developing novel and general techniques for the tasks of identifying and extracting these components.
We first explore the task of extracting sentiment expressions from social media posts. We propose an optimization-based approach that extracts a diverse set of sentiment-bearing expressions, including formal and slang words/phrases, for a given target from an unlabeled corpus. Instead of associating the overall sentiment with a given text, this method assesses the more fine-grained target-dependent polarity of each sentiment expression. Unlike pattern-based approaches which often fail to capture the diversity of sentiment expressions due to the informal nature of language usage and writing style in social media posts, the proposed approach is capable of identifying sentiment phrase
Sujan Perera's Dissertation Defense: Friday, August 12, 2016
Ph.D. Committee: Drs. Amit Sheth, Advisor; T.K. Prasad, Michael Raymer, and Pablo Mendes (IBM Research)
Video: https://youtu.be/pbjJ1zb8ayY
ABSTRACT:
Natural language is a powerful tool developed by humans over hundreds of thousands of years. The extensive usage, flexibility of the language, creativity of the human beings, and social, cultural, and economic changes that have taken place in daily life have added new constructs, styles, and features to the language. One such feature of the language is its ability to express ideas, opinions, and facts in an implicit manner. This is a feature that is used extensively in day to day communications in situations such as: 1) expressing sarcasm, 2) when trying to recall forgotten things, 3) when required to convey descriptive information, 4) when emphasizing the features of an entity, and 5) when communicating a common understanding.
Consider the tweet 'New Sandra Bullock astronaut lost in space movie looks absolutely terrifying' and the text snippet extracted from a clinical narrative 'He is suffering from nausea and severe headaches. Dolasteron was prescribed.' The tweet has an implicit mention of the entity Gravity and the clinical text snippet has implicit mention of the relationship between medication Dolasteron and clinical condition nausea. Such implicit references of the entities and the relationships are common occurrences in daily communication and they add unique value to conversations. However, extracting implicit constructs has not received enough attention. This dissertation focuses on extracting implicit entities and relationships from clinical narratives and extracting implicit entities from Tweets.
This dissertation demonstrates manifestations of implicit constructs in text, studies their characteristics, and develops a solution that is capable of extracting implicit factual information from text. The developed solution starts by acquiring relevant knowledge to solve the implicit information extraction problem. The relevant knowledge includes domain knowledge, contextual knowledge, and linguistic knowledge. The acquired knowledge can take different syntactic forms such as a text snippet, structured knowledge represented in standard knowledge representation languages like Resource Description Framework (RDF) or custom formats. Hence, the acquired knowledge is processed to create models that can be understood by machines. Such models provide the infrastructure to perform implicit information extraction of interest.
This dissertation focuses on three different use cases of implicit information and demonstrates the applicability of the developed solution in these use cases. They are:
- implicit entity linking in clinical narratives,
- implicit entity linking in Twitter,
- implicit relationship extraction from clinical narratives.
Understanding users’ latent intents behind search queries is essential for satisfying a user’s search needs. Search intent mining can help search engines to enhance its ranking of search results, enabling new search features like instant answers, personalization, search result diversification, and the recommendation of more relevant ads. Consequently, there has been increasing attention on studying how to effectively mine search intents by analyzing search engine query logs. While state-of-the-art techniques can identify the domain of the queries (e.g. sports, movies, health), identifying domain-specific intent is still an open problem. Among all the topics available on the Internet, health is one of the most important in terms of impact on the user and it is one of the most frequently searched areas. This dissertation presents a knowledge-driven approach for domain-specific search intent mining with a focus on health-related search queries.
First, we identified 14 consumer-oriented health search intent classes based on inputs from focus group studies and based on analyses of popular health websites, literature surveys, and an empirical study of search queries. We defined the problem of classifying millions of health search queries into zero or more intent classes as a multi-label classification problem. Popular machine learning approaches for multi-label classification tasks (namely, problem transformation and algorithm adaptation methods) were not feasible due to the limitation of label data creations and health domain constraints. Another challenge in solving the search intent identification problem was mapping terms used by laymen to medical terms. To address these challenges, we developed a semantics-driven, rule-based search intent mining approach leveraging rich background knowledge encoded in Unified Medical Language System (UMLS) and a crowd sourced encyclopedia (Wikipedia). The approach can identify search intent in a disease-agnostic manner and has been evaluated on three major diseases.
While users often turn to search engines to learn about health conditions, a surprising amount of health information is also shared and consumed via social media, such as public social platforms like Twitter. Although Twitter is an excellent information source, the identification of informative tweets from the deluge of tweets is the major challenge. We used a hybrid approach consisting of supervised machine learning, rule-based classifiers, and biomedical domain knowledge to facilitate the retrieval of relevant and reliable health information shared on Twitter in real time. Furthermore, we extended our search intent mining algorithm to classify health-related tweets into health categories. Finally, we performed a large-scale study to compare health search intents and features that contribute in the expression of search intent from 100+ million search queries from smarts devices (smartphones/tablets) and personal computers (desktops/laptops)
There is a rapid intertwining of sensors and mobile devices into the fabric of our lives. This has resulted in unprecedented growth in the number of observations from the physical and social worlds reported in the cyber world. Sensing and computational components embedded in the physical world is termed as Cyber-Physical System (CPS). Current science of CPS is yet to effectively integrate citizen observations in CPS analysis. We demonstrate the role of citizen observations in CPS and propose a novel approach to perform a holistic analysis of machine and citizen sensor observations. Specifically, we demonstrate the complementary, corroborative, and timely aspects of citizen sensor observations compared to machine sensor observations in Physical-Cyber-Social (PCS) Systems.
Physical processes are inherently complex and embody uncertainties. They manifest as machine and citizen sensor observations in PCS Systems. We propose a generic framework to move from observations to decision-making and actions in PCS systems consisting of: (a) PCS event extraction, (b) PCS event understanding, and (c) PCS action recommendation. We demonstrate the role of Probabilistic Graphical Models (PGMs) as a unified framework to deal with uncertainty, complexity, and dynamism that help translate observations into actions. Data driven approaches alone are not guaranteed to be able to synthesize PGMs reflecting real-world dependencies accurately. To overcome this limitation, we propose to empower PGMs using the declarative domain knowledge. Specifically, we propose four techniques: (a) automatic creation of massive training data for Conditional Random Fields (CRFs) using domain knowledge of entities used in PCS event extraction, (b) Bayesian Network structure refinement using causal knowledge from Concept Net used in PCS event understanding, (c) knowledge-driven piecewise linear approximation of nonlinear time series dynamics using Linear Dynamical Systems (LDS) used in PCS event understanding, and the (d) transforming knowledge of goals and actions into a Markov Decision Process (MDP) model used in PCS action recommendation.
We evaluate the benefits of the proposed techniques on real-world applications involving traffic analytics and Internet of Things (IoT).
Social media provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens share information, express opinions, and engage in discussions. Often such a Online Citizen Sensor Community (CSC) has stated or implied goals related to workflows of organizational actors with defined roles and responsibilities. For example, a community of crisis response volunteers, for informing the prioritization of responses for resource needs (e.g., medical) to assist the managers of crisis response organizations. However, in CSC, there are challenges related to information overload for organizational actors, including finding reliable information providers and finding the actionable information from citizens. This threatens awareness and articulation of workflows to enable cooperation between citizens and organizational actors. CSCs supported by Web 2.0 social media platforms offer new opportunities and pose new challenges. This work addresses issues of ambiguity in interpreting unconstrained natural language (e.g., ‘wanna help’ appearing in both types of messages for asking and offering help during crises), sparsity of user and group behaviors (e.g., expression of specific intent), and diversity of user demographics (e.g., medical or technical professional) for interpreting user-generated data of citizen sensors. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues in CSC, and allow better accessibility to user-generated data at higher level of information abstraction for organizational actors. This study presents a novel web information processing framework focused on actors and actions in cooperation, called Identify-Match-Engage (IME), which fuses top-down and bottom-up computing approaches to design a cooperative web information system between citizens and organizational actors. It includes a.) identification of action related seeking-offering intent behaviors from short, unstructured text documents using both declarative and statistical knowledge based classification model, b.) matching of intentions about seeking and offering, and c.) engagement models of users and groups in CSC to prioritize whom to engage, by modeling context with social theories using features of users, their generated content, and their dynamic network connections in the user interaction networks. The results show an improvement in modeling efficiency from the fusion of top-down knowledge-driven and bottom-up data-driven approaches than from conventional bottom-up approaches alone for modeling intent and engagement. Several applications of this work include use of the engagement interface tool during recent crises to enable efficient citizen engagement for spreading critical information of prioritized needs to ensure donation of only required supplies by the citizens. The engagement interface application also won the United Nations ICT agency ITU's Young Innovator 2014 award.
Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ...
While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. ..
This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research.
Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer,
Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs)
Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/)
D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation)
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013
D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%)
D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010
Krishnaprasad Thirunarayan, Trust Management: Multimodal Data Perspective,
Invited Tutorial, The 2015 International Conference on Collaboration
Technologies and Systems (CTS 2015), June 2015
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersAmit Sheth
Abstract
Kno.e.sis (http://knoesis.org) is a world-class research center that uses semantic, cognitive, and perceptual computing for gathering insights from physical/IoT, cyber/Web, and social and enterprise (e.g., clinical) big data. We innovate and employ semantic web, machine learning, NLP/IR, data mining, network science and highly scalable computing techniques. Our highly interdisciplinary research impacts health and clinical applications, biomedical and translational research, epidemiology, cognitive science, social good, policy, development, etc. A majority of our $12+ million in active funds come from the NSF and NIH. In this talk, I will provide an overview of some of our major research projects.
Kno.e.sis is highly successful in its primary mission of exceptional student outcomes: our students have exceptional publication and real-world impact and our PhDs compete with their counterparts from top 10 schools for initial jobs in research universities, top industry research labs, and highly competitive companies. A key reason for Kno.e.sis' success is its unique work culture involving teamwork to solve complex problems. Practically all our work involves real-world challenges, real-world data, interdisciplinary collaborators, path-breaking research to solve challenges, real-world deployments, real-world use, and measurable real-world impact.
In this talk, I will also seek to discuss our choice of research topics and our unique ecosystem that prepares our students for exceptional careers.
This tutorial presents tools and techniques for effectively utilizing the Internet of Things (IoT) for building advanced applications, including the Physical-Cyber-Social (PCS) systems. The issues and challenges related to IoT, semantic data modelling, annotation, knowledge representation (e.g. modelling for constrained environments, complexity issues and time/location dependency of data), integration, analy- sis, and reasoning will be discussed. The tutorial will de- scribe recent developments on creating annotation models and semantic description frameworks for IoT data (e.g. such as W3C Semantic Sensor Network ontology). A review of enabling technologies and common scenarios for IoT applications from the data and knowledge engineering point of view will be discussed. Information processing, reasoning, and knowledge extraction, along with existing solutions re- lated to these topics will be presented. The tutorial summarizes state-of-the-art research and developments on PCS systems, IoT related ontology development, linked data, do- main knowledge integration and management, querying large- scale IoT data, and AI applications for automated knowledge extraction from real world data.
Related: Semantic Sensor Web: http://knoesis.org/projects/ssw
Physical-Cyber-Social Computing: http://wiki.knoesis.org/index.php/PCS
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
Amit Sheth's keynote at IEEE BigData 2014, Oct 29, 2014.
Abstract from:
http://cci.drexel.edu/bigdata/bigdata2014/keynotespeech.htm
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is personalized digital health that related to taking better decisions about our health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (e.g., information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!
In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, “How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city.
Academic visibility online presentation 13 october 2011Laura Czerniewicz
A presentation for academics at the University of Cape Town on issues of online presence and visibility, risks, and how to take control of one's digital footprint.
In order to improve personal and organizational knowledge, people have to take time to make sense of the information torrent. If not, it remains merely information. Unfortunately, many of today’s knowledge workers don’t have the time, discipline or the essential skills to select, filter, evaluate and comprehend their multifarious information sources. This can lead to missed opportunities, poor decision-making and suboptimal performance. The 21st century knowledge worker needs to be confident and comfortable with using social technologies and engaging with communities and social learning networks to update his or her knowledge in order to remain relevant. This session explores some of the tools, skills and processes that can help with information sense-making, and looks at the emergent roles of the Community Manager and Digital Curator in delivering value to learning networks.
Digital Connectedness: Taking Ownership of Your Professional Online Presence Sue Beckingham
Developing pathways to connectedness essentially commences with family and friends, but over time new connections outside of these circles begin to form ever increasing and interlinking circles. These informal and formal networks have the potential to help you unlock new doors to new opportunities. Social media can without doubt provide excellent communication channels and a space to develop your network of connections. Nonetheless as your online presence expands it leaves behind both digital footprints and digital shadows; and this needs to be given due consideration. This keynote will look at the value of developing a professional online presence and why as future graduates you need to take ownership of this.
http://www.yorksj.ac.uk/ltd/ltd/student-engagement/undergraduate-research-confere.aspx
Slides accompanying the VLDB 2010 Journal paper -
Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social Intelligence in a Real-Time Dashboard System to appear in a special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media"
Monetizing User Activity on Social Networks - Challenges and Experiences, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
User-Generated Content on Social Media
1. User-Generated Content
on Social Media
Challenges, Opportunities
Meena Nagarajan, KNO.E.SIS, Wright State
meena@knoesis.org, http://knoesis.wright.edu/researchers/meena/
1
Tuesday, October 27, 2009 1
2. The Shift.. in the rules of the game
• Online Media: Packaged Goods Media
to a Conversational Media
• Variety of networked
interactions, many in near
real-time
• Information economy: from dearth of
signals to plenty much!
2 http://gregverdino.typepad.com/greg_verdinos_blog/images/2007/07/09/web2_logos.jpg
Tuesday, October 27, 2009 2
3. Social Media Investigations
• Network: Social structure
emerges from the aggregate of
relationships (ties)
• People: poster identities, the !"
!"
active effort of accomplishing
interaction
• Content : studying the content of
communication."Who says what, to
whom, why, to what extent and with
what effect?" [Laswell]
3
Tuesday, October 27, 2009 3
4. Effects of Networked Publics
• Certain social phenomenon admittedly
more complex
• begs for a people-content-network confluence
• Micro-level variations of Content-
people-network on macro-level features
• “How do the topic of discussion, emotional charge
of a conversation, poster characteristics & network
connections affect ....?”
4
Tuesday, October 27, 2009 4
5. People-Content-Network - Possibilities
• Emerging social order in online
conversations
• How are the people-content-network
dynamics shaping online
conversations?
• Can we understand the Influentials
theory, information diffusion properties
in networks (etc.) while taking people
and content into account?
5
Tuesday, October 27, 2009 5
6. Stand on the shoulder of micro-giants
• The point is that we need a strong grasp
on the micro-level variables of the
content, people and network
dimensions to begin explaining what
they are doing to any social
phenomenon...
• My focus is on the micro-level variables
in the content dimension.
6
Tuesday, October 27, 2009 6
8. Dimensions Of Analysis
• Named Entity Identification
WHAT and Disambiguation
• Cultural Named Entities
• Music artist, track named
• What are the Named Entities
entities (IBM) [ISWC09a,
and topics that people are
VLDB09], Movie named
making references to?
entities (MSR) [WWW2010]
• How are they interpreting any
• Summaries of user
situation in local contexts and
perceptions behind real-time
supporting them in their
events from Twitter
variable observations?
8
Tuesday, October 27, 2009 8
9. Dimensions Of Analysis
WHAT
• What are the Named Entities
www.evri.com and topics that people are
http://memetracker.org/ making references to?
• How are they interpreting any
situation in local contexts and
supporting them in their
variable observations?
9
Tuesday, October 27, 2009 9
10. Dimensions Of Analysis
WHY
• What are the diverse intentions
that produce the diverse content
on social media?
• Why we share by looking at what
we predominantly do with the
medium. Value derived,
repurposing..
• Emotion, sentiment expressions..
10
Tuesday, October 27, 2009 10
11. Dimensions Of Analysis
• Mapping User Intentions WHY
• Information Seeking, Sharing,
Transactional intents [WI09]
• What is the intention landscape of
social media
• where is the monetization potential
11
Tuesday, October 27, 2009 11
12. Dimensions Of Analysis
• Self-presentation in Online
HOW Dating Profiles
(with Prof. Marti Hearst, UC
Berkeley) [ICWSM09]
• What do word usages tell us
about an active population,
or about the medium?
• Dynamics of a conversation -
snubs, flaming words,
coordination.. or lack
thereof!
12
Tuesday, October 27, 2009 12
13. Dimensions Of Analysis
HOW
• What do word usages tell us
about an active population?
• Self-presentation
• Dynamics of a conversation -
snubs, flaming words,
coordination.. or lack thereof!
http://wordwatchers.wordpress.com/2008/10/06/language-in-speeches-and-interviews-summary-comparisons/
13
Tuesday, October 27, 2009 13
14. The Social Media Content
Landscape..
14
Tuesday, October 27, 2009 14
15. +,'((*-./&0)*
!"#$%$#&'()* 9"$:/.,*7'.83*-./&0)*
;353./83"3/&)**
1'.$'2(3*4/"5.$2&6/"*7'.83*-./&0)* 1'.$'2(3*4/"5.$2&6/"**
7'.83*-./&0)*
Population, Medium Diversity
Some mediation
Rate of exchange (asynchronous, synchronous)
Many-to-many reach
Shared Contexts
Slangs, abbreviations, grammar, spelling, media-specific vocabulary
Interpersonal interactions
15
!"#$%%&&&'()*+,'*-.%#!-/-0%123,45--/%676879:8;8%0)<40%-%==
Tuesday, October 27, 2009 15
16. Variety & Formality
Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun
freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 *
TYPE OF DATA FORMALITY
Nat Broadcast Reportage 62.2
Informational writing 61
Academic Social Science 60.6
Writing 58
Professional Letters 57.5
Non Acad Social Science 56.9
Broadcasts 55
Blog corpus 53.3
Scripted Speech 53 TYPE OF DATA FORMALITY
Email Corpus 50.8 Prepared speeches 50
Personal Letters 49.7
Imaginative writing 47
Fiction Prose 46.3
Interviews 46
Unscripted Speeches 44.4
Spontaneous speech 44
Conversations 38
Phone Conversations 36
* Heylighen, F. & Dewaele, J. Variation in the contextuality of language: An empirical measure Foundations of Science, 2002, 293-340
16
Weblogs, Genres and Individual Differences: How bloggers write for who they write for; Scott Nowson
Tuesday, October 27, 2009 16
17. Variety & Formality
Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun
freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 *
TYPE OF DATA FORMALITY
Nat Broadcast Reportage 62.2 TYPE OF DATA FORMALITY
Informational writing 61 Broadcasts 55
Academic Social Science 60.6 Blog corpus 53.3
Writing 58 Scripted Speech 53
Professional Letters 57.5 Email Corpus 50.8
Non Acad Social Science 56.9 Critic Music reviews from
50.13
www.metacritic.com/music/
Broadcasts 55
Yahoo Personals AboutMe 50.10
Blog corpus 53.3
MySpace About Me 50.07
Scripted Speech 53 TYPE OF DATA FORMALITY
MySpace - comments on Artist Pages 50.06
Email Corpus 50.8 Prepared speeches 50
Prepared speeches 50
Personal Letters 49.7
Personal Letters 49.7
Imaginative writing 47
Twitter 49.46
Fiction Prose 46.3
Facebook posts 48.20
Interviews 46
Imaginative writing 47
Unscripted Speeches 44.4
Fiction Prose 46.3
Spontaneous speech 44
Conversations 38
Phone Conversations 36
* Heylighen, F. & Dewaele, J. Variation in the contextuality of language: An empirical measure Foundations of Science, 2002, 293-340
16
Weblogs, Genres and Individual Differences: How bloggers write for who they write for; Scott Nowson
Tuesday, October 27, 2009 16
18. Making up for lack of context..
• Supplement what the data is showing
you with what you already know..
• Statistical NLP + Contextual
Knowledge
• Ontologies, Taxonomies, Dictionaries,
social medium, shared spatio-
temporal contexts..
17
Tuesday, October 27, 2009 17
20. Cultural NER
WHAT
19
Tuesday, October 27, 2009 19
21. Cultural NER
WHAT
It was THE HANGOVER of the
year..lasted forever.. so I went
to the movies..bad choice
picking “GI Jane” worse now
19
Tuesday, October 27, 2009 19
22. Cultural NER
WHAT
It was THE HANGOVER of the
year..lasted forever.. so I went
to the movies..bad choice
picking “GI Jane” worse now
LOVED UR MUSIC YESTERDAY!
19
Tuesday, October 27, 2009 19
23. Cultural NER
WHAT
I decided to check out the Wanted demo today
even though I really did not like the movie
It was THE HANGOVER of the minus Mrs Jolie a.k.a Fox of course!
year..lasted forever.. so I went
to the movies..bad choice
picking “GI Jane” worse now
LOVED UR MUSIC YESTERDAY!
19
Tuesday, October 27, 2009 19
24. Cultural NER
WHAT
I decided to check out the Wanted demo today
even though I really did not like the movie
It was THE HANGOVER of the minus Mrs Jolie a.k.a Fox of course!
year..lasted forever.. so I went
to the movies..bad choice
picking “GI Jane” worse now
LOVED UR MUSIC YESTERDAY!
Obama the Dark Knight of
socialism.. the man is not as
impressive as Ledger yea
19
Tuesday, October 27, 2009 19
25. Intuitions..
• Spotting and Sense Identification
It was THE HANGOVER of the
year..lasted forever.. so I went to the
• Open vs. Closed world movies..bad choice picking “GI
Jane” worse now
• unlike person, location, named entities, contexts and
senses change fairly rapidly
• We assume an open-world wrt senses
• No comprehensive sense knowledge base
• Reduce it to a spotting and binary sense classification
problem
20
Tuesday, October 27, 2009 20
26. Two flavors..
• Artist and tracks spotting in MySpace
music forums
• using the MusicBrainz Taxonomy
• with Daniel Gruhl, Jan Pieper, Christine Robson, IBM
Almaden, Amit Sheth, Knoesis [ISWC09a]
• on Thursday Oct 29, Session: Discovering Semantics
• Movie names from Weblogs
• with Amir Padovitz, Social Streams MSR, [WWW2010]
21
Tuesday, October 27, 2009 21
27. Cultural NER in Weblogs
• Goal: Supplement classifiers with
information that will help them
disambiguate the reference of a term better!
• A Complexity of Extraction measure
associated with an entity in target sense in
a corpus
• with all cues equal, systems that are ‘complexity
aware’ will treat cues differently
22
Tuesday, October 27, 2009 22
28. Measure of Extraction Complexity
• Feature extraction: Graph-based
spreading activation and Extracted
Complexity
clustering (general weblogs)
Time Travellerʼs Wife
Angels and Demons
• entity sense definition from ..
Wikipedia + evidence a corpus The Hangover
..
presents for the target sense of the Star Trek
..
entity Wanted
Up
Twilight
• Ranked list speaks for itself ...
• More varied senses and contexts,
implies higher extraction complexity
23
Tuesday, October 27, 2009 23
29. Feature as a Prior
Decision Tree and Boosting Classifiers
X axis: precision
1500+ hand-labeled data points Y axis: recall
Blue: basic features
Red: with Entropy baseline
Green: with our Complexity of Extraction feature
24
Tuesday, October 27, 2009 24
30. As a Prior in Binary Classification
Average F-measure over 1000 decision tree, boosting models
Average Accuracy over 1000 decision tree, boosting models
1500+ hand-labeled data points
Blue: basic features
Red: with Entropy baseline
Green: with our Complexity of
25 Extraction feature
Tuesday, October 27, 2009 25
31. To chew on..
• The concept of ‘Extraction Complexity’
as an additional prior is very promising
• applies to general NER
26
Tuesday, October 27, 2009 26
32. User Intention Mapping
WHY
• Unlike Web search intent, entity alone is
in-sufficient to characterize intent here..
• Three broad intentions: information
seeking, sharing, transactional,
combinations thereof.
• ‘i am thinking of getting X’ (transactional)
‘i like my new X’ (information sharing)
‘what do you think about X’ (information seeking)
27
Tuesday, October 27, 2009 27
33. Action Patterns
• Resorted to ‘action patterns’ surrounding named
entities
• “where can i find a psp cam..”
• A minimally supervised bootstrapping
algorithm
• 10 seed action patterns, learn new ones from unannotated corpus,
relying on a empirical and semantic similarity with seed patterns
• semantic similarity from communicative functions of words
Linguistic Inquiry Word Count (www.LIWC.net)
28
Tuesday, October 27, 2009 28
34. Information Seeking, Transactional
• Patterns learned using 8000 uncategorized posts
on MySpace forums
Sample learned patterns
does anyone know how
know where i can
was wondering if someone
Im not sure how
someone tell me how
• Intent recognition recall using pre-classified user
posts from Facebook Marketplace (to buy): 81%
29
Tuesday, October 27, 2009 29
35. Impact on Online Advertising?
• Generate ads from user profile
(interests, hobbies) or from posts with
monetizable intents?
30
Tuesday, October 27, 2009 30
36. Targeted Content Delivery Platform
• Of all the ads generated using profile (hobbies,
interests) information, 7% received attention
• Ads generated using authored, monetizable
posts, 59% received attention What
More at [WI09], Beyond Search and Internet Economics Workshop, Why
MSR, Redmond, WA
http://research.microsoft.com/en-us/um/redmond/about/collaboration/awards/beyondsearchawards.aspx
31
Tuesday, October 27, 2009 31
37. Self-Presentation
HOW
• On Online-dating profiles [ICWSM09] (with Prof.
Marti Hearst, UCB)
• quantifying usages of words from linguistic,
personal and psychological categories in LIWC
• Exploratory Factor Analysis to identify
systematic co-occurrence patterns among LIWC
variables
• grouping user profiles on the basis of their
shared multi-dimensional features to compare
and contrast self-presentation
32
Tuesday, October 27, 2009 32
38. Imitate to Impress !?
• More similarities than differences
• Men displaying a higher usage of tentative
words (maybe, perhaps..)
• typically attributed to feminine discourse
• Many similarities in word combinations and
words used!
• Perhaps, self-expression tends towards
attempting homophily in online dating..
33
Tuesday, October 27, 2009 33
40. BBC SoundIndex, IBM
“A pioneering project to tap into the online buzz surrounding
What
artists and songs, by leveraging several popular online sources” Why
De-spam, slang transliterations, entity identification, voting theory When,
to combine multi-modal online data sources [ICSC08a,VLDB09] Where,
http://www.almaden.ibm.com/cs/projects/iis/sound/ Who
35
Tuesday, October 27, 2009 35
41. Twitris: Kno.e.sis
Real-time user perceptions as the fulcrum for
browsing the Web [ISWC09b] What
When,
Where
36
Tuesday, October 27, 2009 36
42. Iran elections: Discussions in the US and Iran on the same day
The mystery of Soylent Green: information where you can use it
37
Tuesday, October 27, 2009 37
43. every day; legalize is no more the
th lk tio n o
cr o ro
op eo s ab n f
illegal immigrants news for Obama!
ta lec io
m es cy t
de pr a u
E pin
in the healthcare captured October
on si ,
n
ra ,
st on
O
tio
context on 12.
September 18.
Find resources related to
social perceptions
Twitris: Twitter through The fourth estate
perspective
space,time,theme
News and
Wikipedia articles
to put extracted
descriptors in
context Integrate user observations with
news on a particular day;
Correlate citizen journalism with
the fourth estate; On September
18, Obama was talking about
Illegal immigrants in the context of
health care;
semantics for thematic aggregation Little statistics from Tiwtris (unit: tweets)
a tweet
ck and checkl new events on twitris #twitris" Healthcare ( Aug 19 - Oct 20) : 721 K (US Only)
of a tweet; # (hashtags) user generated meta; @- refer to
Obama (Oct 8 - 20): 312 K (US Only)
es (Twitter, news services, Wikipedia, and other Web
Come see & play with Twitris @ the
H1N1 (Oct 5 - 20) : 232 K (US Only)
Iran Election (June 5 - Oct 20) : 2.8 m (Worldwide)
Concept Cloud,
News and related
International Semantic Web Challenge
`
twi
tris
inte
articles 140 rnals
at ISWC ’09
cha in le
Twitris rac s
Parallel crawling to scale
ters s tha
Context Data processing pipeline to streamline n
+ Selected Twitter, geocode services, data analytics,
Term to handle heterogeneity
Live resource aggregation
Near real time: Processing upto a day
ogle DBpedia before
widget widget Spatio-temporally weighted text analytics
Data Processing
TFIDF Spatio, Temporal, Extracting
Twitris DB based Thematic storylines Cavetas and Future work
descriptor descriptor around
extraction extraction descriptors 1. Handle Twitter constructs such as hashtags,
retweets, mentions and replies better
2. Different viz widgets such as time series to
http://twitris.knoesis.org
show changing perceptions from a place for an
Data Collection event and demographic based visualizations.
S S
h
Geocode Lookup
. h
Data Dumper
.
3. Sentiment analysis
a .
.
a .
. 4. Robust computing approaches (Cloud, Hadoop)
r . r .
e
Geocode Lookup
.
e Data Dumper
.
5. FB Connect for sharing and personalization
d . d .
. .
. .
M Geocode Lookup M Data Dumper
e e
m m
o o
r r
y y
knoesis.org
A tetris like approach to twitter to gather
Twtitris with everyone
aggregated social signals is defined as
38
Tuesday, October 27, 2009 38
45. References
http://knoesis.wright.edu/researchers/meena/pubs.php
[WISE09] Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh
Jadhav, Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth
International Conference on Web Information Systems Engineering, Oct 5-7, 2009.
[ISWC09a] Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Context and Domain
Knowledge Enhanced Entity Spotting in Informal Text, The 8th International Semantic Web Conference, 2009.
[ISWC09b] Twitris, Submission to the International Semantic Web Challenge, collocated with the International
Semantic Web Conference 2009.
[VLDB09] Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social
Intelligence in a Realtime Dashboard System, Pending Review, VLDB Journal, Special Issue on "Data Management
and Mining on Social Networks and Social Media", 2009.
[WWW2010] Meenakshi Nagarajan, Amir Padovitz, A Measure of Extraction Complexity: a Novel Prior for Improving
Recognition of Cultural Entities, Manuscript in Preparation, for The Nineteenth International World Wide Web
Conference, 2010.
[ICSC08a] Alfredo Alba, Varun Bhagwan, Julia Grace, Daniel Gruhl, Kevin Haas, Meenakshi Nagarajan, Jan Pieper,
Christine Robson, Nachiketa Sahoo. Applications of Voting Theory to Information Mashups, Second IEEE
International Conference on Semantic Computing, ICSC 2008.
[ICSC08b] Meenakshi Nagarajan, Cartic Ramakrishnan, Amit Sheth, “Text Analytics for Semantic Computing - the
good, the bad and the ugly”, Second IEEE International Conference on Semantic Computing Santa Clara, CA, USA,
2008.
[WI09] Meenakshi Nagarajan, Kamal Baid, Amit P. Sheth, and Shaojun Wang, Monetizing User Activity on Social
Networks - Challenges and Experiences, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep
15-18 2009.
[ICWSM09] Meenakshi Nagarajan, Marti A. Hearst. An Examination of Language Use in Online Dating Personals, 3rd
Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2009
[IC09] Amit Sheth, Meenakshi Nagarajan. Semantics-Empowered Social Computing IEEE Internet Computing 13(1),
2009.
40
Tuesday, October 27, 2009 40