This paper sets out to explore whether data about the us-
age of hashtags on Twitter contains information about their
semantics. Towards that end, we perform initial statisti-
cal hypothesis tests to quantify the association between us-
age patterns and semantics of hashtags. To assess the util-
ity of pragmatic features { which describe how a hashtag
is used over time { for semantic analysis of hashtags, we
conduct various hashtag stream classication experiments
and compare their utility with the utility of lexical features.
Our results indicate that pragmatic features indeed contain
valuable information for classifying hashtags into semantic
categories. Although pragmatic features do not outperform
lexical features in our experiments, we argue that pragmatic
features are important and relevant for settings in which tex-
tual information might be sparse or absent (e.g., in social
video streams).
This document summarizes research posters being presented at a computer science and electrical engineering department research review. It describes 8 posters presented by BS, MS, and PhD students. The posters cover topics such as identifying political affiliations in blogs, statistically weighted visualization hierarchies, voter verifiable optical-scan voting, predictive caching in mobile networks, generating statistical volume models, predicting appropriate semantic web terms, approximating online social network community structure, and utilizing semantic policies for managing BGP route dissemination.
UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product review mining – Review Classification – Tracking sentiments towards topics over time
The document describes an entity linking system with three main modules: 1) Entity linking to map entity mentions in text to entities in a knowledge base, which is challenging due to name variations and ambiguity. 2) A knowledge base containing entities. 3) Candidate entity ranking to rank potential entities for a mention using evidence like supervised ranking methods. The system aims to address issues in entity linking like predicting unlinkable mentions.
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
The document discusses entity linking, which is the task of linking entity mentions in text to corresponding entries in a knowledge base. It presents challenges like name variations and ambiguity. The paper then surveys main approaches to entity linking and discusses applications like knowledge base population and question answering. It also covers evaluation of entity linking systems and directions for future work.
Slides for the class, From Pattern Matching to Knowledge Discovery Using Text Mining and Visualization Techniques, presented June 13, 2010, at the Special Libraries Association 2010 annual meeting.
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
This document summarizes an approach to distributed link prediction in large graphs using Apache Spark. It discusses using machine learning techniques like locality sensitive hashing to predict links between nodes in a graph based on document similarity metrics and other structural features. The approach is tested on a graph of 27,770 academic papers linked by 352,857 citations. Both supervised and unsupervised machine learning methods are explored, including treating it as a binary classification problem and using locality sensitive hashing and MinHashLSH through Apache Spark to efficiently handle the large data volumes. The results suggest this distributed approach can accurately predict new links in large graphs.
Social Media Posts On Platforms Such As Twitter Or Instagram Use Hashtags,Which Are Author-Created
Labels Representing Topics Or Themes, Toassist In Categorization Of Posts And Searches For Posts Of
Interest. The Structural Analysis Of Hashtags Is Necessary As Precursor To Understandingtheir Meanings.
This Paper Describes Our Work On Segmenting Nondelimited Strings Of Hashtag-Type English Text. We
Adapt And Extend Methods Used Mostly In Non-Eng
This document summarizes research posters being presented at a computer science and electrical engineering department research review. It describes 8 posters presented by BS, MS, and PhD students. The posters cover topics such as identifying political affiliations in blogs, statistically weighted visualization hierarchies, voter verifiable optical-scan voting, predictive caching in mobile networks, generating statistical volume models, predicting appropriate semantic web terms, approximating online social network community structure, and utilizing semantic policies for managing BGP route dissemination.
UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product review mining – Review Classification – Tracking sentiments towards topics over time
The document describes an entity linking system with three main modules: 1) Entity linking to map entity mentions in text to entities in a knowledge base, which is challenging due to name variations and ambiguity. 2) A knowledge base containing entities. 3) Candidate entity ranking to rank potential entities for a mention using evidence like supervised ranking methods. The system aims to address issues in entity linking like predicting unlinkable mentions.
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
The document discusses entity linking, which is the task of linking entity mentions in text to corresponding entries in a knowledge base. It presents challenges like name variations and ambiguity. The paper then surveys main approaches to entity linking and discusses applications like knowledge base population and question answering. It also covers evaluation of entity linking systems and directions for future work.
Slides for the class, From Pattern Matching to Knowledge Discovery Using Text Mining and Visualization Techniques, presented June 13, 2010, at the Special Libraries Association 2010 annual meeting.
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
This document summarizes an approach to distributed link prediction in large graphs using Apache Spark. It discusses using machine learning techniques like locality sensitive hashing to predict links between nodes in a graph based on document similarity metrics and other structural features. The approach is tested on a graph of 27,770 academic papers linked by 352,857 citations. Both supervised and unsupervised machine learning methods are explored, including treating it as a binary classification problem and using locality sensitive hashing and MinHashLSH through Apache Spark to efficiently handle the large data volumes. The results suggest this distributed approach can accurately predict new links in large graphs.
Social Media Posts On Platforms Such As Twitter Or Instagram Use Hashtags,Which Are Author-Created
Labels Representing Topics Or Themes, Toassist In Categorization Of Posts And Searches For Posts Of
Interest. The Structural Analysis Of Hashtags Is Necessary As Precursor To Understandingtheir Meanings.
This Paper Describes Our Work On Segmenting Nondelimited Strings Of Hashtag-Type English Text. We
Adapt And Extend Methods Used Mostly In Non-Eng
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
This document discusses social data mining. It begins by defining data, information, and knowledge. It then defines data mining as extracting useful unknown information from large datasets. Social data mining is defined as systematically analyzing valuable information from social media, which is vast, noisy, distributed, unstructured, and dynamic. Common social media platforms are described. Graph mining and text mining are discussed as important techniques for social data mining. The generic social data mining process of data collection, modeling, and various mining methods is outlined. OAuth 2.0 authorization is also summarized as an important process for applications to access each other's data.
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionLuca Nannini
We all have mental representations of events: physical or cognitive processes - yet how we create, retain and negotiate them as frames is a matter of how well we do perform causal reasoning of the dynamics involved. But how we could keep track of them when it comes to interaction? how people do argue about causal explanations where a multitude of actors and variables are interplaying? If just quantifying these features is a compelling yet jeopardizing investigation - keeping track of how they do evolve in time and we could retrieve and forecast information diffusion and contagion is where complex networks analysis meets cognitive semiotics, behavioral studies, pragmatic theories, and NLP methods.
A Survey Of Collaborative Filtering Techniquestengyue5i5j
This document provides a survey of collaborative filtering techniques. It begins with an introduction to collaborative filtering and its main challenges, such as data sparsity, scalability, and synonymy. It then describes three main categories of collaborative filtering techniques: memory-based, model-based, and hybrid approaches. Representative algorithms from each category are discussed and analyzed in terms of their predictive performance and ability to address collaborative filtering challenges. The document concludes with a discussion of evaluating collaborative filtering algorithms and commonly used datasets.
This document discusses a content-based recommendation system for blogs. It proposes an algorithm called BRank that ranks blogs based on social relationships between blogs. BRank modifies the PageRank algorithm to consider relationship scores between blogs, which are determined by factors like the type of relationship (e.g. comments), number of relationships, and blog quality scores. BRank computes popularity scores for blogs within a community by applying a random walk approach across the social network of blogs using the relationship scores to determine navigation probabilities. The goal is to surface the most popular blogs within a topic or community based on social connections.
The document discusses the emergence of the semantic web, which aims to make data on the web more interconnected and machine-readable. It describes Tim Berners-Lee's vision of a "Giant Global Graph" that connects all web documents based on what they are about rather than just linking documents. This would allow user data and profiles to be seamlessly shared across different sites without having to re-enter the same information. The semantic web uses standards like RDF, RDFS and OWL to represent relationships between data in a graph structure and enable automated reasoning. Several companies are working to build applications that take advantage of this interconnected semantic data.
Top k query processing and malicious node identification based on node groupi...redpel dot com
Top k query processing and malicious node identification based on node grouping in manets
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do
not contain the keywords but contain information related to the query are not retrieved. Spreading
activation is an algorithm for finding latent information in a query by exploiting relations between nodes in
an associative network or semantic network. However, the classical spreading activation algorithm uses all
relations of a node in the network that will add unsuitable information into the query. In this paper, we
propose a novel approach for semantic text search, called query-oriented-constrained spreading activation
that only uses relations relating to the content of the query to find really related information. Experiments
on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8%
respectively better than the syntactic search and the search using the classical constrained spreading
activation.
Traffic analysis is a process of great importance, when it comes in securing a network. This analysis can be classified in
different levels and one of most interest is Deep Packet Inspection (DPI). DPI is a very effective way of monitoring the network,
since it performs traffic control over mostly of the OSI model’s layers (from L3 to L7). Regular Expressions (RegExp) on the
other hand is used in computer science and can make use of a group of characters, in order to create a searching pattern. This
technique can be combined with a series of mathematical algorithms for helping the individual to quickly find out the search
pattern within a text and even replace it with another value.
In this paper, we aim to prove that the use of Regular Expressions is much more productive and effective when used for
creating matching rules needed in DPI. We design, test and put into comparison Regular Expression rules and compare it
against the conventional methods. In addition to the above, we have created a case study of detecting EternalBlue and
DoublePulsar threats, in order to point out the practical and realistic value of our proposal.
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
This document presents a probability model for detecting evolving topics from social media streams. It focuses on the social aspects of posts reflected in user mentioning behavior, rather than textual content. The model captures the number of mentions per post and frequency of mentioned users. It analyzes individual posting anomalies and aggregates anomaly scores. SDNML change point analysis and burst detection are then used to identify evolving topics from the aggregated scores. Experimental results show that link-based detection using this approach performs better than key-based detection using textual content alone. The model overcomes limitations of prior frequency-based approaches and can detect topics from both textual and non-textual social media posts.
Discovering emerging topics in social streams via link anomaly detectionFinalyear Projects
This document proposes a method to detect emerging topics in social media streams by analyzing anomalies in how users mention and link to each other, rather than analyzing textual content. It presents a probability model to capture normal user mentioning behavior and detect anomalies. Anomaly scores from many users are aggregated and analyzed with change point detection to identify when new topics emerge. The method is tested on real Twitter data and shown to detect emerging topics as early or earlier than text-based methods, especially when textual keywords are ambiguous.
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...IJDKP
As existing computer search engines struggle to understand the meaning of natural language, semantically
enriched metadata may improve interest-based search engine capabilities and user satisfaction.
This paper presents an enhanced version of the ecosystem focusing on semantic topic metadata detection
and enrichments. It is based on a previous paper, a semantic metadata enrichment software ecosystem
(SMESE). Through text analysis approaches for topic detection and metadata enrichments this paper
propose an algorithm to enhance search engines capabilities and consequently help users finding content
according to their interests. It presents the design, implementation and evaluation of SATD (Scalable
Annotation-based Topic Detection) model and algorithm using metadata from the web, linked open data,
concordance rules, and bibliographic record authorities. It includes a prototype of a semantic engine using
keyword extraction, classification and concept extraction that allows generating semantic topics by text,
and multimedia document analysis using the proposed SATD model and algorithm.
The performance of the proposed ecosystem is evaluated using a number of prototype simulations by
comparing them to existing enriched metadata techniques (e.g., AlchemyAPI, DBpedia, Wikimeta, Bitext,
AIDA, TextRazor). It was noted that SATD algorithm supports more attributes than other algorithms. The
results show that the enhanced platform and its algorithm enable greater understanding of documents
related to user interests.
Multiple Regression to Analyse Social Graph of Brand AwarenessTELKOMNIKA JOURNAL
Social Network Analysis (SNA) has become a common tool to conduct social and business
research. SNA can be used to measure how well a marketing campaign affect conversation in social
media. A good marketing campaign is expected to stimulate conversation between users in social media.
In this paper we use SNA metrics to understand the nature of network of top brand awareness products.
We analyses networks structure of social media conversation regarding cellular service provider and
smartphone brand in Indonesia that achieve top brand awareness in 2015. We use conversational
datasets acquired from Twitter. To get more understanding we also compare the result with network
structure of knowledge dissemination. We use multiple regression algorithm, a machine learning algorithm
that is extension of linear regression, to analyses network properties to get insight on the correlation of the
network structure and brand awareness' rank of a product. The result suggests how we should define
network properties in brand awareness context.
This document summarizes an algorithm for efficiently clustering short messages like tweets into general domains or topics. The algorithm breaks the clustering task into two stages: (1) batch clustering of user-annotated data using hashtags to create dense virtual documents, and (2) online clustering of new tweets using the centroids from stage 1. Experimental results show the algorithm outperforms other clustering methods and can accurately cluster large streams of sparse, short messages efficiently.
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...Nexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
This is the final report for Networks & Data Mining Techniques project focusing on mining social network to estimate public opinion about entities and associated keywords. This project mines Twitter for recent feeds and analyzes them to estimate sentiment score, discussed entity and describing keywords in each tweet. This data is then exploited to elicit overall sentiment associated with each entity. Entities and keywords extracted is also used to form an entity-keyword bigraph. This graph is further used to detect entity communities and keywords found within those communities. Presented implementation works in linear time.
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET Journal
This document discusses a proposed system for deep collaborative filtering with aspect information. The system aims to help web users efficiently locate relevant information on unfamiliar topics to increase their knowledge. It utilizes techniques like multi-keyword search, synonym matching, and ontology mapping to return relevant web links, images, and news articles to the user based on their search terms. The proposed system architecture includes an index structure to efficiently search and rank results based on similarity to the search query terms. The implementation and evaluation of the proposed system are also discussed.
Semantic domain ontologies are increasingly seen as the key for enabling
interoperability across heterogeneous systems and sensor-based applications.
The ontologies deployed in these systems and applications are developed by
restricted groups of domain experts and not by semantic web experts. Lately,
folksonomies are increasingly exploited in developing ontologies. The
“collective intelligence”, which emerge from collaborative tagging can be
seen as an alternative for the current effort at semantic web ontologies.
However, the uncontrolled nature of social tagging systems leads to many
kinds of noisy annotations, such as misspellings, imprecision and ambiguity.
Thus, the construction of formal ontologies from social tagging data remains
a real challenge. Most of researches have focused on how to discover
relatedness between tags rather than producing ontologies, much less domain
ontologies. This paper proposed an algorithm that utilises tags in social
tagging systems to automatically generate up-to-date specific-domain
ontologies. The evaluation of the algorithm, using a dataset extracted from
BibSonomy, demonstrated that the algorithm could effectively learn a
domain terminology, and identify more meaningful semantic information for
the domain terminology. Furthermore, the proposed algorithm introduced a
simple and effective method for disambiguating tags.
Semantic Grounding Strategies for Tagbased Recommender Systems dannyijwest
Recommender systems usually operate on similarities between recommended items or users. Tag based
recommender systems utilize similarities on tags. The tags are however mostly free user entered phrases.
Therefore, similarities computed without their semantic groundings might lead to less relevant
recommendations. In this paper, we study a semantic grounding used for tag similarity calculus. We show a
comprehensive analysis of semantic grounding given by 20 ontologies from different domains. The study
besides other things reveals that currently available OWL ontologies are very narrow and the percentage
of the similarity expansions is rather small. WordNet scores slightly better as it is broader but not much as
it does not support several semantic relationships. Furthermore, the study reveals that even with such
number of expansions, the recommendations change considerably.
This dissertation analyzes social media data and outlines approaches for understanding online communication and collaboration. It presents algorithms for detecting communities using structural and semantic properties. It analyzes blog subscription patterns and the microblogging phenomenon. Systems are developed for opinion retrieval from blogs and identifying influential users. The growth of social media and tagging behavior are also studied through analysis of tags and social graphs.
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGijaia
In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific
taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency
method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based
algorithm with a directed crawler. We exploit the multilingual open-content directory of the World Wide
Web, DMOZ1
to seed the crawl, and the domain name to direct the crawl. This domain corpus is then input
to our algorithm that can automatically induce taxonomies. The induced taxonomies provide hierarchical
semantic dimensions for the purposes of faceted browsing. As part of an ongoing personal semantics
project, we applied the resulting taxonomies to personal social media data (Twitter, Gmail, Facebook,
Instagram, Flickr) with an objective of enhancing an individual’s exploration of their personal information
through faceted searching. We also perform a comprehensive corpus based evaluation of the algorithms
based on many datasets drawn from the fields of medicine (diseases) and leisure (hobbies) and show that
the induced taxonomies are of high quality.
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
This document discusses social data mining. It begins by defining data, information, and knowledge. It then defines data mining as extracting useful unknown information from large datasets. Social data mining is defined as systematically analyzing valuable information from social media, which is vast, noisy, distributed, unstructured, and dynamic. Common social media platforms are described. Graph mining and text mining are discussed as important techniques for social data mining. The generic social data mining process of data collection, modeling, and various mining methods is outlined. OAuth 2.0 authorization is also summarized as an important process for applications to access each other's data.
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionLuca Nannini
We all have mental representations of events: physical or cognitive processes - yet how we create, retain and negotiate them as frames is a matter of how well we do perform causal reasoning of the dynamics involved. But how we could keep track of them when it comes to interaction? how people do argue about causal explanations where a multitude of actors and variables are interplaying? If just quantifying these features is a compelling yet jeopardizing investigation - keeping track of how they do evolve in time and we could retrieve and forecast information diffusion and contagion is where complex networks analysis meets cognitive semiotics, behavioral studies, pragmatic theories, and NLP methods.
A Survey Of Collaborative Filtering Techniquestengyue5i5j
This document provides a survey of collaborative filtering techniques. It begins with an introduction to collaborative filtering and its main challenges, such as data sparsity, scalability, and synonymy. It then describes three main categories of collaborative filtering techniques: memory-based, model-based, and hybrid approaches. Representative algorithms from each category are discussed and analyzed in terms of their predictive performance and ability to address collaborative filtering challenges. The document concludes with a discussion of evaluating collaborative filtering algorithms and commonly used datasets.
This document discusses a content-based recommendation system for blogs. It proposes an algorithm called BRank that ranks blogs based on social relationships between blogs. BRank modifies the PageRank algorithm to consider relationship scores between blogs, which are determined by factors like the type of relationship (e.g. comments), number of relationships, and blog quality scores. BRank computes popularity scores for blogs within a community by applying a random walk approach across the social network of blogs using the relationship scores to determine navigation probabilities. The goal is to surface the most popular blogs within a topic or community based on social connections.
The document discusses the emergence of the semantic web, which aims to make data on the web more interconnected and machine-readable. It describes Tim Berners-Lee's vision of a "Giant Global Graph" that connects all web documents based on what they are about rather than just linking documents. This would allow user data and profiles to be seamlessly shared across different sites without having to re-enter the same information. The semantic web uses standards like RDF, RDFS and OWL to represent relationships between data in a graph structure and enable automated reasoning. Several companies are working to build applications that take advantage of this interconnected semantic data.
Top k query processing and malicious node identification based on node groupi...redpel dot com
Top k query processing and malicious node identification based on node grouping in manets
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do
not contain the keywords but contain information related to the query are not retrieved. Spreading
activation is an algorithm for finding latent information in a query by exploiting relations between nodes in
an associative network or semantic network. However, the classical spreading activation algorithm uses all
relations of a node in the network that will add unsuitable information into the query. In this paper, we
propose a novel approach for semantic text search, called query-oriented-constrained spreading activation
that only uses relations relating to the content of the query to find really related information. Experiments
on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8%
respectively better than the syntactic search and the search using the classical constrained spreading
activation.
Traffic analysis is a process of great importance, when it comes in securing a network. This analysis can be classified in
different levels and one of most interest is Deep Packet Inspection (DPI). DPI is a very effective way of monitoring the network,
since it performs traffic control over mostly of the OSI model’s layers (from L3 to L7). Regular Expressions (RegExp) on the
other hand is used in computer science and can make use of a group of characters, in order to create a searching pattern. This
technique can be combined with a series of mathematical algorithms for helping the individual to quickly find out the search
pattern within a text and even replace it with another value.
In this paper, we aim to prove that the use of Regular Expressions is much more productive and effective when used for
creating matching rules needed in DPI. We design, test and put into comparison Regular Expression rules and compare it
against the conventional methods. In addition to the above, we have created a case study of detecting EternalBlue and
DoublePulsar threats, in order to point out the practical and realistic value of our proposal.
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
This document presents a probability model for detecting evolving topics from social media streams. It focuses on the social aspects of posts reflected in user mentioning behavior, rather than textual content. The model captures the number of mentions per post and frequency of mentioned users. It analyzes individual posting anomalies and aggregates anomaly scores. SDNML change point analysis and burst detection are then used to identify evolving topics from the aggregated scores. Experimental results show that link-based detection using this approach performs better than key-based detection using textual content alone. The model overcomes limitations of prior frequency-based approaches and can detect topics from both textual and non-textual social media posts.
Discovering emerging topics in social streams via link anomaly detectionFinalyear Projects
This document proposes a method to detect emerging topics in social media streams by analyzing anomalies in how users mention and link to each other, rather than analyzing textual content. It presents a probability model to capture normal user mentioning behavior and detect anomalies. Anomaly scores from many users are aggregated and analyzed with change point detection to identify when new topics emerge. The method is tested on real Twitter data and shown to detect emerging topics as early or earlier than text-based methods, especially when textual keywords are ambiguous.
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...IJDKP
As existing computer search engines struggle to understand the meaning of natural language, semantically
enriched metadata may improve interest-based search engine capabilities and user satisfaction.
This paper presents an enhanced version of the ecosystem focusing on semantic topic metadata detection
and enrichments. It is based on a previous paper, a semantic metadata enrichment software ecosystem
(SMESE). Through text analysis approaches for topic detection and metadata enrichments this paper
propose an algorithm to enhance search engines capabilities and consequently help users finding content
according to their interests. It presents the design, implementation and evaluation of SATD (Scalable
Annotation-based Topic Detection) model and algorithm using metadata from the web, linked open data,
concordance rules, and bibliographic record authorities. It includes a prototype of a semantic engine using
keyword extraction, classification and concept extraction that allows generating semantic topics by text,
and multimedia document analysis using the proposed SATD model and algorithm.
The performance of the proposed ecosystem is evaluated using a number of prototype simulations by
comparing them to existing enriched metadata techniques (e.g., AlchemyAPI, DBpedia, Wikimeta, Bitext,
AIDA, TextRazor). It was noted that SATD algorithm supports more attributes than other algorithms. The
results show that the enhanced platform and its algorithm enable greater understanding of documents
related to user interests.
Multiple Regression to Analyse Social Graph of Brand AwarenessTELKOMNIKA JOURNAL
Social Network Analysis (SNA) has become a common tool to conduct social and business
research. SNA can be used to measure how well a marketing campaign affect conversation in social
media. A good marketing campaign is expected to stimulate conversation between users in social media.
In this paper we use SNA metrics to understand the nature of network of top brand awareness products.
We analyses networks structure of social media conversation regarding cellular service provider and
smartphone brand in Indonesia that achieve top brand awareness in 2015. We use conversational
datasets acquired from Twitter. To get more understanding we also compare the result with network
structure of knowledge dissemination. We use multiple regression algorithm, a machine learning algorithm
that is extension of linear regression, to analyses network properties to get insight on the correlation of the
network structure and brand awareness' rank of a product. The result suggests how we should define
network properties in brand awareness context.
This document summarizes an algorithm for efficiently clustering short messages like tweets into general domains or topics. The algorithm breaks the clustering task into two stages: (1) batch clustering of user-annotated data using hashtags to create dense virtual documents, and (2) online clustering of new tweets using the centroids from stage 1. Experimental results show the algorithm outperforms other clustering methods and can accurately cluster large streams of sparse, short messages efficiently.
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...Nexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
This is the final report for Networks & Data Mining Techniques project focusing on mining social network to estimate public opinion about entities and associated keywords. This project mines Twitter for recent feeds and analyzes them to estimate sentiment score, discussed entity and describing keywords in each tweet. This data is then exploited to elicit overall sentiment associated with each entity. Entities and keywords extracted is also used to form an entity-keyword bigraph. This graph is further used to detect entity communities and keywords found within those communities. Presented implementation works in linear time.
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET Journal
This document discusses a proposed system for deep collaborative filtering with aspect information. The system aims to help web users efficiently locate relevant information on unfamiliar topics to increase their knowledge. It utilizes techniques like multi-keyword search, synonym matching, and ontology mapping to return relevant web links, images, and news articles to the user based on their search terms. The proposed system architecture includes an index structure to efficiently search and rank results based on similarity to the search query terms. The implementation and evaluation of the proposed system are also discussed.
Semantic domain ontologies are increasingly seen as the key for enabling
interoperability across heterogeneous systems and sensor-based applications.
The ontologies deployed in these systems and applications are developed by
restricted groups of domain experts and not by semantic web experts. Lately,
folksonomies are increasingly exploited in developing ontologies. The
“collective intelligence”, which emerge from collaborative tagging can be
seen as an alternative for the current effort at semantic web ontologies.
However, the uncontrolled nature of social tagging systems leads to many
kinds of noisy annotations, such as misspellings, imprecision and ambiguity.
Thus, the construction of formal ontologies from social tagging data remains
a real challenge. Most of researches have focused on how to discover
relatedness between tags rather than producing ontologies, much less domain
ontologies. This paper proposed an algorithm that utilises tags in social
tagging systems to automatically generate up-to-date specific-domain
ontologies. The evaluation of the algorithm, using a dataset extracted from
BibSonomy, demonstrated that the algorithm could effectively learn a
domain terminology, and identify more meaningful semantic information for
the domain terminology. Furthermore, the proposed algorithm introduced a
simple and effective method for disambiguating tags.
Semantic Grounding Strategies for Tagbased Recommender Systems dannyijwest
Recommender systems usually operate on similarities between recommended items or users. Tag based
recommender systems utilize similarities on tags. The tags are however mostly free user entered phrases.
Therefore, similarities computed without their semantic groundings might lead to less relevant
recommendations. In this paper, we study a semantic grounding used for tag similarity calculus. We show a
comprehensive analysis of semantic grounding given by 20 ontologies from different domains. The study
besides other things reveals that currently available OWL ontologies are very narrow and the percentage
of the similarity expansions is rather small. WordNet scores slightly better as it is broader but not much as
it does not support several semantic relationships. Furthermore, the study reveals that even with such
number of expansions, the recommendations change considerably.
This dissertation analyzes social media data and outlines approaches for understanding online communication and collaboration. It presents algorithms for detecting communities using structural and semantic properties. It analyzes blog subscription patterns and the microblogging phenomenon. Systems are developed for opinion retrieval from blogs and identifying influential users. The growth of social media and tagging behavior are also studied through analysis of tags and social graphs.
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGijaia
In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific
taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency
method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based
algorithm with a directed crawler. We exploit the multilingual open-content directory of the World Wide
Web, DMOZ1
to seed the crawl, and the domain name to direct the crawl. This domain corpus is then input
to our algorithm that can automatically induce taxonomies. The induced taxonomies provide hierarchical
semantic dimensions for the purposes of faceted browsing. As part of an ongoing personal semantics
project, we applied the resulting taxonomies to personal social media data (Twitter, Gmail, Facebook,
Instagram, Flickr) with an objective of enhancing an individual’s exploration of their personal information
through faceted searching. We also perform a comprehensive corpus based evaluation of the algorithms
based on many datasets drawn from the fields of medicine (diseases) and leisure (hobbies) and show that
the induced taxonomies are of high quality.
This document describes a study that aims to improve tag clouds as visual information retrieval interfaces. The researchers propose selecting tags based on their usefulness in representing resources rather than just frequency of use. They define tag usefulness based on how well a tag represents the resources it is assigned to compared to other tags of those resources, as well as how many unique resources it covers. They also propose grouping tags visually based on similarity using clustering techniques. An evaluation on a sample of tagged web resources found their proposed tag selection method achieved better coverage of resources with less overlap between tags compared to traditional frequency-based selection.
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...ijscai
In the magnitude of internet one need to devote extra time to investigate anticipated resource, especially
when one need to search information from documents. For the higher range internet there is serious need
to demand the essentiality to discover the reserved resources. One of the solutions for information retrieval
from document repository is to attach tags to documents. Numerous online social bookmarking services
permit users to attach tags with resources which are eventually meta-data, frequently stated as folksonomy.
In current paper, authors implemented this model for information retrieval by utilizing these tags, after
retrieving by using delicious API and synthesize tag cloud in an Indian University to search and retrieve
information from document repository.
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...IJSCAI Journal
In the magnitude of internet one need to devote extra time to investigate an
ticipated resource, especially
when one need to search information from documents. For the higher range internet there is serious need
to demand the essentiality to discover the reserved resources. One of the solutions for information retrieval
from docume
nt repository is to attach tags to documents. Numerous online social bookmarking services
permit users to attach tags with resources which are eventually meta
-
data, frequently stated as folksonomy.
In current paper, authors implemented this model for infor
mation retrieval by utilizing these tags, after
retrieving by using delicious API and synthesize tag cloud in an Indian University to search and retrieve
information from document repository
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...ijscai
In the magnitude of internet one need to devote extra time to investigate anticipated resource, especially
when one need to search information from documents. For the higher range internet there is serious need
to demand the essentiality to discover the reserved resources. One of the solutions for information retrieval
from document repository is to attach tags to documents. Numerous online social bookmarking services
permit users to attach tags with resources which are eventually meta-data, frequently stated as folksonomy.
In current paper, authors implemented this model for information retrieval by utilizing these tags, after
retrieving by using delicious API and synthesize tag cloud in an Indian University to search and retrieve
information from document repository.
The document presents a proposed approach for sentiment analysis on big social data using Spark. It discusses collecting opinions from social media to analyze large events by tracking public behavior in real-time. The proposed system provides a adaptable sentiment analysis approach using Spark that analyzes social media posts and classifies them by subject in real-time. It also discusses using sentiment data from social media to inform decisions.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
The document presents a new model for intelligent social networks based on semantic tag ranking. It uses a multi-agent system approach with agents performing indexing and ranking. For indexing, it uses an enhanced Latent Dirichlet Allocation (E-LDA) model that optimizes LDA parameters. Tags above a threshold from E-LDA output are ranked using Tag Rank. Simulation results showed improvements in indexing and ranking over conventional methods. The model introduces semantics to social networks to improve search and link recommendation.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share
their interests without being at the same geographical location. With the great and rapid growth of Social
Media sites such as Facebook, LinkedIn, Twitter...etc. causes huge amount of user-generated content.
Thus, the improvement in the information quality and integrity becomes a great challenge to all social
media sites, which allows users to get the desired content or be linked to the best link relation using
improved search / link technique. So introducing semantics to social networks will widen up the
representation of the social networks.
Here are the key points about using content-based filtering techniques:
- Content-based filtering relies on analyzing the content or description of items to recommend items similar to what the user has liked in the past. It looks for patterns and regularities in item attributes/descriptions to distinguish highly rated items.
- The item content/descriptions are analyzed automatically by extracting information from sources like web pages, or entered manually from product databases.
- It focuses on objective attributes about items that can be extracted algorithmically, like text analysis of documents.
- However, personal preferences and what makes an item appealing are often subjective qualities not easily extracted algorithmically, like writing style or taste.
- So while content-based filtering can
International Journal of Computational Engineering Research(IJCER) ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Integrated expert recommendation model for online communitiesst02IJwest
Online communities have become vital places for Web 2.0 users to share knowledg
e and experiences.
Recently, finding expertise user in community has become an important research issue. This paper
proposes a novel cascaded model for expert recommendation using aggregated knowledge extracted from
enormous contents and social network fe
atures. Vector space model is used to compute the relevance of
published content with respect
to a specific query while PageRank
algorithm is applied to rank candidate
experts. The experimental results sho
w that the proposed model is
an effective recommen
dation which can
guarantee that the most candidate experts are both highly relevant to the specific queries and highly
influential in corresponding areas
int.ere.st: SCOT-based Tag Sharing ServicesHaklae Kim
This document discusses the need for sharing tags across different systems and platforms. It proposes using SCOT (Social Semantic Cloud of Tags), an ontology for representing tagging data, and the OpenTagging platform for enabling interoperability among tags from different sources. The OpenTagging platform uses SCOT's data model and provides APIs to retrieve, search, and integrate tagging data from various social tagging applications and services in a unified manner. This allows tags to be portable and shared across different users, communities and sources.
Similar to Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter (20)
Como a cultura maker vai mudar o modo de produção globalGabriela Agustini
O documento discute como a cultura maker está mudando o modo de produção global através da democratização das ferramentas de inovação e fabricação digital. A cultura maker estimula a experimentação e a produção customizada em pequena escala através de comunidades e redes de pessoas. Estudos indicam que o movimento maker está impulsionando novos micro e pequenos negócios focados em nichos específicos.
Cidadãos como protagonistas das transformações sociaisGabriela Agustini
O documento descreve como as tecnologias digitais transformaram a sociedade nos últimos 20 anos através da descentralização do conhecimento e da cultura colaborativa na internet. É destacado o papel da Wikipedia como exemplo de como a internet permitiu que o conhecimento seja construído e compartilhado coletivamente de forma descentralizada.
O documento discute a cultura maker e inovações como impressão 3D, destacando como essas tecnologias permitem a criação decentralizada de produtos. Também aborda tendências como a Internet das Coisas, que conecta objetos à internet, e a prototipagem rápida, que reduz custos de inovação. A cultura maker valoriza a compreensão de como as tecnologias funcionam e a colaboração em sua criação.
O documento discute o movimento maker e seu papel a favor da educação. Defende que o movimento estimula a criatividade, a resolução de problemas e a aprendizagem baseada em projetos através da experimentação, colaboração e o uso de novas tecnologias como robótica e impressão 3D. Além disso, o movimento ajuda a democratizar as ferramentas de inovação.
O documento discute estratégias digitais para engajar o público em organizações culturais, mencionando o uso de ibeacons no Brooklyn Museum para fornecer experiências aos visitantes e a conversa com obras de arte na Pinacoteca de São Paulo usando a IBM Watson. Também recomenda colocar o público no centro do processo e usar mapas de empatia para identificar melhor os públicos-alvo.
The document discusses how cultural institutions need to change their mentality to remain relevant to citizens in the digital age. It argues that institutions must rethink how they work with new technologies and urban growth, use culture to drive social innovation, change how buildings are used and create spaces for public dialogue, and rethink their formats. This will allow culture and the arts to better serve as an engine for social change.
Crowdsourcing na arte permite que públicos anteriormente passivos se tornem criadores ativos, empoderando pessoas fora do mundo da arte. Isso fornece milhares de horas de trabalho gratuito ou barato para artistas, permitindo projetos de grande escala que um único artista levaria uma vida para criar sozinho. Estratégias digitais devem entender como a tecnologia pode ajudar a missão da organização e integrar a estratégia digital à visão geral, se baseando no comportamento do público para estabelecer novas conexões.
1) A cultura é um processo dinâmico de transformação e reinvenção constante, não algo estático. Quanto maior o grau de compartilhamento cultural, mais democrática e tolerante será uma sociedade.
2) A diversidade cultural é essencial para o desenvolvimento sustentável e para assegurar a livre expressão das culturas em sociedades dinâmicas.
3) A cultura digital representa a nova era de uma cultura de rede, onde a diversidade cultural é seu fundamento.
Social Entrepreneurship - International School of Law and TechnologyGabriela Agustini
This class is part of the International School of Law and Technology, organized by ITS Rio on July 2017. Gabriela Agustini, executive director of Olabi, and Luiza Mesquita, head of innovation at ITS Rio.
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?Gabriela Agustini
O documento discute como a tecnologia pode ser usada para o bem da humanidade, mencionando que a conectividade já alcançou metade da população e deve ser estendida a todos. Também ressalta que é preciso aprender com modelos adotados na África e em Shenzhen e ouvir mais as pessoas nas margens da sociedade.
This report was assembled by the co-organizers of the Makers for Global Good Summit, Stephanie Santoso, Kate Gage and Sam Bloch, with help from summit participants. The summit was made possible through the generous support of the Gordon and Betty Moore Foundation, Engineering for Change and in collaboration with The Tech Museum of Innovation. For more information on Makers for Global Good, visit makersforglobalgood.com.
Apresentação olabi institucional interna - abril 17Gabriela Agustini
Este documento apresenta a missão, visão, valores e objetivos da organização Olabi, que tem como objetivo democratizar a produção de tecnologia para promover mais igualdade social. O Olabi oferece ferramentas, um espaço físico para experimentação e um sistema para empoderar cidadãos no uso de tecnologias. Suas linhas de trabalho incluem educação para o século 21, resolver problemas urbanos e empoderamento feminino.
O documento discute a utilização criativa de acervos de museus e a gestão da propriedade intelectual. Ele lista exemplos de makerspaces, fabricação digital, computação imersiva e projetos de remix de acervos que promovem o conhecimento compartilhado. Também destaca o papel educacional dos museus e seu potencial para sensibilizar o público sobre o valor do patrimônio cultural e a importância de sua preservação.
A pretalab é uma plataforma do Olabi para estimular que mais meninas e mulheres negras e indígenas tenham acesso ao universo das inovações e da tecnologia.
A aula 2 de Cultura Digital discute como a internet e as tecnologias digitais transformaram a sociedade e a cultura, criando novas formas de comunicação, compartilhamento de informações e produção e consumo de conteúdo.
A cultura digital está mudando a forma como nos comunicamos e interagimos. As redes sociais e a internet transformaram nossas vidas, permitindo novas formas de compartilhar informações e criar conteúdo online. É importante entender esses novos hábitos e como eles afetam nossa sociedade.
Inovação de baixo para cima e o poder dos cidadãos Gabriela Agustini
O documento discute como makerspaces e fablabs estão permitindo que inovações surjam de baixo para cima, dando poder aos cidadãos comuns para ajudar a resolver problemas locais usando novas tecnologias e ferramentas de fabricação digital. A autora reflete sobre como essas comunidades de "makers" podem ajudar a construir um mundo melhor aproveitando todo o potencial criativo das pessoas.
Este documento descreve o Olabi Makerspace, um espaço de criação, aprendizagem e inovação em Rio de Janeiro. O Olabi oferece ferramentas, máquinas e cursos para hobbystas, empreendedores, aprendizes e comunidade em geral desde 2014. O documento destaca a importância dos princípios de empoderamento, criação de comunidade e aprendizagem prática nos makerspaces e fornece links para artigos sobre o uso de makerspaces na educação.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
1. Meaning as Collective Use:
Predicting Semantic Hashtag Categories on Twitter
Lisa Posch Claudia Wagner
Graz University of Technology, JOANNEUM RESEARCH,
Knowledge Management Institute for Information and
Institute Communication Technologies
Inffeldgasse 13, 8010 Graz, Steyrergasse 17, 8010 Graz,
Austria Austria
lposch@sbox.tugraz.at clauwa@sbox.tugraz.at
Philipp Singer Markus Strohmaier
Graz University of Technology, Graz University of Technology,
Knowledge Management Knowledge Management
Institute Institute
Inffeldgasse 13, 8010 Graz, Inffeldgasse 13, 8010 Graz,
Austria Austria
philipp.singer@tugraz.at markus.strohmaier@tugraz.at
ABSTRACT Thus, new methods and techniques for analyzing the se-
This paper sets out to explore whether data about the us- mantics of hashtags are definitely needed.
age of hashtags on Twitter contains information about their A simplistic view on Wittgenstein’s work [17] suggests
semantics. Towards that end, we perform initial statisti- that meaning is use. This indicates that the meaning of
cal hypothesis tests to quantify the association between us- a word is not defined by a reference to the object it denotes,
age patterns and semantics of hashtags. To assess the util- but by the variety of uses to which the word is put. There-
ity of pragmatic features – which describe how a hashtag fore, one can use the narrow, lexical context of a word (i.e.,
is used over time – for semantic analysis of hashtags, we its co-occurring words) to approximate its meaning. Our
conduct various hashtag stream classification experiments work builds on this observation, but focuses on the prag-
and compare their utility with the utility of lexical features. matics of a word (i.e., how a word, or in our case a hashtag,
Our results indicate that pragmatic features indeed contain is used by a large group of users) – rather than its narrow,
valuable information for classifying hashtags into semantic lexical context.
categories. Although pragmatic features do not outperform The aim of this work is to investigate to what extent prag-
lexical features in our experiments, we argue that pragmatic matic characteristics of a hashtag (which capture how a large
features are important and relevant for settings in which tex- group of users uses a hashtag) may reveal information about
tual information might be sparse or absent (e.g., in social its semantics. Specifically, our work addresses the following
video streams). research questions:
• Do different semantic categories of hashtags reveal sub-
Categories and Subject Descriptors stantially different usage patterns?
H.3.5 [Information Storage and Retrieval]: Online In- • To what extent do pragmatic and lexical properties of
formation Services—Web-based services hashtags help to predict the semantic category of a
hashtag?
Keywords To address these research questions we conducted an em-
Twitter; hashtags; social structure; semantics pirical study on a broad range of diverse hashtag streams be-
longing to eight different semantic categories (such as tech-
nology, sports or idioms) which have been identified in pre-
1. INTRODUCTION vious research [12] and have shown to be useful for grouping
A hashtag is a string of characters preceded by the hash hashtags. From each of the eight categories, we selected ten
(#) character and it is used on platforms like Twitter as sample hashtags at random and collected temporal snap-
descriptive label or to build communities around particular shots of messages containing at least one of these hashtags
topics [15]. To outside observers, the meaning of hashtags at three different points in time. To quantify how hashtags
is usually difficult to analyze, as they consist of short, often are used over time, we extended the set of pragmatic stream
abbreviated or concatenated concepts (e.g., #MSM2013). measures which we introduced in our previous work [16] and
applied them to the hashtag streams in our dataset. These
Copyright is held by the International World Wide Web Conference pragmatic measures capture not only the social structure of
Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink
to the author’s site if the Material is used in electronic media.
a hashtag at specific points in time, but also the changes in
WWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil. social structure over time.
ACM 978-1-4503-2038-2/13/05.
2. To answer the first research question, we used statisti- grounded and classified into predefined semantic categories.
cal standard tests which allow to quantify the association For example, Noll and Meinel [8] presented a study of the
between pragmatic characteristics of hashtag streams and characteristics of tags and determined their usefulness for
their semantic categories. To tackle the second research web page classification [9]. Similar to our work, Overell et
question, we firstly computed lexical features using a a stan- al. [10] presented an approach which allows classifying tags
dard bag-of-words model with term frequency (TF). Then, into semantic categories. They trained a classifier to classify
we trained several classification models with lexical features Wikipedia articles into semantic categories, mapped Flickr
only, pragmatic features only and a combination of both. We tags to Wikipedia articles using anchor texts in Wikipedia
compared the performance of different classification models and finally classified Flickr tags into semantic categories by
by using standard evaluation measures such as the F1-score using the previously trained classifier. Their results show
(which is defined as the harmonic mean of precision and that their ClassTag system increases the coverage of the vo-
recall). To get a fair baseline for our classification mod- cabulary by 115% compared to a simple WordNet approach
els, we constructed a control dataset by randomly shuffling which classifies Flickr tags by mapping them to WordNet
the category labels of the hashtag streams. That means we via string matching techniques. Unlike our work, they did
destroyed the original relationship between the pragmatic not take into account how tags are used, but learn relations
properties and the semantic categories of hashtags. between tags and semantic categories via mapping them to
Our results show that pragmatic features indeed reveal Wikipedia articles.
information about hashtags’ semantics and perform signifi- Pragmatics and semantics of hashtags: On Twitter,
cantly better than the baseline. They can therefore be useful users have developed a tagging culture by adding a hash
for the task of semantically annotating social media con- symbol (#) in front of a short keyword. The first introduc-
tent. Not surprisingly, our results also show that lexical fea- tion of the usage of hashtags was provided by Chris Messina
tures are more suitable than pragmatic features for the task in a blog post [7]. Huang et al. [4] state that this kind
of semantically categorizing hashtag streams. However, an of new tagging culture has created a completely new phe-
advantage of pragmatic features is that they are language- nomenon, called micro-meme. The difference between such
and text-independent. Pragmatic features can be applied to micro-memes and other social tagging systems is that the
tasks where the creation of lexical features is not possible – participation in micro-memes is an a-priori approach, while
such as multimedia streams. Also for scenarios where tex- other social tagging systems follow an a-posteriori approach.
tual content is available, pragmatic features allow for more This is due to the fact that users are influenced by the ob-
flexibility due to their independence of the language used in servation of the usage of micro-meme hashtags adopted by
the corpus. Our results are relevant for social media and other users. The work of [4] suggests that hashtagging in
semantic web researchers who are interested in analyzing Twitter is more commonly used to join public discussions
the semantics of hashtags in textual or non-textual social than to organize content for future retrieval. The role of
streams (e.g., social video streams). hashtags has also been investigated in [18]. Their study
This paper is structured as follows: Section 2 gives an confirms that a hashtag serves both as a tag of content and
overview of related research on analyzing the semantics of a symbol of community membership. Laniado and Mika [6]
tags in social bookmarking systems and research on hash- explored to what extent hashtags can be used as strong iden-
tagging on Twitter in general. In Section 3 we describe tifiers like URIs are used in the Semantic Web. They mea-
our experimental setup, including our datasets, feature en- sured the quality of hashtags as identifiers for the Semantic
gineering and evaluation approach. Our results are reported Web, defining several metrics to characterize hashtag usage
in Section 4 and further discussed in Section 5. Finally, we on the dimensions of frequency, specificity, consistency, and
conclude our work in Section 6. stability over time. Their results indicate that the lexical
usage of hashtags can indeed be used to identify hashtags
2. RELATED WORK which have the desirable properties of strong identifiers. Un-
like our work, their work focuses on lexical usage patterns
In the past, a considerable effort has been spent on study-
and measures to what extent those patterns contribute to
ing the semantics of tags (e.g., tags in social bookmarking
the differentiation between strong and weak semantic iden-
systems), but also hashtags in Twitter have received atten-
tifiers (binary classification) while we use usage patterns to
tion from the research community.
classify hashtags into semantic categories.
Semantics of tags: On the one hand, researchers ex-
Recently, researchers have also started to explore the dif-
plored to what extent semantics emerge from folksonomies
fusion dynamics of hashtags - i.e., how hashtags spread in
by investigating different algorithms for extracting tag net-
online communities. For example the work of [15] aims to
works and hierarchies from such systems (see e.g., [1], [3] or
predict the exposure of a hashtag in a given time frame
[13]). The work of [14] evaluated three state-of-the-art folk-
while [12] are interested in the temporal spreading patterns
sonomy induction algorithms in the context of five social
of hashtags.
tagging systems. Their results show that those algorithms
specifically developed to capture intuitions of social tagging
systems outperform traditional hierarchical clustering tech- 3. EXPERIMENTAL SETUP
niques. K¨rner et al. [5] investigated how tagging usage pat-
o Our experiments are designed to explore to what extent
terns influence the quality of the emergent semantics. They pragmatic properties of hashtag streams can be used to
found that ‘verbose’ taggers (describers) are more useful for gauge the semantic category of a hashtag. We are not only
the emergence of tag semantics than users who use a small interested in the idiosyncrasies of hashtag usage within one
set of tags (categorizers). semantic category but also in the deltas between different
On the other hand, researchers investigated to what extent semantic categories. In this section, we first introduce our
tags (and the resources they annotate) can be semantically dataset as well as the pragmatic and lexical measures which
3. 3/4/2012 1 week 4/1/2012 4/29/2012
we used to describe hashtag streams. Then we present the
t0 t1 t2
methodology and evaluation approach which we used to an-
swer our research questions. crawl of crawl of crawl of
stream stream stream
social social social
tweets tweets tweets
structure structure structure
3.1 Dataset
In this work we use data that we acquired from Twit- Figure 1: Timeline of the data collection process
ter’s API. Romero et al. [12] conducted a user study and a
classification experiment and identified eight broad semantic
categories (concretely, we removed the two hashtags #bsb
categories of hashtags: celebrity, games, idiom, movies/TV,
and #mj). We also decided to remove inactive hashtag
music, political, sports and technology. We used a list con-
streams (those where less than 300 posts where retrieved)
sisting of the 500 hashtags which were used by most users
as estimating information theoretic measures is problematic
within their dataset and which were manually assigned to
if only few observations are available [11]. The most common
the eight categories as a starting point for creating our own
solution is to restrict the measurements to situations where
dataset.
one has an adequate amount of data. We found four inactive
For each category, we chose ten hashtags at random (see
hashtags in the category games and seven in the category
Table 1). We biased our random sample towards active
celebrity. The removal of these hashtag streams resulted in
hashtag streams by re-sampling hashtags for which we found
the complete removal of the category celebrity as it was only
less than 1000 posts at the beginning of our data collection
left with one hashtag stream (#michaeljackson). A possi-
(March 4th, 2012). For those categories for which we could
ble explanation for the low number of tweets in the hashtag
not find ten hashtags that had more than 1000 posts (i.e.,
streams for this category is that topics related to celebrities
games and celebrity), we selected the most active hashtags
have a shorter life-span than topics related to other cate-
per category (i.e., the hashtags for which we found the most
gories. Our final datasets consist of 64 hashtag streams and
posts).
seven semantic categories which were sufficiently active dur-
The dataset consists of three parts, each part representing
ing our observation period.
a time frame of four weeks. The different time frames ensure
that we can observe the usage of a hashtag over a given
period of time. The time frames are independent of each Table 2: Description of the complete dataset
other, i.e., the data collected at one time frame does not t0 t1 t2
contain any information of the data collected at another time
Tweets 94,634 94,984 95,105
frame. Authors 53,593 54,099 53,750
At the start of each time frame, we retrieved the most Followers 56,685,755 58,822,119 66,450,378
recent tweets in English for each hashtag using Twitter’s Followees 34,025,961 34,263,129 37,674,363
Friends 21,696,134 21,914,947 24,449,705
public search API. Afterwards, we retrieved the followers Mean Followers per Author 1,057.71 1,087.31 1,236.29
and followees of each user who had authored at least one Mean Followees per Author 634.90 633.34 700.92
message in our hashtag stream dataset. Some pragmatic fea- Mean Friends per Author 404.83 405.09 454.88
tures capture information about who potentially consumes
a hashtag stream (followers) or who potentially informs au-
thors of a hashtag stream (followees) and therefore require 3.2 Feature Engineering
the one-hop neighborhood of hashtag streams’ authors. In In the following, we define the pragmatic and lexical fea-
this work, we call users who hold both of these roles (i.e., tures which we designed to capture the different social and
have established a bidirectional link with an author) friends. message based structures of hashtag streams. For our prag-
The starting dates of the time frames were March 4th (t0 ), matic features we further differentiate between static prag-
April 1st (t1 ) and April 29th, 2012 (t2 ). Table 2 depicts matic features (which capture the social structure of a hash-
the number of tweets and relations between users that we tag at a specific point in time) and dynamic pragmatic fea-
collected during each time frame. tures (which combine information from several time points).
The stream tweets were retrieved on the first day of each
time frame, fetching tweets that were authored a maximum 3.2.1 Static Pragmatic Measures:
of seven days previous to the date of retrieval. During the Entropy Measures are used to measure the random-
first week of each time frame, the user IDs of the followers ness of streams’ authors and their followers, followees and
and followees were collected. Figure 1 depicts this process. friends. For each hashtag stream, we rank the authors
Since we were interested in learning what types of char- by the number of messages they published in that stream
acteristics are useful for describing a semantic hashtag cat- (norm entropy author) and we rank the followers (norm -
egory, we removed hashtag streams that belong to multiple entropy follower), followees (norm entropy followee) and
Table 1: Randomly selected hashtags per category (ordered alphabetically)
technology idioms sports political games music celebrity movies
blackberry factaboutme f1 climate e3 bsb ashleytisdale avatar
ebay followfriday football gaza games eurovision brazilmissesdemi bbcqt
facebook dontyouhate golf healthcare gaming lastfm bsb bones
flickr iloveitwhen nascar iran mafiawars listeningto michaeljackson chuck
google iwish nba mmot mobsterworld mj mj glee
iphone nevertrust nhl noh8 mw2 music niley glennbeck
microsoft omgfacts redsox obama ps3 musicmonday regis movies
photoshop oneofmyfollowers soccer politics spymaster nowplaying teamtaylor supernatural
socialmedia rememberwhen sports teaparty uncharted2 paramore tilatequila tv
twitter wheniwaslittle yankees tehran wow snsd weloveyoumiley xfactor
4. friends (norm entropy friend) by the number of stream’s 3.2.3 Lexical Measures:
authors they are related with. A high author entropy indi- We use vector-based methods which allow representing
cates that the stream is created in a democratic way since all each microblog message as a vector of terms and use term
authors contribute equally much. A high follower entropy frequency (TF ) as weighting schema. In this work lexical
and friend entropy indicate that the followers and friends do measures are always computed for individual time points
not focus their attention towards few authors but distribute and are therefore static measures.
it equally across all authors. A high followee entropy and
friend entropy indicate that the authors do not focus their 3.3 Usage Patterns of Hashtag Categories
attention on a selected part of their audience. Our first aim is to investigate whether different seman-
Overlap Measures describe the overlap between the au- tic categories of hashtags reveal substantially different us-
thors and the followers (overlap authorfollower), followees age patterns (such as that they are used and/or consumed
(overlap authorfollowee) or friends (overlap authorfriend) of by different sets of users or that they are used for differ-
a hashtag stream. If overlap is one, all authors of a stream ent purpose). To compare the pragmatic fingerprints of
are also followers, followees or friends of stream authors. hashtags belonging to different semantic categories and to
This indicates that the stream is consumed and produced by quantify the differences between categories, we conducted
the same users. A high overlap suggests that the community a pairwise Mann-Whitney-Wilcoxon-Test which is a non-
around the hashtag is rather closed, while a low overlap indi- parametric statistical hypothesis test for assessing whether
cates that the community is more open and that active and one of two samples of independent observations tends to have
passive part of the community do not extensively overlap. larger values than the other. We used a non-parametric test
Coverage Measures characterize a hashtag stream via since the Shapiro-Wilk-Test revealed that not all features
the nature of its messages. We introduce four coverage are normally distributed, even after applying arcsine trans-
measures. The informational coverage measure (informa- formation to ratio measures. Holm-Bonferroni method was
tional) indicates how many messages of a stream have an used for adjusting the p-values and counteract the problem
informational purpose - i.e., contain a link. The conversa- of multiple comparisons. For this experiment, we used the
tional coverage (conversational) measures the mean number timeframes t0→1 and t1→2 .
of messages of a stream that have a conversational purpose
- i.e., those messages that are directed to one or several
t1→0 t2→1
specific users (e.g., through @replies). The retweet cover-
age (retweet) measures the percentage of messages which
are retweets. The hashtag coverage (hashtag) measures the
mean number of hashtags per message in a stream.
t0 t1 t2
3.2.2 Dynamic Pragmatic Measures:
To explore how the social structure of a hashtag stream
changes over time, we measure the distance between the
tweet-frequency distributions of authors at different time t0→1 t1→2
points, and the author-frequency distributions of followers,
followees or friends at different time points. The intuition
behind these features is that certain semantic categories of Figure 2: Illustration of our time frames
hashtags may have a fast changing social structure since
new people start and stop using those types of hashtags
frequently, while other semantic categories may have a more 3.4 Hashtag Classification
stable community around them which changes less over time. Our second aim is to investigate to what extent pragmatic
We use a symmetric variation of the Kullback-Leibler di- and lexical properties of hashtag streams contribute to clas-
vergence (DKL ) which represents a natural distance mea- sify them into their semantic categories. That means we
sure between two probability distributions (A and B) and aim to classify temporal snapshots of hashtag streams into
2
1
is defined as follows: 1 DKL (A||B) + 2 DKL (B||A). The KL their correct semantic categories (to which they were as-
divergence is also known as relative entropy or information signed in [12]) just by analyzing how they are used over
divergence. The KL divergence is zero if the two distri- time. We then compare the performance of the pragmati-
butions are identical and approaches infinity as they differ cally informed classifier with the performance of a classifier
more and more. We measure the KL divergence for the dis- informed by lexical features within a semantic multiclass
tributions of authors (kl authors), followers (kl followers), classification task. We used the timeframes t1→0 and t2→1
followees (kl followees) and friends (kl friends). for this experiment in order to avoid including information
Figure 2 visualizes the different time frames and their no- from the ‘future’ in our classification.
tation. t0 only contains the static features computed from We performed grid search with varying hyperparameters
data collected at t0 . Consequently, t1 and t2 only contain using Support Vector Machine (linear and RBF kernels) and
the static features computed from data collected at t1 or an ensemble method with extremely randomized trees. Since
t2 , respectively. t0→1 includes static features computed on extremely randomized trees are a probabilistic method and
data collected at t0 and the dynamic measures computed perform slightly different in each run, we run them ten times
on data collected at t0 and t1 . t1→0 includes static features and report the average scores. The features were standard-
computed on data collected at t1 and the dynamic measures ized by subtracting the mean and scaling to unit variance.
computed on data collected at t0 and t1 . t1→2 and t2→1 are We used stratified 6-fold cross-validation (CV) to train and
defined in the same way. test each classification model.
5. Since we have two different types of pragmatic features, the 64 hashtag streams. That means we destroyed the orig-
static and dynamic ones, we trained and tested three sepa- inal relationship between the pragmatic properties and the
rate classification models which were only informed by prag- semantic categories of hashtags and evaluated the perfor-
matic information: mance of a classifier which tries to use pragmatic properties
Static Pragmatic Model: We trained and tested this to classify hashtags into their shuffled categories within a 6-
classification model with static pragmatic features on data fold cross-validation. We repeated the random shuffling 100
collected at t1 using stratified 6-fold CV. The experiment times and used the resulting average F1-score as our base-
was repeated on the data collected at t2 . line performance. For the baseline classifier we also used grid
Dynamic Pragmatic Model: We trained and tested search to determine the optimal parameters prior to train-
the classification model with dynamic pragmatic features on ing. Our baseline classifier tests how well randomly assigned
data collected at t0 and t1 using stratified 6-fold CV. The categories can be identified compared to our real semantic
computation of our dynamic features requires at least two categories. One needs to note that a simple random guesser
time points. We repeated this experiment on data collected baseline would be a weaker baseline than the one described
at t1 and t2 . above and would lead to a performance of 1/7.
Combined Pragmatic Model: We combined the static To gain further insights into the impact of individual prop-
and dynamic pragmatic features, and trained and tested the erties, we analyzed their information gain (IG) with respect
classification model on the data of t1→0 using stratified 6- to the categories. The information gain measures how accu-
fold CV. Again, we repeated the experiment on the data of rately a specific stream property P is able to predict stream’s
t2→1 . category C and is defined as follows: IG(C, P ) =H(C) −
We also performed our classification with a model using H(C | P ) where H denotes the entropy.
our lexical features (i.e., TF weighted words). Finally, we
trained and tested a combined classification model using
pragmatic and lexical features, which leads to the follow- 4. RESULTS
ing classification models: In the following section, we present the results from our
Lexical Model: We trained and tested the model on empirical study on usage patterns of different semantic cat-
data from t1 using stratified 6-fold CV and repeated the egories of hashtags.
experiment on data collected at t2 .
Combined Pragmatic and Lexical Model: We trained 4.1 Usage Patterns of Hashtag Categories
and tested the mixed classifier on the data of t1→0 us- To answer our first research question, we explored to what
ing stratified 6-fold CV, then repeated this experiment for extent usage patterns of hashtag streams in different seman-
t2→1 . A simple concatenation of pragmatic and lexical fea- tic categories are indeed significantly different.
tures is not useful, since the vast amount of lexical features Our results indicate that some pragmatic measures are
would overrule the pragmatic features. Therefore, we used indeed significantly different for distinct semantic categories.
a stacking method (see [2]) and performed firstly a clas- This indicates that hashtags of certain categories are used
sification using lexical features alone and Leave-One-Out in a very specific way which may allow us to relate these
cross-validation. We used a SVM with linear kernel for hashtags with their semantic categories just by observing
this classification since it worked best for these features. how users use them. Table 3 depicts the measures that show
Secondly, we combined the pragmatic features with the re- statistically significant (p < 0.05) differences in both t0→1
sulting seven probability features which we got from the and t1→2 . In total, 35 pragmatic category differences were
previous classification model and which describe how likely found to be statistically significant (with p < 0.05) for t0→1
each semantic class is for a certain stream given its words. and 33 for t1→2 . 26 pragmatic category differences were
To get a fair baseline for our experiment, we constructed a found to be significant for both t0→1 and t1→2 which suggests
control dataset by randomly shuffling the category labels of that results are independent of our choice of time frame.
Table 3: Features which showed a statistically significant difference (with p < 0.05) for each pair of categories
in both t0→1 and t1→2
games idioms movies music political sports
idioms informational
retweet
movies informational
music informational
political kl followers kl authors
kl followers
kl followees
informational
hashtag
sports kl followers kl authors
kl followers
informational
technology kl followers kl authors kl friends kl friends overlap authorfollower
kl followers overlap authorfriend
kl followees
kl friends
informational
retweet
hashtag
6. technology technology
sports sports
political political
music music
movies movies
idioms idioms
games games
0.0 0.2 0.4 0.6 0.8 1.0 6 8 10 12 14
Informational Coverage KL Divergence Authors
(a) This figure shows the percentage of mes- (b) This figure shows how much the au-
sages of hashtag streams belonging to differ- thors’ tweet-frequency distributions of hash-
ent categories that contain at least one link. tag streams of different categories change on
average.
technology technology
sports sports
political political
music music
movies movies
idioms idioms
games games
1 2 3 4 5 1 2 3 4 5 6
KL Divergence Followers KL Divergence Friends
(c) This figure shows how much the follow- (d) This figure shows how much the friends’
ers’ author-frequency distributions of hash- author-frequency distributions of hashtag
tag streams of different categories change on streams of different categories change on av-
average. erage.
Figure 3: Each plot shows the feature distribution of different categories of one of the 4 best pragmatic
features for t0→1 . We obtained similar results for t1→2 .
Not surprisingly, the category which shows the most spe- The category technology can be distinguished from all
cific usage patterns is idioms and therefore the hashtags of other categories except sports, particularly because its fol-
this category can be distinguished from all hashtags just by lowers’ and friends’ author-frequency distributions have
analyzing their pragmatic properties. Hashtag streams of significantly lower symmetric KL divergences than hash-
the category idioms exhibit a significantly lower informa- tags in the categories games, idioms, movies and music
tional coverage than hashtag streams of all other categories (see Figures 3(c) and 3(d)). This indicates that hashtag
(see Figure 3(a)) and a significantly higher symmetric KL di- streams in the category technology have a stable social
vergence for author’s tweet-frequency distributions (see Fig- structure which changes less over time. This is not sur-
ure 3(b)). Also the followers’ and friends’ author-frequency prising since this semantic category denotes a topical area
distributions tend to have a higher symmetric KL divergence and users who are interested in such areas may consume
for idioms hashtags than for other hashtags (see Figures 3(c) and provide information on a regular base. It is especially
and 3(d)). This indicates that the social structure of hashtag interesting to note that the only pragmatic measures which
streams in the category idioms changes faster than hashtags allows distinguishing political and technological hashtag
of other categories. Furthermore, hashtag streams of this streams are the author-follower and author-friend over-
category are less informative - i.e., contain significantly less laps since these overlaps are significantly lower for the
links per message on average. category technology compared to the category political.
7. 1.0
1.0
Baseline performance Model performance Baseline performance Model performance
0.8
0.8
F1 measure
F1 measure
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
Comb. Pragmatic and Lexical
Combined Pragmatic
Dynamic Pragmatic
Lexical
Static Pragmatic
Comb. Pragmatic and Lexical
Combined Pragmatic
Dynamic Pragmatic
Lexical
Static Pragmatic
(a) t1→0 (b) t2→1
Figure 4: Weighted averaged F1-scores of different classification models trained and tested on t1→0 4(a) and
t2→1 4(b) using 6-fold cross-validation
This indicates that the content of hashtag streams of the
category political is far more likely to be produced and con- Table 4: Top features for two different datasets
sumed by the same people than content of technological ranked via Information Gain
Rank t1→0 t2→1
hashtag streams.
Comparing the individual measures reveals that the infor- 1 informational kl followers
2 kl followers informational
mational coverage (six category pairs) and the symmetric 3 kl friends hashtag
KL divergences of followers’ author-frequency distributions 4 hashtag kl followees
(six category pairs), authors’ tweet-frequency distributions 5 norm entropy friend kl friends
(three pairs) and friends’ follower-frequency distributions
(three pairs) are the most discriminative measures. Figure 3
depicts the distributions of these four measures per category. 4.2.1 Feature Ranking:
Other measures that show significant differences in medians In addition to the overall classification performance which
for both t0→1 and t1→2 are the symmetric KL divergence can be achieved solely based on analyzing the pragmatics of
of followees’ author-frequency distributions (two pairs), the hashtags, we were also interested in gaining insights into the
author-follower and the author-friend overlap (one pair) as impact of individual pragmatic features. To evaluate the in-
well as the retweet and hashtag coverage (two pairs). dividual performance of the features we used information
Some measures like the conversational coverage measure gain (with respect to the categories) as a ranking criterion.
did not show any significant differences for any of the cat- The ranking was performed on t1→0 and t2→1 with stratified
egory pairs, for any time frame. This indicates that in all 6-fold cross-validation. Table 4 shows that the top five fea-
hashtag streams an equal amount of conversational activities tures (i.e., the pragmatic features which reveal most about
take place. the semantic of hashtags) are features which capture the
temporal dynamics of the social context of a hashtag (i.e.,
4.2 Hashtag Classification the temporal follower, followees and friends dynamics) as
In order to quantify the value of different pragmatic and well as the informational and hashtag coverage. This indi-
lexical properties of hashtag streams for predicting their se- cates that the collective purpose for which a hashtag is used
mantic category, we conducted a hashtag stream classifi- (i.e., if it used to share information rather than for other
cation experiment and systematically compared the perfor- purposes) and the social dynamics around a hashtags – i.e.,
mance of various classification models trained with different who uses a hashtag for whom – play a key role in under-
sets of features. standing its semantics.
Figure 4 shows the performance of the best classifier (ex-
tremely randomized trees) trained with different sets of fea-
tures. One can see from this figure that in general lexical 5. DISCUSSION OF RESULTS
features perform better than pragmatic features, but also Although our results show that lexical features work best
that pragmatic features (both static and dynamic) signifi- within the semantic classification task, those features are
cantly outperform a random baseline. This indicates that text and language dependent. Therefore, their applicability
pragmatic features indeed reveal information about a hash- is limited to settings where text is available. Pragmatic fea-
tag’s meaning, even though they do not match the perfor- tures on the other hand rely on usage information which is
mance of lexical features in this case. In 4(a) we can see that independent of the type of content which is shared in social
for t1→0 the combination of lexical and pragmatic features streams and can therefore also be computed for social video
performs slightly better than using lexical features alone. or image streams.
8. We believe that pragmatic features can supplement lex- Proceedings of the 19th International World Wide
ical features if lexical features alone are not sufficient. In Web Conference (WWW 2010), Raleigh, NC, USA,
our experiments, we could see that the performance may Apr. 2010. ACM.
slightly increase when combining pragmatic and lexical fea- [6] D. Laniado and P. Mika. Making sense of twitter. In
tures. However, the effect was not significant. We think P. F. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika,
the reason for this is that in our setup lexical features alone L. Zhang, J. Z. Pan, I. Horrocks, and B. Glimm,
already achieved good performance. editors, International Semantic Web Conference (1),
The classification results coincide with the results of the volume 6496 of Lecture Notes in Computer Science,
statistical significance tests. Ranking the properties by in- pages 470–485. Springer, 2010.
formation gain showed that the most discriminative proper- [7] C. Messina. Groups for twitter; or a proposal for
ties (the ones that showed a statistical significance in both twitter tag channels. http://factoryjoe.com/blog/
t0→1 and t1→2 for the highest amount of category pairs) 2007/08/25/groups-for-twitter-or-a-proposal-
found in 4.1 were also the top ranked features (informational for-twitter-tag-channels/, 2007.
coverage and the KL divergences). [8] M. G. Noll and C. Meinel. Exploring social
annotations for web document classification. In
6. CONCLUSIONS AND IMPLICATIONS Proceedings of the 2008 ACM symposium on Applied
Our work suggests that the collective usage of hashtags computing, SAC ’08, pages 2315–2320, New York, NY,
indeed reveals information about their semantics. How- USA, 2008. ACM.
ever, further research is required to explore the relations [9] M. G. Noll and C. Meinel. The metadata triumvirate:
between usage information and semantics, especially in do- Social annotations, anchor texts and search queries. In
mains where limited text is available. We hope that our Web Intelligence, pages 640–647. IEEE, 2008.
research is a first step into this direction since it shows that [10] S. Overell, B. Sigurbj¨rnsson, and R. van Zwol.
o
hashtags of different semantic categories are indeed used in Classifying tags using open content resources. In
different ways. Proceedings of the Second ACM International
Our work has implications for researchers and practition- Conference on Web Search and Data Mining, WSDM
ers interested in investigating the semantics of social media ’09, pages 64–73, New York, NY, USA, 2009. ACM.
content. Social media applications such as Twitter provide [11] S. Panzeri and A. Treves. Analytical estimates of
a huge amount of textual information. Beside the textual limited sampling biases in different information
information, also usage information can be obtained from measures. Network: Computation in Neural Systems,
these platforms and our work shows how this information 7(1):87–107, 1996.
can be exploited for assigning semantic annotations to tex- [12] D. M. Romero, B. Meeder, and J. Kleinberg.
tual data streams. Differences in the mechanics of information diffusion
across topics: idioms, political hashtags, and complex
7. ACKNOWLEDGEMENTS contagion on twitter. In Proceedings of the 20th
international conference on World wide web, WWW
This work was supported in part by a DOC-fForte fellow- ’11, pages 695–704, New York, NY, USA, 2011. ACM.
ship of the Austrian Academy of Science to Claudia Wagner.
[13] P. Schmitz. Inducing ontology from flickr tags. In
Furthermore, this work is in part funded by the FWF Aus-
Proceedings of the Workshop on Collaborative Tagging
trian Science Fund Grant I677 and the Know-Center Graz.
at WWW2006, Edinburgh, Scotland, May 2006.
[14] M. Strohmaier, D. Helic, D. Benz, C. K¨rner, and
o
8. REFERENCES R. Kern. Evaluation of folksonomy induction
[1] D. Benz, C. K¨rner, A. Hotho, G. Stumme, and
o algorithms. Transactions on Intelligent Systems and
M. Strohmaier. One tag to bind them all: Measuring Technology, 2012.
term abstractness in social metadata. In Proc. of 8th [15] O. Tsur and A. Rappoport. What’s in a hashtag?:
Extended Semantic Web Conference ESWC 2011, content based prediction of the spread of ideas in
Heraklion, Crete, (May 2011), 2011. microblogging communities. In Proceedings of the fifth
[2] T. Hastie, R. Tibshirani, and J. Friedman. The ACM international conference on Web search and
Elements of Statistical Learning. Springer Series in data mining, WSDM ’12, pages 643–652, New York,
Statistics. Springer New York Inc., New York, NY, NY, USA, 2012. ACM.
USA, 2001. [16] C. Wagner and M. Strohmaier. The wisdom in
[3] P. Heymann and H. Garcia-Molina. Collaborative tweetonomies: Acquiring latent conceptual structures
creation of communal hierarchical taxonomies in social from social awareness streams. In Semantic Search
tagging systems. Technical Report 2006-10, Computer Workshop at WWW2010, 2010.
Science Department, April 2006. [17] L. Wittgenstein. Philosophical Investigations.
[4] J. Huang, K. M. Thornton, and E. N. Efthimiadis. Blackwell Publishers, London, United Kingdom, 1953.
Conversational tagging in twitter. In Proceedings of Republished 2001.
the 21st ACM conference on Hypertext and [18] L. Yang, T. Sun, M. Zhang, and Q. Mei. We know
hypermedia, HT ’10, pages 173–178, New York, NY, what @you #tag: does the dual role affect hashtag
USA, 2010. ACM. adoption? In Proceedings of the 21st international
[5] C. K¨rner, D. Benz, M. Strohmaier, A. Hotho, and
o conference on World Wide Web, WWW ’12, pages
G. Stumme. Stop thinking, start tagging - tag 261–270, New York, NY, USA, 2012. ACM.
semantics emerge from collaborative verbosity. In