Semantic social network analysis is a science that analyzes the structure and interactions within social networks. It uses tools like sociograms and centrality metrics to understand information flow and strategic positions. The goal is to improve communication, resilience, and trust within networks. Semantic social network analysis represents these analyses using ontologies to model social networks and their properties on the semantic web.
Cyberbullying, or humiliating and slandering people through Internet, has been recently noticed as a serious social problem disturbing mental health of Internet users. In Japan, to deal with the problem, voluntary members of Parent-Teacher Association (PTA) manually read through the Web to spot cyberbullying entries. To help PTA members in their uphill task we propose a novel method for automatic detection of malicious contents on the Internet. The method is based on a combinatorial approach resembling brute force search algorithms with application to language classification. The method extracts sophisticated patterns from sentences and uses them in classification. We tested the method on actual data containing cyberbullying provided by Human Rights Center. The results show our method outperformed previous methods. It is also more efficient as it requires minimal human effort.
Nltk natural language toolkit overview and application @ PyHugJimmy Lai
NLTK is a python toolkit for Natural Language Processing. In this slide, the author provides overview for NLTK and demonstrates an application in Chinese text classification.
The presentation describes how to install the NLTK and work out the basics of text processing with it. The slides were meant for supporting the talk and may not be containing much details.Many of the examples given in the slides are from the NLTK book (http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1282107366&sr=8-1-spell ).
This slideshow highlights the Tweet Analyzer machine, a tool created by Paterva and enabled through Maltego Carbon 3.5.3 and Maltego Chlorine 3.6.0. The Tweet Analyzer enables real-time captures of Tweets (from Twitter's streaming API) along with real-time sentiment analysis (based on polarities: positive, negative, and neutral), based on the Alchemy API.
Cyberbullying, or humiliating and slandering people through Internet, has been recently noticed as a serious social problem disturbing mental health of Internet users. In Japan, to deal with the problem, voluntary members of Parent-Teacher Association (PTA) manually read through the Web to spot cyberbullying entries. To help PTA members in their uphill task we propose a novel method for automatic detection of malicious contents on the Internet. The method is based on a combinatorial approach resembling brute force search algorithms with application to language classification. The method extracts sophisticated patterns from sentences and uses them in classification. We tested the method on actual data containing cyberbullying provided by Human Rights Center. The results show our method outperformed previous methods. It is also more efficient as it requires minimal human effort.
Nltk natural language toolkit overview and application @ PyHugJimmy Lai
NLTK is a python toolkit for Natural Language Processing. In this slide, the author provides overview for NLTK and demonstrates an application in Chinese text classification.
The presentation describes how to install the NLTK and work out the basics of text processing with it. The slides were meant for supporting the talk and may not be containing much details.Many of the examples given in the slides are from the NLTK book (http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1282107366&sr=8-1-spell ).
This slideshow highlights the Tweet Analyzer machine, a tool created by Paterva and enabled through Maltego Carbon 3.5.3 and Maltego Chlorine 3.6.0. The Tweet Analyzer enables real-time captures of Tweets (from Twitter's streaming API) along with real-time sentiment analysis (based on polarities: positive, negative, and neutral), based on the Alchemy API.
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
Talk about representation learning using word vectors such as Word2Vec, Paragraph Vector. Also introduced to neural network language models. Expose some applications using NNLM such as sentiment analysis and information retrieval.
These slides accompanied a demo of Deeplearning4j, while the meetup explored distributed clustering and various deep learning explanations.
http://www.meetup.com/SF-Neural-Network-Afficianados-Discussion-Group/events/182645252/
Deep-learning is useful in detecting anomalies like fraud, spam and money laundering; identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; recognizing faces and voices.
Deeplearning4j is an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. It includes a distributed deep-learning framework and a normal deep-learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure. The distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models.
The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.
Semantic Complex Event Processing at Sem Tech 2010Adrian Paschke
Semantic Complex Event Processing - The Future of Dynamic IT
Presentation by Paul Vincent, Adrian Paschke, Harold Boley
at the RuleML Semantic Rules Track of the Semantic Technologies Conference 2010 (SemTech 2010), San Francisco, CA, USA
http://semtech2010.semanticuniverse.com/rules
A sprint thru Python's Natural Language ToolKit, presented at SFPython on 9/14/2011. Covers tokenization, part of speech tagging, chunking & NER, text classification, and training text classifiers with nltk-trainer.
This thesis proposes to help analyzing the characteristics of the heterogeneous social networks that emerge from the use of web-based social applications, with an original contribution that leverages Social Network Analysis with Semantic Web frameworks. Social Network Analysis (SNA) proposes graph algorithms to characterize the structure of a social network and its strategic positions. Semantic Web frameworks allow representing and exchanging knowledge across web applications with a rich typed graph model (RDF), a query language (SPARQL) and schema definition frameworks (RDFS and OWL). In this thesis, we merge both models in order to go beyond the mining of the flat link structure of social graphs by integrating a semantic processing of the network typing and the emerging knowledge of online activities. In particular we investigate how (1) to bring online social data to ontology-based representations, (2) to conduct a social network analysis that takes advantage of the rich semantics of such representations, and (3) to semantically detect and label communities of online social networks and social tagging activities.
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
Talk about representation learning using word vectors such as Word2Vec, Paragraph Vector. Also introduced to neural network language models. Expose some applications using NNLM such as sentiment analysis and information retrieval.
These slides accompanied a demo of Deeplearning4j, while the meetup explored distributed clustering and various deep learning explanations.
http://www.meetup.com/SF-Neural-Network-Afficianados-Discussion-Group/events/182645252/
Deep-learning is useful in detecting anomalies like fraud, spam and money laundering; identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; recognizing faces and voices.
Deeplearning4j is an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. It includes a distributed deep-learning framework and a normal deep-learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure. The distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models.
The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.
Semantic Complex Event Processing at Sem Tech 2010Adrian Paschke
Semantic Complex Event Processing - The Future of Dynamic IT
Presentation by Paul Vincent, Adrian Paschke, Harold Boley
at the RuleML Semantic Rules Track of the Semantic Technologies Conference 2010 (SemTech 2010), San Francisco, CA, USA
http://semtech2010.semanticuniverse.com/rules
A sprint thru Python's Natural Language ToolKit, presented at SFPython on 9/14/2011. Covers tokenization, part of speech tagging, chunking & NER, text classification, and training text classifiers with nltk-trainer.
This thesis proposes to help analyzing the characteristics of the heterogeneous social networks that emerge from the use of web-based social applications, with an original contribution that leverages Social Network Analysis with Semantic Web frameworks. Social Network Analysis (SNA) proposes graph algorithms to characterize the structure of a social network and its strategic positions. Semantic Web frameworks allow representing and exchanging knowledge across web applications with a rich typed graph model (RDF), a query language (SPARQL) and schema definition frameworks (RDFS and OWL). In this thesis, we merge both models in order to go beyond the mining of the flat link structure of social graphs by integrating a semantic processing of the network typing and the emerging knowledge of online activities. In particular we investigate how (1) to bring online social data to ontology-based representations, (2) to conduct a social network analysis that takes advantage of the rich semantics of such representations, and (3) to semantically detect and label communities of online social networks and social tagging activities.
2. Social Network Analysis?
[Wasserman & Faust 1994] [Scott 2000] [Mika 2007]
• A science to understand the structure, the interactions
and the strategic positions in social networks.
• Sociograms
[Moreno, 1933]
• What for?
– To control information flow
– To improve/stimulate communication
– To improve network resilience
– To trust
3. Community
detection
• Global structure
• Distribution of actors
and activities
Influences the way
information is shared Influences the way actors behave
[Coleman 1988] [Burt 2000]
4. Centrality: strategic positions
[Freeman 1979]
Degree centrality:
Local attention
Closeness centrality:
Capacity to
communicate
Community detection:
Distribution of actors and
activities
beetweenness centrality:
reveal broker
"A place for good ideas"
[Burt 1992] [Burt 2004]
11. SNA on the semantic web
[Paolillo and Wright 2006]
Foaf:knows
Foaf:interest
Rich graph representations reduced to simple
untyped graphs in order to apply SNA
13. Semantic paths in
social graphs
mainDish type
type
ingredient
likes
subclassOf
Food
14. Fabien
Mylène
e
knows Gérard
colleagu
e r
ist
fat
s
he
r
colleague d < familly > ( guillaume )c
olle
agu
m
e
ot
he
sibling parent
r
Michel
Yvonne
sister brother father mother
15. Fabien
Mylène
e
knows Gérard
colleagu
e r
ist
fat
s
he
r
colleague d < familly > ( guillaume )c = 3
olle
agu
m
e
ot
he
sibling parent
r
Michel
Yvonne
sister brother father mother
16. Closeness centrality
Cc<type>(y)
select ?y ?to pathLength($path) as ?length
sum(?length) as ?centrality where{
?y $path ?to
filter(match($path, star(param[type]),
param[type]
'sa'))
}
group by ?y
17. Parametrized Component
C<type>(G)
add{
?x semsna:isMemberOf ?uri
}
select ?x ?y genURI(<myorg>) as ?uri
from G
where {
?x $path ?y
filter(match($path, star(param[type]), 'sa'))
param[type]
}
group by any
21. Most popular manager in a work subnetworks
select ?y ?indegree{
?y rdf:type domain:Manager
?y semsna:hasInDegree ?z
?z semsna:isDefinedForProperty rel:worksWith
?z semsna:hasValue ?indegree
?z semsna:hasDistance 2
}
order by desc(?indegree)
22. Current Community
detection algorithms
• Hierarchical algorithms
– Agglomerative (based on vertex proximity):
• [Donetti and Munoz 2004] [Zhou Lipowsky, R. 2004]
– Divisive (mostly based on centrality):
• [Girvan and Newman 2002] [Radicchi et al 2004]
• Based on heuristic (modularity, randon walk, etc.)
• [Newman 2004], [Pons and Latapy 2005], [Wu and Huberman
2004]