Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD, WWW2012juliahoxha
Enabling Semantic Analysis of User Browsing Patterns in the Web of Data
USEWOD Workshop, @WWW2012
A useful step towards better interpretation and analysis of
the usage patterns is to formalize the semantics of the resources
that users are accessing in the Web. We focus on this problem and present an approach for the semantic formalization of usage logs, which lays the basis for effective
techniques of querying expressive usage patterns.
We also present a query answering approach, which is useful to find in the logs expressive patterns of usage behavior via formulation of semantic and temporal-based constraints.
We have processed over 30 thousand user browsing sessions
extracted from usage logs of DBPedia and Semantic Web
Dog Food. The logs are semantically formalized using respective
domain ontologies and RDF representations of the
Web resources being accessed. We show the effectiveness of our approach through experimental results, providing in this way an exploratory analysis of the way users browse the Web of Data.
Talk given at the Semantic Web SIKS course 2011: why we need semantics on the Social Web. Three examples: social tagging, user profiling based on Twitter streams and cross-system user profiling (linking user profiles).
Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD, WWW2012juliahoxha
Enabling Semantic Analysis of User Browsing Patterns in the Web of Data
USEWOD Workshop, @WWW2012
A useful step towards better interpretation and analysis of
the usage patterns is to formalize the semantics of the resources
that users are accessing in the Web. We focus on this problem and present an approach for the semantic formalization of usage logs, which lays the basis for effective
techniques of querying expressive usage patterns.
We also present a query answering approach, which is useful to find in the logs expressive patterns of usage behavior via formulation of semantic and temporal-based constraints.
We have processed over 30 thousand user browsing sessions
extracted from usage logs of DBPedia and Semantic Web
Dog Food. The logs are semantically formalized using respective
domain ontologies and RDF representations of the
Web resources being accessed. We show the effectiveness of our approach through experimental results, providing in this way an exploratory analysis of the way users browse the Web of Data.
Talk given at the Semantic Web SIKS course 2011: why we need semantics on the Social Web. Three examples: social tagging, user profiling based on Twitter streams and cross-system user profiling (linking user profiles).
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsKrist Wongsuphasawat
I gave this presentation at Workshop on Interactive Language Learning, Visualization, and Interfaces / ACL 2014 in Baltimore, MD on June 27, 2014.
http://nlp.stanford.edu/events/illvi2014/index.html
ABSTRACT
Everyday on Twitter, there are millions of thoughts that are captured and shared to the world in the form of 140-character messages, or Tweets. There are many things we could learn from these thoughts if we could figure out a way to digest this gigantic dataset. Visualization is one of the many ways to extract information from these Tweets. In this presentation, I will talk about several visualizations based on Tweets, as well as share experiences and challenges from working with Tweet data.
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
Will the rich domain knowledge from research publications and the implicit cross-domain metadata of cultural objects be compliant with each other? A contextual framework is proposed as dynamic and relational in supporting three different contexts: Reusing, Publication and Curation, which are individually constructed but overlapped with major conceptual elements. A Relations for Reusing (R4R) ontology has been devised for modeling these overlapping
conceptual components (Article, Data, Code, Provence, and License) for interlinking research outputs and cultural heritage data. In particular, packaging and citation relations are key to build up interpretations for dynamic contexts. Examples are provided for illustrating how the linking mechanism can be constructed and represented as a result to reveal the data linked in different contexts.
Information Extraction from Text, presented @ DeloitteDeep Kayal
Useful unstructured text occurs in plentiful amounts, and often is central to the success of a business. The benefits of being able to successfully decipher unstructured text can be direct or derived. Companies which offer products for medical differential diagnosis are directly benefitted by the ability to correctly extract drug-disease interactions from publications, for example. As for derived benefits of text processing, we need to look no further than cases of improving process flows by analyzing the sentiment of the emails a company receives from its customers.
Being at the frontier of natural language processing, information representation and retrieval, information extraction has been the subject of extensive research for several decades and there are plenty of existing techniques to help with the understanding of unstructured textual content. This presentation will introduce and summarize useful techniques that are helpful in tackling sub-domains of information extraction, such as named entity recognition, keyword extraction and document summarization for efficient retrieval. Additionally, the talk will also emphasize low-resource cases, when not much useful labelled information is available.
Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, "Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Experiences," Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009, Poland.
Twitter is a free social networking microblogging service that allows registered members to broadcast, in real-time, short posts called tweets. Twitter members can broadcast tweets and follow other users’ tweets by using multiple devices, making this information system one of the fastest in the world. In this chapter, we leverage this characteristic to introduce a novel topic-detection method aimed at informing, in real-time, a specific user about the most emerging arguments expressed by the network around his/her domain interests. With this goal, we aim at formalizing the information spread over the network by studying the topology of the network and by modeling the implicit and explicit connections among the users. Then, we propose an innovative term aging model, based on a biological metaphor, to retrieve the freshest arguments of discussion, represented through a minimal set of terms, expressed by the community within the foci of interest of a specific user. We finally test the proposed model through various experiments and user studies.
Searching for Interestingness in Wikipedia and Yahoo! AnswersGabriela Agustini
In many cases, when browsing the Web, users are searching for specific information. Sometimes, though, users are also looking for something interesting, surprising, or entertain- ing. Serendipitous search puts interestingness on par with relevance. We investigate how interesting are the results one can obtain via serendipitous search, and what makes them so, by comparing entity networks extracted from two promi- nent social media sites, Wikipedia and Yahoo! Answers. PAPER ACCEPTED FOR THE WWW2013 CONFERENCE (www.2013.org).
Invited talk at Session on Semantic Knowledge for Commodity Computing, at Microsoft Research Faculty Summit 2011, July 19-20, 2011, Redmond, WA. http://research.microsoft.com/en-us/events/fs2011/default.aspx
Associated video at: https://youtu.be/HKqpuLiMXRs
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsKrist Wongsuphasawat
I gave this presentation at Workshop on Interactive Language Learning, Visualization, and Interfaces / ACL 2014 in Baltimore, MD on June 27, 2014.
http://nlp.stanford.edu/events/illvi2014/index.html
ABSTRACT
Everyday on Twitter, there are millions of thoughts that are captured and shared to the world in the form of 140-character messages, or Tweets. There are many things we could learn from these thoughts if we could figure out a way to digest this gigantic dataset. Visualization is one of the many ways to extract information from these Tweets. In this presentation, I will talk about several visualizations based on Tweets, as well as share experiences and challenges from working with Tweet data.
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
Will the rich domain knowledge from research publications and the implicit cross-domain metadata of cultural objects be compliant with each other? A contextual framework is proposed as dynamic and relational in supporting three different contexts: Reusing, Publication and Curation, which are individually constructed but overlapped with major conceptual elements. A Relations for Reusing (R4R) ontology has been devised for modeling these overlapping
conceptual components (Article, Data, Code, Provence, and License) for interlinking research outputs and cultural heritage data. In particular, packaging and citation relations are key to build up interpretations for dynamic contexts. Examples are provided for illustrating how the linking mechanism can be constructed and represented as a result to reveal the data linked in different contexts.
Information Extraction from Text, presented @ DeloitteDeep Kayal
Useful unstructured text occurs in plentiful amounts, and often is central to the success of a business. The benefits of being able to successfully decipher unstructured text can be direct or derived. Companies which offer products for medical differential diagnosis are directly benefitted by the ability to correctly extract drug-disease interactions from publications, for example. As for derived benefits of text processing, we need to look no further than cases of improving process flows by analyzing the sentiment of the emails a company receives from its customers.
Being at the frontier of natural language processing, information representation and retrieval, information extraction has been the subject of extensive research for several decades and there are plenty of existing techniques to help with the understanding of unstructured textual content. This presentation will introduce and summarize useful techniques that are helpful in tackling sub-domains of information extraction, such as named entity recognition, keyword extraction and document summarization for efficient retrieval. Additionally, the talk will also emphasize low-resource cases, when not much useful labelled information is available.
Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, "Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Experiences," Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009, Poland.
Twitter is a free social networking microblogging service that allows registered members to broadcast, in real-time, short posts called tweets. Twitter members can broadcast tweets and follow other users’ tweets by using multiple devices, making this information system one of the fastest in the world. In this chapter, we leverage this characteristic to introduce a novel topic-detection method aimed at informing, in real-time, a specific user about the most emerging arguments expressed by the network around his/her domain interests. With this goal, we aim at formalizing the information spread over the network by studying the topology of the network and by modeling the implicit and explicit connections among the users. Then, we propose an innovative term aging model, based on a biological metaphor, to retrieve the freshest arguments of discussion, represented through a minimal set of terms, expressed by the community within the foci of interest of a specific user. We finally test the proposed model through various experiments and user studies.
Searching for Interestingness in Wikipedia and Yahoo! AnswersGabriela Agustini
In many cases, when browsing the Web, users are searching for specific information. Sometimes, though, users are also looking for something interesting, surprising, or entertain- ing. Serendipitous search puts interestingness on par with relevance. We investigate how interesting are the results one can obtain via serendipitous search, and what makes them so, by comparing entity networks extracted from two promi- nent social media sites, Wikipedia and Yahoo! Answers. PAPER ACCEPTED FOR THE WWW2013 CONFERENCE (www.2013.org).
Invited talk at Session on Semantic Knowledge for Commodity Computing, at Microsoft Research Faculty Summit 2011, July 19-20, 2011, Redmond, WA. http://research.microsoft.com/en-us/events/fs2011/default.aspx
Associated video at: https://youtu.be/HKqpuLiMXRs
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
3. Motivation, Goals
Mumbai Terror Attack 2008
Citizen sensor observations (flickr, twitter,
blogs..)
No matter where you looked, tapping into a
cultural perception was impossible
We wanted to know what people in India
were saying vs. those in Pakistan or the
U.S.A
4. Spatio-Temporal-Thematic Slices of
Real-time Data
Around NEWS-WORTHY EVENTS
Using space and time as cues for extracting
social perceptions (behind signals)
Summarizing hundreds and thousands of
real-time observations
12. Find resources related to
Find resources related to
social perceptions
social perceptions
Browsing Real-time Data in Context
News and
News and
Wikipedia articles
Wikipedia articles
toto put extracted
put extracted
SOYLENT GREEN and the HEALTH CARE REFORM descriptors in
descriptors in
context
context
News and
Wikipedia articles
to put extracted
descriptors in
context
✓Exploit spatio, temporal semantics for thematic aggregation
Exploit spatio, temporal semantics for thematic aggregation
17. Topical Tweets
Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..
2: Google insights to expand hashtag list
18. Topical Tweets
Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..
2: Google insights to expand hashtag list
19. Topical Tweets
3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
1500 tweets per query
20. Topical Tweets
3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
1500 tweets per query
4. Obtain other Hashtags in crawled tweets
21. Topical Tweets
3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
1500 tweets per query
4. Obtain other Hashtags in crawled tweets
Check for topic drifts
22. Topical Tweets
3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
1500 tweets per query
4. Obtain other Hashtags in crawled tweets
Check for topic drifts
5. Repeat from Step 3 and babysit!
23. Architecture
Step1 : Gathering event-
relevant tweets
Step2: Spatial, Temporal
ata Collection, analysis metadata of tweets
and visualizing in
ly Relevant Data
ning citizen observations from Twitte
24. Geo-Coordinates of Tweets
Location a tweet originates from
Location it mentions
Approximation: Poster location on Twitter
profile
Location: Dayton, OH (Google geocoder service, GeoDB)
Location: “best place in the world” (fail!)
25. Architecture
Step1 : Gathering event-
relevant tweets
Step2: Spatial, Temporal
metadata of tweets
ta Collection, analysis and visualizing in
Step3: Spatio-temporal
clusters
y Relevant Data
26. Spatio-Temporal Clusters of Tweets
Because every event is different.. and we want to preserve social perceptions
that generated this data!
Long-running, world-wide events (Iran Election Protest)
clusters by country and week?
Short, world-wide events (Olympics)
clusters by country and day?
Long-running, evolving, local events (Health Care
Reform Debate)
clusters by state and day?
Tunable parameters
27. Tweets in a Spatio-Temporal Cluster
Spatio-temporal bias dictate granularity of
processing tweets
Mumbai Terror Attack
Cluster1: Tweets from India, 08/1/08
Cluster2: Tweets from Pakistan, 08/1/08
Cluster n: Tweets from USA, 08/13/08
28. Architecture
Step1 : Gathering event-
relevant tweets
Step2: Spatial, Temporal
metadata of tweets
Step3: Spatio-temporal
ta Collection, analysis andclusters
visualizing in
Step4: Thematic Descriptors
in spatio-temporal cluster
y Relevant Data
30. n-gram descriptors
“President Obama in trying to regain control of the
health-care debate will likely shift his pitch in September”
1-grams: President, Obama, in, trying, to, regain, ...
2-grams: “President Obama”, “Obama in”, “in
trying”, “trying to”...
3-grams: “President Obama in”, “Obama in trying”;
“in trying to”...
32. Thematic Descriptors
“President” “President Obama” “President Obama in”
A descriptor is an n-gram weighted by:
Thematic Importance
redundancy: statistically discriminatory in nature
variability: contextually important
33. Thematic Descriptors
“President” “President Obama” “President Obama in”
A descriptor is an n-gram weighted by:
Thematic Importance
redundancy: statistically discriminatory in nature
variability: contextually important
Spatial Importance (local vs. global popularity)
34. Thematic Descriptors
“President” “President Obama” “President Obama in”
A descriptor is an n-gram weighted by:
Thematic Importance
redundancy: statistically discriminatory in nature
variability: contextually important
Spatial Importance (local vs. global popularity)
Temporal Importance (always popular vs. currently
trending)
35. Thematic Importance of an n-gram
“President” “President Obama” “President Obama in”
Exploiting Redundancy
tfidf of n-gram (Lucene Index)
amplify by fraction of nouns in the n-gram
(Stanford Natural Language Parser)
amplify by fraction of non-stop words (‘going to
try’)
36. Thematic Importance of an n-gram
Exploiting Variability
Big three/Big 3; Ford, GM, Chrysler, General
Motors..
Contextually relevant words boost statistical
importance #)$
*&'+,-('$
Focus word (fw) : “big three” #(1('2-$
)/%/',$
!"#$%&'(($
Associated words (awi) : ./'0$
co-occurring in spatio-temporal set of tweets
37. Thematic Importance of an n-gram
#)$
*&'+,-('$
focus word (fw): Big Three
#(1('2-$ !"#$%&'(($
)/%/',$
associated word (awi): Ford
./'0$
Thematic importance of focus word:
tfidf of fw tfidf of awi
association strength of fw and awi
38. focus word in the given spatio-temporal corpus. The goal is to
o measure strength of associations is to useassociated words
of the focus word only with the strongly word co-occu
nguage [9]. Borrowing fromassociations is in thisword co-occure
to measure strength of past success to use area, we mea
rengthlanguage [9]. Borrowingwordpast success in this area, words a
between the focus from and the associated we meas
Contextual Relevance
strength between the focus word and the associated words as
he notion of point-wise mutual information in terms of co-o
the notion of point-wise mutual information in terms of co-oc
We measure assocstr scores as aas a function ofthe point-wisem
We measure assocstr scores function of the point-wise
etweenbetween the word Strengthcontextandawi .i . This is done
the focus focus word and the context of awi This is done
Association and the of fw of aw
ssociation strengths are determined in in the contexts thatthe d
association strengths are determined the contexts that the
Let us depends on contexts Cawi ={caw1 ,caw ..} where caw
et us call thecall the contexts foras iCawi ={caw1 ,caw22 ..},, wherecawk
contexts for awi aw as
strong descriptors collocate with awawiassoc str(f w,aw) )isis
rong descriptors that that collocate with . . assoc (f w,awi c
i str i
Contexts of associated P (pmi(f w,caw ))
word awi : ‘Ford’
assocstr (f w,awP (pmi(f w,caw k ,∀cawk ∈Cawi
i )=
k
k ))
|Cawi |
!"#$%&'(($ assocstr (f w,awi )= k ,∀cawk ∈Caw
|Cawi |
where the point-wise mutual information between f w and ca
here the i)*'+$is calculated as:
aw ),point-wise mutual information between f w and c
Pointwise Mutual Information
wi ), is calculated big
chrysler, GM, as: 3 p(f w,caw )
k p(cawk |f w)
pmi(f w,cawk )=log p(f w)p(caw )
=log p(cawk )
k
focus, model, release.. w,cawk )=log p(f w)p(caw ) ) is thep(cawk |f)
where p(f w)= pmi(f k |f w)=
n(f w)
;p(caw
p(f w,cawk
n(cawk ,f w)
w)
; n(f w) =log frequency
p(caw
N n(f w) k k
39. ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo
b) Top 15 extracted descriptors in the US for Mumbai attack even
ocus word and all associations in Cf w . The thematic weights of
long with Temporal Importance of a1 to compu
their strengths are plugged into Eqn
Descriptor
hematic score ngrami (th), of the n-gram descriptor.
B. Temporal Importance of an event descriptor: While th
re good indicators of what will always dominate
Certain descriptors is important in a spatio-tempora
escriptors tend to dominate discussions. In order to allow
discussions
ossibly interesting descriptors to surface, we discount the th
“Terrorism” in Mumbai Terror Attack Tweets
escriptor depending on how popular it has been in the recent p
iscount score for a n-gram, a Care reform debatedepending on
“Healthcare” in Health tuneable factor
vent, is calculated over a period of time as:
Allow recent (possibly interesting) ones to
surface ngram (te)=temporal ∗
PD ngrami (th)d
i bias d=1 d
0-1 bias: less to more importance
here ngrami (th)d is the enhanced thematic score
to recent n-grams of the descri
40. ration for which we wish to apply the dampening factor, for exa
nt week. However, this temporal discount might not be relevant f
ons. For this reason, we also apply a temporalbias weight ranging fr
weight closer to 1 Importance of while a weight closer to 0
Spatial activity.
gives more importance, a Descriptor
portance to past
ial Importance of an event descriptor: We also discount the im
a descriptor based on its occurence in other spatio-temporal sets
is that Local descriptors are more interesting compared ar
descriptors that occur all over the world on a given day
sting compared to those that occur only in the spatio-temporal set
to global ones
We define the spatial discount score for an n-gram as a fraction of sp
Spatial discount
artitions (e.g. countries) that had activity surrounding this descri
k
ngrami (sp)= |spatio−temporalsets| ∗(1−spatialbias )
fraction of spatio-temporal closer to 0 = global
clusters n-gram occurred in importance
41. of importance to the global presence of the descripto
ng on the event of interest, both these discounting fa
rent spatio-temporal sets. For example, when processi
STT Score of an n-gram
Mumbai attack setting the spatialbias to 1 eliminate
ial signals. While processing tweets from the US, on
obal bias given that the event did not originate the
are setSpatio-temporal-thematic score of aof observations
before we begin the processing descriptor
he spatial thematic score - spatio-temporal discountsfrom
= and temporal effects are discounted
final spatio-temporal-thematic (STT) weight of the n
wi =ngrami (th)−ngrami (te)−ngrami (sp)
illustrates the effect of our enhanced STT weights
ptors pertaining to the Mumbai terror attack event,