3. 19/3
Introduction
• The use of social networks has altered the way of life of online community since
last decade.
• Social data uses in:
• Academic applications
• E-commerce
• Discovers the user habits and interests of different geographical online communities
• Sentimental analysis of users
• Purpose: Support analysts in decision-making and optimal resource
management in businesses as well as web maintenance.
4. 19/4
Introduction (continue)
• The social data is one of the powerful sources of data:
• To get knowledge about social communities
• Investigate the behavior and other different aspects of the online communities
• User-generated contents (UGC) used to help online organizations to enhance
their services based on user perspectives.
• The data mining techniques are effectively exploited to discover hidden,
interested and meaningful knowledge from the social data.
5. 19/5
Related Works
• TwitterEcho
• Collect data from distributed architecture (Portuguese Twittosphere)
• Use of micro-blogging as the means to predict the political sentiment.
• TWICALL
• Discovers important events, categorizes and classifies them
• NIF-T
• Exploring data published on micro-blogging websites (i.e. Twitter)
7. 19/7
Collecting and preprocessing of tweets
• Access tweets using Twitter API.
• Received tweets are unsuitable for the subsequent processes.
• Includes information which is not required for problem under consideration
• Remove unnecessary information and transform them into items and related
contextual features.
Access data using
Twitter API
Remove Unnecessary
Information
Transform into
suitable format
Mapped into a
transactional database
8. 19/8
Collecting and preprocessing of tweets
• Transformed tweets are then mapped into a transactional database.
• Composed of set of stems
• i.e. “Imagination is more important than knowledge” may be mapped into {imagination,
important, knowledge}
Access data using
Twitter API
Remove Unnecessary
Information
Transform into
suitable format
Mapped into a
transactional database
9. 19/9
Discovery of Correlations
• Use apriori method to extract frequent itemset mining.
• An association rule is usually represented as: If Body then Head
• If Body happens then there are more chance that Head may also happen
• It is the relationship between them
• Strength of the rule depends on association rule support and confidence
• The higher the strength of the rule, higher the association in between the terms.
• 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑡𝑖𝑜𝑛 ⇒ 𝑘𝑛𝑜𝑤𝑙𝑒𝑑𝑔𝑒
• Support = 40%
• Confidence = 70%
10. 19/10
Taxonomy Generation
• Automatically generates taxonomy based on tweet attributes (i.e. frequent
keywords that are generated in the previous phase).
• The more generalized or high-level concepts or correlations can be extracted.
• The taxonomy nodes represent distinct terms extracted from tweet contents
• Graph extraction
• Graph partitioning and pruning
11. 19/11
Taxonomy Generation (Graph extraction)
• Strong correlations are detected using previous phase result.
• Generated correlations are represented in graph format
• Edge: The implications present in the rule
• Vertices: Items of tweet contents
• 𝑐𝑜𝑢𝑛𝑡𝑟𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑
𝑠𝑜𝑐𝑖𝑒𝑡𝑦, 𝑝𝑒𝑜𝑝𝑙𝑒 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦
𝑝𝑒𝑎𝑐𝑒 ⇒ 𝑊𝑜𝑟𝑙𝑑
𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑
𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦
12. 19/12
Taxonomy Generation (Graph partitioning and pruning)
• Makes the graph compact
• Prunes edges which do not have string relevant relationship by performing
vertex labeling. (Label represents level of taxonomy)
13. 19/13
Analyzing Correlations
• The selection and ranking of the significant correlations
• The selection is made having
• A rule schema < 𝐾𝑒𝑦𝑤𝑜𝑟𝑑,∗ > ⇒ < 𝑃𝑙𝑎𝑐𝑒,∗ >
• Given interesting rule items < 𝐾𝑒𝑦𝑤𝑜𝑟𝑑, 𝑆𝑐ℎ𝑜𝑜𝑙 > ⇒ < 𝑃𝑙𝑎𝑐𝑒, 𝐿𝑜𝑛𝑑𝑜𝑛 >
• The results ranked based on their support and confidence quality indexes.
14. 19/14
Experimental Evaluation
• The proposed framework highlights famous topical subjects (i.e. European
Union)
• The results includes 58 transactions with 209 distinct items (i.e. keywords).
• Firstly, the effectiveness is presented in two scenarios:
• User behavior analysis
• Topic trend analysis
• Secondly, the effectiveness is presented as quality of generated taxonomies.
15. 19/15
User Behavior Analysis
• Extracted correlations allow experts to highlight hidden and potentially
interesting user behaviors.
• 𝑝𝑒𝑎𝑐𝑒 ⇒ 𝑊𝑜𝑟𝑙𝑑, 𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦, 𝑐𝑜𝑢𝑛𝑡𝑟𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑
• Proposed framework automatically generates the taxonomy from the mined rules.
• The taxonomy clearly highlights the behavior of people towards the peace.
16. 19/16
Topic Trend Analysis
• Discovery and analysis of currently matter of contention on Twitter.
• Domain expert wants to discover subjects of topical interest for Twitter users.
• The taxonomy suggests that society as a general and people in particular are
concerns with peace in the World.
17. 19/17
Quality of generated taxonomies
• The evaluation of taxonomy generation is measured with
• Global quality (Using geometry average)
• Local quality (Degree of correlation between non-leaf and leaf nodes)
• Spread (Number of nodes across the taxonomy to move from node to its root node in graph)
• The results are compared with the approach of
• “Evolutionary Taxonomy Construction from Dynamic Tag Space”, 2010
18. 19/18
Quality of generated taxonomies (continue)
• Global quality remained same in both approaches.
• Produced pretty balanced local quality vs. spread measurement indexes.
• Proposed approach takes slightly less time comparing with the approach
reported in.
19. 19/19
Conclusion
• Present the mechanism of extracting hidden correlations between contents.
• Generated correlations are helpful to understand the hidden associations
among the textual and contextual features of the UGC.
• Proposed approach automatically generates taxonomy.
• The experimental results validate the efficiency and effectiveness of the
proposed framework.