Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Knowledge-base Enabled Information
Filtering on Social Web
Pavan Kapanipathi
Kno.e.sis Center, Wright State University
Adv...
Kno.e.sis
2
Social Web in 60 secs
3
Social Web in 60 secs
500M users generate 500M tweets per day
4
Disaster Management Organizations
utilize Social Web
35% of 20M tweets during
hurricane sandy shared information
and news ...
Healthcare Issues
6
Healthcare Issues
7
Personalized Filtering on Social Web
Following Dynamically
Evolving Topics as
interests
8
Personalization on Social Web
• Following Dynamically
Evolving Topics
• Indian Elections
• US Elections
• Heathcare Debate...
Personalization on Social Web
• Following Dynamically
Evolving Topics
• Indian Elections
• US Elections
• Heathcare Debate...
Dynamic Topics
11
Dynamic Topics
Continuously
Evolving on
Twitter
Entity – Event
relevance
changes
Many entities
are involved
12
Dynamic Topics
Manually crawl using
keywords
“indianelection”“jan25” “sandy”
“swineflu” “ebola”
13
Dynamic Topics
Manually updating
keywords to get topic
relevant tweets is not
feasible
“indianelection”
“modi”
“bjp”
“cong...
Problem
How can we automatically update
the filters to track a dynamically
evolving topic on Twitter
15
Hashtags as Filters
• Identify a topic on Twitter
• Tweets with hashtags are
more informative
• Users have a lot of freedo...
Exploring Hashtags as Evolving
Filters for Dynamic Topics
Colorado Shooting
17
Exploring Hashtags as Evolving
Filters for Dynamic Topics
Colorado Shooting
Occupy Wall Street
18
Exploring Hashtags as Evolving
Filters for Dynamic Topics
Colorado Shooting
Occupy Wall Street
CS OWS
Tweets: 122,062 Twee...
Exploring Hashtags as Evolving
Filters for Dynamic Topics
Colorado Shooting
Occupy Wall Street
CS OWS
Tweets: 122,062 Twee...
Colorado Shooting Occupy Wall Street
Hashtag Filters Co-occurrence
Graph
21
Colorado Shooting Occupy Wall Street
Event Related
Hashtags co-occur
with each other
Hashtag Filters Co-occurrence
Graph
22
Summarizing Hashtag Analysis
Starting with one of the event
relevant hashtags, by co-
occurrence we can reach other
releva...
Determining Relevancy of Co-
occurring Hashtags
#indianelection2015
#modikisarkar
Too many
co-occurring hashtags
24
Hashtag Filters distributions
25
Not surprising
It’s a Powerlaw
distribution
Hashtag distributions
26
Top 1% retrieves
around 85% of the
tweets
Hashtag distributions
27
Clustering Co-efficient of Hashtag
Co-occurrence network (1%)
Clustering co-efficient
The top ones co-occur
with each othe...
Determining Relevancy of Co-
occurring Hashtags
#indianelection2015
#modikisarkar
Co-occurring:
Threshold δ
Preferably a p...
Hashtag Co-occurrence
works?
o No. Just co-occurrence does not work
o Many noisy or unrelated hashtags co-occurs
o Determi...
Determining Relevancy of Co-
occurring Hashtags
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,50...
Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Thres...
Event Relevant Background
Knowledge
o Wikipedia Event Pages
33
o Wikipedia Event Pages
Event Relevant Background
Knowledge
34
o Entities mentioned on the Event page of
Wikipedia are relevant to the Event
Event Relevant Background
Knowledge
35
o Wikipedia’s Hyperlink structure is very
rich
o Page-Page (Wikipedia) links
Indian General
Election, 2014
Narendra Modi
R...
Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Thres...
o Hyperlink structure is dynamically
updated
Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (Indi...
o Hyperlink structure is dynamically
updated
Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (Indi...
o Hyperlink structure is dynamically
updated
Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (Indi...
Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Thres...
o Edge Based Measure
o Link Overlap Measure: Jaccard similarity
o Out(c) are the links in Wikipedia page “c”
o Final Score...
Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Thres...
Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Thres...
o Set Based
o Jaccard Similarity
o Considers the entities without the scores
o Vector Based
o Symmetric
o Cosine Similarit...
India General
Election 2014
Narendra
Modi
Intuition behind
Asymmetric
India General
Election 2014
Narendra
Modi
Penalized
...
Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Thres...
o 2 events
o US Presidential Elections (#election2012)
o Hurricane Sandy (#sandy)
o Top 25 co-occurring hashtags
Evaluatio...
o Ranking Problem
o Rank the Top 25 hashtags based on the
relevancy of tweets to the event
o Experiment with all the simil...
Evaluation
50
Evaluation
Evaluated tweets comprising of top-
relevant hashtags detected for
dynamic topics
• NDCG - 92% at top-5 Mean Av...
A little
pause for
Questions?
52
Personalized Filtering
53
User Interest
Identification/User
Modeling
Filtering Module
Twitter Streaming API
Tweets
Network...
Personalized Filtering
54
User Interest
Identification/User
Modeling
Filtering Module
Twitter Streaming API
Tweets
Network...
Personalized Filtering
55
User Interest
Identification/User
Modeling
Filtering Module
Twitter Streaming API
Tweets
Network...
o User Interest Identification on Twitter
o Content-based (Only Tweets)
o Term-based (semantic, web, #semanticweb)
o Entit...
A simple solution to most problems I
am trying to solve
Hierarchical
Interest Graphs
58
What is in your mind? (Next
concept/term)
59
What is in your mind? (Next
concept/term)
Fruit
60
What is in your mind? (Next
concept/term)
Fruit
Other Fruit
Names
61
Cognitive Science
o Human memory has been argued to be
structured as a hierarchy of concepts
(Semantic Network)
o Spreadin...
Hierarchical Interest Graphs
o Extending user profiles from Twitter to
comprise a hierarchy of concepts
o Hierarchy of con...
64
Semantic
Search
Linked Data Metadata
0.8 0.2 0.6
Scores for
Interests
65
User Interests
Internet
Semantic
Search
Linked Data Metadata
Technology
World Wide Web
Semantic
Web
Structured
Information
0.8 0.2 0.6
Sc...
Internet
Semantic
Search
Linked Data Metadata
Technology
World Wide Web
Semantic
Web
Structured
Information
0.8 0.2 0.6
Sc...
68
Tweets
Approach
69
Tweets
Approach
70
Wikipedia Category Graph
Contains
Cycles
More abstract:
World Wide Web or
Semantic Web?
71
Wikipedia Hierarchy
Hierarchical Levels
No Cycles
1
2
3
4
5
6
72
Tweets
Approach
73
http://en.wikipedia.org/wiki/Semantic_search
http://en.wikipedia.org/wiki/Ontology
o Extracting Wikipedia entities
o In...
Internet
Semantic
Search
Linked Data Metadata
Technology
World Wide Web
Semantic
Web
User Interests
Structured
Information...
75
Tweets
Approach
76
Cricket
M S Dhoni Virat Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers
0.8 0.2 0.6
0.5
0.4
0.25
0.1
Act...
o Simple Activation Function
𝐴𝑗 = 𝐴𝑖 × 𝑊𝑖𝑗 × 𝐷𝑛
𝑖=0
𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑐ℎ𝑖𝑙𝑑 𝑜𝑟 𝑠𝑢𝑏𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑜𝑓 𝑗 𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑 .
𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑡𝑜 𝑏𝑒...
o Uneven distribution of nodes in the
hierarchy
o Many-many for category-subcategory
relationships
78
78
Challenges – Wiki...
o Uneven distribution of nodes in the
hierarchy
o Many-many for category-subcategory
relationships
79
79
Challenges – Wiki...
o Uneven distribution of nodes in the
hierarchy
o Many-many for category-subcategory
relationships
80
80
Challenges – Wiki...
81
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
50000
100000
150000
200000
250000
300000
NumberofNodes
Hierarchical Level
81
A...
o Uneven distribution of nodes in the
hierarchy
o Many-many for category-subcategory
relationships
82
82
Challenges – Wiki...
83
83
Preferential Path Constraint –
Many to Many Links
84
84
Preferential Path Constraint –
Many to Many Links
85
1 2 3 4
85
Preferential Path Constraint –
Many to Many Links
Boosting Common Ancestors
o Nodes that intersect domains/subcategories
activated by diverse entities
86
86
Boosting Common Ancestors
87
Cricket
M S Dhoni Virat Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers3
3
5
5...
88
88
Boosting Common Ancestors
o Bell
𝐴𝑗 = 𝐴𝑖 × 𝐹𝑗
𝑛
𝑖=0
o Bell Log
𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗
𝑛
𝑖=0
o Priority Intersect
𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗 × 𝑃𝑗𝑖 × 𝐵𝑗
𝑛
𝑖=0
89
Activation...
Evaluation
User Study
• 37 Users
• 30K Tweets
Evaluated the top-10 categories of
interests derived from the hierarchy
• 76...
o Working on a Tweet recommendation
system that utilizes Hierarchical
Interest Graph
o Preliminary results are “interestin...
Conclusion
o Focus on “Information” overload instead of
“Data” overload.
o Personalized Information Filtering
o Knowledge-...
93
More at Kno.e.sis
kHealth
Knowledge-enabled Healthcare
Applied to ADHF, Asthma, GI, and Dementia
94
Through physical monitoring and
analysis, our cellphones could act as
an early warning system to detect
serious health con...
Social Health
Signals
96
Motivational Scenario
Manually going through
news articles, diabetes
forums, blogs, etc.
- Time consuming
- Relevant?
Inte...
98
Search and Explore
X Controls
Cancer
X = diet, treatment, exercise
(Pattern-based Approach
leveraging domain
semantics)...
Thanks
Contact:
Email-pavan@knoesis.org
Twitter:@pavankaps
Webpage:
http://knoesis.org/researchers/pavan
99
Upcoming SlideShare
Loading in …5
×

Knowledge base enabled Information Filtering on Social Web -- EMC

657 views

Published on

Invited talk at EMC on Information Filtering on the social web. Includes Continuous Semantics and Hierarchical Interest Graphs.

Published in: Social Media
  • Be the first to comment

Knowledge base enabled Information Filtering on Social Web -- EMC

  1. 1. Knowledge-base Enabled Information Filtering on Social Web Pavan Kapanipathi Kno.e.sis Center, Wright State University Advisor: Amit Sheth 1
  2. 2. Kno.e.sis 2
  3. 3. Social Web in 60 secs 3
  4. 4. Social Web in 60 secs 500M users generate 500M tweets per day 4
  5. 5. Disaster Management Organizations utilize Social Web 35% of 20M tweets during hurricane sandy shared information and news about the disaster 5
  6. 6. Healthcare Issues 6
  7. 7. Healthcare Issues 7
  8. 8. Personalized Filtering on Social Web Following Dynamically Evolving Topics as interests 8
  9. 9. Personalization on Social Web • Following Dynamically Evolving Topics • Indian Elections • US Elections • Heathcare Debate 9
  10. 10. Personalization on Social Web • Following Dynamically Evolving Topics • Indian Elections • US Elections • Heathcare Debate 10
  11. 11. Dynamic Topics 11
  12. 12. Dynamic Topics Continuously Evolving on Twitter Entity – Event relevance changes Many entities are involved 12
  13. 13. Dynamic Topics Manually crawl using keywords “indianelection”“jan25” “sandy” “swineflu” “ebola” 13
  14. 14. Dynamic Topics Manually updating keywords to get topic relevant tweets is not feasible “indianelection” “modi” “bjp” “congress” “jan25” “egypt” “tunisia” “arabspring” “sandy” “newyork” “redcross” “fema” “swineflu” “ebola” 14
  15. 15. Problem How can we automatically update the filters to track a dynamically evolving topic on Twitter 15
  16. 16. Hashtags as Filters • Identify a topic on Twitter • Tweets with hashtags are more informative • Users have a lot of freedom to create them • Some get popular, most die 16
  17. 17. Exploring Hashtags as Evolving Filters for Dynamic Topics Colorado Shooting 17
  18. 18. Exploring Hashtags as Evolving Filters for Dynamic Topics Colorado Shooting Occupy Wall Street 18
  19. 19. Exploring Hashtags as Evolving Filters for Dynamic Topics Colorado Shooting Occupy Wall Street CS OWS Tweets: 122,062 Tweets: 6,077,378 Tags: 192,512 Distinct: 12,350 100% Retrieval: 7,763 Tags: 15,963,209 Distinct: 191,602 100% Retrieval: 21,314 19
  20. 20. Exploring Hashtags as Evolving Filters for Dynamic Topics Colorado Shooting Occupy Wall Street CS OWS Tweets: 122,062 Tweets: 6,077,378 Tags: 192,512 Distinct: 12,350 100% Retrieval: 7,763 Tags: 15,963,209 Distinct: 191,602 100% Retrieval: 21,314 HASHTAG FILTERS 20
  21. 21. Colorado Shooting Occupy Wall Street Hashtag Filters Co-occurrence Graph 21
  22. 22. Colorado Shooting Occupy Wall Street Event Related Hashtags co-occur with each other Hashtag Filters Co-occurrence Graph 22
  23. 23. Summarizing Hashtag Analysis Starting with one of the event relevant hashtags, by co- occurrence we can reach other relevant hashtags 23
  24. 24. Determining Relevancy of Co- occurring Hashtags #indianelection2015 #modikisarkar Too many co-occurring hashtags 24
  25. 25. Hashtag Filters distributions 25
  26. 26. Not surprising It’s a Powerlaw distribution Hashtag distributions 26
  27. 27. Top 1% retrieves around 85% of the tweets Hashtag distributions 27
  28. 28. Clustering Co-efficient of Hashtag Co-occurrence network (1%) Clustering co-efficient The top ones co-occur with each other the best 28
  29. 29. Determining Relevancy of Co- occurring Hashtags #indianelection2015 #modikisarkar Co-occurring: Threshold δ Preferably a prominent hashtag 29
  30. 30. Hashtag Co-occurrence works? o No. Just co-occurrence does not work o Many noisy or unrelated hashtags co-occurs o Determine the “dynamic” relevance of the top co-occurring hashtag with the dynamic topic 30
  31. 31. Determining Relevancy of Co- occurring Hashtags #indianelection2015 #modikisarkar Co-occurring: Threshold Latest K (200,500) Narendra Modi: 0.9 BJP: 0.7 NDA: 0.6 India: 0.4 Elections: 0.2 Rahul Gandhi: 0.2 Congress: 0.2 Entity Extraction and Scoring δ Normalized Frequency Scoring 31 (Vector Space Model)
  32. 32. Determining Relevancy of Co- occurring Hashtags (Vector Space Model) #indianelection2015 #modikisarkar Co-occurring: Threshold Latest K (200,500) Narendra Modi: 0.9 BJP: 0.7 NDA: 0.6 India: 0.4 Elections: 0.2 Rahul Gandhi: 0.2 Congress: 0.2 Entity Extraction and Scoring Indian General Election,_2014 Dynamically Updated Background Knowledge δ 32
  33. 33. Event Relevant Background Knowledge o Wikipedia Event Pages 33
  34. 34. o Wikipedia Event Pages Event Relevant Background Knowledge 34
  35. 35. o Entities mentioned on the Event page of Wikipedia are relevant to the Event Event Relevant Background Knowledge 35
  36. 36. o Wikipedia’s Hyperlink structure is very rich o Page-Page (Wikipedia) links Indian General Election, 2014 Narendra Modi Rahul Gandhi NDA (India)UPA (India) BJP Indian National Congress Event Relevant Background Knowledge – Graph Structure 36
  37. 37. Determining Relevancy of Co- occurring Hashtags (Vector Space Model) #indianelection2015 #modikisarkar Co-occurring: Threshold Latest K (200,500) Narendra Modi: 0.9 BJP: 0.7 NDA: 0.6 India: 0.4 Elections: 0.2 Rahul Gandhi: 0.2 Congress: 0.2 Entity Extraction and Scoring Indian General Election,_2014 Extract, Periodically Update Hyperlink structure One hop from Event Page δ 37
  38. 38. o Hyperlink structure is dynamically updated Indian General Election, 2014 Narendra Modi Rahul Gandhi NDA (India)UPA (India) BJP Indian National Congress 10 May 2010 Event Relevant Background Knowledge 38
  39. 39. o Hyperlink structure is dynamically updated Indian General Election, 2014 Narendra Modi Rahul Gandhi NDA (India)UPA (India) BJP Indian National Congress 10 May 2010 29 March 2013 29 March 2013 29 March 2013 29 March 2013 Event Relevant Background Knowledge 39
  40. 40. o Hyperlink structure is dynamically updated Indian General Election, 2014 Narendra Modi Rahul Gandhi NDA (India)UPA (India) BJP Indian National Congress 10 May 2010 29 March 2013 29 March 2013 29 March 2013 29 March 2013 20 May 2013 20 May 2013 Event Relevant Background Knowledge 40
  41. 41. Determining Relevancy of Co- occurring Hashtags (Vector Space Model) #indianelection2015 #modikisarkar Co-occurring: Threshold Latest K (200,500) Narendra Modi: 0.9 BJP: 0.7 NDA: 0.6 India: 0.4 Elections: 0.2 Rahul Gandhi: 0.2 Congress: 0.2 Entity Extraction and Scoring Indian General Election,_2014 Extract, Periodically Update Hyperlink structure Entity scoring based on relevance to the Event One hop from Event Page δ 41
  42. 42. o Edge Based Measure o Link Overlap Measure: Jaccard similarity o Out(c) are the links in Wikipedia page “c” o Final Score: r(c,E) = ed(c,E) + oco(c,E) Hyperlink Entity Scoring India General Election, 2014 Narendra Modi India General Election, 2014 India General Election, 2009 1 Mutually Important ed (c,E) = 1 ed (c,E) = 2 42
  43. 43. Determining Relevancy of Co- occurring Hashtags (Vector Space Model) #indianelection2015 #modikisarkar Co-occurring: Threshold Latest K (200,500) Narendra Modi: 0.9 BJP: 0.7 NDA: 0.6 India: 0.4 Elections: 0.2 Rahul Gandhi: 0.2 Congress: 0.2 Entity Extraction and Scoring Indian General Election,_2014 Extract, Periodically Update Hyperlink structure Entity scoring based on relevance to the Event One hop from Event Page Indian General Elec: 1.0 India: 0.9 Elections: 0.7 UPA: 0.6 BJP: 0.3 NDA: 0.3 Narendra Modi: 0.3 δ 43
  44. 44. Determining Relevancy of Co- occurring Hashtags (Vector Space Model) #indianelection2015 #modikisarkar Co-occurring: Threshold Latest K (200,500) Narendra Modi: 0.9 BJP: 0.7 NDA: 0.6 India: 0.4 Elections: 0.2 Rahul Gandhi: 0.2 Congress: 0.2 Entity Extraction and Scoring Indian General Election,_2014 Extract, Periodically Update Hyperlink structure Entity scoring based on relevance to the Event One hop from Event Page Indian General Elec: 1.0 India: 0.9 Elections: 0.7 UPA: 0.6 BJP: 0.3 NDA: 0.3 Narendra Modi: 0.3 Similarity Check Relevance Score: 0.6 δ 44
  45. 45. o Set Based o Jaccard Similarity o Considers the entities without the scores o Vector Based o Symmetric o Cosine Similarity o Asymmetric o Subsumption Similarity Similarity Check 45
  46. 46. India General Election 2014 Narendra Modi Intuition behind Asymmetric India General Election 2014 Narendra Modi Penalized Ignored Similarity Symmetric Asymmetric 46
  47. 47. Determining Relevancy of Co- occurring Hashtags (Vector Space Model) #indianelection2015 #modikisarkar Co-occurring: Threshold Latest K (200,500) Narendra Modi: 0.9 BJP: 0.7 NDA: 0.6 India: 0.4 Elections: 0.2 Rahul Gandhi: 0.2 Congress: 0.2 Entity Extraction and Scoring Indian General Election,_2014 Extract, Periodically Update Hyperlink structure Entity scoring based on relevance to the Event One hop from Event Page Indian General Elec: 1.0 India: 0.9 Elections: 0.7 UPA: 0.6 BJP: 0.3 NDA: 0.3 Narendra Modi: 0.3 Similarity Check Relevance Score: 0.6 δ 47
  48. 48. o 2 events o US Presidential Elections (#election2012) o Hurricane Sandy (#sandy) o Top 25 co-occurring hashtags Evaluation – Dataset 48
  49. 49. o Ranking Problem o Rank the Top 25 hashtags based on the relevancy of tweets to the event o Experiment with all the similarity metrics o Manually annotated the tweets of these hashtags as relevant/irrelevant (Gold Standard) o Ranking Evaluation Metrics o Mean Average Precision o NDCG Evaluation – Strategy 49
  50. 50. Evaluation 50
  51. 51. Evaluation Evaluated tweets comprising of top- relevant hashtags detected for dynamic topics • NDCG - 92% at top-5 Mean Average Precision 51
  52. 52. A little pause for Questions? 52
  53. 53. Personalized Filtering 53 User Interest Identification/User Modeling Filtering Module Twitter Streaming API Tweets Network Filtered Tweets
  54. 54. Personalized Filtering 54 User Interest Identification/User Modeling Filtering Module Twitter Streaming API Tweets Network Filtered Tweets Dynamic Topics as Interests Interest: Indian Elections
  55. 55. Personalized Filtering 55 User Interest Identification/User Modeling Filtering Module Twitter Streaming API Tweets Network Filtered Tweets A Significant Module
  56. 56. o User Interest Identification on Twitter o Content-based (Only Tweets) o Term-based (semantic, web, #semanticweb) o Entity-based (sematic web <same as> #semanticweb) o Interest Graphs derived from knowledge-base (Hierarchical Interest Graphs) o Collaborative (Users’ Friends) o Hybrid User Modeling 56
  57. 57. A simple solution to most problems I am trying to solve
  58. 58. Hierarchical Interest Graphs 58
  59. 59. What is in your mind? (Next concept/term) 59
  60. 60. What is in your mind? (Next concept/term) Fruit 60
  61. 61. What is in your mind? (Next concept/term) Fruit Other Fruit Names 61
  62. 62. Cognitive Science o Human memory has been argued to be structured as a hierarchy of concepts (Semantic Network) o Spreading activation theory has been utilized to simulate search on semantic network o This theory has not been well explored for user interest modeling 62
  63. 63. Hierarchical Interest Graphs o Extending user profiles from Twitter to comprise a hierarchy of concepts o Hierarchy of concepts are derived from Wikipedia Category Structure o Each concept in the hierarchy is scored based on the users extent of interest 63
  64. 64. 64
  65. 65. Semantic Search Linked Data Metadata 0.8 0.2 0.6 Scores for Interests 65 User Interests
  66. 66. Internet Semantic Search Linked Data Metadata Technology World Wide Web Semantic Web Structured Information 0.8 0.2 0.6 Scores for Interests 66 User Interests
  67. 67. Internet Semantic Search Linked Data Metadata Technology World Wide Web Semantic Web Structured Information 0.8 0.2 0.6 Scores for Interests 67 User Interests 0.7 0.5 0.4 0.3
  68. 68. 68 Tweets Approach
  69. 69. 69 Tweets Approach
  70. 70. 70 Wikipedia Category Graph Contains Cycles More abstract: World Wide Web or Semantic Web?
  71. 71. 71 Wikipedia Hierarchy Hierarchical Levels No Cycles 1 2 3 4 5 6
  72. 72. 72 Tweets Approach
  73. 73. 73 http://en.wikipedia.org/wiki/Semantic_search http://en.wikipedia.org/wiki/Ontology o Extracting Wikipedia entities o Interest Scoring o Frequency based User Profile Generation
  74. 74. Internet Semantic Search Linked Data Metadata Technology World Wide Web Semantic Web User Interests Structured Information 0.8 0.2 0.6 Scores for Interests 74
  75. 75. 75 Tweets Approach
  76. 76. 76 Cricket M S Dhoni Virat Kohli Sachin Tendulkar Sports Indian Cricket Indian Cricketers 0.8 0.2 0.6 0.5 0.4 0.25 0.1 Activation Function Determines the extent of spreading Example
  77. 77. o Simple Activation Function 𝐴𝑗 = 𝐴𝑖 × 𝑊𝑖𝑗 × 𝐷𝑛 𝑖=0 𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑐ℎ𝑖𝑙𝑑 𝑜𝑟 𝑠𝑢𝑏𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑜𝑓 𝑗 𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑 . 𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑡𝑜 𝑏𝑒 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑. 𝑊𝑖𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑒𝑑𝑔𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑗 𝑎𝑛𝑑 𝑖. 𝐷 𝑖𝑠 𝑡ℎ𝑒 𝑑𝑒𝑐𝑎𝑦 𝑓𝑎𝑐𝑡𝑜𝑟. 77 Activation Function
  78. 78. o Uneven distribution of nodes in the hierarchy o Many-many for category-subcategory relationships 78 78 Challenges – Wikipedia Category Graph
  79. 79. o Uneven distribution of nodes in the hierarchy o Many-many for category-subcategory relationships 79 79 Challenges – Wikipedia Category Graph
  80. 80. o Uneven distribution of nodes in the hierarchy o Many-many for category-subcategory relationships 80 80 Challenges – Wikipedia Category Graph
  81. 81. 81 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 50000 100000 150000 200000 250000 300000 NumberofNodes Hierarchical Level 81 Addressing Uneven Node Distribution
  82. 82. o Uneven distribution of nodes in the hierarchy o Many-many for category-subcategory relationships 82 82 Challenges – Wikipedia Category Graph
  83. 83. 83 83 Preferential Path Constraint – Many to Many Links
  84. 84. 84 84 Preferential Path Constraint – Many to Many Links
  85. 85. 85 1 2 3 4 85 Preferential Path Constraint – Many to Many Links
  86. 86. Boosting Common Ancestors o Nodes that intersect domains/subcategories activated by diverse entities 86 86
  87. 87. Boosting Common Ancestors 87 Cricket M S Dhoni Virat Kohli Sachin Tendulkar Sports Indian Cricket Indian Cricketers3 3 5 5 Michael Clarke Shane Watson Australian Cricket Australian Cricketers 2 2 87
  88. 88. 88 88 Boosting Common Ancestors
  89. 89. o Bell 𝐴𝑗 = 𝐴𝑖 × 𝐹𝑗 𝑛 𝑖=0 o Bell Log 𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗 𝑛 𝑖=0 o Priority Intersect 𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗 × 𝑃𝑗𝑖 × 𝐵𝑗 𝑛 𝑖=0 89 Activation Functions
  90. 90. Evaluation User Study • 37 Users • 30K Tweets Evaluated the top-10 categories of interests derived from the hierarchy • 76% Mean Average Precision • 98% Mean Reciprocal Recall • 70% are not mentioned in tweets 90
  91. 91. o Working on a Tweet recommendation system that utilizes Hierarchical Interest Graph o Preliminary results are “interesting”  91 Tweet Recommendation using Hierarchical Interest Graph
  92. 92. Conclusion o Focus on “Information” overload instead of “Data” overload. o Personalized Information Filtering o Knowledge-base enabled solutions for challenges in Tweets filtering o Wikipedia hyperlink structure and category graph leveraged for Twitter data filtering o More Research on User Specific Attribute Extraction (Personalization) from Twitter Data o Activity Estimation o Location Prediction
  93. 93. 93 More at Kno.e.sis
  94. 94. kHealth Knowledge-enabled Healthcare Applied to ADHF, Asthma, GI, and Dementia 94
  95. 95. Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information canary in a coal mine Empowering Individuals (who are not Larry Smarr!) for their own health kHealth: knowledge-enabled healthcare 95
  96. 96. Social Health Signals 96
  97. 97. Motivational Scenario Manually going through news articles, diabetes forums, blogs, etc. - Time consuming - Relevant? Interesting? Informative? Useful? 97 How about all the relevant and important health information aggregated at one platform? A diabetic patient is interested in keeping himself up to date with new information about diabetes
  98. 98. 98 Search and Explore X Controls Cancer X = diet, treatment, exercise (Pattern-based Approach leveraging domain semantics) Top Health News Informative news about selected disease Faceted search (by health topics) Learn about disease Source: Wikipedia Search & Explore Top Health News Tweet Traffic Learn about Disease Home
  99. 99. Thanks Contact: Email-pavan@knoesis.org Twitter:@pavankaps Webpage: http://knoesis.org/researchers/pavan 99

×