Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced Knowledge Bases


Published on

A overview of my research addressing big data challenges on the social web.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced Knowledge Bases

  1. 1. Addressing Volume and Velocity Challenge on the Social Web using Crowd-Sourced Knowledge Bases. Pavan Kapanipathi Kno.e.sis Center, Wright State University, Dayton, OH USA Volume Challenge Wikipedia: •Collaborative encyclopedia with more than 4M articles. •Prominent source of an evolving knowledge base. •Structured representation of Wikipedia as Dbpedia. •Wikipedia Hyperlink structure is a powerful resource to find semantic realtedness between text and entities. Twitter and Wikipedia This work has primarily focused on addressing the volume and velocity challenge on Social Web, specifically Twitter. In order to address these challenges, we have utilized Wikipedia as the source of Knowledge Base. Volume – Hierarchical Interest Graphs Generate Hierarchical Interest Graph from users’ tweets. The Hierarchical Interest Graphs are later used for filtering and recommendations. Velocity – Tracking Dynamic Events on Twitter Events change their topics (sub-events) dynamically. Tracking events on Twitter is challenging. We utilize the evolving Wikipedia structure to track dynamic events on Twitter. Overview Evaluation •User study with 37 participants •Evaluated the top-30 categories for three different experiments. •Best had a MAP of 76% at top-5 with 98% MRR Evaluation •Dynamic events on Twitter are challenging to follow either for information or for real-time analysis. •During dynamic events Wikipedia evolves due to its collaborative nature . •This work leverages Wikipedia’s dynamic nature and the hashtag co-occurrence on Twitter to track event tweets. •Created gold standard for 3 events (75 Hashtags, 15000 Tweets). •Evaluated the tweets tagged with top hashtags •NDCG of 92% for the top 5 hashtags •Generates entities of interests from tweets of users. •Maps the entities to those on Wikipedia and infers the appropriate categories from Wikipedia Hierarchy. •Spreading Activation function is a function of 1. Prominence of the category for its sub-category (handling multiple categories) 2. Importance of the node in the user’s interest hierarchy. 3. Normalizing based on the distribution of categories in the hierarchy. •Handling Information Overload by utilizing User Profiles of Interest. Also, addressing Cold start and Data sparcity problems . •Hierarchy representation of interests by inferring the hierarchy from knowledge bases. •Our hashtag co-occurrence analysis is as follows: 1. A very small percentage of event-related hashtags are necessary to get most of the event related tweets. 2. These popular hashtags co-occur very well. •Starting with an initial event-relevant hashtag, we check the relevancy of co-occurring hashtags with the Wikipedia Event page. •The relevancy is measured by representing -- tags with its co- occurring entities --- Wikipedia event page by its linked entities. Velocity Challenge Publications Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, and Amit Sheth. User Interests Identification on Twitter Using a Hierarchical Knowledge BaseUR - The Semantic Web: Trends and Challenges, ESWC 2014. Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, and Amit Sheth. 2014. Hierarchical interest graph from tweets. In Proceedings of the companion publication of the 23rd international conference on World wide web companion (WWW Companion '14). Pavan Kapanipathi, Krishnaprasad Thirunarayan, Amit Sheth, and Pascal Hitzler. A Real-time Approach for Continuous Crawling of Events on Twitter by Leveraging Wikipedia. Technical report 2013. Twitter: •Unidirectional paradigm and open to research. •Twitter users generate around 433k tweets, around 12TB /min. • Being explored to understand user behavior , disaster management, follow trending topics, and news .