• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Tpdl Doctoral consortium 2012
 

Tpdl Doctoral consortium 2012

on

  • 769 views

 

Statistics

Views

Total Views
769
Views on SlideShare
271
Embed Views
498

Actions

Likes
0
Downloads
2
Comments
0

30 Embeds 498

http://ws-dl.blogspot.com 357
http://ws-dl.blogspot.de 23
http://ws-dl.blogspot.co.uk 20
http://ws-dl.blogspot.gr 10
http://ws-dl.blogspot.ca 9
http://ws-dl.blogspot.in 9
http://ws-dl.blogspot.it 9
http://ws-dl.blogspot.fr 6
http://ws-dl.blogspot.nl 5
http://ws-dl.blogspot.kr 4
http://ws-dl.blogspot.ro 4
http://ws-dl.blogspot.dk 4
http://ws-dl.blogspot.hk 4
http://ws-dl.blogspot.com.es 4
http://ws-dl.blogspot.cz 3
http://ws-dl.blogspot.ru 3
http://ws-dl.blogspot.tw 3
http://ws-dl.blogspot.com.au 3
http://ws-dl.blogspot.fi 3
http://ws-dl.blogspot.co.at 2
http://ws-dl.blogspot.sg 2
http://ws-dl.blogspot.com.br 2
http://ws-dl.blogspot.se 2
http://ws-dl.blogspot.no 1
http://ws-dl.blogspot.jp 1
http://www.ws-dl.blogspot.com 1
http://ws-dl.blogspot.pt 1
http://ws-dl.blogspot.co.nz 1
http://ws-dl.blogspot.be 1
http://ws-dl.blogspot.ie 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Tpdl Doctoral consortium 2012 Tpdl Doctoral consortium 2012 Presentation Transcript

    • Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany M. SalahEldeen Old Dominion University Department of Computer Science Advisor: Dr. Michael L. Nelson TPDL ‘12 Doctoral ConsortiumHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Let’s breakdown the title first… Detecting, Modeling, & Predicting User Temporal Intention in Social MediaHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Let’s breakdown the title first… Detecting, Modeling, & Predicting User Temporal Intention in Social MediaHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Let’s breakdown the title first… Detecting, Modeling, & Predicting User Temporal Intention in Social MediaHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Scenario 1: Jenny reading Jeff’s tweetsHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Michael Jackson Dies Snapshot on: June 25th 2009 http://web.archive.org/web/20090625232522/http://www.cnn.com/Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Jeff tweets about it… Published on: June 25th 2009 https://twitter.com/mdnitehk/status/2333993907Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Jenny is off the grid… Jeff’s friend Jenny was on a vacation in Hawaii for a monthHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Jenny starts catching up a month later When she came back she checked Jeff’s tweets and was shocked! Read on: July26th 2009 https://twitter.com/mdnitehk/status/2333993907Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Jenny follows the link on July 26th She quickly clicked on the link in the tweet… CNN page on: July 26th 2009 http://web.archive.org/web/20090726234411/http://www.cnn.com/Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Jenny is confused!• Implication: • Jenny thought Jeff is making a joke about her favorite singer and she got mad at him• Problem: • The tweet and the resource the tweet links to have become unsynchronized.Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Scenario 2: The Egyptian RevolutionHany SalahEldeen & Michael Nelson Doctoral Consortium
    • The Egyptian Revolution Jan 2011Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Reading about it in Storify.com a year later in March 2012 http://storify.com/maq4sure/egypts-revolutionHany SalahEldeen & Michael Nelson Doctoral Consortium
    • I noticed some shared images are missing http://storify.com/maq4sure/egypts-revolutionHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Some tweets are still intact https://twitter.com/miss_amy_qb/status/32477898581483521Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • …and some lost their meaning with the disappearance of the images https://twitter.com/aishes/status/32485352102952960 Missing ? https://twitter.com/omar_chaaban/status/32203697597452289Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • The tweet remains but the shared image disappeared… http://yfrog.com/h5923xrvbqqvgzjHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Cairo….we have a problem!• Implication: • The reader cannot understand what the author of the tweet meant because the image is not available.• Problem: • The post is available but the linked resource (image) is completely missing.Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • …back to the title Detecting, Modeling, & Predicting User Temporal Intention in Social MediaHany SalahEldeen & Michael Nelson Doctoral Consortium
    • …back to the title Detecting, Modeling, & Predicting User Temporal Intention in Social MediaHany SalahEldeen & Michael Nelson Doctoral Consortium
    • The Anatomy of a TweetHany SalahEldeen & Michael Nelson Doctoral Consortium
    • The Anatomy of a Tweet Author’s username Other user mentionSocial Post Tweet Body Interaction Publishing Shortened URL Hash Tag options timestamp to resource Shared Resource Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • 3 URIs = 3 Chances to failHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Explanation in MJ’s example t3 t4 t5 t7 t8 t9 … tn t1 t2 t6
    • User’s Temporal Intention The Focus of our research Instrumented shortener Share time Implicit Explicit Click time Implicit Explicit Instrumented web client Out of our scope Purview of Facebook, Engineering problem Twitter, Google, …etc Solved by providing toolsHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Sometimes you want a previous version The Correct Temporal Intention CNN.com at the closest time to the tweet: 25th June 2009 ~ 7pmHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Sometimes you want the current version The Correct Temporal Intention In this case the current state of the press releases pageHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Research Question Can we estimate the users’ intention at the time of posting and reading to predict and maintain temporal consistency?Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Research Goals • Detect the temporal intention of the: 1. Author upon sharing time 2. The reader upon dereferencing time • Model this intention as a function of time, nature of the resource, and its context. • Predict how resources change with time and the intention behind sharing them to minimize inconsistency. • Implement the prediction model to automatically preserve vulnerable social content that is prone to change or loss. • Create an environment implementing this framework that provides a smooth temporal navigation of the social web.Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Related Work • User’s Web Search Intention • Persistence of shared resources – A. Ashkan ECIR ’09 – M. Nelson D-Lib ‘02 – C. Lee AINA ‘05 – R. Sanderson OR’11 – A. Loser IRSW ‘08 – F. McCown JCDL ‘07 – L. Azzopardi ECIR ‘09 – R. Baeza-Yates SPIR‘06 – N. Dai HT ’11 • URL Shortening – D. Antoniades WWW ’11 • Commercial Intention – Q. Guo SIGIR ’10 • Tweeting, Micro-blogging and Popularity – A. Benczur AIRWeb ’07 – S. Wu WWW ’11 – A. Java SNA-KDD ’07 • Sentiment Analysis – H. Kwak WWW ’10 – G. Mishne AAAI ‘06 – J. Bollen JCS ‘11 • Social Networks Growth and Evolution • Access to Archives – B. Meeder WWW ’11 – H. Van de Sompel OR‘09Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Current Analyze Contextual Intention State Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD DefenseHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD DefenseHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Estimating Web Archiving Coverage • Goal: Estimate how much of the public web is present in the public archives and how many copies are available? • Action: – Getting 4 different datasets from 4 different sources: • Search Engines Indices • Bit.ly • DMOZ • Delicious. • Results: * * Table Courtesy of Ahmed AlSum JCDL 2011 • Publications: – How much of the web is archived? JCDL 11Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD DefenseHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Shortened URI analysis • Goal: Have a better understanding of URI shortening and resolving, understand the effect of time on this process and the correlation between the page’s features and characteristics, and its resolution. • Action: – Fresh Bit.lys – Get hourly clicklogs, rate of change, social networking spread, and other contextual information – Longitudinal study • Evaluation: – Compare results with frequency of change analysis of Cho and Garcia- Molina. – Compare results with Antoniades et al. WWW 2011.Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD DefenseHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Estimating Loss of Shared Resources in Social Media• Goal: Estimate how much of the public web is present in the public archives and how many copies are available?• Action: – Sampling from 6 public events – Events spanning 3 years – Existence in the current web – Existence in the public archives – Find relation with time• Results: – After 1st year ~11% will be lost – After that we will continue on losing 0.02% daily• Publications: – A year after the Egyptian revolution, 10% of the social media documentation is gone. http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html – Losing my revolution: How Many Resources Shared on Social Media Have Been Lost? TPDL 12Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage User Intention Analysis Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD DefenseHany SalahEldeen & Michael Nelson Doctoral Consortium
    • User Intention Analysis • Goal: Have a better understanding of User Intention and what factors affect it. Also create a new testing and training set. • Action: – Get a sample set of tweets selected at random – Extract the URIs – Get closest Memento – Download the snapshot & current version – Use Amazon’s Mechanical Turk in choosing the best version • Evaluation: – Measure cross-rater agreement and confidence.Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • Proposed Work • Data Gathering • Feature Extraction • Modeling the intention engine • Evaluation • Application: Prediction and PreservationHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Possible Solution for JennyHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Possible Solution for Jenny The resource has changed since last time it was shared Do you wish to see the version the author intended or the current version? Current Version Intended VersionHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Proposed Framework Archived Version Feature Classifier Extraction Example Features: Current Version - Tweet Content - Click Logs - Other Tweets - Shared Resource - TimemapsHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Extra SlidesHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Archive Shortener ApplicationHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Estimating Shared Resources Loss in Social MediaHany SalahEldeen & Michael Nelson Doctoral Consortium
    • Estimating Shared Resources Loss in Social MediaHany SalahEldeen & Michael Nelson Doctoral Consortium
    • My Publications • S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much of the web is archived? In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, JCDL 11, pages 133{136, 2011. • H. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media content has been lost? Accepted in TPDL 2012 • H. SalahEldeen and M. L. Nelson. Losing my revolution: A year after the Egyptian revolution, 10% of the social media documentation is gone. http://ws- dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html.Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • References • D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: the web of short urls. In Proceedings of the 20th international conference on World wide web, WWW 11, pages 715 {724, New York, NY, USA, 2011. ACM. • A. Ashkan, C. L. Clarke, E. Agichtein, and Q. Guo. Classifying and characterizing query intent. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR 09, pages 578{586, Berlin, Heidelberg, 2009. Springer-Verlag. • L. Azzopardi and M. de Rijke. Query intention acquisition: A case study on automatically inferring structured queries. In Proceedings DIR-2006, 2006. • R. Baeza-Yates, L. Calderon-Benavides, and C. Gonzalez-Caro. The intention behind web queries. In F. Crestani, P. Ferragina, and M. Sanderson, editors, String Processing and Information Retrieval, volume 4209 of Lecture Notes in Computer Science, pages 98{109. Springer Berlin / Heidelberg, 2006. 10.1007/11880561 9. • A. Benczur, I. Bro, K. Csalogany, and T. Sarlos. Web spam detection via commercial intent analysis. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, AIRWeb 07, pages 89{92, New York, NY, USA, 2007. ACM. • J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010. • N. Dai, X. Qi, and B. D. Davison. Bridging link and query intent to enhance web search. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia, HT 11, pages 17{26, New York, NY, USA, 2011. ACM. • N. Dai, X. Qi, and B. D. Davison. Enhancing web search with entity intent. In Proceedings of the 20 th international conference companion on World wide web, WWW 11, pages 29{30, New York, NY, USA, 2011. ACM. • K. Durant and M. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In O. Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, and B. Masand, editors, Advances in Web Mining and Web Usage Analysis, volume 4811 of Lecture Notes in Computer Science, pages 187{206. Springer Berlin / Heidelberg, 2007. 10.1007/978-3-540-77485-3 11.Hany SalahEldeen & Michael Nelson Doctoral Consortium
    • References • Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR 10, pages 130{137, New York, NY, USA, 2010. ACM. • A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD 07, pages 56{65, New York, NY, USA, 2007. ACM. • H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW 10, pages 591{600, New York, NY, USA, 2010. ACM. • C.-H. L. Lee and A. Liu. Modeling the query intention with goals. In Proceedings of the 19th International Conference on Advanced Information Networking and Applications - Volume 2, AINA 05, pages 535{540, Washington, DC, USA, 2005. IEEE Computer Society. • A. Loser, W. M. Barczynski, and F. Brauer. Whats the intention behind your query? a few observations from a large developer community. In IRSW, 2008. • F. McCown, N. Diawara, and M. L. Nelson. Factors aecting website reconstruction from the web infrastructure. In JCDL 07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 39{48, 2007. • B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: inferring social link creation times in twitter. In Proceedings of the 20th international conference on World wide web, WWW 11, pages 517{526, New York, NY, USA, 2011. ACM. • G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), 2006. • M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002. • R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with memento. CoRR, abs/1105.3459, 2011. • H. Van de Sompel, M. L. Nelson, R. Sanderson, L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. CoRR, abs/0911.1112, 2009. • S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In Proceedings of the 20th international conference on World wide web, WWW 11, pages 705{714, New York, NY, USA, 2011. ACM.Hany SalahEldeen & Michael Nelson Doctoral Consortium