Steffen Staabstaab@uni-koblenz.de1WeSTVote for free Web Science MOOC!
Steffen Staabstaab@uni-koblenz.de2WeSTYou want to have more freeWeb Science Education on the Web?Vote for our course athtt...
Steffen Staabstaab@uni-koblenz.de3WeSTWeb Science & TechnologiesUniversity of Koblenz ▪ Landau, GermanyThe Challenges of B...
Steffen Staabstaab@uni-koblenz.de4WeSTProduceConsumeCognitionEmotionBehaviorSocialisationKnowledgeObservableMicro-interact...
Steffen Staabstaab@uni-koblenz.de5WeSTWhy to observe? Understanding Collecting Describing Analyzing Modeling Predict...
Steffen Staabstaab@uni-koblenz.de6WeSTWhy to observe? Understanding Collecting Describing Analyzing Modeling Predict...
Steffen Staabstaab@uni-koblenz.de7WeSTProduceConsumeCognitionEmotionBehaviorSocialisationKnowledgeObservableMicro-interact...
Steffen Staabstaab@uni-koblenz.de8WeSTChallenges – Data Collection IssuesLegal and/or Ethical Crawling May be disallowed...
Steffen Staabstaab@uni-koblenz.de9WeSTChallenges – Data Collection Issues Crawling What does it mean to crawl a heavily ...
Steffen Staabstaab@uni-koblenz.de10WeSTChallenges – Data Collection Issues Crawling What does it mean to crawl a heavily...
Steffen Staabstaab@uni-koblenz.de11WeSTChallenges – Data Collection Issues Crawling What does it mean to crawl a heavily...
Steffen Staabstaab@uni-koblenz.de12WeSTChallenges – Data Collection Issues Crawling What does it mean to crawl a heavily...
Steffen Staabstaab@uni-koblenz.de13WeSTChallenges – Data Publishing IssuesLegal and/or Ethical Example Issues AOL query l...
Steffen Staabstaab@uni-koblenz.de14WeSTChallenges – Data Publishing IssuesTechnical/Modelling issues Generic format, e.g....
Steffen Staabstaab@uni-koblenz.de15WeSTSharing Software Software For crawling or usage logging Rather than sharing the ...
Steffen Staabstaab@uni-koblenz.de16WeSTWhy to observe? Understanding Collecting Describing Analyzing Modeling Predic...
Steffen Staabstaab@uni-koblenz.de17WeSTWEB OBSERVATORY WIKIIn spite of all this....
Steffen Staabstaab@uni-koblenz.de18WeSTOngoing discussion What to do about sharing Web Science datasets? Let‘s do simple...
Steffen Staabstaab@uni-koblenz.de19WeSTWeb Observatory Wiki• Main Goals:• Registry of Web Science datasets• Compiled by We...
Steffen Staabstaab@uni-koblenz.de20WeST Semantic MediaWiki + Forms Extension URL: http://wow.west.webobservatory.org/ M...
Steffen Staabstaab@uni-koblenz.de21WeST Semantic MediaWiki + Forms Extension URL: http://wow.west.webobservatory.org/ C...
Steffen Staabstaab@uni-koblenz.de22WeSTSemantic Exploration by Views
Steffen Staabstaab@uni-koblenz.de23WeSTSemantic Forms: Providing Data
Steffen Staabstaab@uni-koblenz.de24WeSTko:konectko:slashdot-zoowow:contains1944wow:network-volumewow:social-networkrdf:typ...
Steffen Staabstaab@uni-koblenz.de25WeSTDiscussion & Q&A Access to wiki Current model:• Edits allowed by IPs and users• E...
Steffen Staabstaab@uni-koblenz.de26WeSTSanity Check UnderstandingCollecting (to some extent: commodity service)Describi...
Steffen Staabstaab@uni-koblenz.de27WeST What else do we need?
Steffen Staabstaab@uni-koblenz.de28WeSTVote at: https://moocfellowship.org/
Upcoming SlideShare
Loading in …5
×

Challenges of Building Web Observatories

652 views

Published on

Invited Talk at WebSci workshop on Building Web Observatories

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
652
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Challenges of Building Web Observatories

  1. 1. Steffen Staabstaab@uni-koblenz.de1WeSTVote for free Web Science MOOC!
  2. 2. Steffen Staabstaab@uni-koblenz.de2WeSTYou want to have more freeWeb Science Education on the Web?Vote for our course athttps://moocfellowship.org/now!
  3. 3. Steffen Staabstaab@uni-koblenz.de3WeSTWeb Science & TechnologiesUniversity of Koblenz ▪ Landau, GermanyThe Challenges of BuildingInteroperable Web Observatorieshttp://wow.west.webobservatory.org/Steffen Staab
  4. 4. Steffen Staabstaab@uni-koblenz.de4WeSTProduceConsumeCognitionEmotionBehaviorSocialisationKnowledgeObservableMicro-interactions in theWebAppsProtocolsData & InformationGovernanceWWWObservableMacro-effects inthe WebWhat to observe?
  5. 5. Steffen Staabstaab@uni-koblenz.de5WeSTWhy to observe? Understanding Collecting Describing Analyzing Modeling Predicting Repeating!
  6. 6. Steffen Staabstaab@uni-koblenz.de6WeSTWhy to observe? Understanding Collecting Describing Analyzing Modeling Predicting Repeating!
  7. 7. Steffen Staabstaab@uni-koblenz.de7WeSTProduceConsumeCognitionEmotionBehaviorSocialisationKnowledgeObservableMicro-interactions in theWebAppsProtocolsData & InformationGovernanceWWWObservableMacro-effects inthe WebWhat to observe?Web Crawling UsageLogging
  8. 8. Steffen Staabstaab@uni-koblenz.de8WeSTChallenges – Data Collection IssuesLegal and/or Ethical Crawling May be disallowed by provider Usage logging Privacy of individuals Even if it is allowed....
  9. 9. Steffen Staabstaab@uni-koblenz.de9WeSTChallenges – Data Collection Issues Crawling What does it mean to crawl a heavily interactive site? Incomplete data• Unreachability• Time outs
  10. 10. Steffen Staabstaab@uni-koblenz.de10WeSTChallenges – Data Collection Issues Crawling What does it mean to crawl a heavily interactive site? Incomplete data Where to start?• We cannot observe everything!– Even just for data size!– What appear to be most fruitful starting points?
  11. 11. Steffen Staabstaab@uni-koblenz.de11WeSTChallenges – Data Collection Issues Crawling What does it mean to crawl a heavily interactive site? Incomplete data Where to start? Where to stop?• Each crawl is a view– Twitter» Tweet» URL» Web Page» Subweb» Followers» Followers‘ Followers» ...
  12. 12. Steffen Staabstaab@uni-koblenz.de12WeSTChallenges – Data Collection Issues Crawling What does it mean to crawl a heavily interactive site? Incomplete data Where to start? Where to stop? Synchronous vs asynchronous• Strictly speaking: only asynchronous crawling possible– But in [Dellschaft&Staab] we targeted the construction ofmodels for streams of tags
  13. 13. Steffen Staabstaab@uni-koblenz.de13WeSTChallenges – Data Publishing IssuesLegal and/or Ethical Example Issues AOL query log Netflix challenge Delicious http://www.tagora-project.eu/data/ Twitter Collecting, but no sharing• SocialSensor project
  14. 14. Steffen Staabstaab@uni-koblenz.de14WeSTChallenges – Data Publishing IssuesTechnical/Modelling issues Generic format, e.g. RDF Format ready for digestion by a certain software, e.g. forMatlab processing Openness to other data E.g. references to DBPedia/Wikipedia Accuracy of publishing http://me.org showed „...“ http://me.org showed „...“@2013-05-01:0900CEST http://me.org showed „...“@2013-05-01:0900CEST calledfrom IP 193.99.144.85 using browser...version...history...
  15. 15. Steffen Staabstaab@uni-koblenz.de15WeSTSharing Software Software For crawling or usage logging Rather than sharing the data, share the code for observing Example: code for crawling Twitter in a certain way Issues Limited repeatability Disturbance liability („Störerhaftung“) – at least in DE• If you provide source code for crawling, e.g., Facebook, evenif you do not crawl FB, FB can sue you
  16. 16. Steffen Staabstaab@uni-koblenz.de16WeSTWhy to observe? Understanding Collecting Describing Analyzing Modeling Predicting Repeating!
  17. 17. Steffen Staabstaab@uni-koblenz.de17WeSTWEB OBSERVATORY WIKIIn spite of all this....
  18. 18. Steffen Staabstaab@uni-koblenz.de18WeSTOngoing discussion What to do about sharing Web Science datasets? Let‘s do simple things first Collect pointers! Publish whatever you can publish – others will reuse Make it more archival In a way that makes it easy to expand to handle morecomplex issues Semantic Wiki!
  19. 19. Steffen Staabstaab@uni-koblenz.de19WeSTWeb Observatory Wiki• Main Goals:• Registry of Web Science datasets• Compiled by Web Observatory participants – YOU!• Minor Goals• Semantically store all information about datasets• Make it• Explorable• Queryable• Reuseable
  20. 20. Steffen Staabstaab@uni-koblenz.de20WeST Semantic MediaWiki + Forms Extension URL: http://wow.west.webobservatory.org/ Main classes: Examples: Dataset_Repository KONECT Dataset Slashdot Zoo Organization WeSTQuick Facts -1
  21. 21. Steffen Staabstaab@uni-koblenz.de21WeST Semantic MediaWiki + Forms Extension URL: http://wow.west.webobservatory.org/ Class Hierarchy Example: Attributes: Dataset Dublin Core +Size, license, URL,… Network Node Count Social Network …Quick Facts - 2
  22. 22. Steffen Staabstaab@uni-koblenz.de22WeSTSemantic Exploration by Views
  23. 23. Steffen Staabstaab@uni-koblenz.de23WeSTSemantic Forms: Providing Data
  24. 24. Steffen Staabstaab@uni-koblenz.de24WeSTko:konectko:slashdot-zoowow:contains1944wow:network-volumewow:social-networkrdf:typewow:networkrdfs:subClassOfwow:datasetrdfs:subClassOfko:twitterwow:contains120000000wow:sizewow:network-volumerdfs:domainwow:sizerdfs:domainrdf:typewow:dataset-repositoryrdf:typewow:containsrdfs:domainrdfs:rangeSchema (Excerpt)
  25. 25. Steffen Staabstaab@uni-koblenz.de25WeSTDiscussion & Q&A Access to wiki Current model:• Edits allowed by IPs and users• Everyone can be blocked, including IPs Contribute: Content Modeling requirements ... Let us know!
  26. 26. Steffen Staabstaab@uni-koblenz.de26WeSTSanity Check UnderstandingCollecting (to some extent: commodity service)Describing (WOW)AnalyzingModelingPredictingRepeating!So far ad hoc –needs much more:• Experience• Guidelines• Processing workflow• Executable code shares(on big data!)• ...
  27. 27. Steffen Staabstaab@uni-koblenz.de27WeST What else do we need?
  28. 28. Steffen Staabstaab@uni-koblenz.de28WeSTVote at: https://moocfellowship.org/

×