Your SlideShare is downloading. ×

Manvsmachinewithnotes

989

Published on

Web 2.0 is not only about making sites easier for people to interact with, but it is also about creating webs of data that machines can also interact with. These slides looks at a few examples of …

Web 2.0 is not only about making sites easier for people to interact with, but it is also about creating webs of data that machines can also interact with. These slides looks at a few examples of technologies that can help weave the data web, and shows some example applications, with a focus on science.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
989
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Man vs Machine Main theme, Web 2.0 is as much about machine consumable as human consumable data.
  • 2. Web 1.0 Web 2.0 DoubleClick Google AdSense Ofoto Flickr Akamai BitTorrent mp3.com Napster Britannica Online Wikipedia personal websites blogging evite upcoming.org and EVDB domain name speculation search engine optimization page views cost per click screen scraping web services publishing participation CMS wikis directories (taxonomy) tagging (folksonomy) stickiness syndication The meme of Web 2.0 was influenced by comparing pre-dot com bubble companies and post dot com bubble companies. What is the difference between the list on the left and the list on the right? Let’s take the example of Brtiannica vs Wikipedia. The information in Britannica is centrally controlled. It has a relatively small number of contributors. The workload per contributor is high. Wikipedia is open to anyone to contribute. A collaboration of 1000’s can lead to a work of equal quality to a more centrally controlled method. Britannica’s revenues decreased from 650M to 50M over a 10 year period! The new sites make it easy to add information and use that information to answer or solve problems for people.
  • 3. y easy contributing hard mining easy Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information. One of the thesis that we are following by trying to work in this context is that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems. Companies that find ways to do this should succeed.
  • 4. y easy contributing semantic web hard mining easy Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information. One of the thesis that we are following by trying to work in this context is that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems. Companies that find ways to do this should succeed.
  • 5. y easy contributing plain text, emails semantic web hard mining easy Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information. One of the thesis that we are following by trying to work in this context is that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems. Companies that find ways to do this should succeed.
  • 6. y easy plain text, emails hyperlinks views tags citations? contributing semantic web hard mining easy Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information. One of the thesis that we are following by trying to work in this context is that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems. Companies that find ways to do this should succeed.
  • 7. y easy plain text, emails hyperlinks views tags citations? contributing academic papers semantic web hard mining easy Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information. One of the thesis that we are following by trying to work in this context is that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems. Companies that find ways to do this should succeed.
  • 8. y easy plain text, emails hyperlinks views tags citations? contributing microformats MicroFormats academic papers semantic web hard mining easy Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information. One of the thesis that we are following by trying to work in this context is that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems. Companies that find ways to do this should succeed.
  • 9. The Kind of Information that we can capture with Connotea is typical of many sites. For Connotea we have: - citation information - usage patterns, (when did an item get added to our DB, how many times has it been added) - user generated meta-data such as tags - Potentially social network information, how many of my friends have added this item?
  • 10. The Kind of Information that we can capture with Connotea is typical of many sites. For Connotea we have: - citation information - usage patterns, (when did an item get added to our DB, how many times has it been added) - user generated meta-data such as tags - Potentially social network information, how many of my friends have added this item?
  • 11. The Kind of Information that we can capture with Connotea is typical of many sites. For Connotea we have: - citation information - usage patterns, (when did an item get added to our DB, how many times has it been added) - user generated meta-data such as tags - Potentially social network information, how many of my friends have added this item?
  • 12. The Kind of Information that we can capture with Connotea is typical of many sites. For Connotea we have: - citation information - usage patterns, (when did an item get added to our DB, how many times has it been added) - user generated meta-data such as tags - Potentially social network information, how many of my friends have added this item?
  • 13. The Kind of Information that we can capture with Connotea is typical of many sites. For Connotea we have: - citation information - usage patterns, (when did an item get added to our DB, how many times has it been added) - user generated meta-data such as tags - Potentially social network information, how many of my friends have added this item?
  • 14. Gatherin Trustin Integrat Analyz Triangl g g ing ing es del.icio.us Many Web 2.0 sites, have created islands of data. Some key technologies for bridging these islands include fire eagle, OpenId and OAuth. - rfid, fire eagle point the way to merging these islands with the real world
  • 15. Whats the process? • Gathering The data • Trusting the data • Integration / Disambiguating • Understanding and analyzing the data
  • 16. DOI Some key technologies for bridging these islands include fire eagle, OpenId and OAuth. In the publishing world DOIʼs are a key technology
  • 17. Internet Cf Site or Internet Site Application OpenID cf OAuth OpenID allows a single person to interact with multiple web sites using one log-in mechanisim OAuth allows both desktop and web applications to share data using one authentication mechanisim
  • 18. Rated 5/5 Rated 1/5 Redemption Based-on-Play Android Love Refugee Spacecraft Time-Travel Soldier Famous-Score Hope Alien Blockbuster Alien Broken-Heart Blockbuster Space War Futuristic Based-on-Novel Racism Artificial-Intelligence Hero Melodrama Once you merge the data, you have to understand it. The tags that a person uses across different services can give you a more holistic picture of their interests
  • 19. However tags can be ambiguous. Some technologies that are addressing this a semantic web technologies, look at projects such as Tagora http://www.tagora-project.eu/ DBpedia http://dbpedia.org/ SIOC http://sioc-project.org/ FOAF http://www.foaf-project.org/
  • 20. Open Science Web 2.0 Semantic Web Though not exactly the same, web 2.0, Open science and the semantic web work well together and they share some common traits, namely sharing, openness and minability of information.
  • 21. Growth in submissions to the arXiv, demonstrating growth in scientific output certainly growth in output of available data online in e-format There is some discussion about whether there is an information overload, as the main journals are still the important ones, but reading habits have changed
  • 22. Discussion Groups and Mailing lists contain a huge amount of information from from snippets of computer code, to long discussions about topics. Mark Mail, from MarkLogic, have a site that mines this information. Here we see a comparison of a search for FORTRAN vs a search for Java. At the moment these kinds of archives are mainly relevant in the computer science area, but these kinds of conversations are going on all the time in every field. http://markmail.org/
  • 23. Amazon use page views and a database of user purchases to find things you might like. Again, here they are using data that they get for free from people using their site. Google page rank is another canonical example
  • 24. Crystal Eye Social/Knowledge Networking An example of two type of uses in science: CrystalEye http://wwmm.ch.cam.ac.uk/crystaleye/ example bond length for a structure: http://wwmm.ch.cam.ac.uk/crystaleye/bondlengths/H-Rb.svg Nature Network: human-human interaction
  • 25. Nature Web Publishing group OTMI The main products that we have developed so far are - database gateways - OTMI (open text mining interface) - podcasts - scintilla - nature network - nature preceedings - connotea
  • 26. There are also other tools out there that are doing the same kind of thing, but I’m partial.
  • 27. There are also other tools out there that are doing the same kind of thing, but I’m partial.
  • 28. There are also other tools out there that are doing the same kind of thing, but I’m partial.
  • 29. There are also other tools out there that are doing the same kind of thing, but I’m partial.
  • 30. There are also other tools out there that are doing the same kind of thing, but I’m partial.
  • 31. There are also other tools out there that are doing the same kind of thing, but I’m partial.
  • 32. Repository Discuss how social silo’s can be interchange locations between repositories and also between repositories and applications that we might also be built on top of the social silos.
  • 33. Repository Discuss how social silo’s can be interchange locations between repositories and also between repositories and applications that we might also be built on top of the social silos.
  • 34. Repository Discuss how social silo’s can be interchange locations between repositories and also between repositories and applications that we might also be built on top of the social silos.
  • 35. Repository Discuss how social silo’s can be interchange locations between repositories and also between repositories and applications that we might also be built on top of the social silos.
  • 36. Repository Discuss how social silo’s can be interchange locations between repositories and also between repositories and applications that we might also be built on top of the social silos.
  • 37. Repository Repository Repository Repository Repository Discuss how social silo’s can be interchange locations between repositories and also between repositories and applications that we might also be built on top of the social silos.
  • 38. Repository Repository Repository Repository Repository Discuss how social silo’s can be interchange locations between repositories and also between repositories and applications that we might also be built on top of the social silos.
  • 39. Repository Repository Repository Repository Repository Citation Pubmed Activity Management Integration Listing Discuss how social silo’s can be interchange locations between repositories and also between repositories and applications that we might also be built on top of the social silos.
  • 40. Connotea citation parsing modules This model was quick and easy to implement but using the URL as the unique key.
  • 41. Amazon.pm DOI.pm LivingReviews.pm PLoS.pm RIS.pm SpamDNSBL.pm autodiscovery.pm BibTeX.pm Dlib.pm NASA.pm PMC.pm Scitation.pm Springer.pm blog.pm Blackwell.pm Highwire.pm NPG.pm PNAS.pm Self.pm Wiley.pm ePrints.pm BmcPdf.pm Hubmed.pm OUP.pm Pubmed.pm Simple.pm arXiv.pm We have a bunch of citation modules they currently have to be written in perl, and this is a problem, there is nothing similar to the scaffold infrastructure that Zotero has
  • 42. Title
  • 43. Title
  • 44. Title Date
  • 45. Title Date
  • 46. Title Author Date
  • 47. Title Author Date
  • 48. Title Author Date PMID/DOI
  • 49. Getting data in, part 2 The meta-data from the paper has been captured When you begin to add tags suggested tags are presented based on tags you have already used paper by Huberman et all shows that displaying all tags drives tag-onomies to stable state (Polya- Renyi urn model) You need to display the full community tags, which we don’t do ... yet.
  • 50. Getting data in, part 2 The meta-data from the paper has been captured When you begin to add tags suggested tags are presented based on tags you have already used paper by Huberman et all shows that displaying all tags drives tag-onomies to stable state (Polya- Renyi urn model) You need to display the full community tags, which we don’t do ... yet.
  • 51. Getting data in, part 2 The meta-data from the paper has been captured When you begin to add tags suggested tags are presented based on tags you have already used paper by Huberman et all shows that displaying all tags drives tag-onomies to stable state (Polya- Renyi urn model) You need to display the full community tags, which we don’t do ... yet.
  • 52. user home page, toolbox, on right user tags related tags related users, groups
  • 53. user home page, toolbox, on right user tags related tags related users, groups
  • 54. user home page, toolbox, on right user tags related tags related users, groups
  • 55. Getitng data out Open Data, important Export only gets out the citation data, and not extra meta data that the user has added such as comments or tags. Formats: txt, rdf, BibTex,RIS,EndNote an api??
  • 56. Getitng data out Open Data, important Export only gets out the citation data, and not extra meta data that the user has added such as comments or tags. Formats: txt, rdf, BibTex,RIS,EndNote an api??
  • 57. perl mod_perl Template Toolkit MySQL Open Source, GPL2.5 v 1.8.1 web1.75 application Discuss reasons for OS, discuss web1.8.1 - hope for community involvement, - Code is not MVC structured, this has led to some problems with adoption - We do have some people running their own instances, with some feedback , but we would like to eventually make the code easier to work with - Why not port it? That’s a big can of worms, and someone needs to convince me of the benefits. - If for some reason we choose to no longer support connotea then the data and the code could be hosted be someone else, - Someone asked me what do how do they know we don’t cheat, and preferentially return NPG articles in searches, well the code is open so if you are that paranoid you can go and run an instance yourself and check up on us.
  • 58. http://www.connotea.org/user/IanMulvany http://www.connotea.org/users/tag/scifoo http://www.connotea.org/user/IanMulvany/tag/scifoo http://www.connotea.org/user/IanMulvany/tag/science http://www.connotea.org/user/IanMulvany/tag/ science2.0+citation Example of calls to query the data, html output
  • 59. http://www.connotea.org/data/user/IanMulvany http://www.connotea.org/data/users/tag/scifoo http://www.connotea.org/data/user/IanMulvany/tag/scifoo http://www.connotea.org/data/user/IanMulvany/tag/ science http://www.connotea.org/data/user/IanMulvany/tag/ science2.0+citation Example of API calls (you don’t have to type them in green when making the call)
  • 60. http://www.connotea.org/rss/user/IanMulvany http://www.connotea.org/rss/users/tag/scifoo http://www.connotea.org/rss/user/IanMulvany/tag/scifoo http://www.connotea.org/rss/user/IanMulvany/tag/science http://www.connotea.org/rss/user/IanMulvany/tag/ science2.0+citation Example of RSS calls (you don’t have to type them in green when making the call) We create an rss feed of everything
  • 61. Thousands Ja n 100 200 300 400 500 600 0 -0 M 5 ar -0 M 5 ay -0 5 Ju l-0 Se 5 Growth in Connotea bookmarks p- 0 N 5 ov -0 Ja 5 n- 0 Entries in All Libraries M 6 ar -0 M 6 ay -0 6 Ju l-0 Se 6 p- 0 N 6 ov -0 Bookmark Growth in Connotea Ja 6 n- 0 M 7 ar -0 M 7 ay -0 7 Ju l-0 Se 7 p- 0 N 7 ov -0 Ja 7 n- 0 M 8 ar -0 8
  • 62. Mirko Gontek at the university of Colonge information visualization of links in connotea These social links can create networks of information on top of the basic information. This is what we want to use to start building collaborative intelligence into these systems.

×