Your SlideShare is downloading. ×
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Data linking with kblog
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data linking with kblog


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • So, today I am going to talk about data linking with knowledge blog. Normally, talks start at the beginning. I thought to buck this trend and instead...
  • Start at the end....The long tail was mentioned yesterday. Much research data comes from individual research labsFrom individual researchers, each producing relatively small amounts of data, but collectivelyProducing a lot. So, long tail or big science?My field, bioinformatics, does both.
  • But the data from the long tail and big science is different. While big science generally produces Sequence data, which is generally all of the same type. The long tail doesn’t. For example, We start with microarray expression data. Then we have MIAME compliant metadata, An RNA degredation plot and finally a paper, in this case a random one that I found on PLoSYesterday. Of these, we have data standards for many parts – the second part, often called “metadata” even Though it isn’t, whichusesMIAME which is one of the older information content standards in Bioinformatics. To me, all of this is data. Without the later three, the “raw data” is just junk.
  • The paper is the richest form in terms of expressivity – is carries the most complex ideas, usesThe largest vocabulary. Also the least open to reuse, although in general it gives meaning to all the rest. And is the form of scientific data storage Which has changed the least
  • So, what is the problem. Well first the process of publishing is very time-consuming. Secondly, it’s very expensive. And finally, it’s a process where, to misquote Douglas AdamsWhich is so amazingly primitive that we still think PDFs are a pretty neat idea. But in general, this form of data capture only happens for the most cherry picked data. The positive data, the significant data, the data where the experiment worked. What aboutThe negative data, the insignificant, what about the standard operating procedure, what about the tutorialInformation and so on. This is not a small issue – the massive publication bias in biology hampersOur understanding of the way that organisms function. In medicine, people die because not through lack of knowledge, but because we cannot collate information that exists.
  • So, why is this the case. Well, scientific publishing is basically still at the stage of coach building.Consider these stats: the second biggest STM publisher in the world looks like this – and costs1.5 billion euros per annum. This is Elsevier. The biggest looks like this. It only costs 10 million dollars per annum. This is wikipedia.Is this comparison fair? Are the two equivalent? No, probably not, but they are not two orders Of magnitude different either.
  • Consider for example this process from one of the major publishers that I have Published with. I wrote my article in latex. I converted it to PDF. The website converted it to anotherPDF (which I had to check). The publishers then (and this is true) converted it to a word doc. From there, they turn it into XML, which was finally converted to HTML and, yes, you guessedIt, another PDF. Now, not only is this a waste of time, but it’s inaccurate. Errors happen. And trying to get Structured or data linked publications through this process. You might as well give up.
  • My solution.Wordpress. Actually, more importantly, commodity software. And by commodity, I mean commodity, and not research. There are some excellent tools from academia – widely used. Open Journal Systems, for example, powers6000 journals. Wordpress is behind 10% of ALL websites.
  • Why wordpress. Well, it has an edit dialog. But it’s not very good. But you can blog from word – I don’t think that is very good either. But, it is the way that itIs, it’s what people use. So wordpress fits in with peoples workflows. It supports everything. Nothing would ever convince me to add this level of support to a tool.
  • What other features are suitable for academic publishing. Well, here, we borrowed, stole and occasionally wrote our own. Reviewings – courtesy of EditFlow. Metadata, and crawlability features we added. Multiple authors we borrowed. These allow archiving – this comes from the UK web archive. Also searchability (google scholar)
  • Bi Directional links. As well as permalinks, it also supports legacy identifiers in the shape of DOIs --- thanks to datacite. And it’s extensible. So I added nice look maths (scalable, thanks to mathjax), syntax highlighting. Bibliographic support Exists . We can do typed linking, with CiTO (thanks to David Shotton), although clunkily at the moment. This will beImproved – also want to add client renderable – the user should choose the citation format. And finally, epub and even PDF export.
  • We also want to extend bi-directional linking – blogs do this out of the box, but support required at both ends.And finally we want to be able to embed the data directly into the paper.
  • So, why are people not doing this already. I’ve now spent a fair bit of time learning PhP, javascript. And whilePoking around in the innards of wordpress I have discovered something that I now reveal to you
  • Short articles, single author, example based articles.
  • Transcript

    • 1. Data linking with kblog
      Phillip Lord
      Newcastle University
    • 2. The Long Tail
    • 3. Example Data
      1007_s_at 2.867330709
      1053_at 10.50302152
      117_at 2.702517066
      121_at 3.052316166
      1255_g_at 2.278998026
      1294_at 5.360226024
      1316_at 5.496447322
      1320_at 4.475412175
      1405_i_at 2.301359647
    • 4. The paper
    • 5. The problem?
    • 6. Coach Building
      250,000 articles per year
      240 million Downloads
      Cost: 1.5 Billion Euro
      17 million articles
      > 20 languages
      365 million readers
      Total Cost: 10 million dollars
    • 7. The process
    • 8. Our Solution
    • 9. Wordpress
      Has one critical feature
      It has an edit dialog
      Open Office
      By email
    • 10. Features
      Metadata – coins, metatags *
      Crawlability *
      Multiple authors
      Archiving (UKWA)
    • 11. Features
      Bi-directional links
      Permalinks (purls to follow)
      DOIs (datacite!)
      Nice maths * (and mathjax)
      Syntax Highlighting
      Bibliographic Support (with DOIs, and incompletely CiTO) *
      ePUB and PDF (!?) export
    • 12. Data Linking
      Bi-directional links require support at both ends
      Adding this generically
      Adding this for specific data sets (microarray)
      Data linking into papers
    • 13. Old technology
      Most of this technology pre-exists
      So why don’t people use it!
      There is a good reason...
    • 14. Content
      Now has 15k page views (not hits!)
      25 articles, multiple authors
      Seeking pubmed inclusion
      Advertising: two blog articles about ontogenesis happened with 1 day of first article.
      10 articles
      About scientific workflows
      Supplement to myExperiment
    • 15. Well...
      These stats are not going to scare either Elsevier or Wikipedia
      But, they are not bad either
      And it allows primary scientific content of many different forms
      We believe it can form part of the scientific landscape
    • 16. Acknowledgements
      Phillip Lord (me!)
      Dan Swan
      Simon Cockell
      Robert Stevens (Manchester)
      Georgina Moulton (Manchester)
      Thanks also to JISC, David Shotton, BL, Datacite, and wordpress.