Mduke sagecite-jisc-march11


Published on

A presentation on the SageCite project given at the JISC MRD International Workshop in March 2011. Describes the application domain and citation challenges in SageCite.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • One of the CLIP group of projects funded by JISC Focus of talk is on the domain as some of the citation issues will be common with other projects. Concentrating on Sage and its data and some of the implications. Not covering everything done in the project.
  • Holy Trinity as written up in the proposal.
  • There is something missing from the triangle Cuts across the 3 areas of data, process, publicaton Central to issues of attribution and credit That’s the contributors! Will come back to them later.
  • This is an overview of a more complex process (simplified view). Does not show original sources which enter the process. Each part can be looked at in more detail – example coming up. Different people involved as each stage is specialized.
  • This is just one of the stages from the previous diagram. We have a version of this for each of the stages, not enough time to go into each of the stages. Each stage has input and ouput, tools employed e.g. r scripts
  • Previous slide was a simplified view, actual process is broken down into a workflow, with configuration details, actual scripts, input and output.
  • Main contribution has been to capture this process, document it better via Taverna, makeit more understandable and re-usable. Understanding the domain before we can ask the questions about citation. Lessons Learned. Publications have many gaps, perl scripts not very user friendly, working from a document shared by SageBionetworks. Based on a visit to Sage Bionetworks, funded by Sage, built relationships and dialogue leading to data sharing
  • Moving on to a slightly diffferent perspective from the domain to the general citation question that we start to address. Started to think about how citation happens. Scenarios on the blog. For each of these stages we will think about the questions that our example throws. I cite others – input data is derived from somewhere I make my work citable – main work of the project (Taverna) Credit – motivation – least addressed so far in the project.
  • Identifying the contributor is an issue e.g. the geographicla area of work, may need to identify the organisation that funded. Does the modification change the original, how do I preserve the link.
  • This has been the main area of work for Sagecite – next slide show the role of taverna.
  • Assign DataCite DOIs Generate metadata – open question; linked data approach Store is temporary for the purposes of demonstrator
  • Extra steps have been added to the workflow, within Taverna
  • More slides on the last question…
  • This was a diagram we had early on in the project - What about other types of publication?
  • I presented this at SOLO earlier this month, .. [eda segja annarstadard, i recent & ongoing ORCID activities] I reach for Geoff Bilder’s slides again and nick a few things what I want to do here is replace Geoff’s silly little dude with glasses with my much much cooler ‘academic dude’, as we call him in the office. ### SKIPTA ### I want to show you in the next few slides a hypothetical scenario involving this dude, representing me, submitting a dataset to this digital repository which is a companion to Geoff’s Psychoceramics Review journal this will demonstrate some of the practicalities of how we might actually use ORCID in data publication.
  • Coming back to those people….. ORCID addressed by a presentation later on, focussed effort on discussions, bulding scenarios.
  • Have started disussions, no service yet – how tools like myExperiment and Taverna which are on the desktop and manage identity (not global) work with a service like ORCID to exchange information including for validation.
  • Finally, an advert……..
  • Collaborators on some of the projects, provided some of the slides, Sage funded the visit, shared data and documentation.
  • Mduke sagecite-jisc-march11

    1. 1. UKOLN is supported by: Monica Duke Project Manager/Researcher 29 th March 2011 Aston Business School SageCite Project #sagecite [email_address]
    2. 2. <ul><li>Citation in the domain of disease network modelling </li></ul>
    3. 3. Data Process Publication Research Object Citation Chains Credit and Attribution
    4. 4. Data Process Publication ?
    5. 5. Sage data and processes <ul><li>The idealised Sage modelling process can be divided into 7 stages </li></ul><ul><li>A combination of phenotypic, genetic, and expression data are processed to determine a list of genes associated with diseases </li></ul><ul><li>Different people are responsible for different stages of the modelling process. One person oversees the whole process though. </li></ul>
    6. 6. Stage 2: Statistical QC <ul><li>Actual values in data sets are validated for quality to check for experimental artifacts </li></ul><ul><li>The checks made are dependent on the type of data set and involves the use of R scripts and tools like Plink </li></ul><ul><li>The output is a normalised data set </li></ul>Validated & curated data sets Curated data sets Statistical QC
    7. 8. Domain Complexity <ul><li>Multistage process </li></ul><ul><ul><li>Each stage is specialised </li></ul></ul><ul><li>Several people involved </li></ul><ul><li>Size/specialisation </li></ul>
    8. 9. Unpacking Citation <ul><li>I cite others </li></ul><ul><ul><li>I need to give attribute to others </li></ul></ul><ul><li>I make my work citable </li></ul><ul><ul><li>Make it easy to cite my work </li></ul></ul><ul><li>Others cite me </li></ul><ul><ul><li>Get credit when others attribute me </li></ul></ul>
    9. 10. I cite others <ul><li>Challenges </li></ul><ul><ul><li>Tracking what data I have used </li></ul></ul><ul><ul><li>Some information may be confidential </li></ul></ul><ul><ul><li>Some data may be restricted access </li></ul></ul><ul><ul><li>What if I have modified the data? </li></ul></ul>
    10. 11. I make my work citable Discover <ul><li>Support others to: </li></ul>Re-use Access
    11. 12. DataCite sagecitedemorepository Citable data Produces Register, submit metadata Generate landing page for data and store DOIs Mint DataCite API Google API Resolve to landing page Taverna workflow The relationships between data via DataCite DOIs with tools are captured by the provenance (OPM) produced by Taverna 1 2 3 4 5 6 Workflow metadata For referring to data reported in the provenance
    12. 13. Additional steps for citing data
    13. 14. I make my work citable <ul><li>Challenges </li></ul><ul><ul><li>Making my work re-usable </li></ul></ul><ul><ul><li>Granularity of credit </li></ul></ul><ul><ul><li>When to assign a new identifier </li></ul></ul><ul><ul><ul><li>What type of identifier </li></ul></ul></ul><ul><ul><li>What represents intellectual input – which contributions deserve to be cited? </li></ul></ul>
    14. 15. Others cite me <ul><li>Recognising contributions other than publications </li></ul><ul><li>Granularity of roles and contribution </li></ul><ul><li>Will added value be recognised? </li></ul><ul><li>What metrics to use </li></ul><ul><li>Linking all my contributions together </li></ul><ul><li>What constitutes “publication”? </li></ul>
    15. 16. Workflow Provenance Gene URI myExperiment URI Data sets in GEO database Register Submit Open Provenance Model W3C Provenance Incubator RDF Data creator ORCID DOI, Pubmed Id Scientific publication Publication? Sage bionetwork model (Co-expn, Bayesian) DataCite Workflow user ORCID
    16. 17. Identity Workshop prep-meeting, Helsinki, January 27 2011 Publishing a journal article Publishing a dataset G. A. Thorisson, University of Leicester
    17. 18. Different forms of publication <ul><li>As support for an article </li></ul><ul><li>Publish to a repository/archive </li></ul><ul><li>Blogs or other social networking sites </li></ul><ul><li>Micro-attribution (nano-publication) </li></ul>
    18. 19. Working with ORCID <ul><li>Contributor ID </li></ul>
    19. 20. Identity Workshop prep-meeting, Helsinki, January 27 2011 Centrally-managed informatics infrastructure: i) for researchers to manage & use profile ii) for tracking author-to-publication attribution links iii) interaction with other systems (e.g. publishers, digital libraries ORCID ID: G-1442-2009 J. Smith, Univ. North Pole ORCID ID: D-2400-2010 J. Smith, Luthor Corporation ORCID ID: B-1242-2010 G. Thorisson, Univ. Leicester G. A. Thorisson, Univ. Leicester G. A. Thorisson, Cold Spring Harbor Lab. G. A. Thorisson, University of Leicester
    20. 21. Special issue <ul><li>New Models of Semantic Publishing in Science </li></ul><ul><li> </li></ul><ul><li>Deadline: 1 st May </li></ul>
    21. 22. Acknowledgements <ul><li>University of Manchester </li></ul><ul><ul><li>Carole Goble </li></ul></ul><ul><ul><li>Peter Li </li></ul></ul><ul><li>British Library </li></ul><ul><ul><li>Max Wilkinson </li></ul></ul><ul><ul><li>Tom Pollard </li></ul></ul><ul><li>Sage Bionetworks </li></ul><ul><li>UKOLN </li></ul><ul><ul><li>Liz Lyon </li></ul></ul><ul><ul><li>Monica Duke </li></ul></ul><ul><li>Nature Genetics </li></ul><ul><ul><li>Myles Axton </li></ul></ul><ul><li>PLoS Comp Bio </li></ul><ul><ul><li>Phil Bourne </li></ul></ul>