Coleman: Latest trends in Data Analysis for the Scholarly and Academic Publishing Community


Published on

Latest trends in Data Analysis for the Scholarly and Academic Publishing Community by Lee-Ann Coleman, PhD, Head of Science, Technology and Medicine, The British Library for the October 16, 2013 NISO Virtual Conference: Revolution or Evolution: The Organizational Impact of Electronic Content.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The blood letting zodiac man is taken from one of the Library’s 15th century Harley Manuscripts They were donated to the nation in 1753 and form one of the foundation collections of the BLNot going to spend 20 minutes giving you a discourse on the representation of medicine in the medieval period although it is interesting to reflect on what some of this tells us about information – when you are complaining about writing up your work at least you don’t have to write it by hand and draw the picturesWill it still be around in 700 yearsWill people laugh at it
  • PhD focus groups: People Science & PolicyWe built evidence based on our own user research, research from the literature, and with internal consultation. But theoretical evidence of discovery as the route for the library to take needed to be backed up with something more concrete – a pilot.A pilot would allows us to test the proposition with users to get concrete evidence for the Library’s work in this area, but also something we could show those internal to the Library.
  • Some of this information was relevant to metadata, hence needing to have something in place to start selection properly.We couldn’t go out and select everything as an STM dataset at once, so for the pilot we chose a specific subject area: Living with Environmental Change – that is data from monitoring or modeling the environment. This is now also expanded to Biodiversity (for the International Year if Biodiversity in 2010) and soon there will also be records available on Neglected Tropical Diseases. In the next year we will be expanding to Food security – spanning environmental and bioscience topics, from crop genetics and animal breeding to soil quality and pollination.Guidelines were otherwise very much based on existing STM selection criteria.
  • And this is what it looks like!Research datasets material type, accessible via the I want this tab, direct link.
  • Darker blues are very useful or just useful. Lighter blues are not useful.Survey confirmed that our approach was suitable, but was it actually being used?Initial effort in promotion of the service to get feedback was high, but towards the end of the first year, when little or no time was put into promotion, usage stats showed that usage still remained stable, even given the ‘pilot’ status of the Search Our Catalogue interface itself.These data do exclude staff IP ranges, so are reading room and external visitors only.And wasn't just curiosity, this graph shows that people were clicking through to the website containing the data.
  • And just referring to other published articles where readers can’t check the facts for themselves can be problematic. A recent study (shown on the slide) demonstrated that ‘conventional wisdom’ is often not based on experimental data. The study looked at reported incubation times for various viral infections and found that half of the studies did not even provide a source for their estimate. Mapping the citation networks enabled the authors to show that the information about incubation times was often based on a small fraction of the data or on no empirical evidence. But there would be benefits if researchers could actually cite the dataset itself. This would enable people to check the facts for themselves, obtain easier access to the data (theoretically researchers should share any data that underpins a published article but in reality they often don’t), funding organisations could show better value for money if data generated could be re-used, many people or data centres who actually manage data do not receive credit for doing so but this could offer a form of acknowledgement. In addition, the process of science is aided by openness and transparency.
  • - In the same way that researchers don’t directly get DOIs for their papers, they must go via a publisher to get a DOI for them.When we say ‘data centre’ this is for ease of time – we include trusted digital/institutional repositories in this!!We work at an organisational level
  • You will see that this DOI appears quite long – data centres are free to determine the format for the DOI suffix (see slide #24).
  • Coleman: Latest trends in Data Analysis for the Scholarly and Academic Publishing Community

    1. 1. A National Library: Playing a role in data Lee-Ann Coleman PhD Head of Science, Technology & Medicine @ScienceBL
    2. 2. Custodians of old books 2
    3. 3. And we are embracing digital National library of the UK Here for everyone who wants to do research Archiving since 1662 Legal deposit incl. non-print publications (from April 2013) Print occupies > 600km shelving 300TB of data in the Digital Library Provide access to 45k eJournals & newspapers, eBooks, datasets & 800 bibliographic databases 2M sound recordings, 4M maps, 5M reports, theses, conference papers, the world’s largest patents collection (c.50M) & 8M stamps 3
    4. 4. Catering for contemporary science Managing collections Delivering new content Developing services Research Engaging and inspiring Science team Collaborations & Partnerships 4
    5. 5. Information lifecycle 5
    6. 6. The value of research data • Data are a vital part of the scientific record • But what is/should be/will be the role of libraries in this changing landscape? • Data as a format is very different from traditional library content, so are libraries equipped with the knowledge, technology and capacity to deal with it? • How should libraries prepare for this? We examined the landscape of data and assessed the services that the British Library might offer 6
    7. 7. Testing dataset discovery A service involving a ‘new’ material type raised questions about: • Users • Selection • Metadata SDASM Archives. Public Domain Via Flickr • Operational sustainability Preliminary work: • Studies conducted on our behalf • Literature review of user behaviour • Internal scoping to define suitable processes and systems Lead to a pilot service, using existing systems 7
    8. 8. Selection criteria These considered: Scope: Subject Value to research Access: Restrictions Stability Copyright Quality: Creators Publishers 8
    9. 9. Datasets discovery in Explore the British Library >500 research datasets Environmental Science Tropical & Rare Diseases 9 9
    10. 10. Results Metadata for SEARCH % conversion from dataset view to click through 100% 90% 80% 80.0 70.0 70% 60% 50% 60.0 50.0 40% 40.0 30% 30.0 20% 20.0 10% 0% 10.0 0.0 • A wide variety of approaches were used • Usage statistics suggest the service to search was used to find research data 10
    11. 11. The benefits of citing data • Checking facts • Obtaining easier access to data • Enabling re-use of data • Providing acknowledgement to a wider group – the data centre, curators etc. • Supporting openness and transparency Reich NG, Perl TM, Cummings DAT, Lessler J (2011) Visualizing Clinical Evidence: Citation Networks for the Incubation Periods of Respiratory Viral Infections. PLoS ONE 6(4): e19496. doi:10.1371/journal.pone.0019496 11
    12. 12. Why finding and citing data is not easy • No widely used method to identify datasets • No widely used method to cite datasets • No effective way to link between articles and datasets • How can we solve these challenges? 12
    13. 13. Why DOIs? The Digital Object Identifier is a persistent identifier that directs users to an online object, even if it changes location. Why DOIs? • Most widely used identifier for research articles • Researchers and publishers already know how to use them • Puts datasets on the same playing field as articles • The DOI system offers an easy way to connect the article with the underlying data 13
    14. 14. DataCite • Established in 2009 as a not-for-profit organisation • A member of the International DOI Foundation • A Registration Agency for DOI names • 18 full members from Europe, North America, Asia and Australia (2m DOIs) • Members work with data centres in their own countries • Provide a shared infrastructure for minting DOIs 14
    15. 15. British Library's role in DataCite International DOI Foundation Member • The British Library is one of 18 international members of DataCite DataCite • We are an allocating agent Member Institution • We provide the DataCite infrastructure, enabling UK Data Centres to ‘mint’ DOIs for data Data Centre Data Centre Data Client • While the aim is to support researchers, we do not work with individuals - they must deposit to a data centre/institution 15
    16. 16. British Library DataCite Service 16
    17. 17. Examples of UK data centres with DOIs DOI: 10.5285/1a91c7d1-ec44-4858-9af2-98d80f169bbd 17
    18. 18. Thank you! • • • • E-mail: 18