I am here to tell you about the datasets programme across the BL and in the Social Sciences team, There have been rapid changes in the digital landscape which has led to people generating and sharing ever increasing volumes of data. We refer to collections of data as datasets. While the nature of datasets varies across disciplines, researchers within each discipline typically agree on what constitutes a dataset for them. Examples of datasets include (1) example of volcanic data (2) cluster of chromosomes inside a breast cancer cell (3) uk poll of voting intention (blue cons, red labour, yellow liberal) Within the Dataset Programme, we consider a dataset to be an organised collection of digital objects that is produced or consumed during research. We emphasise the role that the dataset plays in the research activity, its importance to researchers, its impact, and its potential for reuse. Despite the differing nature of datasets, many of the services required by researchers are shared, such as methods of citation, discovery, and preservation.
So why focus on datasets? Data is the foundation for research It is an essential component of the scientific record. Time-consuming, costly to produce. Re-acquisition may be impossible. Therefore essential that it is preserved and shared.
As a result of these challenges In Dec 2009 the BL produced a Datasets Strategy. This strategy has been transformed into a programme of work in a number of departments in the Library and a number of significant projects. The datasets programme has been established to explore how the Library can help… Not only do we want to ensure data is preserved, we envision a future where… Our approach is to foster collaboration and…
Example Project 3 – Resource Discovery The BL is developing improved discovery services by deploying the Primo system from Ex Libris. We are investigating ways of including datasets alongside of other catalogue material such as articles and monographs. Now you can see how this works together with DataCite. There is a link next to the dataset that shows that you can get it as an on-line resource. This link uses the DataCite DOI. If we follow it, the DOI system takes us directly to the dataset. The same mechanism is also being used to link to articles and datasets in Elsevier's ScienceDirect and Thompson-Reuters Web of Science.
In social sciences we haven ’t assigned any datacite DOIs yet, hopefully that is coming soon, but we are using the Primo system in our new projects. Dataset resources will be included in the release of the management and business studies portal. As you can see from this search for flexible working datset results from UKDA are displayed alongside articles We have tested the search functionality out with users and have had some good feedback that we are currently incorporating before the launch. A resource guide for the MBS datsets will also be published on the portal at launch.
The types of data linked to include Data from ESDS/UKDA, UK government data, regional and local government data, international organisations etc.
Search by URL, title, full-text Browse by
Footer text here...
Footer text here...
Example Project 1 – DataCite Our long term vision is to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence.
How does datacite work? The approach that DataCite is taking – using DOIs - has some important social benefits. Researchers, authors, publishers are comfortable, understand, and know how to use them. They put datasets on a level playing field with articles.
We have recently launched our social sciences team blog focusing on research methods and resources, it has posts from our curators about projects we are working on, but we are also keen to hear what members of the research community are doing so we gladly accept guest posts and contributions, so if any of you would like a place to talk about your work then please get in touch.
Social Science Data and Digital Resources February 2012 John Kaye – Lead Curator Digital Social Sciencehttp://www.slideshare.net/johnkayebl
Overview• British Library Datasets Strategy• ESDS• Census Resources• Spatial Data• Open Data• UK Web Archive• Other Data and Resources• Tools, Software and Visualisation• Identifying, Citing and Sharing Data 2
What is a dataset? Seismic measurements taken by a geologist. Genetic data collected by a medical researcher. A survey of public opinions collected by a sociologist. 3
The Foundation for Research Data is a crucial component of the scholarly record. Re-acquisition may be impossible Datasets are essential to the British Library’s mission to advance the World’s knowledge. 4
The British Library Datasets StrategyWe envision a future where researchers can: Discover, access, reuse, and reference datasets. Track the impact of the data that they generate and receive appropriate credit.Our approach is to: Provide a focus for the community to establish needs, requirements and agreement. Explore novel technology and creative solutions. 5
Economic and Social Data Service - (ESDS) www.esds.ac.ukData search and downloadResearch method guidesThematic guidesOnline analysisSecure Data Service http://securedata.data-archive.ac.uk/ UK Data Service 9
Economic and Social Data Service - (ESDS) www.esds.ac.ukESDS Government large-scale government surveys, such as the Labour Force Survey and the General Household Survey ESDS International multi-nation databanks, such as World Banks World Development Indicators, and survey data including Eurobarometer ESDS Longitudinal major UK surveys following individuals over time, such as the British Household Panel Survey ESDS Qualidata a range of multimedia qualitative data sources 10
2011 CensusData available on www.ons.gov.uk -latest release is output area key statisticsAcademic releases will be madeavailable via Census Dissemination Unitvia their InFuse toolhttp://cdu.mimas.ac.uk/ Lots of information and support athttp://census.ac.uk/Experian Geodemographic Datahttp://cdu.mimas.ac.uk/experian/index.htm 11
Previous CensusesData available for 1981-2011 on http://www.nomisweb.co.uk/Academic data release from 1971 to 2001 on casweb (alsocontains geographic boundary data) http://casweb.mimas.ac.uk/ Histpop – The Online Historical Reports Collection (OHPR)provides online access to population reports for Britain andIreland from 1801 to 1937 http://www.histpop.org/Look at changes between census questions, structures andgeographies 12
BL Official Publications Collection – Census Reports 1851 Population Pyramid UK Census Reports 85+ 80-84 BL holds statistical reports relating 75-79 70-74 65-69 to each census. 60-64 55-59 Age Group 50-54 45-49 Male Reports for 1921-1991 in the 40-44 35-39 30-34 Female reading room on open shelves 25-29 20-24 15-19 10-14 National and county aggregate 5-9 0-4 1,500,000 1,000,000 500,000 0 500,000 1,000,000 1,500,000 reports for England and Wales, Num ber of People Scotland, Northern Ireland and Great Britain Number of Households Lacking or sharing Amenities (England and Wales) Aggregate statistical information 3,500,000 at each level for all census 3,000,000 2,500,000 questions 2,000,000 Compliments Histpop which has 1,500,000 1,000,000 digitised reports between 1801 – 500,000 1937 and Casweb: 1971 – 2001 0 1951 (Lack or 1961 (Lack or 1971 (Lack or 1981 (Lack or 1991 (Lack or 2001 (Without Share Flushing Share Flushing Share Inside Share Inside Share Inside Sole use of Some older reports can be found Toilet) Toilet) Toilet) Toilet) Toilet and/or Toilet and/or Bath or Shower) Bath or Shower) in parliamentary papers 15
Maps The library holds a number of maps generated with census and population data from UK and all over the world Ireland map for railwaysAugustus Petermann, Map of the British Isles, elucidating thedistribution of the population based on the 1841 census.London,1861. 16
Spatial DataEdina Digimap and UKBordershttp://edina.ac.uk/digimap/http://edina.ac.uk/ukborders/Go Geo! Searchhttp://www.gogeo.ac.uk/cgi-bin/index.cgi 17
Spatial Data Landmap http://landmap.mimas.ac.uk/Ordanance Survey Open Datahttp://www.ordnancesurvey.co.uk/oswebsite/products/os-opendata.htm 18
UK Government Open Datahttp://data.gov.uk/Admin and Statistical data portalOffice for National Statisticshttp://www.statistics.gov.uk/default.asphttp://www.neighbourhood.statistics.gov.uk/dissemination/https://www.nomisweb.co.uk/Default.aspNational Digital Archive of Datasetshttp://www.ndad.nationalarchives.gov.uk/Regionalhttp://data.london.gov.uk/http://datagm.org.uk/ 19
International open dataUnited Nationshttp://data.un.org/European Unionhttp://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/OECDhttp://www.oecd.org/statsportal/World Bankhttp://data.worldbank.org/IMFhttp://www.imf.org/external/data.htmPublic Data EUhttp://publicdata.eu/ 20
UK Web Archive http://www.webarchive.org.uk Selective Web Archive over 11,000 websites collected since 2004 over 50,000 instances Over 16TB of compressed data British Library, National Libraryof Wales, JISC Also National Library of Scotland, the National Archives, Wellcome Library Many collaborators eg Women’s Library, Live Arts Development Agency, Quakers in Britain 21
A typical event-based special collection Collect, preserve, and make accessible eb sites of cultural and scholarly importance from the UK domain
A comprehensive special collection Collect, preserve, and make accessible eb sites of cultural and scholarly importance from the UK domain
JISC UK Web Domain Dataset (1996-2010)Funded by JISC to create a research collection of UKwebsites Collaboration between the Internet Archive, JISC and theBritish Library Copy of subset of the Internet Archive’s web collection thatrelates to the UK 470466 files, mostly arc.gz, with 4494 warc.gz. Total size: 32TB No local access – possible through the Internet Archive Can be used to generate secondary datasets and makethese available Analytical access the main route 24
Other Data and ResourcesArts and Humanities data Service (AHDS)http://ahds.ac.ukGuardian Data Storehttp://www.guardian.co.uk/data-storeFinancial Timeshttp://www.ft.com/home/ukEconomist Intelligence Unithttp://www.eiu.com/Default.aspxUK Government Web Archivehttp://www.nationalarchives.gov.uk/webarchive/ 25
Other Data and Resources The Mass Observation Archive Specialises in material about everyday life in Britain. It contains papers generated by the original Mass Observation social research organisation (1937 to early 1950s), and newer material collected continuously since 1981 http://www.massobs.org.uk/index.htm A Vision of Britain through Time Contains historical Maps, Census Reports, Election reports and other historical material, searchable by local area. http://www.visionofbritain.org.uk/ Charles Booth Online Archive Gives access to archive material from the Booth collections ofthe London School of Economics and Political Science and the Senate House Library http://booth.lse.ac.uk/ Images from The Mass Observation Archive 26
Online Mapping Tools using Google MapsMapTubehttp://www.maptube.org/Google Drivehttps://drive.google.comGmap Creatorhttp://www.casa.ucl.ac.uk/software/gmapcreator.aspOther, more advanced online mapping(requires coding):Open Layers http://openlayers.org/OS Openspacehttp://www.ordnancesurvey.co.uk/oswebsite/web-services/os-openspace/index.html 29
Data VisualizationPresenting data in a usefuland interesting mannerAllowing concepts to beeasily understoodLots of examples onlinee.g:http://flowingdata.com/http://datavisualization.ch/http://www.guardian.co.uk/news/datablog 30
DataCite DataCite is an international consortium which aims to: Establish easier access to research data on the Internet Increase acceptance of research data as legitimate, citable contributions to the scholarly record Support data archiving that will permit results to be verified and re-purposed for future study http://datacite.org/ 32
Connecting an Article with the Underlying Data URLs are not persistent (e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).Digital Object Identifiers (DOIs) offer a solution Mostly widely used identifier for scientific articles Dataset Researchers, authors, publishers Yancheva et al (2007). Analyses know how to use them on sediment of Lake Maar. Put datasets on the same playing PANGAEA. field as articles doi:10.1594/PANGAEA.587840 33
Open Researcher and Contributor ID (ORCID)http://about.orcid.org/•Infrastructure is being created for researchers to build up an openportfolio of research objects
Open Researcher and Contributor ID (ORCID)•Register an ORCID ID www.orcid.org and link published papersusing ORCID’s tools
Sharing Data - Figshare•Non published outputs (working papers, datasets) can be depositedin figshare http://figshare.com/ given a DataCite DOI and linked backand added to ORCID profile
Impact of Data•View the impact of your work using traditional citation metrics andsocial citations http://www.impactstory.org/
Depositing and Archiving DataWhy Archive?Institutional RepositoriesUK Data Archive/ESDSMetadata and Code! 39
BL Social Science Research Bloghttp://britishlibrary.typepad.co.uk/socialscience/ 40
Contact Details John Kaye Lead Curator – Digital Social Science Socials Sciences The British Library 96 Euston Road London NW1 2DB Telephone: 020 7412 7450 Email: firstname.lastname@example.org Twitter: @johnkayebl http://britishlibrary.typepad.co.uk/socialscience/ Slides - http://www.slideshare.net/johnkayebl 41