Finding, searching and sharing qualitative data: the uses of XML

548 views

Published on

An introduction to XML and explanation of how it may be used to encode qualitative data produced by health researchers. Talk given by Libby Bishop of the UK Data Service at the Data Management in Practice workshop, which took place on Nov 14th 2013 at the London School of Hygiene and Tropical Medicine

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
548
On SlideShare
0
From Embeds
0
Number of Embeds
50
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Main points – not to teach xml. Researchers need to locate, explore and use data.xml behind the scenes, makes that possible. Even if you are not technical, useful to understand.
  • We have a lot to be proud of, but technologies are advancing, and we want to improve ways we provide access and disseminate data to our users.
  • WE try to listen and learn from researchers – and respond to their/your needs. And those expectations are rising quickly.
  • A bit like html, but for marking up structural features (paragraphs), not format (bold)Hard (for me) to get in the abstract, so going to turn now to examples, cases of uses of XML,For search, use, citation and sharing.
  • And here is a list of the data types we are going to talk about today, we’re not going to go into a huge amount of detail – but this talkwill give you a taste of the data we host
  • 391 hits from search on health survey
  • Similar search – but limited to keyword = tropical and data only. We find a LSHTM data collection!
  • This is just a small part of the cat record for this collection. Note fields like title, and depositor. Pretty standard kinds of metadata (data about data) needed for any data collection. Note the upper right – get DDI XML record….
  • No need to be scared – this is exactly the same info….but you can see its XML structure – using tags.
  • The use of xml for metadata, and doing it in a standardised way, enables exponential increase in power of searching for data. DDI – standardised set of tags for handling social science data.
  • Now, a quick look at some other capabilities made possible by this xml structured mdata.
  • The ability to find and locate variables across surveys and decide if they are “close enough” for your purposes.
  • And the ability to search for health data across archives – for example, this portal for European data archives.
  • DF – intended to address a couple of those areas where UKDS wants to improve
  • Search results show all interviews with search term.With context – surrounding sentence.And with metadata about the interviewee – age, gender, region, SES
  • Here is an example of some of the paper being digitised – essays about future life written by 16yo school leavers, SheppeyFor this collection – there is value in retaining an image file – the handwriting itself is data.
  • WE are scanning, then ocr – to fully digitise text. This version shows original spelling. What to do – show correct or original?
  • XML allows us to keep both in the same document.
  • Typical transcript user could download. Good, but cannot be browsed online, and can’t modify display.
  • Available online. Can use formatting to display turn-taking and Can modify speaker tags with multiple versions of metadata
  • This is QuDex Schema (a bit like DDI) – Qualitative Data Exchange – family of tags created specifically for qualitative data.
  • Made possible by GUID – Globally Unique ID – for every utterance
  • Finding, searching and sharing qualitative data: the uses of XML

    1. 1. Finding, searching and sharing qualitative data: the uses of XML Libby Bishop Producer Relations and Research Ethics Data Management in Practice LSHTM, London, 14 November 2013
    2. 2. UK Data Service seeking to improve • We have one of the largest qualitative data collections– • over 300 data collections in the social sciences Currently users find and download these from our website – generally good, we would like to improve: • No searching within collections • Hard to display complex relationships among related • files within a collection (transcript, audio, image, memo) Cannot reliably cite parts of data
    3. 3. What researchers want from data centres • Search - find data regardless of location • Use – ways to use data flexibly • Examine interview extract in context, online • Decide before download • Support analysis led by research questions (not technology) • Cite – get and give credit appropriately • Preserve – for own or others’ use later XML is not a miracle cure, just a (key) part of the solution
    4. 4. XML – eXtensible Mark-up Language • Language – system for communication • Mark-up – encoding descriptive features of text • Tags, e.g. <u>words spoken in an interview</u> • Extensible – set of tags is not fixed • Text Encoding Initiative (TEI) has 100s • Independent of specific hard/software • Open XML allows qual data (rich, deep, but messy, unstructured) to benefit from computing power typically applied to structured, numeric data.
    5. 5. Search: all types of resource available Data collections • studies • variables Case studies • research • teaching ESRC outputs • • • • Support/ ‘how to’ guides conference paper article report research summary • dataset • theme • methods/statistics
    6. 6. Search
    7. 7. What makes all this possible? XML…..
    8. 8. Data Documentation Initiative (DDI) DDI: A metadata specification for the social sciences
    9. 9. Use and Cite: Digital Futures project • Build a user-friendly system for publishing and • • • • exploring qualitative data online Project includes large-scale digitisation of precious and undigitized materials Browse search results in context Improve display complex data Offer a mechanism for reliably citing data located in the system
    10. 10. Search results – displayed in context
    11. 11. Many formats for different research questions
    12. 12. School Leaver Essay 53 – My Past aaa In 1978 I left school, I was sixteen years old. I came straight out of school into an apprenticeship heavy meter machanics. I served my four year apprenticeship in a garage for another year and the left and started my own garage. At the age of twenty three I got married. The garage was doing well so I didn’t have Much prodlems setting up a home. One year After I had/been married my wife had her first child. When I had some spare time I made up a car for rally cross racing but In the time I was racing I only won a few. When I was twenty five our second child was born. Once when rally driving I had a smash and was in hospital for five months when I was twenty nine we had our third child. I would get up at six o clock and drive to the garage and open it at Saturdays. On some Sundays when I wasn’t rally driving the family would go horse riding or for a picnic whilst I went fishing. In the garage I took an apprenticship from people who had just left school. When I was thirty six we had our fourth child. My first child would come and help in the garage at least when he left school he would get a job. When I was forty I had an extension built on to the garage. I also bought 4 acres of land and built a racetrack and made go-karts for my second and third eldest sons when my last child was eight I brought her a pony and taught her to ride. From when I was forty four My mother died and my father had died when I was twenty nine.
    13. 13. Corrected spelling – for accurate searches <sic>apprenticship</sic><corr>apprenticeship<corr/>
    14. 14. Status quo - rft transcript for download
    15. 15. DF - Target page for an interview
    16. 16. Objects in collection metadata
    17. 17. Richer metadata = richer discovery • Use of DDI 2.5, QuDEx and TEI schema • QuDEx allows identification of data objects: • Interview transcript or audio recording etc. • Relationship to another data object or part of data • Descriptive categories at the object level, e.g. mime • type, interview characteristics, interview setting Capacity to capture rich annotation of parts of data • QuDEx model in use (Schema at: www.data• archive.ac.uk/create-manage/projects/qudex/) Object-level description = a lot of manual work!
    18. 18. Citation – of collection, and utterance World Health Organization and International Collaborative Study of Medical Care Utilization, WHO/ICS Medical Care Utilization Study Data, 1968-1969 [computer file]. Colchester, Essex: UK Data Archive [distributor], January 1981. SN: 1427, http://dx.doi.org/10.5255/UKDA-SN-1427-1
    19. 19. Preservation – benefits of XML • Open standard • Widely adopted as the basis for interchange of documents and data over the Web • Human readable • Best for metadata; some challenges for preserving data itself
    20. 20. How can researchers help? • Produce and share high quality metadata and documentation….and, • Using XML is not that different than text processing and spread sheets
    21. 21. Questions Libby Bishop ebishop@essex.ac.uk

    ×