Collaborative Data
Mark-up & Distribution

Jacqueline Whyte Appleby
Scholars Portal
October 17, 2013
CASRAI
Ontario Data Documentation, Extraction Service, and
Infrastructure
<odesi>
• An online data research tool developed between
2007 and 2009
• Jointly funded by the Ontario Council of
Universi...
<odesi> in context
is managed by

which is a service of

which is governed by

21 Ontario
university
libraries
<odesi> goals
• Facilitate discovery, downloading, and analysis of
data products
• Create a tool that is useful to both ex...
<odesi>: where does the content
come from?
Confidential Microdata
available through the RDC

Statistics Canada
(data produ...
<odesi> : where does the content
come from?
ICPSR metadata
Public Use Microdata Files
(PUMFs)
Available through the DLI
Ca...
<odesi>: the catalogue
<odesi>: the repository
<odesi> in use
Broad questions:
•

“I want to write a paper on women in the workforce…”
<odesi> in use
Broad questions:
•

“I’m interested in exploring on-reserve housing issues.”
<odesi> in use
Testing a hypothesis
•

“How many Ontarians smoke today compared with 10 years ago?”
<odesi> in use
Testing a hypothesis
•

“How many Ontarians smoke today compared with 50 years ago?”
<odesi> highlights
• Metadata is bilingual and DDI-compliant
• Don’t need statistical software to run many
analyses
• Surv...
MarkIt! program
• OCUL members (usually data librarians) apply for
funding
• Funds pay for student employees, who are
trai...
MarkIt! program
MarkIt! program best practices
• Be flexible; always be ready to shift priorities
• Establish best practices and adhere to...
MarkIt! Program expansion?
Next up: Geospatial metadata?
Next up: Dataverse support?
Next up: Dataverse support?
Next up: Dataverse support?
Thank you!

http://odesi.ca
http://scholarsportal.info
http://geo.scholarsportal.info
http://dataverse.scholarsportal.info...
Upcoming SlideShare
Loading in...5
×

Collaborative Data Mark-up & Distribution

548

Published on

Presented at CASRAI 2013: Reconnect Big Data.

Appreciation to Amber Leahey, the metadata librarian at Scholars Portal, whose 2012 iASSIST slides were very useful in putting this together.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
548
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Good afternoon, my name is Jacqueline Whyte Appleby and I’m the Client Services Librarian, as well as the interim data and geospatial librarian, which means I both manage Odesi and work in teaching and research support, helping libraries and end users with Odesi implementation and useTalk about Odesi – a platform for finding and working with Data that is used by Ontario universitiesSpecial thanks to Amber Leahey whose presentation at iASSIST 2013 informed some of this presentation
  • Odesi is a too developed between 2007 and 2009….and I also want to acknowledge Jeff Moon, who was with this project from the beginning. Odesi was jointly funded by OCUL, which is the parent organization of Scholars Portal, and Ontario Buys, which is a government program we’ve had a lot of success withOdesi was developed specifically to support the Ontario university community, but we’ve now expanded beyond Ontario and there are schools in other provinces using the service.
  • So, just so the structure is clear here….Odesi is managed, on a day-to-day basis by Scholars Portal, by myself, by our programmers and metadata staffScholars Portal is a service of the Ontario Council of University Libraries, which is governed by the 21 Ontario university libraries.So staff at all of these universities work to make Scholars Portal services what they are, and it’s really the data librarians at many of these schools that pushed Odesi to realization, and who continue to build it
  • Odesi was created with the goal of facilitating discovery, downloading, and analysis of a range of data products To do this, it was important that Odesi was useful to both experienced and new researchers – so it needed to be sophisticated enough to allow for advanced searching and analysis, but it also needed to be friendly enough that an undergraduate with a question could play around with it and get something useful.
  • As I’ve alluded to, Odesi has a lot of content, and much of it comes from Statistics CanadaAs Jeff discussed, the RDC needs to be visited in person, it’s for people with very specific research needsAs Sylvie discussed, PUMFs are available through the DLI,This is data that you don’t need to go to the RDC to get, but it’s not available to everyone – you need to have signed the DLI license So we include many of the PUMFs, as well as a lot of supporting and related documentation that StatsCan publishes – that includes the codebooks, sometimes copies of the actual surveys themselves, reports based on the survey results. So we have about 3000 survey from this source
  • We also have data from other sources.The ICPSR – the interuniversity consortium for political and social research – houses a lot of excellent data, and most of our universities actually subscribe to it separately. So we’ve set up a script to run monthly and pull all metadata for ICPSR so that these surveys are also searchable in Odesi – the students will then be directed to the ICPSR website. We also host a large number of Canadian Gallup polls and the Canadian Opinion Research Archive (CORA) data – this is based at Queen’s – and these are really rich sources of social data, for students wondering what people thought about smoking, or the middle east, or the prime minister over time…they go back to the 70s (confirm).
  • Odesi has two pieces, and the first piece is the catalogue. You can search or browse for data in the catalogue, and you can do so at the series (Census of Canada) or study level (Census of Canada 2006), and using keywords. What’s really great about Odesi is that you can also search at the variable level – you can find particular questions and answers to questionsThe Odesi catalogue was built in-house using MarkLogic (about)
  • Once you’ve found data you want to explore, you’ll move into the repository. The repository is run on a platform called Nesstar (about) and it has this front end for users, and it also has a publisher’s backend, which is how we get all of this great metadata in there. It’s DDI compliantYou’re looking at a question fro the 2006 census on field of study…and you can see there’s the literal question asked and the breakdown of responses. And this is available for every question in almost every survey we have in Odesi. Users can run a cross tabulation on any variables that interest them right in the interface, or they can download a whole data set, or just part of one, in a number of formats, including SPSS, SAS, Stata, and CSV In other words it’s very easy to say: give me a file with the responses to every question by women who are over 50 (so cases), or I want to see everyone’s response to the question of how much exercise they get, and which province they live in, so I can compare across provinces.
  • DDI – data documentation Have 3300 data sets, many more recordsICPSR are pulled in as a script Stats Can is adding new data all the time – how to stay on top of that?
  • Metadata markup is used in our ML database to allow for searching at granular levels, for example if you wanted to know how many surveys had variables that asked about smoking you can search for this using odesi. The markup provided by the individuals doing the markup assist in performing better and more accurate searching.
  • -students grab the file off the Stats Can FTP server -markup the variables, and study level metadata (i.e. weighting, abstract, sampling procedures etc.)-means we can get data sets up quite quickly
  • Dataverse – one time deposits of legacy data, but also studies in process, with geographically dispersed researchers contributing to marking up data.We may also be in a good position to develop some guidelines for researchers doing all of their own depositing, or for librarians working to support them.
  • Dataverse – one time deposits of legacy data, but also studies in process, with geographically dispersed researchers contributing to marking up data.We may also be in a good position to develop some guidelines for researchers doing all of their own depositing, or for librarians working to support them.
  • Dataverse – one time deposits of legacy data, but also studies in process, with geographically dispersed researchers contributing to marking up data.We may also be in a good position to develop some guidelines for researchers doing all of their own depositing, or for librarians working to support them.
  • Collaborative Data Mark-up & Distribution

    1. 1. Collaborative Data Mark-up & Distribution Jacqueline Whyte Appleby Scholars Portal October 17, 2013 CASRAI
    2. 2. Ontario Data Documentation, Extraction Service, and Infrastructure
    3. 3. <odesi> • An online data research tool developed between 2007 and 2009 • Jointly funded by the Ontario Council of University Libraries (OCUL) and OntarioBuys • Developed to serve the Ontario university community, now expanding beyond the province
    4. 4. <odesi> in context is managed by which is a service of which is governed by 21 Ontario university libraries
    5. 5. <odesi> goals • Facilitate discovery, downloading, and analysis of data products • Create a tool that is useful to both experienced and new researchers
    6. 6. <odesi>: where does the content come from? Confidential Microdata available through the RDC Statistics Canada (data producers) Public Use Microdata Files (PUMFs) available through the DLI Other public products available through statcan.gc.ca
    7. 7. <odesi> : where does the content come from? ICPSR metadata Public Use Microdata Files (PUMFs) Available through the DLI Canadian Gallup Polls data Other public products Available through statcan.gc.ca Canadian Opinion Research Archive (CORA) data
    8. 8. <odesi>: the catalogue
    9. 9. <odesi>: the repository
    10. 10. <odesi> in use Broad questions: • “I want to write a paper on women in the workforce…”
    11. 11. <odesi> in use Broad questions: • “I’m interested in exploring on-reserve housing issues.”
    12. 12. <odesi> in use Testing a hypothesis • “How many Ontarians smoke today compared with 10 years ago?”
    13. 13. <odesi> in use Testing a hypothesis • “How many Ontarians smoke today compared with 50 years ago?”
    14. 14. <odesi> highlights • Metadata is bilingual and DDI-compliant • Don’t need statistical software to run many analyses • Surveys also include all supplementary material • New surveys added daily
    15. 15. MarkIt! program • OCUL members (usually data librarians) apply for funding • Funds pay for student employees, who are trained to mark up surveys using DDI 2 standards • 2013-2014: Carleton, U of Ottawa, Queen’s and McMaster are participating, as well as Scholars Portal
    16. 16. MarkIt! program
    17. 17. MarkIt! program best practices • Be flexible; always be ready to shift priorities • Establish best practices and adhere to them • Make QA and editing each others’ work the norm (35% of datasets are marked up at more than one school)
    18. 18. MarkIt! Program expansion?
    19. 19. Next up: Geospatial metadata?
    20. 20. Next up: Dataverse support?
    21. 21. Next up: Dataverse support?
    22. 22. Next up: Dataverse support?
    23. 23. Thank you! http://odesi.ca http://scholarsportal.info http://geo.scholarsportal.info http://dataverse.scholarsportal.info jacqueline@scholarsportal.info
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×