The Team Approach: Developing a
Data Services Program at the
University of Washington
University of Washington Libraries
Matthew Parsons & Theodore Gerontakos
Agenda
* Background / Structure
* Our Approach
* Metadata
* Moving Forward
Background
Data Services Program Planning Committee,
June 2009-January 2010
Outcomes from report:
• Data Services Coordinator (July 2010)
• Data Services Team (September 2010)
• Data Services CE Fund
Head, Reference & Research Services
Data Services Coordinator
Data Services Team roster:
Organizational Structure
Assoc. Dean Research &
Instructional Services
Assoc. Dean for Resource
Access/Description & Info.
Technology Services
Data Services Coordinator, Chair (Stephanie) Business Computer-Based Librarian (Corey)
Library Specialist, Monographic Srvcs. (Will) Geospatial Data and Maps Librarian (Matt)
U.S. Documents Librarian (Cass) Systems Librarian (Anjanette)
Library Specialist, Monographic Srvcs. (Heather) Metadata Librarian (Theo)
Regional Tech. Coord., NN/LM PNR (Mahria)
Our Approach: A Three Step
Process
• Outreach
• Continuing education for Libraries
staff
• Developing resources and services
Outreach
Outreach efforts were priority for the first year
• Institute for Health Metrics & Evaluation
• Center for Advanced Research Technology in the
Arts & Humanities
• Center for the Study of Demography & Ecology
• Center for Social Science Computational Research
• Office of Research
Continuing Education for
Libraries Staff
• Webinars
• Speakers
• Conferences
• Workshops
• Courses
Developing Resources and
Services
• Data Services LibGuide
• Data Management LibGuide
• Email list: uwlib-datainfo@u
• data.blogspot.com
• Data Citation - EZID
• Data management
reference & consultation
• Dataset collection
• Presentations to groups
& classes on RDM
• Data management pilot
survey
PSNERP
http://wagda.lib.washington.edu/data/geography/wa_state/#PSNERP
The PSNERP Example
We Hold These Truths to Be
Self-Evident…
International Hero
Metadata Fixations
Theo
Everything isn't metadata, right?
2012 Olympics
Zhou Lulu
Olympics Data
Olympics Metadata
Downloadable Metadata!
<rdf:Description
rdf:about="http://reference.data.gov.uk/id/salary-
range/32469-54113">
<rdf:type
rdf:resource="http://reference.data.gov.uk/def/central-
government/SalaryRange"/>
</rdf:Description>
<rdf:Description
rdf:about="http://reference.data.gov.uk/id/salary-
range/32469-54113">
<rdfs:label>£32469 - £54113</rdfs:label>
</rdf:Description>
Data Data Data Data Data
Data/Metadata
Data/Metadata
Metadata Will Manage Datasets
Metadata is essential to data management
“If the data is to be analyzed by
generic tools, the tools need to
“understand” the data. You cannot
just present a bundle-of-bytes to a tool and
expect the tool to intuit where the data values
are and what they mean. The tool will want to
know the metadata.”
--Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alex Szalay, David
J. DeWitt, Gerd Heber, “Scientific Data Management in the Coming
Decade,” in ACM SIGMOD Record, Vol. 34, No. 4, Dec. 2005, p. 34-
41
[From an earlier slide in a
previous version of this talk:]
Data Services Team Roster:
Data Services Coordinator, Chair (Stephanie)
Business Computer-Based Librarian (Corey)
Library Specialist, Monographic Srvcs. (Will)
Geospatial Data and Maps Librarian (Matt)
U.S. Documents Librarian (Cass)
Systems Librarian (Anjanette)
Library Specialist, Monographic Srvcs. (Heather)
Metadata Librarian (Theo)
Regional Tech. Coord., NN/LM PNR (Mahria)
Data Services Specialist (Malyka)
Metadata Workers
One Metadata Librarian
Lots of Metadata Workers
Metadata/Cataloging Librarian
"Traditional" Cataloging
Common Standards
01338nam a22003730
4500001001200000002000900012005001700021008004100038040001300079066000600092066000700098090002
000105099001900125099001900144130004400163245008900207260004600296300004600342500005100388700003
0004397000038004697400036005078800043005438800079005868800049006658800046007148800032007608800
04300792880003100835941001500866910001400881941001500895987005400910ocm227237840154295320011018135
859.0901121s1667 xx a 000 0 chi d aWAUcWAU c1 c$1 aDS707b.S4 1667 aDS707 .S4 1667 aDS707 .S4 16670
6880-01aShan hai jing (Chinese classic)106880-02aShan hai jingb[18 juan, tu 5 juan,cGuo Pu zhuan] Wu Zhiyi [Wu
Renchen] zhu 6880-03a[n.p.bn.p.]cKangxi 6 (1667) xu] a6 v. (double leaves) in 1.billus.c26 cm 6880-04aCaption title:
Shan hai jing guang zhu1 6880-10aGuo, Pu,d276-3241 6880-11aWu, Renchen,d1628?-1689?0 6880-05aShan hai jing guang
zhu006130-01/$1a山海經 (Chinese classic)106245-02/$1a山海經b[18卷, 圖5卷,c郭璞傳] 吳志伊[吳任臣]注0 6260-
03/$1a[n.p.bn.p.]c康熙6 (1667) 序] 6500-04/$1aCaption title: 山海經廣注106700-10/$1a郭璞,d276-324106700-11/$1a吳任
臣,d1628?-1689?eed016740-05/$1a山海經廣注 a.b25105383 amt-75 rev a.b25105383
aPINYINbOCoLCc20011014dre09.30.2001f130 REV /01086nam a22003370
4500001001200000002000900012005001700021008004100038040001300079066000600092066000700098090001
6001050990015001210990015001362450050001512600072002013000041002735000038003146510031003527000038
00383740003000421880004500451880006600496880003800562880003700600880002800637941001500665910001
000680941001500690987004300705ocm229633460155417320011018140158.0910115q15731620xx 000 0 chi d
aWAUcWAU c1 c$1 aDS736b.I15 aDS736 .I15 aDS736 .I15006880-01aYi zhou shub[10 juan]cKong Chao zhu 6880-
02a[n.p.,bMing Jiang shichang kan banc[between 1573 to 1620] a2 v. (double leaves) in case.c29 cm 6880-03aSong Jiading
15 (1222) ba 0aChinaxHistoryxPhilosophy1 6880-04aKong, Chao,d3rd/4th cent0 6880-05aJi zhong Zhou shu006245-
01/$1a逸周書b[10卷]c孔晁注0 6260-02/$1a[n.p.,b明姜士昌刊本c[between 1573 to 1620] 6500-03/$1a宋嘉定15 (1222)跋
]106700-04/$1a孔晁,d3rd/4th cent016740-05/$1a汲冢周書 a.b25217008 acl-71 a.b25217008
aPINYINbOCoLCc20011014dce09.30.200101261nam a22003610…
August 1, 2007
UW Digital Collections
Metadata Work in Progress
MIG Data/Metadata
Data Dictionaries
Metadata Galaxies
Diverse Projects, Collaborations, etc
Schemas, Transforms, XML Technologies
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:local="http://example.org/local">
<xs:annotation>
<xs:documentation>A schema to validate Dublin Core Qualified records as defined by the
DSpace simple archive format. See the DSpace manual, ver. 1.6.1, section
8.3.1.</xs:documentation>
</xs:annotation>
<xs:element name="dublin_core" type="dublincoreType">
<xs:annotation>
<xs:documentation> Top level container for the individual record </xs:documentation>
</xs:annotation>
</xs:element>
<xs:complexType name="dublincoreType">
<xs:sequence maxOccurs="unbounded">
<xs:element ref="dcvalue"/>
</xs:sequence>
</xs:complexType>
<xs:element name="dcvalue" type="dcvalueType">
<xs:annotation>
<xs:documentation>The element that contains data values, with attributes for DC element
name, qualifier, and optional language.</xs:documentation>
</xs:annotation>
Brumfield Russian Architecture
Project: Data Entry Form
OAI-PMH Cheat Sheet
Catalog Card = Data?
XSLT Processing Diagram
Advanced Data Model
Data Service "Initiators"
Data Repository
The Data Deluge
The Data Deluge
The Data Deluge?
The DNA Data Deluge!
Number of hours in a year: 8,766
Who has the time?
COMPLEXITY
This is our moment!!!
[insert quote here on the benefits knowing
everything]
Metadata Worker
Lots to learn, however
What have I actually done?
Some tasks performed by the
Metadata Librarian (1)
Routine tasks as a member of the Data
Services Team:
• Regular meetings
• Planning report
• Data management guide
• Survey: create/distribute
• Group meetings with UW faculty/staff
Some tasks performed by the
Metadata Librarian (2)
Professional development:
• UK Data Archive: "How to Run a Data Service"
• DDI tutorial
• TEI workshop
• Science Commons Symposium at Microsoft
• XML technologies study group (UW Libraries)
• Linked data project group (UW Libraries)
• Geospatial metadata workshop
Some tasks performed by the
Metadata Librarian (3)
Digital humanities:
• Guest lectures on Digital Humanities
• Survey: personally invited eresearchers to
participate
• Text markup (TEI) project
Some tasks performed by the
Metadata Librarian (4)
Other stuff:
• ARL/DLF E-Science Institute
• Data Curation profiles workshop
• Analyzed ISO 191xx and FGDC CSDGM
metadata and schemas
• Dialogue with UW DDI Alliance
representative based in the Center for Studies
in Demography and Ecology
Credits (1)
slide 12: Image generated by Wordle at http://www.wordle.net/ using an RSS feed from the DCMI Home page
http://dublincore.org/ on June 20, 2012.
slide 13: cartoon by Betsy Elswit, accessed at
http://www.library.cornell.edu/staffweb/kaleidoscope/volume19/april2011.html.
slide 16: Photo by psd, Paul Downey, accessed at his Flickr photostream at
http://www.flickr.com/photos/psd/1428129861/.
slide 17: Image of “Data near Here” taken from http://web.cecs.pdx.edu/~vmegler/. This was part of a research
project by Veronika M. Megler, a Ph. D. student in CS at the Maseeh College of Engineering & Computer
Science, Portland State University, in Portland Oregon.
Slide 18: cover of Marin Dimitrov's Metadata management for the BBCs 2010 World Cup site using OWLIM,
European Semantic Technology Conference 2010, found at http://pdfcast.org/pdf/metadata-management-
for-the-bbc-rsquo-s-2010-world-cup-site-using-owlim.
Slide 19: University of Edinburgh web site, Information Services page on Data Documentation and Metadata.
http://www.ed.ac.uk/schools-departments/information-services/services/research-support/data-
library/research-data-mgmt/data-mgmt/data-documentation.
Slide 20: Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alex Szalay, David J. DeWitt, Gerd Heber,
“Scientific Data Management in the Coming Decade,” in ACM SIGMOD Record, Vol. 34, No. 4, Dec.
2005, p. 34-41.
Credits (2)
Slide 21: Abstract accessed at http://adsabs.harvard.edu/abs/2011AGUFMIN41B1408M on June 22, 2012.
Slide 22: Denise Rogers, “Database management: Metadata is more important than you think!” in Database
Journal, March 24, 2010. Accessed June 24, 2012 at
http://www.databasejournal.com/sqletc/article.php/3870756/Database-Management-Metadata-is-more-
important-than-you-think.htm.
Slide 25: Taken from http://sharilopatin.com/2011/09/15/im-tired-of-writing/, Shar Lopatin’s blog, Shari
Lopatin: Rogue Writer. No image credits cited at that site.
Slide 34: Library Card taken from http://www.ccp.edu.ph/main/index.php/how-to-guides/locate-a-book at
Central College of the Phillippines web site.
Slide 35: XSLT processing model taken from
http://www.w3.org/Consortium/Offices/Presentations/XSLT_XPATH/#(1), “Overview of XSLT and
Xpath,” published by the W3C, 2005.
Slide 36: Handwritten model taken from http://dinesh-logbook.blogspot.com/2008/12/design-and-implement-
data-model.html, the Dinesh LogBook, probably by Dinesh Kumar.
Slide 37: “Terminators” taken from
http://warhammer40k.wikia.com/wiki/File:Librarian_Terminator_Blood_Angels.jpg which is an image
page in http://warhammer40k.wikia.com/wiki/Warhammer_40k_Wiki, a wiki to the game Warhammer
40,000 maintained by Lead Administrator Montonius.
Credits (3)
Slide 38: Data repository diagram taken from http://www.w3.org/2001/sw/sweo/public/UseCases/FAO/
“Semantic Web Use Cases and Case Studies” by Gauri Salokhe, Margherita Sini, Johannes Keizer, Food
and Agriculture Organization of the United Nations, Italy, and published by W3C, 2007.
Slide 39: Found at http://afeatheradrift.wordpress.com/2009/10/03/learning-stuff-i-dont-wanna-know/, “A
Feather Adrift,” Sherry Peyton's blog, but she attributes it to howtosplitanatom.com, Steve Spalding's blog,
where the image appears at http://howtosplitanatom.com/news/how-to-improve-productivity/ with no
attribution.
Slide 40: Dallas Museum of Art Metadata Standards Crosswalk available at
http://www.museumsandtheweb.com/mw2007/papers/gutierrez/gutierrez.fig.11.pdf
Slide 41: DDI Word Cloud found at http://www.ddialliance.org/what and seems to have been generated at
www.tagxedo.com.
Data Services: Moving Forward &
Why It Matters
University of Washington Libraries
Stephanie Wright
Agenda
* What Does It All Mean
* Why It Matters
* What To Do Next
* Q&A
Jargon
DATA
That which is collected, observed, or created,
for purposes of analyzing to produce original
research results.
DISC-UK DataShare Report:
http://ie-repository.jisc.ac.uk/336/1/DataSharefinalreport.pdf
Jargon
BIG DATA
D
Images: http://thearmyyouhave.blogspot.com/ and http://www.tickdata.com/wp-
content/uploads/2010/06/options-data.jpg
Jargon
LONG TAIL OF DATA
Image credit: disruptormonkey.typepad.com
So what?
Image credit: http://www.trueloveconquersall.com/2012/07/huh.html
Institutional Level
•Digital Preservation Network (DPN)
“The Digital Preservation Network is being
created by research-intensive universities to
ensure long-term preservation of the complete
digital scholarly record.”
http://d-p-n.org/
Departmental Level
“… the UW’s computer science
department was able to attract world-
class professors in “big data,” “machine
learning” and “visual data analysis…”
http://www.geekwire.com/2012/love-marriage-university-washington-bolstered-machine-learning-big-data-staff/
http://seattletimes.nwsource.com/html/localnews/2018546054_computerscience28m.html
Departmental Level
Data Curation Faculty Position
“The University of Washington Information School is building an
innovative program in data sciences and data curation.”
Tenure Track Assistant or Associate Professor of Human
Centered Design & Engineering
”Position open in “big data” with relationship to crowdsourcing, social
computing, visualization, visual analytics, individual and group
behavior, ubiquitous computing.”
Department of Communication, Assistant Professor
“… seeks a full-time tenure-track Assistant Professor to develop
communication scholarship with “big data” computational approaches.”
In Short…
•Data is coming here.
•We need to support it.
•It affects all of us.
Image credit: http://tinyurl.com/9ojjx64
It’s Not That Bad
Image credit: http://www.freedigitalphotos.net/images/Emotions_g96-Scared_p33305.html
A Little Help From Your
Friends
•Data Services LibGuide
ohttp://guides.lib.washington.edu/data
•Email list
ohttps://mailman1.u.washington.edu/mailman/
listinfo/uwlib-datainfo
•Data Management LibGuide
ohttp://guides.lib.washington.edu/dmg
•EZID
oService to register permanent identifiers for datasets
ohttp://guides.lib.washington.edu/loader.php?type=d
&id=502026
Planning Ahead
•In the short term (this coming year)
oResearchWorks
oResearch Data Management Needs Survey
oCurriculum for library staff and
researchers (data literacy, data services,
dm)
oInvited Speakers
oCommunication Plan
oStrategic Agenda
Planning Ahead
•More broadly (ongoing)
oOpportunities for collaboration
Campus data initiatives
Outreach to new faculty / research groups
“doing data”
oStay abreast of new developments
SciVal
oIdentify & prioritize needs and services
Digital Humanities
Remember…
We don’t have to do it all.
Questions?
Thank you!
Stephanie Wright, Data Services Coordinator: swright@uw.edu
Theo Gerontakos, Metadata Librarian: tgis@uw.edu
Matt Parsons, Geospatial Data/Maps Librarian: parsonsm@uw.edu

UW Libraries Data Services Forum

  • 1.
    The Team Approach:Developing a Data Services Program at the University of Washington University of Washington Libraries Matthew Parsons & Theodore Gerontakos
  • 2.
    Agenda * Background /Structure * Our Approach * Metadata * Moving Forward
  • 3.
    Background Data Services ProgramPlanning Committee, June 2009-January 2010 Outcomes from report: • Data Services Coordinator (July 2010) • Data Services Team (September 2010) • Data Services CE Fund
  • 4.
    Head, Reference &Research Services Data Services Coordinator Data Services Team roster: Organizational Structure Assoc. Dean Research & Instructional Services Assoc. Dean for Resource Access/Description & Info. Technology Services Data Services Coordinator, Chair (Stephanie) Business Computer-Based Librarian (Corey) Library Specialist, Monographic Srvcs. (Will) Geospatial Data and Maps Librarian (Matt) U.S. Documents Librarian (Cass) Systems Librarian (Anjanette) Library Specialist, Monographic Srvcs. (Heather) Metadata Librarian (Theo) Regional Tech. Coord., NN/LM PNR (Mahria)
  • 5.
    Our Approach: AThree Step Process • Outreach • Continuing education for Libraries staff • Developing resources and services
  • 6.
    Outreach Outreach efforts werepriority for the first year • Institute for Health Metrics & Evaluation • Center for Advanced Research Technology in the Arts & Humanities • Center for the Study of Demography & Ecology • Center for Social Science Computational Research • Office of Research
  • 7.
    Continuing Education for LibrariesStaff • Webinars • Speakers • Conferences • Workshops • Courses
  • 8.
    Developing Resources and Services •Data Services LibGuide • Data Management LibGuide • Email list: uwlib-datainfo@u • data.blogspot.com • Data Citation - EZID • Data management reference & consultation • Dataset collection • Presentations to groups & classes on RDM • Data management pilot survey
  • 9.
  • 10.
  • 11.
    We Hold TheseTruths to Be Self-Evident…
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 18.
  • 19.
  • 20.
  • 21.
    Data Data DataData Data
  • 22.
  • 23.
  • 24.
  • 25.
    Metadata is essentialto data management “If the data is to be analyzed by generic tools, the tools need to “understand” the data. You cannot just present a bundle-of-bytes to a tool and expect the tool to intuit where the data values are and what they mean. The tool will want to know the metadata.” --Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alex Szalay, David J. DeWitt, Gerd Heber, “Scientific Data Management in the Coming Decade,” in ACM SIGMOD Record, Vol. 34, No. 4, Dec. 2005, p. 34- 41
  • 29.
    [From an earlierslide in a previous version of this talk:] Data Services Team Roster: Data Services Coordinator, Chair (Stephanie) Business Computer-Based Librarian (Corey) Library Specialist, Monographic Srvcs. (Will) Geospatial Data and Maps Librarian (Matt) U.S. Documents Librarian (Cass) Systems Librarian (Anjanette) Library Specialist, Monographic Srvcs. (Heather) Metadata Librarian (Theo) Regional Tech. Coord., NN/LM PNR (Mahria) Data Services Specialist (Malyka)
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
    Common Standards 01338nam a22003730 4500001001200000002000900012005001700021008004100038040001300079066000600092066000700098090002 000105099001900125099001900144130004400163245008900207260004600296300004600342500005100388700003 0004397000038004697400036005078800043005438800079005868800049006658800046007148800032007608800 04300792880003100835941001500866910001400881941001500895987005400910ocm227237840154295320011018135 859.0901121s1667xx a 000 0 chi d aWAUcWAU c1 c$1 aDS707b.S4 1667 aDS707 .S4 1667 aDS707 .S4 16670 6880-01aShan hai jing (Chinese classic)106880-02aShan hai jingb[18 juan, tu 5 juan,cGuo Pu zhuan] Wu Zhiyi [Wu Renchen] zhu 6880-03a[n.p.bn.p.]cKangxi 6 (1667) xu] a6 v. (double leaves) in 1.billus.c26 cm 6880-04aCaption title: Shan hai jing guang zhu1 6880-10aGuo, Pu,d276-3241 6880-11aWu, Renchen,d1628?-1689?0 6880-05aShan hai jing guang zhu006130-01/$1a山海經 (Chinese classic)106245-02/$1a山海經b[18卷, 圖5卷,c郭璞傳] 吳志伊[吳任臣]注0 6260- 03/$1a[n.p.bn.p.]c康熙6 (1667) 序] 6500-04/$1aCaption title: 山海經廣注106700-10/$1a郭璞,d276-324106700-11/$1a吳任 臣,d1628?-1689?eed016740-05/$1a山海經廣注 a.b25105383 amt-75 rev a.b25105383 aPINYINbOCoLCc20011014dre09.30.2001f130 REV /01086nam a22003370 4500001001200000002000900012005001700021008004100038040001300079066000600092066000700098090001 6001050990015001210990015001362450050001512600072002013000041002735000038003146510031003527000038 00383740003000421880004500451880006600496880003800562880003700600880002800637941001500665910001 000680941001500690987004300705ocm229633460155417320011018140158.0910115q15731620xx 000 0 chi d aWAUcWAU c1 c$1 aDS736b.I15 aDS736 .I15 aDS736 .I15006880-01aYi zhou shub[10 juan]cKong Chao zhu 6880- 02a[n.p.,bMing Jiang shichang kan banc[between 1573 to 1620] a2 v. (double leaves) in case.c29 cm 6880-03aSong Jiading 15 (1222) ba 0aChinaxHistoryxPhilosophy1 6880-04aKong, Chao,d3rd/4th cent0 6880-05aJi zhong Zhou shu006245- 01/$1a逸周書b[10卷]c孔晁注0 6260-02/$1a[n.p.,b明姜士昌刊本c[between 1573 to 1620] 6500-03/$1a宋嘉定15 (1222)跋 ]106700-04/$1a孔晁,d3rd/4th cent016740-05/$1a汲冢周書 a.b25217008 acl-71 a.b25217008 aPINYINbOCoLCc20011014dce09.30.200101261nam a22003610…
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
    Schemas, Transforms, XMLTechnologies <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:local="http://example.org/local"> <xs:annotation> <xs:documentation>A schema to validate Dublin Core Qualified records as defined by the DSpace simple archive format. See the DSpace manual, ver. 1.6.1, section 8.3.1.</xs:documentation> </xs:annotation> <xs:element name="dublin_core" type="dublincoreType"> <xs:annotation> <xs:documentation> Top level container for the individual record </xs:documentation> </xs:annotation> </xs:element> <xs:complexType name="dublincoreType"> <xs:sequence maxOccurs="unbounded"> <xs:element ref="dcvalue"/> </xs:sequence> </xs:complexType> <xs:element name="dcvalue" type="dcvalueType"> <xs:annotation> <xs:documentation>The element that contains data values, with attributes for DC element name, qualifier, and optional language.</xs:documentation> </xs:annotation>
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
    The DNA DataDeluge!
  • 55.
    Number of hoursin a year: 8,766
  • 56.
  • 57.
  • 58.
    This is ourmoment!!! [insert quote here on the benefits knowing everything]
  • 59.
  • 60.
  • 61.
    What have Iactually done?
  • 62.
    Some tasks performedby the Metadata Librarian (1) Routine tasks as a member of the Data Services Team: • Regular meetings • Planning report • Data management guide • Survey: create/distribute • Group meetings with UW faculty/staff
  • 63.
    Some tasks performedby the Metadata Librarian (2) Professional development: • UK Data Archive: "How to Run a Data Service" • DDI tutorial • TEI workshop • Science Commons Symposium at Microsoft • XML technologies study group (UW Libraries) • Linked data project group (UW Libraries) • Geospatial metadata workshop
  • 64.
    Some tasks performedby the Metadata Librarian (3) Digital humanities: • Guest lectures on Digital Humanities • Survey: personally invited eresearchers to participate • Text markup (TEI) project
  • 65.
    Some tasks performedby the Metadata Librarian (4) Other stuff: • ARL/DLF E-Science Institute • Data Curation profiles workshop • Analyzed ISO 191xx and FGDC CSDGM metadata and schemas • Dialogue with UW DDI Alliance representative based in the Center for Studies in Demography and Ecology
  • 66.
    Credits (1) slide 12:Image generated by Wordle at http://www.wordle.net/ using an RSS feed from the DCMI Home page http://dublincore.org/ on June 20, 2012. slide 13: cartoon by Betsy Elswit, accessed at http://www.library.cornell.edu/staffweb/kaleidoscope/volume19/april2011.html. slide 16: Photo by psd, Paul Downey, accessed at his Flickr photostream at http://www.flickr.com/photos/psd/1428129861/. slide 17: Image of “Data near Here” taken from http://web.cecs.pdx.edu/~vmegler/. This was part of a research project by Veronika M. Megler, a Ph. D. student in CS at the Maseeh College of Engineering & Computer Science, Portland State University, in Portland Oregon. Slide 18: cover of Marin Dimitrov's Metadata management for the BBCs 2010 World Cup site using OWLIM, European Semantic Technology Conference 2010, found at http://pdfcast.org/pdf/metadata-management- for-the-bbc-rsquo-s-2010-world-cup-site-using-owlim. Slide 19: University of Edinburgh web site, Information Services page on Data Documentation and Metadata. http://www.ed.ac.uk/schools-departments/information-services/services/research-support/data- library/research-data-mgmt/data-mgmt/data-documentation. Slide 20: Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alex Szalay, David J. DeWitt, Gerd Heber, “Scientific Data Management in the Coming Decade,” in ACM SIGMOD Record, Vol. 34, No. 4, Dec. 2005, p. 34-41.
  • 67.
    Credits (2) Slide 21:Abstract accessed at http://adsabs.harvard.edu/abs/2011AGUFMIN41B1408M on June 22, 2012. Slide 22: Denise Rogers, “Database management: Metadata is more important than you think!” in Database Journal, March 24, 2010. Accessed June 24, 2012 at http://www.databasejournal.com/sqletc/article.php/3870756/Database-Management-Metadata-is-more- important-than-you-think.htm. Slide 25: Taken from http://sharilopatin.com/2011/09/15/im-tired-of-writing/, Shar Lopatin’s blog, Shari Lopatin: Rogue Writer. No image credits cited at that site. Slide 34: Library Card taken from http://www.ccp.edu.ph/main/index.php/how-to-guides/locate-a-book at Central College of the Phillippines web site. Slide 35: XSLT processing model taken from http://www.w3.org/Consortium/Offices/Presentations/XSLT_XPATH/#(1), “Overview of XSLT and Xpath,” published by the W3C, 2005. Slide 36: Handwritten model taken from http://dinesh-logbook.blogspot.com/2008/12/design-and-implement- data-model.html, the Dinesh LogBook, probably by Dinesh Kumar. Slide 37: “Terminators” taken from http://warhammer40k.wikia.com/wiki/File:Librarian_Terminator_Blood_Angels.jpg which is an image page in http://warhammer40k.wikia.com/wiki/Warhammer_40k_Wiki, a wiki to the game Warhammer 40,000 maintained by Lead Administrator Montonius.
  • 68.
    Credits (3) Slide 38:Data repository diagram taken from http://www.w3.org/2001/sw/sweo/public/UseCases/FAO/ “Semantic Web Use Cases and Case Studies” by Gauri Salokhe, Margherita Sini, Johannes Keizer, Food and Agriculture Organization of the United Nations, Italy, and published by W3C, 2007. Slide 39: Found at http://afeatheradrift.wordpress.com/2009/10/03/learning-stuff-i-dont-wanna-know/, “A Feather Adrift,” Sherry Peyton's blog, but she attributes it to howtosplitanatom.com, Steve Spalding's blog, where the image appears at http://howtosplitanatom.com/news/how-to-improve-productivity/ with no attribution. Slide 40: Dallas Museum of Art Metadata Standards Crosswalk available at http://www.museumsandtheweb.com/mw2007/papers/gutierrez/gutierrez.fig.11.pdf Slide 41: DDI Word Cloud found at http://www.ddialliance.org/what and seems to have been generated at www.tagxedo.com.
  • 69.
    Data Services: MovingForward & Why It Matters University of Washington Libraries Stephanie Wright
  • 70.
    Agenda * What DoesIt All Mean * Why It Matters * What To Do Next * Q&A
  • 71.
    Jargon DATA That which iscollected, observed, or created, for purposes of analyzing to produce original research results. DISC-UK DataShare Report: http://ie-repository.jisc.ac.uk/336/1/DataSharefinalreport.pdf
  • 72.
    Jargon BIG DATA D Images: http://thearmyyouhave.blogspot.com/and http://www.tickdata.com/wp- content/uploads/2010/06/options-data.jpg
  • 73.
    Jargon LONG TAIL OFDATA Image credit: disruptormonkey.typepad.com
  • 74.
    So what? Image credit:http://www.trueloveconquersall.com/2012/07/huh.html
  • 75.
    Institutional Level •Digital PreservationNetwork (DPN) “The Digital Preservation Network is being created by research-intensive universities to ensure long-term preservation of the complete digital scholarly record.” http://d-p-n.org/
  • 76.
    Departmental Level “… theUW’s computer science department was able to attract world- class professors in “big data,” “machine learning” and “visual data analysis…” http://www.geekwire.com/2012/love-marriage-university-washington-bolstered-machine-learning-big-data-staff/ http://seattletimes.nwsource.com/html/localnews/2018546054_computerscience28m.html
  • 77.
    Departmental Level Data CurationFaculty Position “The University of Washington Information School is building an innovative program in data sciences and data curation.” Tenure Track Assistant or Associate Professor of Human Centered Design & Engineering ”Position open in “big data” with relationship to crowdsourcing, social computing, visualization, visual analytics, individual and group behavior, ubiquitous computing.” Department of Communication, Assistant Professor “… seeks a full-time tenure-track Assistant Professor to develop communication scholarship with “big data” computational approaches.”
  • 78.
    In Short… •Data iscoming here. •We need to support it. •It affects all of us.
  • 79.
  • 80.
    It’s Not ThatBad Image credit: http://www.freedigitalphotos.net/images/Emotions_g96-Scared_p33305.html
  • 81.
    A Little HelpFrom Your Friends •Data Services LibGuide ohttp://guides.lib.washington.edu/data •Email list ohttps://mailman1.u.washington.edu/mailman/ listinfo/uwlib-datainfo •Data Management LibGuide ohttp://guides.lib.washington.edu/dmg •EZID oService to register permanent identifiers for datasets ohttp://guides.lib.washington.edu/loader.php?type=d &id=502026
  • 82.
    Planning Ahead •In theshort term (this coming year) oResearchWorks oResearch Data Management Needs Survey oCurriculum for library staff and researchers (data literacy, data services, dm) oInvited Speakers oCommunication Plan oStrategic Agenda
  • 83.
    Planning Ahead •More broadly(ongoing) oOpportunities for collaboration Campus data initiatives Outreach to new faculty / research groups “doing data” oStay abreast of new developments SciVal oIdentify & prioritize needs and services Digital Humanities
  • 84.
  • 85.
  • 86.
    Thank you! Stephanie Wright,Data Services Coordinator: swright@uw.edu Theo Gerontakos, Metadata Librarian: tgis@uw.edu Matt Parsons, Geospatial Data/Maps Librarian: parsonsm@uw.edu

Editor's Notes

  • #2 Welcome and introductions. In our forum we are going to describe the University of Washington Libraries' Data Services Team and Data Services Program.
  • #3 The forum will begin with some Background information on how and why the Data Services Team was formed and take a quick look at its organizational structure. Next, will be an overview of the Approach that the Team has taken to implement data services to the campus community. Theo will expand on the role of the metadata librarian as in integral member of our Data Services Team. Lastly, Steph will talk about next steps and future projects for the Data Services Team.
  • #4 In June of 2009, UW Libraries' Administration created the Data Services Program Planning Committee comprised of librarians and staff from various units within the LIbraries. Its charge: Identify a broad set of data services that the Libraries could offer to students and faculty. Over the next six months, the DSPPC: In Jan 2010, the Committee submitted its report to Libraries' Administration. Committee recommendations within the report included: Creating a new position: Data Services Librarian Creating admin. level position: Director for Data Services Creating a Data Services Team Adding elements data services responsibilities to appropriate position descriptions of subject liaisons and library administrators Moving forward with recommendations made in previous proposals, such as the "Geospatial Data Services Infrastructure Enhancement" Making sure the Libraries does not overextend its resources; fall short of data manipulation and data enhancements The UW Libraries' Cabinet decided to implement the following actions: Move ahead with Geospatial Data Services Infrastructure Enhancements (bump up its profile/priority status) Create a temporary Data Services Coordinator position (internal recruitment, 2 year renewable, currently under consideration as permanent appt) Create a Data Services Team comprised of public services and technical services staff and librarians Continue the engagement w/campus partners to develop campus-wide data community Allocate funds to support professional development re data management, data services, etc.
  • #5 To provide a bit more context of how the Data Services Team and the Data Services Coordinator function, here is how our organizational structure looks: The Data Services Coordinator's direct supervisor is the Head of Reference & Research Services. However, the DSC also indirectly reports to Assoc. Deans, Cynthia and Bill (who formed the programming planning committee and control purse strings of the data services fund for CE). The DSC chairs the DST. DST was formed by: identifying staff who possessed a passion for data; recognizing strategic units (ITS, Health Sciences, Maps/GIS, etc.) that already deal with data in some fashion; as well as looking at who was on the planning committee. After assessing current employee workloads, specific librarians and staff were invited to become members of the DST. The current membership of the DST is: Stephanie, Chair (Data Services Coordinator) Will (Library Specialist, Monographic Services) - interest in data, serves on Libraries strategic planning cmte. Cass (U.S. Documents Librarian) Heather (Library Specialist, Monographic Services) - worked on Digital Initiatives projects Mahria (Regional Technology Coordinator, NN/LM PNR) effective July 1, 2012 [National Network of Libraries of Medicine, PNW Region] Corey (Business Computer-Based Services Librarian) - also has visualization skills/interest Matt (Geospatial Data and Maps Librarian) Anjanette (Systems Librarian) Theo (Metadata Librarian) CHARGE The role of the Data Services Team is to serve as a working group with the following responsibilities: Prepare a short and long term plan, guided by the Libraries overall strategic directions, which sets priorities and develops strategies for the provision of data services by the Libraries to students, staff and faculty. Work in close collaboration with appropriate staff, Libraries committees and campus entities that have data management and literacy training as part of their portfolios, to achieve Data Services objectives. Ensure that appropriate staff are kept informed of service changes and training opportunities provided by the Data Services Team, the Libraries, the UW or outside entities. Provide in-person and virtual reference, instruction and consultation services related to data literacy, management and visualization to students, staff and faculty. MEMBERSHIP The Data Services Team is appointed by Head of Reference and Research Services, Associate Dean of RIS, and Associate Dean of RAD/ITS and the Data Services Coordinator. The Data Services Coordinator serves as chair of the team. Members serve a staggered renewable two-year term and should include staff from GIS, ITS, GovPubs, Health Sciences, Social Sciences, Humanities and Sciences. Member representation roles may overlap and should include staff from public and technical services. The team will consult with groups and committees inside and outside the Libraries as additional expertise is required. MEMBER EXPECTATIONS Members are expected to: Attend Data Services Team meetings Bring their knowledge and expertise to bear on the growth of the data services provided by the Libraries including responding to reference queries in person or via QP, chat or referral from reference desks. Raise issues or opportunities for investigation or consideration Play an active role in ensuring that the work of the group is consistently communicated and made available to Libraries staff, in accordance with Libraries Communication Guidelines. Participate in the shared learning environment and contribute to the ongoing education of the group Actively engage all staff in the evolution and provision of data services
  • #6 The Data Services Team took a three step approach to fulfill its charge obligations : outreach, continuing education, and developing resources/services.
  • #7 #1 Outreach. As you all know, the UW is a decentralized campus, therefore outreach efforts were a priority for the first year. Do not underestimate outreach. We identified stakeholders and met with them (formally and informally) to find out what they were doing. Additionally, we used these meetings to discover who else with which we should meet. In this way, we were able to meet with people from across the spectrum. But, most importantly these meetings led to opportunities for collaboration. Collaborative efforts as result of these meetings: CSSCR: training for library staff on data management CSDE: UW Research / City of Seattle symposium invite IHME: InForum presenting on GHDx (Global Health Data Exchange - data catalog for IHME) One might say, the goal was really about PR. Outreach helps to foster community through not only forming connections but through doing stuff together and sharing interests.
  • #8 Some of the continuing education activities that we've done: Through meeting with CSSCR Webinars - data related information: data specific, DMPs, Data Fair Sponsored librarians to participate in an online course for spatial literacy (ALA RUSA) Hosted Data Curation Profiles workshop: Jake Carlson from Purdue Sent librarians to the Data Curation Summer Institute at UofIlinois Sent librarians to the UK Data Archive workshop on setting up a data service - opportunity to meet w/ British Library, London School of Economics & Oxford to discuss data management efforts Sponsored a team of librarians to participated in the ARL/DLF eScience Institute: The ARL/DLF E-Science Institute's mission: to help research libraries develop strategies for engaging with e-science and digital research on their campuses Funded librarians to attend King County GIS Center's GIS Academy Supported attendance at external events - Microsoft Research, conferences/workshops related to data
  • #9 Because our outreach and continuing education efforts gave us a better understanding of the trending practices in the field and the data needs on our own campus, we were able to move forward with developing resources and services: Data Services LibGuide - evolution (more robust guide) on DMPs EZID Data Info email list: uwlib-datainfo@u.washington.edu. Data blog Developing - Survey and follow up interviews on data management needs evaluating data set collection data management reference and consultation
  • #10 An example of trying our hand at set consultation and collection came to us via PSNERP. PSNERP: Puget Sound Nearshore Ecosystem Restoration Project is a partnership of organizations that endeavor to protect and restore natural processes that create and maintain Puget Sound nearshore ecosystems. A sub-group of PSNERP created an integrated geodatabase that characterizes the physical conditions affecting nearshore environments of Puget Sound. Its purpose is to provide an analytical approach to systematically quantify historic change over the last approximately 125 years. One intention of the project was that this "Change Analysis geodatabase," and associated metadata and methodology be available for public download. So, they needed a place to host it. Matt and Theo met with oceanography Miles Logsdon. and his graduate student to discuss the data and talk about the process of handing it over to the Libraries. This was a fairly simple process: there was good metadata, an accompanying methodology report, it was in a ready-to-go gdb, and was small in size. WAGDA was the best place to put it and it is now available to the public at large. http://wagda.lib.washington.edu/data/geography/wa_state/#PSNERP, in the Washington State Geospatial Data Archive
  • #11 As with the PSNERP data upload... Slide: http://wagda.lib.washington.edu/data/geography/wa_state/#PSNERP, in the Washington State Geospatial Data Archive
  • #12 ...it makes sense to me when offering data services that, at some point... Slide; http://truedataservices.com/tds/images/ad.gif taken from the http://truedataservices.com/tds/ web site
  • #13 ...you would call in the metadata guy... Slide: http://www.flickr.com/photos/melomcc/3339786669/, by Mel McC His comment: Mascot for the metadata class I'm taking, inspired by the on-site weekend. Suggested accessories welcome. Please submit images of accessories via links in comments
  • #14 ...not just because that's my opinion... Slide: Image generated by Wordle at http://www.wordle.net/ using an RSS feed from the DCMI Home page http://dublincore.org/ on June 20, 2012.
  • #15 ...as I'm a person for whom just about everything resolves into metadata... Slide: Taken from http://www.newworldencyclopedia.org/entry/File:DNA_replication.svg in New World Encyclopedia, “DNA,” possibly contributed by Rick Swarts
  • #16 ...for example, the Olympics: Slide: Olympic rings taken from http://scansnapcommunity.com/features/7689-faster-higher-stronger-olympics-edition/ on the ScanSnap web site
  • #17 ...I was interested that Zhou Lulu broke a world record for total weight lifted in the women’s +75 kg class, and won the gold medal.... Slide; http://www.london2012.com/athlete/zhou-lulu-1067584/pictures.html#zhou-lulu-china-lifts-during-the-women-75kg-weightlifting , picture credit not given, taken from the Official London 2012 web site
  • #18 ...and I was interested that Zhang Jike remained the #1 player in the world, winning all his matches relatively easily (with the exception of his first match). Slide: http://www.butterflyonline.com/Images/Site/Wallpapers/ZhangJike1600.jpg , taken from the Butterfly North America web site.
  • #19 ...but I was equally interested in Olympics data… Both slides: http://www.bbc.co.uk/blogs/bbcinternet/2012/08/olympic_data_xml_latency.html, “Building the Olympic Data Services” by David Rogers.
  • #20 …and metadata, which featured an adaptation of ontotext OWLIM first implemented during the 2010 World Cup… Slide:
  • #21 …and how there’s Olympics data even publicly downloadable in a range of formats. Slide: data downloaded from http://www.transparency.culture.gov.uk/oda, a page on the site of the Department for Culture, Media and Sport, especially offering salaries of those in the Olympic Delivery Authority.
  • #22 So I’m thinking often about data, and where there’s data... Slide: http://2.bp.blogspot.com/-0e_J_YTx0Fw/TbWFeNK2PMI/AAAAAAAABJ8/nAJHc2_xNsc/s1600/raw+data.png actually appears in the page http://startingsalarylegalsecretary.blogspot.com/2011_04_01_archive.html, a page in the starting salary legal secretary blog.
  • #23 …metadata never seems very far away…
  • #24 …and sometimes it’s difficult to tell what is the data and what is the metadata.
  • #25 Agan, it's not just my opinion: it’s generally understood, among people in the concerned professions, that metadata plays a central role in managing data – as evidenced here in this page from University of Edinburgh; Slide: Slide 19: University of Edinburgh web site, Information Services page on Data Documentation and Metadata. http://www.ed.ac.uk/schools-departments/information-services/services/research-support/data-library/research-data-mgmt/data-mgmt/data-documentation.
  • #26 ...and evidenced here, in this quote by famous people; Slide: Slide 20: Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alex Szalay, David J. DeWitt, Gerd Heber, “Scientific Data Management in the Coming Decade,” in ACM SIGMOD Record, Vol. 34, No. 4, Dec. 2005, p. 34-41.
  • #27 ...and here it’s sufficient to just look at the source and the headline;
  • #28 ...all these recent slides are demonstrating a mode of thought, as in this slide -- but I’m assuming you do not need to be convinced of this, and that most of you are probably “sold” on the importance of metadata.
  • #29 Fortunately, University of Washington Libraries administrators who initiated the current phase of our data services included a metadata librarian in the planning phase…
  • #30 …then a Metadata Librarian was on the roster of the actual Data Services Team that emerged out of the planning phase...
  • #31 …so we had decision makers who were also sold on the importance of metadata in data services -- and the choice of which metadata librarian to place in data services was relatively easy, I assume…
  • #32 ...as we have only one Metadata Librarian (me) – BUT --
  • #33 …we have lots of metadata workers in Cataloging and Metadata Services (formerly MSD). We have 33 members in CaMS, mostly creating and maintaining our bibliographic data using traditional library metadata standards;
  • #34 ...in fact, the Metadata Librarian’s title is not simply the Metadata Librarian, but the Metadata/Cataloging Librarian.
  • #35 The idea was to ensure that the Metadata Librarian stays rooted in MARC, AACR2, LCRIs, LCSH, etc, and now, of course, RDA…
  • #36 …to keep everybody largely on the same page…
  • #37 This full time Metadata/Cataloging Librarian (me) officially started on August 1, 2007 – the day Architecture Hall re-opened after its $25 million renovation... Slide: from the blog “TheWorldAroundUs” maintained by Michael Peterson, owner of Material Practice, a Seattle based design firm, who may have taken the photograph; accessed at http://www.materialpractice.com/blog/?tag=architecture
  • #38 …well after non-MARC metadata work was well established here; Slide: From Boyd and Braas Photographs collection, accessed at http://content.lib.washington.edu/cdm4/item_viewer.php?CISOROOT=/boydBraas&CISOPTR=1&CISOBOX=1&REC=10
  • #39 ...there was even a UW-MIG with 8 members. Once hired, the Metadata Librarian chaired this group experienced in creating digital projects.
  • #40 Our MIG is known mostly for advising digital projects and helping develop metadata application profiles for these digital projects.
  • #41 Our MAPs are largely text-based metadata schemas for digital projects. It became routine for UW-MIG to meet with projects, discuss data modeling, metadata structures, ways of populating those structures, and so forth, by fixating on the production of this document, the data dictionary or MAP.
  • #42 There have always been challenges as to what the scope of our metadata services would be… Slide: “Seeing Standards: a Visualization of the Metadata Universe” by Jenn Riley and Devin Becker, 2009-2010, available from http://www.dlib.indiana.edu/~jenlrile/metadatamap/.
  • #43 ...and MIG is always moving into various areas in which its members need to learn new things... Slide: image taken from http://filmstandards.org/fsc/index.php/How_EN_15744_and_EN_15907_came_into_being, a page in the filmstandards.org site
  • #44 …from using XML technologies…
  • #45 …to populating databases…
  • #46 …to exploring new standards, beyond our familiar MARC and Dublin Core. Slide: OAI-PMH Cheat Sheet by Richard Durban, available at http://www.flickr.com/photos/musebrarian/3380447326/
  • #47 All this is to say that as data creators -- largely creators of “records…”
  • #48 …as data transformers…
  • #49 …as data modelers and metadata advisors and project liaisons…
  • #50 …the Metadata Librarian and MIG members at the UW had already begun providing data services...
  • #51 ...and now, as we’re talking about archiving datasets, giving access to datasets, and generally managing data, metadata workers are going to be relied upon, perhaps beyond "comfort levels."
  • #52 And just in time! as we are in the beginning of The Data Deluge, right? This was the Feb 27 2010 issue of The Economist. Slide: Cover of the Economist, February 27, 2010, accessed at http://www.economist.com/node/21521550.
  • #53 This book was published by Libraries Unlimited at the end of 2009. Slide: Cover of The data deluge : can libraries cope with e-science? / Deanna B. Marcum and Gerald George, editors ; foreword by Kakugyo S. Chiku. Accessed at the Libraries Unlimited web site at http://www.abc-clio.com/product.aspx?isbn=9781591588870
  • #54 Is this the “data deluge?” This was the November 2011 issue of Popular Science. Slide: image accessed at the Popular Science web site, at the page http://www.popsci.com/announcements/article/2011-10/november-2011-data-power.
  • #55 And then there's the undeniably overwhelming amount of data being produced in selected disciplines, as in human genome sciences.
  • #56 So: in doing this work there’s a lot to learn and keep up with – more than any one person can even hope to do…
  • #57 ...and some of us just have to know when to draw the line and say No, there’s not enough time in the day…
  • #58 …while others, metadata workers, maybe "the select few," perhaps over-ambitious, full of energy and enthusiasm to take on the most complex and challenging problems of our day…
  • #60 So, returning to me…
  • #61 …there I am, amazingly well suited to the work, right? Slide: cover image taken from Amazon.com at http://www.amazon.com/The-Right-Skills-Job-Perspectives/dp/0821387146; cover of the book The Right Skills for the Job?: Rethinking Training Policies for Workers published by World Bank Publications. The book is available at http://publications.worldbank.org/.
  • #62 But what have I actually done? Slide: I found this as the logo of “rentalix”,” a commenter on the site funnyjunk (http://www.funnyjunk.com/) but it is clearly Rainbow Dash, who first appeared in generation 3 of My Little Pony, initiated in 2003 by Hasbro.
  • #66 ISO 191xx: this was a preparation to understand the validation in our geospatial metadata repository. I prepared a report explaining the mechanisms in preparation for depositing valid UW metadata to the repository.
  • #70 So now that Matt & Theo have told you what the Team and specifically they have been up to I’d like to delve a bit more into what Data Services is about and where we’re going. I’ll start up high then drill down to some specifics.
  • #71 I’m going to start out by providing some definitions for terms you may have heard flying around. I’ll also explain why you need to understand them. And since you’ve already heard what we’ve been doing, I wanted to give you some broad strokes on what Data Services is planning to focus on next with a couple new services that have just gone “live”. And we’ll finish up with an opportunity for more questions (and hopefully answers).
  • #72 As you might guess, I’ve been talking and reading about this stuff a LOT so before I get too far and start talking in data speak, I want to define a few terms so we’re talking the same language. First, there’s “data”. You would not believe how many definitions you are for such a tiny word. This one is my favorite definition. (Yes, I have a favorite definition of data. Doesn’t everyone?) It’s adapted from the DISC-UK DataShare Report and I like it because 1) it’s short and 2) it doesn’t overtly align itself to a particular discipline or data format. It can be numerical, qualitative, images, videos, computer models… it’s all data. And when we’re talking data services, we’re mostly looking at supporting digital data services. So while this definition could include such things as something tabulated on a piece of graph paper, that’s not really what we’re focused on.
  • #73 I admit I really don’t like this term… everybody seems to have a different definition for it and every definition seems to depend on the focus of the person doing the defining. You may have heard of these – Rodents Of Unusual Size. In general terms, you can think of “big data” as D. O. U. S.s – data of unusual size. It’s basically just sets of data that are too big to manipulate, manage, whatever using the usual software tools. Think of NASA and all the data they’ll be collecting from the Mars Rover mission. Or all the photos in Facebook which as of last year was amounting to about 6 billion (with a b) per day. So are 50 GB big data? Maybe for some people… 50 terabytes… could be. No one has specifically defined it… so you can perhaps understand why I don’t like this term. But, nonetheless, it’s being widely used.
  • #74 This is the term that probably requires the most explanation and you will probably most frequently hear it used in conjunction with the previous term because this is usually what “big data” is not and this graph actually explains it pretty well. The vertical axis (the up and down line) is Frequency of Use. The horizontal axis (side to side) is the total inventory of data – everything, all collected data. The green part represents datasets that are popular, widely used and well managed (think of all the climate data collected and maintained over a hundred years by the National Climate Data Center). The yellow part represents datasets that are less frequently used and are managed in some informal manner (think departmental shared network folders – like SharedDocs). The red part – that is the long tail. It’s data that is rarely used and not managed in any kind of organized fashion. And the scary part… it’s estimated that it’s 80-85% of all data collected. It’s that red part we really want to focus on with data services because that’s where the needs are greatest. It’s the data that a researcher collected 10 yrs, 5yrs, 1 yr ago that’s sitting on a floppy, a CD or a thumb drive in, or worse, under a researcher’s desk. You may notice that size of the dataset is not represented on this graph. Datasets in the long tail of data are not determined by size but for the most part, folks dealing w/ big data know what to do with it and are managing it, which is not always the case for folks dealing with smaller datasets. There are exceptions, of course.
  • #75 Sooo…. Why have I bored you with these definitions? Because this stuff matters now. Research data is beginning to be recognized as a valid AND valued piece of scholarly output. And it doesn’t just matter to the researchers and data geeks. It matters at the institutional level.
  • #76 Earlier this year the Digital Preservation Network (or DPN) was formed and the UW became a member with the support of President Young – and while not focused solely on data, its mission is to “ensure long-term preservation of the complete digital scholarly record”. Not just ejournal articles and ebooks, the COMPLETE digital scholarly record and data certainly is part of that. There’s not much at the website yet but give it time… it’s still relatively new. And if you’re wondering why the Libraries should care… look at that last part of the mission statement again. Take out “digital” and isn’t that why academic libraries were born?
  • #77 Let’s drill down a little and talk about it at the departmental level. How many saw this headline in June? This isn’t the actual Seattle Times headline which also ran an article about this but I liked the title of this one better. The Computer Sciences Dept pulled off a major coup by recruiting three very highly regarded faculty from Carnegie Mellon, University of Pennsylvania and Stanford. And look at their areas of research. Now you might be thinking… Okay, that’s Computer Sciences. Duh… of course they’d be interested in data. But check this out.
  • #78 These are all positions posted here in the last month or so. I ran out of room to include the other iSchool position recently posted. They’re looking to hire a full-time lecturer and one of the areas of interest is data management. I want to point out that these are coming from different disciplines… data isn’t just in the sciences. In fact, I just got back from a week-long workshop held by THE largest social science repository in the world. Ask Deb Raftus & Ann Lally about the Digital Humanities Summer Institute they attended. And that brings up another point… it’s not just the subject librarians who need to be aware of this stuff. You’ve heard Theo and what he’s been up to. Ask ILL staff about the requests for datasets they’ve been getting. Think of Special Collections and the data that comes with the transfer of materials to the University Archives. What I’m trying to say is…
  • #79 Data is recognized as a valuable scholarly output Therefore we need to learn and teach how to manage, preserve and provide access to it. And if we don’t? Everybody is going to try to figure out a way to do it themselves and just for themselves. To put it in perspective, imagine if every department on campus was maintaining the books and journals in their own subject areas. This is what we’re supposed to do… it’s why we’re here. And we’re all going to have to take those skills we’ve always used to do those same things we’ve always done for traditional scholarly outputs and learn how to adapt them to meet scholarly data needs. Okay… now that I’ve said that, some of you may be thinking…
  • #80 It was all I could do not to add sound effects to this one.
  • #81 Relax. Really… it’s not. And I’m not just saying that because I love this stuff. I’m saying that because… #1. This is all still relatively new for EVERYONE, even those of us on the Data Services Team. Even the folks I’ve been seeing at workshops and conferences. I see a lot of the same people there because we’re all still trying to learn about it. So even if you know a little, you know a lot. #2. Most of you probably know more than you think you do because conceptually it’s not ALL that different than what we’ve been trained to do already. Trust me. It’s information management… and who does that better than we do? #3. Finally, the Data Services Team has put together some resources to help and that’s what I’m going to show you now.
  • #82 We have put a few tools in place to help you out. Matt already mentioned these and the first two have been in place for about a year and a half now. I want to go into a little more detail on the last two items. Just this week we’ve gone “live” with a new resource to help library staff and researchers with questions about data management and that’s the Data Management Guide (affectionately known as the DMG) (Quick demo) Finally, this is more for researchers than it is for library staff but it’s a service we now provide and we finally have it up and running so will start marketing it and you might get questions about it. Back in January, we managed to negotiate a consortial deal for a service called EZID. There are some details about it which I’ll send out at a later date, but in a nutshell, it’s a service that’s specifically designed to register permanent identifiers (DOIs for example) for datasets. The cool part about this is that EZID has been negotiating with Thomson Reuters to get them to index datasets registered through EZID and include them in Web of Knowledge databases. That means it could be possible to link articles with the datasets on which they are based and it would be possible to track the use and citation of a dataset. It’s all pretty darn cool. I have an info sheet linked at the bottom but it is more easily accessible through the DMG on the citation page you just saw. That’s stuff we have in place now… now on to what we’re working on…
  • #83 First piece of news, Data Services, along with Geospatial Services is moving under the ResearchWorks umbrella – This is not a change in organizational structure, just a shift in how RW is defined. Think of it as a suite of services that support research, not confined to services provided by Digital Initiatives. RDMS – with help from Steve Hiller and a fabulous Directed Fieldwork Student from the iSchool, pilot tested the survey and follow-up interviews this Spring. Plan to distribute more widely in Fall. Use results to identify gaps in service and direct efforts for collaboration and support in the future. Curriculum – We had hoped to get some of this done this summer but it didn’t happen so now it’s a project for the coming year. We’ve been talking with Lauren about working with the Graduate School to provide data mgmt workshops in the Research Commons this coming year. Would also like to come up with workshops on data mgmt best practices for faculty. And last but so not least, we want to create a curriculum specifically for library staff so you know how to support the data mgmt needs of the UW community, whether that’s helping faculty w/ data mgmt needs or providing instruction in data literacy to classes with data-related assignments. Beginning conversations w/ Ann Lally and Judith Henchy to come up with a Digital Humanities speaker series. Health Sciences, particularly the RML has been rocking and rolling with fabulous events like hosting Dr. Jian Qin from Syracuse for an eScience workshop and currently they are working on an upcoming inter-regional eScience forum w/ University of Utah. We’d like to add to that with more expert speakers in various areas of data management and services. Comm Plan: Matt mentioned the email list which a few of you have joined. He also mentioned that we have a blog. I haven’t publicized that because I’m still trying to feel out the scope of that. What I’d really like to do is figure out the best methods for getting info out to everyone but also come up with ways for you to keep us informed of issues and trends related to data services in your work. Finally, Cynthia Fugate, Tania Bardyn and I attended the capstone event for the ARL/DLF eScience Institute in January. And from that and work was done with Maureen Nolan and Theo prior to that, we came back with a draft strategic agenda. I thought it was a great framework for a long-term planning document for Data Services so I’d like to revisit that this year. If anyone is interested in working on any of this stuff, please, come talk with me.
  • #84 Looking a little further ahead … We plan to continue to keep our eyes and ears open to more opportunities for collaboration We want to keep apprised of any data initiatives taking place on campus I showed you that list of new faculty coming, we want to reach out to them and anyone else that we learn is “doing data” on campus to see what services they need to support their work. There always seems to be new developments & tools available SciVal researcher profile and networking tool, Office of Research subscribed and is populating with information now We need to learn more about it and how we can use it to enhance the services we provide Digital Humanities & institutional repositories. Again, we can use your help in letting us know of any of these areas that need our attention. This all may sound like a lot (at least it does to me) but a key thing to remember…
  • #85 We’ve got partners out there who have specifically said they would like to work with us. However… we do need to be proactive. Try to make connections when and where we can so we don’t get left behind.
  • #86 Questions?