May 2, 2013
Shared Data:
What it Means for the Future of
Libraries
May 2, 2013
Peter Murray,
LYRASIS Digital Technology Services
Robin Fay,
Head, DBM/Cataloging
University of Georgia Libraries
Using this software
Agenda
• Overview of big data
• What is big data? What is shared data?
• Implications and challenges
• Discussion
How did our data get big?
• Technology that has unforeseen consequences
• Technology changes
• We leave digital trails wherever we go
• Think> internet browsing history, email, medical
records, bank transactions, buying history at
shopping sites, Amazon reviews, Facebook
photos, comments on websites, and much more
How did our data get big?
• “Collectively the data
that we leave behind
is Big DataBig DataBig DataBig Data”
• and of course.. There
is the data that others
(people and
machines) create
about us
• Big Data is about us
and has far reaching
consequences
What is Big Data?
• It is a not a technology –
it is a shift in how we
view and use information
• Taking large amounts of
information spread
across many different
resources in different
formats making them
explore
• It doesn’t have to be
“that big just bigger than
what you can go through
by hand”
3 attributes of Big Data
• Large
• Fast (manual
time needed)
• and
unstructured
(formats differ)
=3 Vs of Big Data
Big Data
• Relational (relationships) database - our ILS systems are often
relational databases
• Mathematical database – computations
• Big Data is the intersection of two
• Health– analyzing health records to identify allergies, sickness, etc
• Philanthropy (datakind) – analyze behavior of farmers and
knowledge workers to evaluate the impact (ROI) of philanthropic
work
• Think about potential for library use: we have patron data,
bibliographic data and more!
Concerns: Big Data
• Privacy – erodes privacy potentially leaking private
information
• Justify stereotypes (data can be misused or used in a
negative) and polarize social groups
• Facebook open graph search – pulling together information
from diverse information to get lists of seemingly innocent
ways such as movie watching habits or music can be used in
negative ways to reinforce stereotypes or drawn conclusions
about people
• “Personalization can look like prejudice”
• We live in grey areas
• Computers do not understand that
Which side of the fence?
• Big Data is going to change our lives!
• Are you
• a semantic idealist?a semantic idealist?a semantic idealist?a semantic idealist? if we can “taxonomize” and
organize it, we can make sense of it
– Wolfram Alpha – we can ask it and it will reason
(mathematical)
• A chaotic nihilistA chaotic nihilistA chaotic nihilistA chaotic nihilist? Algorithms will handle it – correct
data will bubble up given enough information
– Watson – doesn’t know answers but will analyze to
interpret answer
So, how would you file a cup of coffee?So, how would you file a cup of coffee?So, how would you file a cup of coffee?So, how would you file a cup of coffee?
• Depends upon how you will use the
information!
• Understandings do not take
advantage of digital information
which slows semantic idealism –
much information not organized so
we have to rely algorithms (for now)
but it is vulnerable.
• Tagging is often done by machines
– even in libraries we batch load,
harvest, update data globally.
Humans and technologyHumans and technologyHumans and technologyHumans and technology
• Our reasoning can be flawed - we make decisions
evolutionary – we look at simple correlations and
patterns (false positives)
• If comments after a post are highly negative,
responders are more likely to take polarizing
viewpoints
• Even when math is good, data can be wrong
Shared dataShared dataShared dataShared data
• We are a mosaic of data from other resources
• Unified digital history – record of all of our data and could
aggregate health information and share with doctors – just
one example
• Veracity (can verify) and Value (how we can make sense of
our data)
• Shared data : connecting networks will collect data;
algorithms will tag and assign metadata but it will be up to
humans to add value - this can then be shared in ways that
are useful
Linked data makes it possibleLinked data makes it possibleLinked data makes it possibleLinked data makes it possible
• Linked data keeps us from having to re-enter or
copy information
It makes data:
• reusable
• easy to correct (correct one record instead of
multiples)
• efficient
• and potentially useful to others
Linked data makes it possibleLinked data makes it possibleLinked data makes it possibleLinked data makes it possible
• It can build relationships in different ways -
allowing us to create temporary collections (a user
could organize their search results in a way that
makes sense to them) or more permanent
(collocating ALL works by a particular author more
easily; pulling together photographs more easily)
• It can help make sense of Big Data and facilitate
sharing data
Linked data makes it possibleLinked data makes it possibleLinked data makes it possibleLinked data makes it possible
• Linked data keeps us from having to re-enter or
copy information
It makes data:
• reusable
• easy to correct (correct one record instead of
multiples)
• efficient
• and potentially useful to others
Thinking of data in the library environmentThinking of data in the library environmentThinking of data in the library environmentThinking of data in the library environment
• Automation and new technologies
• The web has changed
• Large scale bibliographic databases
• User expectations and needs
• Patron data
• Cooperative cataloging
• Greater variety of media in library collections (electronic!)
• FRBR is our data model – semantic web friendly!
Discussion points
• Obviously, WorldCat is a shared data resource we
have all been using for years. What are some other
examples of big data, shared data, or linked data
that libraries use now?
• What are some examples of data that libraries
could share that we aren't sharing already?
• What are some of the pitfalls of data sharing on a
massive scale?
Thank you!
• Our speakers
• You!
• Questions?
• russell.palmer@lyrasis.org

Shared data and the future of libraries

  • 1.
    May 2, 2013 SharedData: What it Means for the Future of Libraries
  • 2.
    May 2, 2013 PeterMurray, LYRASIS Digital Technology Services Robin Fay, Head, DBM/Cataloging University of Georgia Libraries
  • 3.
  • 4.
    Agenda • Overview ofbig data • What is big data? What is shared data? • Implications and challenges • Discussion
  • 5.
    How did ourdata get big? • Technology that has unforeseen consequences • Technology changes • We leave digital trails wherever we go • Think> internet browsing history, email, medical records, bank transactions, buying history at shopping sites, Amazon reviews, Facebook photos, comments on websites, and much more
  • 6.
    How did ourdata get big? • “Collectively the data that we leave behind is Big DataBig DataBig DataBig Data” • and of course.. There is the data that others (people and machines) create about us • Big Data is about us and has far reaching consequences
  • 7.
    What is BigData? • It is a not a technology – it is a shift in how we view and use information • Taking large amounts of information spread across many different resources in different formats making them explore • It doesn’t have to be “that big just bigger than what you can go through by hand”
  • 8.
    3 attributes ofBig Data • Large • Fast (manual time needed) • and unstructured (formats differ) =3 Vs of Big Data
  • 9.
    Big Data • Relational(relationships) database - our ILS systems are often relational databases • Mathematical database – computations • Big Data is the intersection of two • Health– analyzing health records to identify allergies, sickness, etc • Philanthropy (datakind) – analyze behavior of farmers and knowledge workers to evaluate the impact (ROI) of philanthropic work • Think about potential for library use: we have patron data, bibliographic data and more!
  • 10.
    Concerns: Big Data •Privacy – erodes privacy potentially leaking private information • Justify stereotypes (data can be misused or used in a negative) and polarize social groups • Facebook open graph search – pulling together information from diverse information to get lists of seemingly innocent ways such as movie watching habits or music can be used in negative ways to reinforce stereotypes or drawn conclusions about people • “Personalization can look like prejudice” • We live in grey areas • Computers do not understand that
  • 11.
    Which side ofthe fence? • Big Data is going to change our lives! • Are you • a semantic idealist?a semantic idealist?a semantic idealist?a semantic idealist? if we can “taxonomize” and organize it, we can make sense of it – Wolfram Alpha – we can ask it and it will reason (mathematical) • A chaotic nihilistA chaotic nihilistA chaotic nihilistA chaotic nihilist? Algorithms will handle it – correct data will bubble up given enough information – Watson – doesn’t know answers but will analyze to interpret answer
  • 12.
    So, how wouldyou file a cup of coffee?So, how would you file a cup of coffee?So, how would you file a cup of coffee?So, how would you file a cup of coffee? • Depends upon how you will use the information! • Understandings do not take advantage of digital information which slows semantic idealism – much information not organized so we have to rely algorithms (for now) but it is vulnerable. • Tagging is often done by machines – even in libraries we batch load, harvest, update data globally.
  • 13.
    Humans and technologyHumansand technologyHumans and technologyHumans and technology • Our reasoning can be flawed - we make decisions evolutionary – we look at simple correlations and patterns (false positives) • If comments after a post are highly negative, responders are more likely to take polarizing viewpoints • Even when math is good, data can be wrong
  • 14.
    Shared dataShared dataShareddataShared data • We are a mosaic of data from other resources • Unified digital history – record of all of our data and could aggregate health information and share with doctors – just one example • Veracity (can verify) and Value (how we can make sense of our data) • Shared data : connecting networks will collect data; algorithms will tag and assign metadata but it will be up to humans to add value - this can then be shared in ways that are useful
  • 15.
    Linked data makesit possibleLinked data makes it possibleLinked data makes it possibleLinked data makes it possible • Linked data keeps us from having to re-enter or copy information It makes data: • reusable • easy to correct (correct one record instead of multiples) • efficient • and potentially useful to others
  • 16.
    Linked data makesit possibleLinked data makes it possibleLinked data makes it possibleLinked data makes it possible • It can build relationships in different ways - allowing us to create temporary collections (a user could organize their search results in a way that makes sense to them) or more permanent (collocating ALL works by a particular author more easily; pulling together photographs more easily) • It can help make sense of Big Data and facilitate sharing data
  • 17.
    Linked data makesit possibleLinked data makes it possibleLinked data makes it possibleLinked data makes it possible • Linked data keeps us from having to re-enter or copy information It makes data: • reusable • easy to correct (correct one record instead of multiples) • efficient • and potentially useful to others
  • 18.
    Thinking of datain the library environmentThinking of data in the library environmentThinking of data in the library environmentThinking of data in the library environment • Automation and new technologies • The web has changed • Large scale bibliographic databases • User expectations and needs • Patron data • Cooperative cataloging • Greater variety of media in library collections (electronic!) • FRBR is our data model – semantic web friendly!
  • 19.
    Discussion points • Obviously,WorldCat is a shared data resource we have all been using for years. What are some other examples of big data, shared data, or linked data that libraries use now? • What are some examples of data that libraries could share that we aren't sharing already? • What are some of the pitfalls of data sharing on a massive scale?
  • 20.
    Thank you! • Ourspeakers • You! • Questions? • russell.palmer@lyrasis.org