Brief overview of open data, big data and sharing data ; discussion followed (based on Alastair Croll's presentation at ALA). robin fay @georgiawebgurl ; peter murray (lyrasis)
SQL Database Design For Developers at php[tek] 2024
Shared Data & Big Data for Libraries
1. Shared Data: What it Means
for the Future of Libraries
Robin Fay @georgiawebgurl
Head, DBM/Cataloging / UGA Libraries
Peter Murray
Lyrasis
Draft Content for Discussion group 05.01.2013 / robinfay
2. Agenda
• Overview of big data
▫ What is big data? What is shared data?
▫ Implications and challenges
▫ Background: Alistair Croll talk at ALA
Midwinter
http://www.youtube.com/watch?v=Ic_Bl
PesEls
• Discussion framed around Alistair’s
presentation topics
Draft Content for Discussion group 04.30.2013
3. How did our data get big?
• Technology that has unforeseen consequences
• Technology changes.
• We leave digital trails wherever we go.
• Think> internet browsing history, email,
medical records, bank transactions, buying
history at shopping sites, Amazon reviews,
Facebook photos, comments on websites, and
much more.
Draft Content for Discussion group 05.01.2013
4. How did our data get big?
• “Collectively the data
that we leave behind
is Big Data. “
• and of course.. There
is the data that
others (people and
machines) create
about us.
• Big Data is about us
and has far reaching
consequences.
Draft Content for Discussion group 05.01.2013
5. What is Big Data?
• It is a not a technology
– it is a shift in how we
view and use
information
• Taking large amounts of
information spread
across many different
resources in different
formats making them
explore
• It doesn’t have to be
“that big just bigger
than what you can go
through by hand”
Draft Content for Discussion group 05.01.2013
6. 3 attributes of Big Data
• Large
• Fast (manual
time needed)
• and
unstructured
(formats
differ)
=3 Vs of Big
Data
Draft Content for Discussion group 05.01.2013
7. Big Data
• Relational (relationships) database - our ILS systems
are often relational databases
• Mathematical database – computations
• Big Data is the intersection of two
• Health– analyzing health records to identify allergies,
sickness, etc
• Philanthropy (datakind) – analyze behavior of
farmers and knowledge workers to evaluate the impact
(ROI) of philanthropic work
• Think about potential for library use: we have patron
data, bibliographic data and more!
8. Concerns of Big Data
• Privacy – erodes privacy potentially leaking private
information
• Justify stereotypes (data can be misused or used in a
negative) and polarize social groups
• Facebook open graph search – pulling together
information from diverse information to get lists of
seemingly innocent ways such as movie watching
habits or music can be used in negative ways to
reinforce stereotypes or drawn conclusions about
people
• “Personalization can look like prejudice”
• We live in grey areas
• Computers do not understand that
Draft Content for Discussion group 05.01.2013
9. Which side of the fence?
• Big Data is going to change our lives!
• Are you
• a semantic idealist ? if we can taxonomize
and organize it, we can make sense of it
▫ Wolfram Alpha – we can ask it and it will reason
(mathematical)
• A chaotic nihilist? Algorithms will handle it –
correct data will bubble up given enough information
▫ Watson – doesn’t know answers but will analyze to
interpret answer
Draft Content for Discussion group 05.01.2013
10. So, how would you file a cup of coffee?
• Depends upon how you will use
the information!
• Understandings do not take
advantage of digital information
which slows semantic idealism –
much information not organized
so we have to rely algorithms (for
now) but it is vunerable.
• Tagging is often done by
machines – even in libraries we
batch load, harvest, update data
globally.
Draft Content for Discussion group 05.01.2013
11. Humans and technology
• Our reasoning can be flawed - we make
decisions evolutionary – we look at
simple correlations and patterns (false
positives)
• If comments after a post are highly
negative, responders are more likely to
take polarizing viewpoints
• Even with math is good, data can be
wrong
Draft Content for Discussion group 05.01.2013
12. Shared data
• We are a mosaic of data from other resources
• Unified digital history – record of all of our
data and could aggregate health information and
share with doctors – just one example
• Veracity (can verify) and Value (how we can
make sense of our data)
• Shared data : connecting networks will collect
data; algorithms will tag and assign metadata
but it will be up to humans to add value - this
can then be shared in ways that are useful
Draft Content for Discussion group 05.01.2013
13. Linked data makes it possible
• Linked data keeps us from having to re-enter or
copy information
It makes data:
• reusable
• easy to correct (correct one record instead of
multiples)
• efficient
• and potentially useful to others
Draft Content for Discussion group 05.01.2013
14. Linked data makes it possible
• It can build relationships in different ways -
allowing us to create temporary collections (a
user could organize their search results in a way
that makes sense to them) or more permanent
(collocating ALL works by a particular author
more easily; pulling together photographs more
easily)
• It can help make sense of Big Data and
facilitate sharing data.
15. Linked data makes it possible
• Linked data keeps us from having to re-enter or
copy information
It makes data:
• reusable
• easy to correct (correct one record instead of
multiples)
• efficient
• and potentially useful to others
16. Thinking of data in the library environment
• Automation and new technologies
• The web has changed
• Large scale bibliographic databases
• User expectations and needs
• Patron data
• Cooperative cataloging
• Greater variety of media in library collections
(electronic!)
• FRBR is our data model – semantic web
friendly!
Draft Content for Discussion group 05.01.2013
17. Discussion points
• Obviously, WorldCat is a shared data resource
we have all been using for years. What are
some other examples of big data, shared data,
or linked data that libraries use now?
2. What are some examples of data that
libraries could share that we aren't sharing
already?
3. What are some of the pitfalls of data sharing
on a massive scale?
Draft Content for Discussion group 05.01.2013
Editor's Notes
Did we get the internet exactly wrong? We started out using wired connections for local communications and satellite for long distance; we now rely on satellites and wifi for long distance.
Did we get the internet exactly wrong? We started out using wired connections for local communications and satellite for long distance; we now rely on satellites and wifi for long distance.
Did we get the internet exactly wrong? We started out using wired connections for local communications and satellite for long distance; we now rely on satellites and wifi for long distance.