4. Agenda
• Overview of big data
• What is big data? What is shared data?
• Implications and challenges
• Discussion
5. How did our data get big?
• Technology that has unforeseen consequences
• Technology changes
• We leave digital trails wherever we go
• Think> internet browsing history, email, medical
records, bank transactions, buying history at
shopping sites, Amazon reviews, Facebook
photos, comments on websites, and much more
6. How did our data get big?
• “Collectively the data
that we leave behind
is Big DataBig DataBig DataBig Data”
• and of course.. There
is the data that others
(people and
machines) create
about us
• Big Data is about us
and has far reaching
consequences
7. What is Big Data?
• It is a not a technology –
it is a shift in how we
view and use information
• Taking large amounts of
information spread
across many different
resources in different
formats making them
explore
• It doesn’t have to be
“that big just bigger than
what you can go through
by hand”
8. 3 attributes of Big Data
• Large
• Fast (manual
time needed)
• and
unstructured
(formats differ)
=3 Vs of Big Data
9. Big Data
• Relational (relationships) database - our ILS systems are often
relational databases
• Mathematical database – computations
• Big Data is the intersection of two
• Health– analyzing health records to identify allergies, sickness, etc
• Philanthropy (datakind) – analyze behavior of farmers and
knowledge workers to evaluate the impact (ROI) of philanthropic
work
• Think about potential for library use: we have patron data,
bibliographic data and more!
10. Concerns: Big Data
• Privacy – erodes privacy potentially leaking private
information
• Justify stereotypes (data can be misused or used in a
negative) and polarize social groups
• Facebook open graph search – pulling together information
from diverse information to get lists of seemingly innocent
ways such as movie watching habits or music can be used in
negative ways to reinforce stereotypes or drawn conclusions
about people
• “Personalization can look like prejudice”
• We live in grey areas
• Computers do not understand that
11. Which side of the fence?
• Big Data is going to change our lives!
• Are you
• a semantic idealist?a semantic idealist?a semantic idealist?a semantic idealist? if we can “taxonomize” and
organize it, we can make sense of it
– Wolfram Alpha – we can ask it and it will reason
(mathematical)
• A chaotic nihilistA chaotic nihilistA chaotic nihilistA chaotic nihilist? Algorithms will handle it – correct
data will bubble up given enough information
– Watson – doesn’t know answers but will analyze to
interpret answer
12. So, how would you file a cup of coffee?So, how would you file a cup of coffee?So, how would you file a cup of coffee?So, how would you file a cup of coffee?
• Depends upon how you will use the
information!
• Understandings do not take
advantage of digital information
which slows semantic idealism –
much information not organized so
we have to rely algorithms (for now)
but it is vulnerable.
• Tagging is often done by machines
– even in libraries we batch load,
harvest, update data globally.
13. Humans and technologyHumans and technologyHumans and technologyHumans and technology
• Our reasoning can be flawed - we make decisions
evolutionary – we look at simple correlations and
patterns (false positives)
• If comments after a post are highly negative,
responders are more likely to take polarizing
viewpoints
• Even when math is good, data can be wrong
14. Shared dataShared dataShared dataShared data
• We are a mosaic of data from other resources
• Unified digital history – record of all of our data and could
aggregate health information and share with doctors – just
one example
• Veracity (can verify) and Value (how we can make sense of
our data)
• Shared data : connecting networks will collect data;
algorithms will tag and assign metadata but it will be up to
humans to add value - this can then be shared in ways that
are useful
15. Linked data makes it possibleLinked data makes it possibleLinked data makes it possibleLinked data makes it possible
• Linked data keeps us from having to re-enter or
copy information
It makes data:
• reusable
• easy to correct (correct one record instead of
multiples)
• efficient
• and potentially useful to others
16. Linked data makes it possibleLinked data makes it possibleLinked data makes it possibleLinked data makes it possible
• It can build relationships in different ways -
allowing us to create temporary collections (a user
could organize their search results in a way that
makes sense to them) or more permanent
(collocating ALL works by a particular author more
easily; pulling together photographs more easily)
• It can help make sense of Big Data and facilitate
sharing data
17. Linked data makes it possibleLinked data makes it possibleLinked data makes it possibleLinked data makes it possible
• Linked data keeps us from having to re-enter or
copy information
It makes data:
• reusable
• easy to correct (correct one record instead of
multiples)
• efficient
• and potentially useful to others
18. Thinking of data in the library environmentThinking of data in the library environmentThinking of data in the library environmentThinking of data in the library environment
• Automation and new technologies
• The web has changed
• Large scale bibliographic databases
• User expectations and needs
• Patron data
• Cooperative cataloging
• Greater variety of media in library collections (electronic!)
• FRBR is our data model – semantic web friendly!
19. Discussion points
• Obviously, WorldCat is a shared data resource we
have all been using for years. What are some other
examples of big data, shared data, or linked data
that libraries use now?
• What are some examples of data that libraries
could share that we aren't sharing already?
• What are some of the pitfalls of data sharing on a
massive scale?