WorldCat As Big Data in
Library and Information
centers
Date: 16-03-2017
1
BY
ANJAIAHMOTHUKURI
Assistant Professor
Dept. of Library and Information
Science
DRAVIDIAN UNIVERSITY-KUPPAM
9908694950
E-mail:anjaiahlib@gmail.com
LAYOUT OF THE PAPER
2
 Introduction
 Meaning of Big data
 Definitions of Big Data
 Concept of big Data & History
 Term of the Big Data
 Characteristics of Big Data & The Haddop
Applications
 WorldCat-Meaning
 OCLC-Role in WorldCat
 Conclusion
 Suggestions
INTRODUCTION
3  Big data is one of the most popular terms these days.
The hospitals, manufacturers, colleges/universities,
banks, retailers and governments are all collecting those
so called “big data”. Libraries are also doing it. Of
course, the ultimate goal for doing this is to use these
data to provide new useful services or to improve
efficiency.
 Since 2012, nearly every sector has developed a
fascination with the seemingly new discovery of Big
Data and its unprecedented capabilities to fuel analytic
breakthroughs.
 Extremely large data sets that may be analysed
computationally to reveal patterns, trends, and
associations, especially relating to human behaviour and
interactions.
INTRODUCTION
4
 The `Big Data` is being increasingly used almost
everywhere on the planet – online and offline.
 And it is not related to computers only. It comes
under a blanket term called Information
Technology, which is now part of almost all other
technologies and fields of studies and
businesses. Big Data is not a big deal.
 It is clear that the use of Big Data as an
information resource will continue to become
more prevalent as it is employed in academic
research and data-driven decision making, and
even emerges as a vehicle for government
transparency.
Meaning of Big Data:
Big data means really a
big data; it is a collection
of large datasets that
cannot be processed
using traditional
computing techniques.
Big data is not merely a
data, rather it has
become a complete
subject, which involves
various tools, techniques
and frameworks.
5
The Term“Big Data”& Definition
6
 The term ‘data’ is not new to us. It is one of the primary
things taught when you opt for Information Technology
and computers.
 If you can recall, data is considered the raw form of
Information. Though already there for a decade, the
term Big Data is a buzz these days.
 As evident from the term, loads and loads of data, is Big
Data and it can be processed in different ways using
different methods and tools to procure required
information.
 Big Data Defined as innovative techniques and
technologies to capture, store, distribute, manage and
analyze datasets that traditional data management
methods are unable to handle.
 Doug Laney, a pioneer in the field of data warehousing
Concept of big Data &
History
7
 Big data first time defined by Laney -2001
 The word Big Data has launched a veritable
industry of processes, personnel and technology
to support what appears to be an exploding new
field.
 Giant companies like Amazon and Wal-Mart as
well as bodies such as the U.S. government and
NASA are using Big Data to meet their business
and/or strategic objectives.
 Big data can also play a role for small or medium-
sized companies and organizations that recognize
the possibilities to capitalize upon the gains.
Concept of big Data & History…
conti..
8
 On August, 2013 by Mark
van Rijmenam added
"veracity, variability,
visualization, and value" to
the definition, broadening
the realm even further.
Rijmenam stated "90% of all
data ever created, was
created in the past two
years. From now on, the
amount of data in the world
will double every two years."
The Big Data-Its Characters-3Vs/5Vs
9
 As per the Rob Kitchen, the characteristics are:
 volume: Manage extremely large and growing source
(it’s called “big” for a reason),
 velocity (it’s-time or close created to it), in real
 variety (capturing many kinds of data, both
structured and unstructured),
 exhaustive (trying to capture entire populations or
systems),
 fine-grained (extremely detailed),
 relational (connectable to other datasets
 Flexible
VOLUME:
10  As the size of collection volumes and the
number of collection attributes increase, it
could allow us to more rapidly extract and
subsequently analyze patterns buried in the
data.
 The so called “big data” in library could be
used in many ways, such as improving
usability, helping users to find the interesting
patterns they need.
 In general, the data stored in library certainly
can be classified as large since it has hundred
years of collections on one hand, contains
tens of small research data as well and the
data captured during users using the library
VELOCITY:
11
 The velocity characteristics of big data could also be
found in the data from library.
 Library maintains multiple copies of files on servers
and on tape, in geographically distributed locations.
Therefore, there are movements of files between and
within organizations.
 There are more and more researches going on and
the research data come in and join the dataset
dynamically.
 On the other hand, the library data need to be
processed fast so that researchers could use it with
value and ordinary users could receive the search
results they need right away.
VARIETY:
12
 In general, libraries contain different types of
data: books, journals, reports, notes, maps,
films, pictures, audios etc.
 Some are unstructured. Unstructured data
consists of language-based data (e.g., notes,
twitter messages, books) and non-language-
based data (e.g., pictures, slides, audios,
videos).
 Even for digital research data, they have every
imaginable shape and form, from scans of
historical negative photographs to digital
microscope images of unicellular organisms
taken hundreds at a time at varying depths of
14.03.2017
13
Why is Big Data so Hot Right
Now?14
Need & Use of Big Data
15
Due to the advent of new
technologies, devices, and
communication means like social
networking sites, the amount of data
produced by mankind is growing
rapidly every year.
The amount of data produced by us
from the beginning of time till 2003
was 5 billion gigabytes
16
 The same amount was created in
every two days in 2011, and in every
ten minutes in 2013. This rate is still
growing enormously.
 Though all this information produced
is meaningful and can be useful
when processed, it is being
neglected.
 90% of the world’s data was
generated in the last few years.
Forms of Big Data
17
Big Data Ecosystems
OCLC-HEAD QUARTER,Ohio,USA
19
BIG DATA:
The WorldCat As Big Data forLibrary
and Information Centers
14-03-2017
20
Online Computer Library Center-
OCLC
21
 It was founded in 1967 as the Ohio College
Library Center.
 The Online Computer Library Center (OCLC) is
a US-based Non-Profit Co-Operative
Organization dedicated to the public purposes of
furthering access to the world's information and
reducing information costs".
 OCLC and its member libraries cooperatively
produce and maintain WorldCat, the largest
Online public catalogue (OPAC) in the world.
OCLC..conti….
22
 OCLC is funded mainly by the fees that libraries
have to pay for its services (around $200 million
annually as of 2016).
 OCLC libraries collectively steward a vast
quantity of knowledge. Working together, we
make this information more visible and
accessible to end users.
 This sharing of ideas creates connections both
inside and outside the library community.
 It unites thinkers and doers around common
purposes. And it helps researchers and
learners achieve their goals by putting the
world’s knowledge in reach.
NATIONAL LIBRARIES AT GLOBAL
LEVEL
23
BIG DATA-Various forms
24
25
OCLC LIBBRARIES
26
OCLC libraries collectively steward a vast
quantity of knowledge. Working together, we
make this information more visible and
accessible to end users. This sharing of ideas
creates connections both inside and outside
the library community.
It unites thinkers and doers around common
purposes. And it helps researchers and
learners achieve their goals by putting the
world’s knowledge in reach.
WorldCat-As-Big Data
27
 WorldCat-Meaning:
 WorldCat is the world's largest network of
library content and services. WorldCat libraries
are dedicated to providing access to their
resources on the Web,
 where most people start their search for
information.
 WorldCat is the world’s most comprehensive
database of information about library
collections.
 Libraries co-operatively contribute, enhance
and share bibliographic data through
WorldCat, connecting people to cultural and
scholarly resources in libraries worldwide.
Rich Collections of WorldCat
28
 WorldCat is a union catalog that itemises the
collections of 72,000 libraries in 170 countries and
territories that participate in the Online Computer
Library Center (OCLC) global cooperative.
 It is operated by OCLC Online Computer Library
Center, Inc. The subscribing member libraries
collectively maintain WorldCat's database.
 The library collections have a close tie to the
linked data which forms larger web of big data.
British library studied the linked data of library
collections and tried to model the people, events,
places which are related to holdings in the library.
 The library could collect the data that users
search or use the library data, and such data
certainly could have a volume similar to that of
Twitter and others.
WorldCat- Available Products
&Services on the Web
29
 WorldCat Discovery Services
 WorldShare Management Services
 WorldShare Metadata Services
 WorldShare Interlibrary Loan
 OCLC Cataloging Subscription
 EZproxy
 Dewey Services
 ILLiad
 CONTENTdm
 All products and services
  
TYPES OF LIBRARIES: WorldCat
30
 Libraries of all types from all
over the world contribute to
the quantity and quality of
WorldCat records, so the
records shared here
represent many diverse
interests.
 Every library, museum or
archive that contributes
metadata to WorldCat,
including through a group,
receives the membership
benefits of the OCLC
cooperative.
Academic & National Libraries
31
 Academic libraries- support students and
faculty with specialized research on a wide
variety of topics. They contribute records to
WorldCat for these resources and their unique
holdings, such as dissertations, theses,
published research papers and often the data
sets that support that research.
 National libraries all over the world share their
collections through WorldCat. This allows
libraries everywhere to connect people with
information about many cultures and national
identities.
Public & Special Libraries
32  Public libraries form the centerpiece of their
communities by providing a wide variety of
services and by archiving local history and
genealogical resources. By cataloging their
materials in WorldCat, public libraries connect
people around the world with resources for job
searches, school science projects, book clubs,
cooking and many other topics.
 Special libraries support distinct organizations,
such as a government office, church, corporation,
hospital, museum or research center. These
libraries contribute incredibly deep collections to
WorldCat on very specific topics that are
HOWTO WORKBIG DATA IN LIBRARIES
33
 Work about big data in library could also be found
because library data need to be transformed into
information or knowledge which then be used by
users.
 Bell tried to explore the issues and possibility of
big data in library
 Parry studied how colleges are using big data to
help students chose classes, retain them, and
provided necessary advising.
 The government initiatives on work of big data for
libraries and the impact on the library collections
have been discussed by Schwartz.
OCLC-Quality Team-
OCLC staff improves WorldCat Every Day:
34  500 IT professionals workat OCLC across a
variety of programming environments, systems
responsibilities and product portfolios.
 The staff members with 30+ years of technology
expertise alongside new professionals, all focused
on delivering excellence.
 The WorldCat Quality Team maintains and
monitors Duplicate Detection and Resolution
(DDR) software, which processes WorldCat
records to identify and merge duplicates. DDR
software scans existing WorldCat records and
identifies duplicates.
 Records merged annually by the WorldCat Quality
Team:668,074 (July 2015–June 2016)
 Duplicates removed by DDR software since May
2009:21,485,921 (as of February 2017)
conti….
35
 Affelt described how traditional library skill sets
could match up to the needs of data analysis and
discussed big data technology for library and how
librarians could use it.
 Reinhalterand Wittmann mentioned that
librarians could fill a service gap by enforcing
standards and best practices in the big-data era
because they could create trustworthy data
repositories for researchers.
 ProQuest tried to understand the behavior of
library users such as how to perform search, by
using big data technology. They mentioned their
work could help to develop some search services
CONCLUSION
36
 We live in an era of Big Data, in which we are
able to collect and analyze data at a speed and
scale that is unprecedented.
 Academic libraries face many new challenges in
an era of Big Data. They will be called upon to
support the use and preservation of data as an
increasingly valuable piece of our knowledge
ecosystem, which will require developing new
library programs and skill sets.
 The Big Data is very much useful to the users as
well as administers to evolve policies forthe
development of nation.
SUGGESSTONS
37  As we know well, In this Information Age/Digital
Age or Tech-Age, The Library and Information
centers are playing a pivotal role in every field of
knowledge.
 So, the governments, especially in India, The
central Government should be take immediate
steps to digitize the ALL TYPES OF LIBRARY
RESOURCES from all libraries and Create a BIG
DATA BASE and Come with MoU with all Indian
and some reputed international VENDORS and
acquire current as well as needed material as well
in western countries.
 Then, our Nation Will Become MOST
STRONGEST COUNTRY IN THE WORLD.
38

WORLD CAT AS BIG DATA

  • 1.
    WorldCat As BigData in Library and Information centers Date: 16-03-2017 1 BY ANJAIAHMOTHUKURI Assistant Professor Dept. of Library and Information Science DRAVIDIAN UNIVERSITY-KUPPAM 9908694950 E-mail:anjaiahlib@gmail.com
  • 2.
    LAYOUT OF THEPAPER 2  Introduction  Meaning of Big data  Definitions of Big Data  Concept of big Data & History  Term of the Big Data  Characteristics of Big Data & The Haddop Applications  WorldCat-Meaning  OCLC-Role in WorldCat  Conclusion  Suggestions
  • 3.
    INTRODUCTION 3  Bigdata is one of the most popular terms these days. The hospitals, manufacturers, colleges/universities, banks, retailers and governments are all collecting those so called “big data”. Libraries are also doing it. Of course, the ultimate goal for doing this is to use these data to provide new useful services or to improve efficiency.  Since 2012, nearly every sector has developed a fascination with the seemingly new discovery of Big Data and its unprecedented capabilities to fuel analytic breakthroughs.  Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.
  • 4.
    INTRODUCTION 4  The `BigData` is being increasingly used almost everywhere on the planet – online and offline.  And it is not related to computers only. It comes under a blanket term called Information Technology, which is now part of almost all other technologies and fields of studies and businesses. Big Data is not a big deal.  It is clear that the use of Big Data as an information resource will continue to become more prevalent as it is employed in academic research and data-driven decision making, and even emerges as a vehicle for government transparency.
  • 5.
    Meaning of BigData: Big data means really a big data; it is a collection of large datasets that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, techniques and frameworks. 5
  • 6.
    The Term“Big Data”&Definition 6  The term ‘data’ is not new to us. It is one of the primary things taught when you opt for Information Technology and computers.  If you can recall, data is considered the raw form of Information. Though already there for a decade, the term Big Data is a buzz these days.  As evident from the term, loads and loads of data, is Big Data and it can be processed in different ways using different methods and tools to procure required information.  Big Data Defined as innovative techniques and technologies to capture, store, distribute, manage and analyze datasets that traditional data management methods are unable to handle.  Doug Laney, a pioneer in the field of data warehousing
  • 7.
    Concept of bigData & History 7  Big data first time defined by Laney -2001  The word Big Data has launched a veritable industry of processes, personnel and technology to support what appears to be an exploding new field.  Giant companies like Amazon and Wal-Mart as well as bodies such as the U.S. government and NASA are using Big Data to meet their business and/or strategic objectives.  Big data can also play a role for small or medium- sized companies and organizations that recognize the possibilities to capitalize upon the gains.
  • 8.
    Concept of bigData & History… conti.. 8  On August, 2013 by Mark van Rijmenam added "veracity, variability, visualization, and value" to the definition, broadening the realm even further. Rijmenam stated "90% of all data ever created, was created in the past two years. From now on, the amount of data in the world will double every two years."
  • 9.
    The Big Data-ItsCharacters-3Vs/5Vs 9  As per the Rob Kitchen, the characteristics are:  volume: Manage extremely large and growing source (it’s called “big” for a reason),  velocity (it’s-time or close created to it), in real  variety (capturing many kinds of data, both structured and unstructured),  exhaustive (trying to capture entire populations or systems),  fine-grained (extremely detailed),  relational (connectable to other datasets  Flexible
  • 10.
    VOLUME: 10  Asthe size of collection volumes and the number of collection attributes increase, it could allow us to more rapidly extract and subsequently analyze patterns buried in the data.  The so called “big data” in library could be used in many ways, such as improving usability, helping users to find the interesting patterns they need.  In general, the data stored in library certainly can be classified as large since it has hundred years of collections on one hand, contains tens of small research data as well and the data captured during users using the library
  • 11.
    VELOCITY: 11  The velocitycharacteristics of big data could also be found in the data from library.  Library maintains multiple copies of files on servers and on tape, in geographically distributed locations. Therefore, there are movements of files between and within organizations.  There are more and more researches going on and the research data come in and join the dataset dynamically.  On the other hand, the library data need to be processed fast so that researchers could use it with value and ordinary users could receive the search results they need right away.
  • 12.
    VARIETY: 12  In general,libraries contain different types of data: books, journals, reports, notes, maps, films, pictures, audios etc.  Some are unstructured. Unstructured data consists of language-based data (e.g., notes, twitter messages, books) and non-language- based data (e.g., pictures, slides, audios, videos).  Even for digital research data, they have every imaginable shape and form, from scans of historical negative photographs to digital microscope images of unicellular organisms taken hundreds at a time at varying depths of
  • 13.
  • 14.
    Why is BigData so Hot Right Now?14
  • 15.
    Need & Useof Big Data 15 Due to the advent of new technologies, devices, and communication means like social networking sites, the amount of data produced by mankind is growing rapidly every year. The amount of data produced by us from the beginning of time till 2003 was 5 billion gigabytes
  • 16.
    16  The sameamount was created in every two days in 2011, and in every ten minutes in 2013. This rate is still growing enormously.  Though all this information produced is meaningful and can be useful when processed, it is being neglected.  90% of the world’s data was generated in the last few years.
  • 17.
    Forms of BigData 17
  • 18.
  • 19.
  • 20.
    BIG DATA: The WorldCatAs Big Data forLibrary and Information Centers 14-03-2017 20
  • 21.
    Online Computer LibraryCenter- OCLC 21  It was founded in 1967 as the Ohio College Library Center.  The Online Computer Library Center (OCLC) is a US-based Non-Profit Co-Operative Organization dedicated to the public purposes of furthering access to the world's information and reducing information costs".  OCLC and its member libraries cooperatively produce and maintain WorldCat, the largest Online public catalogue (OPAC) in the world.
  • 22.
    OCLC..conti…. 22  OCLC isfunded mainly by the fees that libraries have to pay for its services (around $200 million annually as of 2016).  OCLC libraries collectively steward a vast quantity of knowledge. Working together, we make this information more visible and accessible to end users.  This sharing of ideas creates connections both inside and outside the library community.  It unites thinkers and doers around common purposes. And it helps researchers and learners achieve their goals by putting the world’s knowledge in reach.
  • 23.
    NATIONAL LIBRARIES ATGLOBAL LEVEL 23
  • 24.
  • 25.
  • 26.
    OCLC LIBBRARIES 26 OCLC librariescollectively steward a vast quantity of knowledge. Working together, we make this information more visible and accessible to end users. This sharing of ideas creates connections both inside and outside the library community. It unites thinkers and doers around common purposes. And it helps researchers and learners achieve their goals by putting the world’s knowledge in reach.
  • 27.
    WorldCat-As-Big Data 27  WorldCat-Meaning: WorldCat is the world's largest network of library content and services. WorldCat libraries are dedicated to providing access to their resources on the Web,  where most people start their search for information.  WorldCat is the world’s most comprehensive database of information about library collections.  Libraries co-operatively contribute, enhance and share bibliographic data through WorldCat, connecting people to cultural and scholarly resources in libraries worldwide.
  • 28.
    Rich Collections ofWorldCat 28  WorldCat is a union catalog that itemises the collections of 72,000 libraries in 170 countries and territories that participate in the Online Computer Library Center (OCLC) global cooperative.  It is operated by OCLC Online Computer Library Center, Inc. The subscribing member libraries collectively maintain WorldCat's database.  The library collections have a close tie to the linked data which forms larger web of big data. British library studied the linked data of library collections and tried to model the people, events, places which are related to holdings in the library.  The library could collect the data that users search or use the library data, and such data certainly could have a volume similar to that of Twitter and others.
  • 29.
    WorldCat- Available Products &Serviceson the Web 29  WorldCat Discovery Services  WorldShare Management Services  WorldShare Metadata Services  WorldShare Interlibrary Loan  OCLC Cataloging Subscription  EZproxy  Dewey Services  ILLiad  CONTENTdm  All products and services   
  • 30.
    TYPES OF LIBRARIES:WorldCat 30  Libraries of all types from all over the world contribute to the quantity and quality of WorldCat records, so the records shared here represent many diverse interests.  Every library, museum or archive that contributes metadata to WorldCat, including through a group, receives the membership benefits of the OCLC cooperative.
  • 31.
    Academic & NationalLibraries 31  Academic libraries- support students and faculty with specialized research on a wide variety of topics. They contribute records to WorldCat for these resources and their unique holdings, such as dissertations, theses, published research papers and often the data sets that support that research.  National libraries all over the world share their collections through WorldCat. This allows libraries everywhere to connect people with information about many cultures and national identities.
  • 32.
    Public & SpecialLibraries 32  Public libraries form the centerpiece of their communities by providing a wide variety of services and by archiving local history and genealogical resources. By cataloging their materials in WorldCat, public libraries connect people around the world with resources for job searches, school science projects, book clubs, cooking and many other topics.  Special libraries support distinct organizations, such as a government office, church, corporation, hospital, museum or research center. These libraries contribute incredibly deep collections to WorldCat on very specific topics that are
  • 33.
    HOWTO WORKBIG DATAIN LIBRARIES 33  Work about big data in library could also be found because library data need to be transformed into information or knowledge which then be used by users.  Bell tried to explore the issues and possibility of big data in library  Parry studied how colleges are using big data to help students chose classes, retain them, and provided necessary advising.  The government initiatives on work of big data for libraries and the impact on the library collections have been discussed by Schwartz.
  • 34.
    OCLC-Quality Team- OCLC staffimproves WorldCat Every Day: 34  500 IT professionals workat OCLC across a variety of programming environments, systems responsibilities and product portfolios.  The staff members with 30+ years of technology expertise alongside new professionals, all focused on delivering excellence.  The WorldCat Quality Team maintains and monitors Duplicate Detection and Resolution (DDR) software, which processes WorldCat records to identify and merge duplicates. DDR software scans existing WorldCat records and identifies duplicates.  Records merged annually by the WorldCat Quality Team:668,074 (July 2015–June 2016)  Duplicates removed by DDR software since May 2009:21,485,921 (as of February 2017)
  • 35.
    conti…. 35  Affelt describedhow traditional library skill sets could match up to the needs of data analysis and discussed big data technology for library and how librarians could use it.  Reinhalterand Wittmann mentioned that librarians could fill a service gap by enforcing standards and best practices in the big-data era because they could create trustworthy data repositories for researchers.  ProQuest tried to understand the behavior of library users such as how to perform search, by using big data technology. They mentioned their work could help to develop some search services
  • 36.
    CONCLUSION 36  We livein an era of Big Data, in which we are able to collect and analyze data at a speed and scale that is unprecedented.  Academic libraries face many new challenges in an era of Big Data. They will be called upon to support the use and preservation of data as an increasingly valuable piece of our knowledge ecosystem, which will require developing new library programs and skill sets.  The Big Data is very much useful to the users as well as administers to evolve policies forthe development of nation.
  • 37.
    SUGGESSTONS 37  Aswe know well, In this Information Age/Digital Age or Tech-Age, The Library and Information centers are playing a pivotal role in every field of knowledge.  So, the governments, especially in India, The central Government should be take immediate steps to digitize the ALL TYPES OF LIBRARY RESOURCES from all libraries and Create a BIG DATA BASE and Come with MoU with all Indian and some reputed international VENDORS and acquire current as well as needed material as well in western countries.  Then, our Nation Will Become MOST STRONGEST COUNTRY IN THE WORLD.
  • 38.