SlideShare a Scribd company logo
1 of 43
Download to read offline
From Records to Data:
It’s Not Just About
Collections Any More




       Leslie Johnston, Library of Congress
              Best Practices Exchange 2011
What are the Biggest
   Insights that we have
 Learned in Fifteen Years of
Building Digital Collections?
Researchers do not use digital
collections the same way that
 they use analog collections
We Can Never Guess Every
Way that Our Collections Will
          Be Used
Stewardship organizations
have, until recently, spoken of
“collections” or “content” or
“records” or even “files,” but
not data.
We Have Data in our Libraries,
  Archives and Museums?

            Yes.

Data is not just generated by
 satellites, identified during
  experiments, or collected
       during surveys.
Datasets are not just scientific and business
tables and spreadsheets: our collections are
now considered data.

They are the building blocks for interpretation
and discovery that transform and combine
them into entities that we may not recognize.
More and more researchers want to use
collections as a whole, mining and organizing
the information in novel ways.

Researchers use algorithms to mine the rich
information and tools to create pictures that
translate that information into knowledge.

Researchers may want to interact with a
collection of artifacts, or they may want to
work with a data corpus.
Consider the Digging Into Data
Challenge
The repositories available for research include not only
scientific information—astronomy, geology, physics, biology,
social science surveys—but images, film, sound,
newspapers, maps, art, archaeology, architecture and
government records.


               http://www.diggingintodata.org/
What Constitutes “Big Data?”
The definition of Big Data is very fluid, as it is a moving
target — what cannot be easily manipulated with common
tools — and specific to the organization: what can be
managed and stewarded by any one institution in its
infrastructure. One researcher or organization’s concept of
a large data set is small to another.

Not too long ago, an organization would be surprised to
need 10 TB of storage for a large digital collection. Now a
collection can increase by 10 TB in a single week.
We still have collections. But what we also
have is Big Data, which requires us to rethink
the infrastructure that is needed to support
Big Data services. Our community used to
expect researchers to come to us, ask us
questions about our collections, and use our
digital collections in our environment.

Now our collections are, more often than not,
self-serve.
Case Study: Web Archives
          •   Web Archives, such as the one at the
              Library of Congress, may be
              comprised of billions of files.
          •   When we began archiving election web
              sites, we imagined users browsing
              through the web pages, studying the
              graphics or use of phrases or links. But
              when our first researchers came to the
              Library, they wanted to know about all
              those topics, but they used scripts to
              query for them and sort them into
              categories. They were not very much
              interested in reading web pages.

               http://www.loc.gov/webarchiving/
Case Study: Historic Newspapers
               •   The Chronicling America collection
                   has over 4 million page images from
                   historic newspapers with OCR from
                   organizations in 25 states.
               •   The site gets approximately 4 million
                   views per day.
               •   Some researchers want to search
                   for stories in historic newspapers.
               •   Some researchers want to mine
                   newspaper OCR for trends across
                   time periods and geographic areas.
               •   Requests have come in to analyze
                   all 4 million page images.

                   http://chroniclingamerica.loc.gov/
Case Study: Twitter
       •   The Twitter archive has 10s of billions
           of tweets in it.
       •   Research requests have included users
           looking for their own Twitter history, the
           study of the geographic spread of news,
           the study of the spread of epidemics,
           and the study of the transmission of
           new uses of language.
                           social
                          science
                visualization

               social media                   status

                     events

                     personal
                                    privacy
                       commercial
Can each of our organizations support real-
time querying of billions of full-text
items? Can we provide tools for collection
analysis and visualization? Can we support
the frequent downloading by researchers of
collections that may be over 200 TB each?

These are among the questions that all of our
institutions are grappling with as we build
large digital collections and discover new
ways in which they can be used.
So what are our
institutions doing
about preservation
and access to our
Big Collections and
Big Data?
Collaboration
                             www.digitalpreservation.gov/ndsa


The National Digital Stewardship Alliance is an
initiative of the National Digital Information
Infrastructure and Preservation Program at the
Library of Congress, with almost 100 member
organizations that share a sense of dedication to
digital preservation, and want to work
collaboratively across the community.

The NDSA operates through five working groups:
Content; Standards and Practices; Infrastructure;
Innovation; and Outreach.
Tool Development

All stewardship organizations can and should
participate in the development and use of open
access tools for use across the community.

NDIIPP is revising its Tools and Services
Directory to include a broader range of projects,
some of which are always looking for other
organizations to contribute to!

http://www.digitalpreservation.gov/partners/resources/tools
As an Example…

Seeing and Sharing Digital Cultural
Heritage Collections Differently
with ViewShare/Recollection
bigish ideas

› heterogeneous data
› one big distributed collection
› open distributed infrastructure
› mindset: records -> data
Beyond thinking
like records
to thinking
like data
the ViewShare idea
digital cultural heritage collections
include temporal, locative, and
categorical data that, could be
tapped to better dynamically
interact with and understand those
collections.
the challenges
› we all have different kinds of
metadata
› that data is in different kinds of
systems
› much of that data is messy
› much of that data is not in the
format we might wish it was
what
ViewShare
does
take this
or this
and make…
ingest collection
descriptions from
spreadsheets, MODS
records, or ATOM and
RSS
Augment: derive
ISO dates,
latitude and
longitude
coordinates, and
break apart
data
design views:
graphical interface
for assembling
views
publish views on the site or embed
views with one line of javascript into
any HTML document.
visually review data
share data and views
share not only the end results, but
also the raw data for other others to
create their own views.

data use and re-use
recent work
› support for public/private views and data
› beta support for OAI and ContentDM data
loading
› full open source release on SourceForge:
http://sourceforge.net/projects/loc-recollect/
what’s next?
› viewshare.org public launch on
November 1, 2011
› big data sets: in a while
› remix across data sets: long view
contact us
› Let us know if you are interested in
participation in the NDSA through the web
site
› Let us know if there is a tool or service that
is missing from our directory
› visit http://recollection.zepheira.com/ to get
a sneak peek at ViewShare
› email NDIIPPaccess@loc.gov if you are
interested in a ViewShare account
Questions?




             Leslie Johnston
             lesliej@loc.gov

More Related Content

What's hot

Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...Andrew Bourgeois
 
New trends in Libraries with IT, AI & i4.0
New trends in Libraries with IT, AI & i4.0New trends in Libraries with IT, AI & i4.0
New trends in Libraries with IT, AI & i4.0Mokhtar Ben Henda
 
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBeth Plale
 
The Importance of Marketing Digital Collections
The Importance of Marketing Digital CollectionsThe Importance of Marketing Digital Collections
The Importance of Marketing Digital CollectionsChristine Madsen
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Big Data in the Arts and Humanities
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and HumanitiesAndrew Prescott
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataShenghui Wang
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Stuart Weibel
 
Introduction to databases and metadata
Introduction to databases and metadataIntroduction to databases and metadata
Introduction to databases and metadatalibrarianrafia
 
Interoperability and Its Role In Standardization, Plus A ResourceSync Overview
Interoperability and Its Role In Standardization, Plus A ResourceSync OverviewInteroperability and Its Role In Standardization, Plus A ResourceSync Overview
Interoperability and Its Role In Standardization, Plus A ResourceSync OverviewPeter Murray
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosOCLC
 
Enterprise 2.0 Workshop, 14.-16.05.2007, Michigan, USA
Enterprise 2.0 Workshop, 14.-16.05.2007, Michigan, USAEnterprise 2.0 Workshop, 14.-16.05.2007, Michigan, USA
Enterprise 2.0 Workshop, 14.-16.05.2007, Michigan, USASimon Dueckert
 
Dulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISDulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISMicah Altman
 
New trends and skill in library automation: impact of Artificial Intelligence...
New trends and skill in library automation: impact of Artificial Intelligence...New trends and skill in library automation: impact of Artificial Intelligence...
New trends and skill in library automation: impact of Artificial Intelligence...Mokhtar Ben Henda
 

What's hot (20)

Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
 
New trends in Libraries with IT, AI & i4.0
New trends in Libraries with IT, AI & i4.0New trends in Libraries with IT, AI & i4.0
New trends in Libraries with IT, AI & i4.0
 
WORLD CAT AS BIG DATA
WORLD CAT AS  BIG DATAWORLD CAT AS  BIG DATA
WORLD CAT AS BIG DATA
 
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
 
The Importance of Marketing Digital Collections
The Importance of Marketing Digital CollectionsThe Importance of Marketing Digital Collections
The Importance of Marketing Digital Collections
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Big Data in the Arts and Humanities
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and Humanities
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Open access (1)
Open access (1)Open access (1)
Open access (1)
 
Introduction to databases and metadata
Introduction to databases and metadataIntroduction to databases and metadata
Introduction to databases and metadata
 
Interoperability and Its Role In Standardization, Plus A ResourceSync Overview
Interoperability and Its Role In Standardization, Plus A ResourceSync OverviewInteroperability and Its Role In Standardization, Plus A ResourceSync Overview
Interoperability and Its Role In Standardization, Plus A ResourceSync Overview
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Enterprise 2.0 Workshop, 14.-16.05.2007, Michigan, USA
Enterprise 2.0 Workshop, 14.-16.05.2007, Michigan, USAEnterprise 2.0 Workshop, 14.-16.05.2007, Michigan, USA
Enterprise 2.0 Workshop, 14.-16.05.2007, Michigan, USA
 
Open access
Open accessOpen access
Open access
 
Dulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISDulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PIS
 
New trends and skill in library automation: impact of Artificial Intelligence...
New trends and skill in library automation: impact of Artificial Intelligence...New trends and skill in library automation: impact of Artificial Intelligence...
New trends and skill in library automation: impact of Artificial Intelligence...
 

Viewers also liked

Technology and Service Trends in Libraries: The Library of Congress and the B...
Technology and Service Trends in Libraries: The Library of Congress and the B...Technology and Service Trends in Libraries: The Library of Congress and the B...
Technology and Service Trends in Libraries: The Library of Congress and the B...lljohnston
 
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...lljohnston
 
Leslie Johnston on Citizen Archiving, iPres 2011
Leslie Johnston on Citizen Archiving, iPres 2011Leslie Johnston on Citizen Archiving, iPres 2011
Leslie Johnston on Citizen Archiving, iPres 2011lljohnston
 
Leslie Johnston code4lib 2013 Keynote
Leslie Johnston code4lib 2013 KeynoteLeslie Johnston code4lib 2013 Keynote
Leslie Johnston code4lib 2013 Keynotelljohnston
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collectionslljohnston
 
An Introduction to digital preservation at the Library of Congress
An Introduction to digital preservation at the Library of CongressAn Introduction to digital preservation at the Library of Congress
An Introduction to digital preservation at the Library of Congresslljohnston
 

Viewers also liked (9)

Technology and Service Trends in Libraries: The Library of Congress and the B...
Technology and Service Trends in Libraries: The Library of Congress and the B...Technology and Service Trends in Libraries: The Library of Congress and the B...
Technology and Service Trends in Libraries: The Library of Congress and the B...
 
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
 
Liam
LiamLiam
Liam
 
Nerea
NereaNerea
Nerea
 
Leslie Johnston on Citizen Archiving, iPres 2011
Leslie Johnston on Citizen Archiving, iPres 2011Leslie Johnston on Citizen Archiving, iPres 2011
Leslie Johnston on Citizen Archiving, iPres 2011
 
Leslie Johnston code4lib 2013 Keynote
Leslie Johnston code4lib 2013 KeynoteLeslie Johnston code4lib 2013 Keynote
Leslie Johnston code4lib 2013 Keynote
 
Trolley Shelter
Trolley ShelterTrolley Shelter
Trolley Shelter
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collections
 
An Introduction to digital preservation at the Library of Congress
An Introduction to digital preservation at the Library of CongressAn Introduction to digital preservation at the Library of Congress
An Introduction to digital preservation at the Library of Congress
 

Similar to Leslie Johnston Keynote, Best Practices Exchange 2011

Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012lljohnston
 
Community Generated Databases for NY State History Conference 2013
Community Generated Databases for NY State History Conference 2013Community Generated Databases for NY State History Conference 2013
Community Generated Databases for NY State History Conference 2013Larry Naukam
 
AReS and Altmetrics: How we use them at ILRI
AReS and Altmetrics: How we use them at ILRIAReS and Altmetrics: How we use them at ILRI
AReS and Altmetrics: How we use them at ILRIILRI
 
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...nullhandle
 
Doctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLDoctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLAquiles Alencar Brayner
 
Digital Odyssey 2015 - Open Collections
Digital Odyssey 2015 - Open CollectionsDigital Odyssey 2015 - Open Collections
Digital Odyssey 2015 - Open CollectionsOurDigitalWorld
 
LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014 LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014 PrattSILS
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchJaap Kamps
 
Cro presentation for library jan13v2
Cro presentation for library jan13v2Cro presentation for library jan13v2
Cro presentation for library jan13v2NeilStewartCity
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussenwkwsci-research
 
Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsJon Voss
 
Planning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectPlanning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectJenn Riley
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and dataAndrew Treloar
 

Similar to Leslie Johnston Keynote, Best Practices Exchange 2011 (20)

Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
 
Community Generated Databases for NY State History Conference 2013
Community Generated Databases for NY State History Conference 2013Community Generated Databases for NY State History Conference 2013
Community Generated Databases for NY State History Conference 2013
 
AReS and Altmetrics: How we use them at ILRI
AReS and Altmetrics: How we use them at ILRIAReS and Altmetrics: How we use them at ILRI
AReS and Altmetrics: How we use them at ILRI
 
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
 
Doctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLDoctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BL
 
Digital Odyssey 2015 - Open Collections
Digital Odyssey 2015 - Open CollectionsDigital Odyssey 2015 - Open Collections
Digital Odyssey 2015 - Open Collections
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 
Aquiles imlr seminar
Aquiles imlr seminarAquiles imlr seminar
Aquiles imlr seminar
 
LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014 LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?
 
Cro presentation for library jan13v2
Cro presentation for library jan13v2Cro presentation for library jan13v2
Cro presentation for library jan13v2
 
Cil06giltrud(1)
Cil06giltrud(1)Cil06giltrud(1)
Cil06giltrud(1)
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
 
Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & Museums
 
Planning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectPlanning and Implementing a Digital Library Project
Planning and Implementing a Digital Library Project
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
C N I20080404
C N I20080404C N I20080404
C N I20080404
 
Torsten Reimer
Torsten ReimerTorsten Reimer
Torsten Reimer
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and data
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Recently uploaded (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Leslie Johnston Keynote, Best Practices Exchange 2011

  • 1. From Records to Data: It’s Not Just About Collections Any More Leslie Johnston, Library of Congress Best Practices Exchange 2011
  • 2. What are the Biggest Insights that we have Learned in Fifteen Years of Building Digital Collections?
  • 3. Researchers do not use digital collections the same way that they use analog collections
  • 4. We Can Never Guess Every Way that Our Collections Will Be Used
  • 5. Stewardship organizations have, until recently, spoken of “collections” or “content” or “records” or even “files,” but not data.
  • 6. We Have Data in our Libraries, Archives and Museums? Yes. Data is not just generated by satellites, identified during experiments, or collected during surveys.
  • 7. Datasets are not just scientific and business tables and spreadsheets: our collections are now considered data. They are the building blocks for interpretation and discovery that transform and combine them into entities that we may not recognize.
  • 8. More and more researchers want to use collections as a whole, mining and organizing the information in novel ways. Researchers use algorithms to mine the rich information and tools to create pictures that translate that information into knowledge. Researchers may want to interact with a collection of artifacts, or they may want to work with a data corpus.
  • 9. Consider the Digging Into Data Challenge The repositories available for research include not only scientific information—astronomy, geology, physics, biology, social science surveys—but images, film, sound, newspapers, maps, art, archaeology, architecture and government records. http://www.diggingintodata.org/
  • 10. What Constitutes “Big Data?” The definition of Big Data is very fluid, as it is a moving target — what cannot be easily manipulated with common tools — and specific to the organization: what can be managed and stewarded by any one institution in its infrastructure. One researcher or organization’s concept of a large data set is small to another. Not too long ago, an organization would be surprised to need 10 TB of storage for a large digital collection. Now a collection can increase by 10 TB in a single week.
  • 11. We still have collections. But what we also have is Big Data, which requires us to rethink the infrastructure that is needed to support Big Data services. Our community used to expect researchers to come to us, ask us questions about our collections, and use our digital collections in our environment. Now our collections are, more often than not, self-serve.
  • 12. Case Study: Web Archives • Web Archives, such as the one at the Library of Congress, may be comprised of billions of files. • When we began archiving election web sites, we imagined users browsing through the web pages, studying the graphics or use of phrases or links. But when our first researchers came to the Library, they wanted to know about all those topics, but they used scripts to query for them and sort them into categories. They were not very much interested in reading web pages. http://www.loc.gov/webarchiving/
  • 13. Case Study: Historic Newspapers • The Chronicling America collection has over 4 million page images from historic newspapers with OCR from organizations in 25 states. • The site gets approximately 4 million views per day. • Some researchers want to search for stories in historic newspapers. • Some researchers want to mine newspaper OCR for trends across time periods and geographic areas. • Requests have come in to analyze all 4 million page images. http://chroniclingamerica.loc.gov/
  • 14. Case Study: Twitter • The Twitter archive has 10s of billions of tweets in it. • Research requests have included users looking for their own Twitter history, the study of the geographic spread of news, the study of the spread of epidemics, and the study of the transmission of new uses of language. social science visualization social media status events personal privacy commercial
  • 15. Can each of our organizations support real- time querying of billions of full-text items? Can we provide tools for collection analysis and visualization? Can we support the frequent downloading by researchers of collections that may be over 200 TB each? These are among the questions that all of our institutions are grappling with as we build large digital collections and discover new ways in which they can be used.
  • 16. So what are our institutions doing about preservation and access to our Big Collections and Big Data?
  • 17. Collaboration www.digitalpreservation.gov/ndsa The National Digital Stewardship Alliance is an initiative of the National Digital Information Infrastructure and Preservation Program at the Library of Congress, with almost 100 member organizations that share a sense of dedication to digital preservation, and want to work collaboratively across the community. The NDSA operates through five working groups: Content; Standards and Practices; Infrastructure; Innovation; and Outreach.
  • 18. Tool Development All stewardship organizations can and should participate in the development and use of open access tools for use across the community. NDIIPP is revising its Tools and Services Directory to include a broader range of projects, some of which are always looking for other organizations to contribute to! http://www.digitalpreservation.gov/partners/resources/tools
  • 19. As an Example… Seeing and Sharing Digital Cultural Heritage Collections Differently with ViewShare/Recollection
  • 20. bigish ideas › heterogeneous data › one big distributed collection › open distributed infrastructure › mindset: records -> data
  • 23. the ViewShare idea digital cultural heritage collections include temporal, locative, and categorical data that, could be tapped to better dynamically interact with and understand those collections.
  • 24. the challenges › we all have different kinds of metadata › that data is in different kinds of systems › much of that data is messy › much of that data is not in the format we might wish it was
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. ingest collection descriptions from spreadsheets, MODS records, or ATOM and RSS
  • 34. Augment: derive ISO dates, latitude and longitude coordinates, and break apart data
  • 36. publish views on the site or embed views with one line of javascript into any HTML document.
  • 37.
  • 39. share data and views share not only the end results, but also the raw data for other others to create their own views. data use and re-use
  • 40. recent work › support for public/private views and data › beta support for OAI and ContentDM data loading › full open source release on SourceForge: http://sourceforge.net/projects/loc-recollect/
  • 41. what’s next? › viewshare.org public launch on November 1, 2011 › big data sets: in a while › remix across data sets: long view
  • 42. contact us › Let us know if you are interested in participation in the NDSA through the web site › Let us know if there is a tool or service that is missing from our directory › visit http://recollection.zepheira.com/ to get a sneak peek at ViewShare › email NDIIPPaccess@loc.gov if you are interested in a ViewShare account
  • 43. Questions? Leslie Johnston lesliej@loc.gov