This document discusses how digital collections are now considered data rather than just records or content. It notes that researchers want to analyze entire collections as data sets rather than individual records. Large digital collections like web archives, historic newspapers, and Twitter archives contain billions of records that researchers want to query, analyze, and visualize as data. Institutions are collaborating through groups like the National Digital Stewardship Alliance and developing open source tools like ViewShare to support access to and preservation of these "big data" collections.
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
The web is now firmly established as the primary communication and publication platform for sharing and accessing social and cultural materials. This networked world has created both opportunities and pitfalls for libraries and archives in their mission to preserve and provide ongoing access to knowledge. How can the affordances of the web be leveraged to drastically extend the plurality of representation in the archive? What challenges are imposed by the intrinsic ephemerality and mutability of online information? What methodological reorientations are demanded by the scale and dynamism of machine-generated cultural artifacts? This talk will explore the interplay of the web, contemporary historical records, and the programs, technologies, and approaches by which libraries and archives are working to extend their mission to preserve and provide access to the evidence of human activity in a world distinguished by the ubiquity of born-digital materials.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
This presentation was delivered by Gloria Gonzalez of Zepheira during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016.
Gary Price, MIT Program on Information ScienceMicah Altman
Gary Price, who is chief editor of InfoDocket, contributing editor of Search Engine Land, co-founder of Full Text Reports and who has worked with internet search firms and library systems developers alike, gave this talk on Issues in Curating the Open Web at Scale as part of the Program on Information Science Brown Bag Series.
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
The web is now firmly established as the primary communication and publication platform for sharing and accessing social and cultural materials. This networked world has created both opportunities and pitfalls for libraries and archives in their mission to preserve and provide ongoing access to knowledge. How can the affordances of the web be leveraged to drastically extend the plurality of representation in the archive? What challenges are imposed by the intrinsic ephemerality and mutability of online information? What methodological reorientations are demanded by the scale and dynamism of machine-generated cultural artifacts? This talk will explore the interplay of the web, contemporary historical records, and the programs, technologies, and approaches by which libraries and archives are working to extend their mission to preserve and provide access to the evidence of human activity in a world distinguished by the ubiquity of born-digital materials.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
This presentation was delivered by Gloria Gonzalez of Zepheira during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016.
Gary Price, MIT Program on Information ScienceMicah Altman
Gary Price, who is chief editor of InfoDocket, contributing editor of Search Engine Land, co-founder of Full Text Reports and who has worked with internet search firms and library systems developers alike, gave this talk on Issues in Curating the Open Web at Scale as part of the Program on Information Science Brown Bag Series.
for getting the library resources fro the libraries entire world, the important tool is Library catalogues. every can browse all most all the world literature through WorldCat fro the INTERNET.
Web Archives and the dream of the Personal Search EngineArjen de Vries
Keynote at the 4th Alexandria Workshop organised by Avishek Anand and Wolfgang Nejdl, L3S, Hannover (Germany). I argue that Web Archives should act as a pivot while revisiting the idea of decentralised search.
See also http://alexandria-project.eu/events/4th-int-alexandria-workshop-19-20-october-2017/
Slides from keynote lecture by Andrew Prescott to the 7th Herrenhausen conference of the Volkswagen Foundation, 'Big Data in a Transdisciplinary Perspective'
Semantic Web Technologies: Changing Bibliographic Descriptions?Stuart Weibel
Keynote presentation at the North Atlantic Health Science Library meeting, October 26, 2009.
An introduction to semantic web technologies and their relationship to libraries and bibliographic data.
Stuart Weibel, Senior Research Scientist, OCLC Research
Introduction to databases and metadata
Outline
What are databases?
What are the elements of databases?
What is metadata?
Why are they important for digital projects?
for getting the library resources fro the libraries entire world, the important tool is Library catalogues. every can browse all most all the world literature through WorldCat fro the INTERNET.
Web Archives and the dream of the Personal Search EngineArjen de Vries
Keynote at the 4th Alexandria Workshop organised by Avishek Anand and Wolfgang Nejdl, L3S, Hannover (Germany). I argue that Web Archives should act as a pivot while revisiting the idea of decentralised search.
See also http://alexandria-project.eu/events/4th-int-alexandria-workshop-19-20-october-2017/
Slides from keynote lecture by Andrew Prescott to the 7th Herrenhausen conference of the Volkswagen Foundation, 'Big Data in a Transdisciplinary Perspective'
Semantic Web Technologies: Changing Bibliographic Descriptions?Stuart Weibel
Keynote presentation at the North Atlantic Health Science Library meeting, October 26, 2009.
An introduction to semantic web technologies and their relationship to libraries and bibliographic data.
Stuart Weibel, Senior Research Scientist, OCLC Research
Introduction to databases and metadata
Outline
What are databases?
What are the elements of databases?
What is metadata?
Why are they important for digital projects?
Cultural Heritage Insitutions and Big Data Collectionslljohnston
Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets are not just scientific and business tables and spreadsheets. We have Big Data in our Libraries, Archives and Museums, and we and managing and preserving those collections for research use. Preservation given at the 2013 Wolfram Data Summit.
An Introduction to digital preservation at the Library of Congresslljohnston
Introduction to digital preservation initiatives at the Library of Congress and the National Digital Information Infrastructure and Preservation Program
Cultural heritage organizations are collaborating with community partners to tell history in innovative and interactive ways.
How do we design workflows to capture community content, how can we share content “sustainably”, and why does it matter? This session will focus on best practices for gathering community contributions whether you’re collaborating in a physical space or virtually. We’ll share some “lessons learned” on working with cultural heritage data.
NISO Two Day Virtual Conference:
Using the Web as an E-Content Distribution Platform:
Challenges and Opportunities
Oct 21-22, 2014
Maryann Martone, Ph.D., Professor of Neuroscience, University of California, San Diego
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussenwkwsci-research
Presented during the WKWSCI Symposium 2014
21 March 2014
Marina Bay Sands Expo and Convention Centre
Organized by the Wee Kim Wee School of Communication and Information at Nanyang Technological University
Planning and Implementing a Digital Library ProjectJenn Riley
Riley, Jenn. "Planning and Implementing a Digital Library Project," Indiana LSTA Digital Project Planning Workshop, December 15, 2006, Peabody Public Library, Columbia City, IN and December 16, 2006, Porter County Public Library, Valpairaiso, IN.
Introductory talk for ANDS workshop on Institutional Repositories and data. The talk situates the topic within the field of scholarly communication before comparing the relative technical simplicity of running repositories of publications with the complexities that accompany a shift to data. The most-retweeted slide is the one viewing the response of repository managers to data through the lens of Elizabeth Kübler-Ross' stages of grieving.
Similar to Leslie Johnston Keynote, Best Practices Exchange 2011 (20)
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
6. We Have Data in our Libraries,
Archives and Museums?
Yes.
Data is not just generated by
satellites, identified during
experiments, or collected
during surveys.
7. Datasets are not just scientific and business
tables and spreadsheets: our collections are
now considered data.
They are the building blocks for interpretation
and discovery that transform and combine
them into entities that we may not recognize.
8. More and more researchers want to use
collections as a whole, mining and organizing
the information in novel ways.
Researchers use algorithms to mine the rich
information and tools to create pictures that
translate that information into knowledge.
Researchers may want to interact with a
collection of artifacts, or they may want to
work with a data corpus.
9. Consider the Digging Into Data
Challenge
The repositories available for research include not only
scientific information—astronomy, geology, physics, biology,
social science surveys—but images, film, sound,
newspapers, maps, art, archaeology, architecture and
government records.
http://www.diggingintodata.org/
10. What Constitutes “Big Data?”
The definition of Big Data is very fluid, as it is a moving
target — what cannot be easily manipulated with common
tools — and specific to the organization: what can be
managed and stewarded by any one institution in its
infrastructure. One researcher or organization’s concept of
a large data set is small to another.
Not too long ago, an organization would be surprised to
need 10 TB of storage for a large digital collection. Now a
collection can increase by 10 TB in a single week.
11. We still have collections. But what we also
have is Big Data, which requires us to rethink
the infrastructure that is needed to support
Big Data services. Our community used to
expect researchers to come to us, ask us
questions about our collections, and use our
digital collections in our environment.
Now our collections are, more often than not,
self-serve.
12. Case Study: Web Archives
• Web Archives, such as the one at the
Library of Congress, may be
comprised of billions of files.
• When we began archiving election web
sites, we imagined users browsing
through the web pages, studying the
graphics or use of phrases or links. But
when our first researchers came to the
Library, they wanted to know about all
those topics, but they used scripts to
query for them and sort them into
categories. They were not very much
interested in reading web pages.
http://www.loc.gov/webarchiving/
13. Case Study: Historic Newspapers
• The Chronicling America collection
has over 4 million page images from
historic newspapers with OCR from
organizations in 25 states.
• The site gets approximately 4 million
views per day.
• Some researchers want to search
for stories in historic newspapers.
• Some researchers want to mine
newspaper OCR for trends across
time periods and geographic areas.
• Requests have come in to analyze
all 4 million page images.
http://chroniclingamerica.loc.gov/
14. Case Study: Twitter
• The Twitter archive has 10s of billions
of tweets in it.
• Research requests have included users
looking for their own Twitter history, the
study of the geographic spread of news,
the study of the spread of epidemics,
and the study of the transmission of
new uses of language.
social
science
visualization
social media status
events
personal
privacy
commercial
15. Can each of our organizations support real-
time querying of billions of full-text
items? Can we provide tools for collection
analysis and visualization? Can we support
the frequent downloading by researchers of
collections that may be over 200 TB each?
These are among the questions that all of our
institutions are grappling with as we build
large digital collections and discover new
ways in which they can be used.
16. So what are our
institutions doing
about preservation
and access to our
Big Collections and
Big Data?
17. Collaboration
www.digitalpreservation.gov/ndsa
The National Digital Stewardship Alliance is an
initiative of the National Digital Information
Infrastructure and Preservation Program at the
Library of Congress, with almost 100 member
organizations that share a sense of dedication to
digital preservation, and want to work
collaboratively across the community.
The NDSA operates through five working groups:
Content; Standards and Practices; Infrastructure;
Innovation; and Outreach.
18. Tool Development
All stewardship organizations can and should
participate in the development and use of open
access tools for use across the community.
NDIIPP is revising its Tools and Services
Directory to include a broader range of projects,
some of which are always looking for other
organizations to contribute to!
http://www.digitalpreservation.gov/partners/resources/tools
19. As an Example…
Seeing and Sharing Digital Cultural
Heritage Collections Differently
with ViewShare/Recollection
20. bigish ideas
› heterogeneous data
› one big distributed collection
› open distributed infrastructure
› mindset: records -> data
23. the ViewShare idea
digital cultural heritage collections
include temporal, locative, and
categorical data that, could be
tapped to better dynamically
interact with and understand those
collections.
24. the challenges
› we all have different kinds of
metadata
› that data is in different kinds of
systems
› much of that data is messy
› much of that data is not in the
format we might wish it was
39. share data and views
share not only the end results, but
also the raw data for other others to
create their own views.
data use and re-use
40. recent work
› support for public/private views and data
› beta support for OAI and ContentDM data
loading
› full open source release on SourceForge:
http://sourceforge.net/projects/loc-recollect/
41. what’s next?
› viewshare.org public launch on
November 1, 2011
› big data sets: in a while
› remix across data sets: long view
42. contact us
› Let us know if you are interested in
participation in the NDSA through the web
site
› Let us know if there is a tool or service that
is missing from our directory
› visit http://recollection.zepheira.com/ to get
a sneak peek at ViewShare
› email NDIIPPaccess@loc.gov if you are
interested in a ViewShare account