Co-presented for the course INLS 720: Metadata Architectures and Applications at UNC SILS. Subsequently, we also presented at the February 2013 meeting of the UNC Scholarly Communications Working Group. This presentation covered copyright in the context of metadata re-use, plus two case studies (one examining Duke University Press and the other examining open bibliographic data).
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
SQL Analytics Powering Telemetry Analysis at ComcastDatabricks
Comcast is one of the leading providers of communications, entertainment, and cable products and services. At the heart of it is Comcast RDK providing the backbone of telemetry to the industry. RDK (Reference Design Kit) is pre-bundled opensource firmware for a complete home platform covering video, broadband and IoT devices. RDK team at Comcast analyzes petabytes of data, collected every 15 minutes from 70 million devices (video and broadband and IoT devices) installed in customer homes. They run ETL and aggregation pipelines and publish analytical dashboards on a daily basis to reduce customer calls and firmware rollout. The analysis is also used to calculate WIFI happiness index which is a critical KPI for Comcast customer experience.
In addition to this, RDK team also does release tracking by analyzing the RDK firmware quality. SQL Analytics allows customers to operate a lakehouse architecture that provides data warehousing performance at data lake economics for up to 4x better price/performance for SQL workloads than traditional cloud data warehouses.
We present the results of the “Test and Learn” with SQL Analytics and the delta engine that we worked in partnership with the Databricks team. We present a quick demo introducing the SQL native interface, the challenges we faced with migration, The results of the execution and our journey of productionizing this at scale.
Achieving Lakehouse Models with Spark 3.0Databricks
It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?
Democratizing Data Quality Through a Centralized PlatformDatabricks
Bad data leads to bad decisions and broken customer experiences. Organizations depend on complete and accurate data to power their business, maintain efficiency, and uphold customer trust. With thousands of datasets and pipelines running, how do we ensure that all data meets quality standards, and that expectations are clear between producers and consumers? Investing in shared, flexible components and practices for monitoring data health is crucial for a complex data organization to rapidly and effectively scale.
At Zillow, we built a centralized platform to meet our data quality needs across stakeholders. The platform is accessible to engineers, scientists, and analysts, and seamlessly integrates with existing data pipelines and data discovery tools. In this presentation, we will provide an overview of our platform’s capabilities, including:
Giving producers and consumers the ability to define and view data quality expectations using a self-service onboarding portal
Performing data quality validations using libraries built to work with spark
Dynamically generating pipelines that can be abstracted away from users
Flagging data that doesn’t meet quality standards at the earliest stage and giving producers the opportunity to resolve issues before use by downstream consumers
Exposing data quality metrics alongside each dataset to provide producers and consumers with a comprehensive picture of health over time
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
SQL Analytics Powering Telemetry Analysis at ComcastDatabricks
Comcast is one of the leading providers of communications, entertainment, and cable products and services. At the heart of it is Comcast RDK providing the backbone of telemetry to the industry. RDK (Reference Design Kit) is pre-bundled opensource firmware for a complete home platform covering video, broadband and IoT devices. RDK team at Comcast analyzes petabytes of data, collected every 15 minutes from 70 million devices (video and broadband and IoT devices) installed in customer homes. They run ETL and aggregation pipelines and publish analytical dashboards on a daily basis to reduce customer calls and firmware rollout. The analysis is also used to calculate WIFI happiness index which is a critical KPI for Comcast customer experience.
In addition to this, RDK team also does release tracking by analyzing the RDK firmware quality. SQL Analytics allows customers to operate a lakehouse architecture that provides data warehousing performance at data lake economics for up to 4x better price/performance for SQL workloads than traditional cloud data warehouses.
We present the results of the “Test and Learn” with SQL Analytics and the delta engine that we worked in partnership with the Databricks team. We present a quick demo introducing the SQL native interface, the challenges we faced with migration, The results of the execution and our journey of productionizing this at scale.
Achieving Lakehouse Models with Spark 3.0Databricks
It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?
Democratizing Data Quality Through a Centralized PlatformDatabricks
Bad data leads to bad decisions and broken customer experiences. Organizations depend on complete and accurate data to power their business, maintain efficiency, and uphold customer trust. With thousands of datasets and pipelines running, how do we ensure that all data meets quality standards, and that expectations are clear between producers and consumers? Investing in shared, flexible components and practices for monitoring data health is crucial for a complex data organization to rapidly and effectively scale.
At Zillow, we built a centralized platform to meet our data quality needs across stakeholders. The platform is accessible to engineers, scientists, and analysts, and seamlessly integrates with existing data pipelines and data discovery tools. In this presentation, we will provide an overview of our platform’s capabilities, including:
Giving producers and consumers the ability to define and view data quality expectations using a self-service onboarding portal
Performing data quality validations using libraries built to work with spark
Dynamically generating pipelines that can be abstracted away from users
Flagging data that doesn’t meet quality standards at the earliest stage and giving producers the opportunity to resolve issues before use by downstream consumers
Exposing data quality metrics alongside each dataset to provide producers and consumers with a comprehensive picture of health over time
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
In this webinar, learn from industry analyst and big data thought leader Mark Madsen about the future of big data and importance of the new Enterprise Data Lake reference architecture.
This webinar also covers what’s important when building a modern, multi-use data infrastructure, the difference between a Hadoop application and a Data Lake infrastructure, and an enterprise data lake reference architecture to get you started.
To learn more, visit: www.snaplogic.com/big-data
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
a brief overview and introduction to metadata from how it is used on the web (including seo and tagging) to its use in Flickr and library catalogs by robin fay, georgiawebgurl@gmail.com.
NISO Webinar:
Experimenting with BIBFRAME: Reports from Early Adopters
About the Webinar
In May 2011, the Library of Congress officially launched a new modeling initiative, Bibliographic Framework Initiative, as a linked data alternative to MARC. The Library then announced in November 2012 the proposed model, called BIBFRAME. Since then, the library world is moving from mainly theorizing about the BIBFRAME model to attempts to implement practical experimentation and testing. This experimentation is iterative, and continues to shape the model so that it’s stable enough and broadly acceptable enough for adoption.
In this webinar, several institutions will share their progress in experimenting with BIBFRAME within their library system. They will discuss the existing, developing, and planned projects happening at their institutions. Challenges and opportunities in exploring and implementing BIBFRAME in their institutions will be discussed as well.
Agenda
Introduction
Todd Carpenter, Executive Director, NISO
Experimental Mode: The National Library of Medicine and experiences with BIBFRAME
Nancy Fallgren, Metadata Specialist Librarian, National Library of Medicine, National Institutes of Health, US Department of Health and Human Services (DHHS)
Exploring BIBFRAME at a Small Academic Library
Jeremy Nelson, Metadata and Systems Librarian, Colorado College
Working with BIBFRAME for discovery and production: Linked data for Libraries/Linked Data for Production
Nancy Lorimer, Head, Metadata Dept, Stanford University Libraries
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Exploring Machine Learning for Libraries and Archives: Present and FutureBohyun Kim
A conference presentation given by Bohyun Kim, Chief Technology Officer & Professor, University of Rhode Island Libraries, USA for the Bite-sized Internet Librarian International 2021 on September 22, 2021.
Active Governance Across the Delta Lake with AlationDatabricks
Alation provides a single interface to provide users and stewards to provide active and agile data governance across Databricks Delta Lake and Databricks SQL Analytics Service. Understand how Alation can expand adoption in the data lake while providing safe and responsible data consumption.
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
A Fortune 100 company recently introduced Hadoop into their data warehouse environment and ETL workflow to save $30 Million. This session examines the specific use case to illustrate the design considerations, as well as the economics behind ETL offload with Hadoop. Additional information about how the Hadoop platform was leveraged to support extended analytics will also be referenced.
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
In the past years, sophisticated methods for extracting knowledge graphs from Wikipedia, like DBpedia,YAGO, and CaLiGraph, have been developed. In this talk, I revisit some of these methods and examine if and how they can be replaced by prompting a large language model like ChatGPT.
Overview of end-to-end lifecycle to productize and commercialize alternative datasets at S&P Global Market Intelligence
Benefits to discuss:
How S&P Market Intelligence develops new alternative datasets
How S&P Market Intelligence develops robust production processes for alternative data
S&P Global Market Intelligence GTM strategy and capabilities to sell alternative data
Architecting Agile Data Applications for ScaleDatabricks
Data analytics and reporting platforms historically have been rigid, monolithic, hard to change and have limited ability to scale up or scale down. I can’t tell you how many times I have heard a business user ask for something as simple as an additional column in a report and IT says it will take 6 months to add that column because it doesn’t exist in the datawarehouse. As a former DBA, I can tell you the countless hours I have spent “tuning” SQL queries to hit pre-established SLAs. This talk will talk about how to architect modern data and analytics platforms in the cloud to support agility and scalability. We will include topics like end to end data pipeline flow, data mesh and data catalogs, live data and streaming, performing advanced analytics, applying agile software development practices like CI/CD and testability to data applications and finally taking advantage of the cloud for infinite scalability both up and down.
Data Catalog as the Platform for Data IntelligenceAlation
Data catalogs are in wide use today across hundreds of enterprises as a means to help data scientists and business analysts find and collaboratively analyze data. Over the past several years, customers have increasingly used data catalogs in applications beyond their search & discovery roots, addressing new use cases such as data governance, cloud data migration, and digital transformation. In this session, the founder and CEO of Alation will discuss the evolution of the data catalog, the many ways in which data catalogs are being used today, the importance of machine learning in data catalogs, and discuss the future of the data catalog as a platform for a broad range of data intelligence solutions.
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
In this webinar, learn from industry analyst and big data thought leader Mark Madsen about the future of big data and importance of the new Enterprise Data Lake reference architecture.
This webinar also covers what’s important when building a modern, multi-use data infrastructure, the difference between a Hadoop application and a Data Lake infrastructure, and an enterprise data lake reference architecture to get you started.
To learn more, visit: www.snaplogic.com/big-data
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
a brief overview and introduction to metadata from how it is used on the web (including seo and tagging) to its use in Flickr and library catalogs by robin fay, georgiawebgurl@gmail.com.
NISO Webinar:
Experimenting with BIBFRAME: Reports from Early Adopters
About the Webinar
In May 2011, the Library of Congress officially launched a new modeling initiative, Bibliographic Framework Initiative, as a linked data alternative to MARC. The Library then announced in November 2012 the proposed model, called BIBFRAME. Since then, the library world is moving from mainly theorizing about the BIBFRAME model to attempts to implement practical experimentation and testing. This experimentation is iterative, and continues to shape the model so that it’s stable enough and broadly acceptable enough for adoption.
In this webinar, several institutions will share their progress in experimenting with BIBFRAME within their library system. They will discuss the existing, developing, and planned projects happening at their institutions. Challenges and opportunities in exploring and implementing BIBFRAME in their institutions will be discussed as well.
Agenda
Introduction
Todd Carpenter, Executive Director, NISO
Experimental Mode: The National Library of Medicine and experiences with BIBFRAME
Nancy Fallgren, Metadata Specialist Librarian, National Library of Medicine, National Institutes of Health, US Department of Health and Human Services (DHHS)
Exploring BIBFRAME at a Small Academic Library
Jeremy Nelson, Metadata and Systems Librarian, Colorado College
Working with BIBFRAME for discovery and production: Linked data for Libraries/Linked Data for Production
Nancy Lorimer, Head, Metadata Dept, Stanford University Libraries
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Exploring Machine Learning for Libraries and Archives: Present and FutureBohyun Kim
A conference presentation given by Bohyun Kim, Chief Technology Officer & Professor, University of Rhode Island Libraries, USA for the Bite-sized Internet Librarian International 2021 on September 22, 2021.
Active Governance Across the Delta Lake with AlationDatabricks
Alation provides a single interface to provide users and stewards to provide active and agile data governance across Databricks Delta Lake and Databricks SQL Analytics Service. Understand how Alation can expand adoption in the data lake while providing safe and responsible data consumption.
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
A Fortune 100 company recently introduced Hadoop into their data warehouse environment and ETL workflow to save $30 Million. This session examines the specific use case to illustrate the design considerations, as well as the economics behind ETL offload with Hadoop. Additional information about how the Hadoop platform was leveraged to support extended analytics will also be referenced.
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
In the past years, sophisticated methods for extracting knowledge graphs from Wikipedia, like DBpedia,YAGO, and CaLiGraph, have been developed. In this talk, I revisit some of these methods and examine if and how they can be replaced by prompting a large language model like ChatGPT.
Overview of end-to-end lifecycle to productize and commercialize alternative datasets at S&P Global Market Intelligence
Benefits to discuss:
How S&P Market Intelligence develops new alternative datasets
How S&P Market Intelligence develops robust production processes for alternative data
S&P Global Market Intelligence GTM strategy and capabilities to sell alternative data
Architecting Agile Data Applications for ScaleDatabricks
Data analytics and reporting platforms historically have been rigid, monolithic, hard to change and have limited ability to scale up or scale down. I can’t tell you how many times I have heard a business user ask for something as simple as an additional column in a report and IT says it will take 6 months to add that column because it doesn’t exist in the datawarehouse. As a former DBA, I can tell you the countless hours I have spent “tuning” SQL queries to hit pre-established SLAs. This talk will talk about how to architect modern data and analytics platforms in the cloud to support agility and scalability. We will include topics like end to end data pipeline flow, data mesh and data catalogs, live data and streaming, performing advanced analytics, applying agile software development practices like CI/CD and testability to data applications and finally taking advantage of the cloud for infinite scalability both up and down.
Data Catalog as the Platform for Data IntelligenceAlation
Data catalogs are in wide use today across hundreds of enterprises as a means to help data scientists and business analysts find and collaboratively analyze data. Over the past several years, customers have increasingly used data catalogs in applications beyond their search & discovery roots, addressing new use cases such as data governance, cloud data migration, and digital transformation. In this session, the founder and CEO of Alation will discuss the evolution of the data catalog, the many ways in which data catalogs are being used today, the importance of machine learning in data catalogs, and discuss the future of the data catalog as a platform for a broad range of data intelligence solutions.
Security and Data Ownership in the Cloud
Andrew K. Pace, Executive Director, Networked Library Services, OCLC; Councilor-at-large, American Library Association
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
CC Tools and Resources for Librarians and LibrariesJane Park
Webinar I gave to librarians across the state of New York part of NY3R (http://www.ny3rs.org/).
Recording from 2 May 2014: http://rrlc.adobeconnect.com/p3wrr1dlws0/.
Abstract:
Creative Commons are a librarian's best friend when it comes to explaining copyright, pointing others to free academic and educational resources, and highlighting reuse and attribution best practices. Learn about Creative Commons -- the organization and its mission; its copyright licenses; its public domain tools, especially CC0 (read CC Zero); how to discover, find and attribute CC-licensed content; and how to license your own content with a CC license. We will also go over a few of the major organizations and institutions who have adopted CC licensing.
Thinking about resource issues: copyright and open accessAllison Fullard
The presentation was given to an international group of public health academics from African and Asian countries. They are preparing learning content for courses to be delivered in blended learning environments. Thinking about how copyright needs to be re-calibrated for our circumstances in 21st Century. Two publicly shared video clips are embedded into the file.
This presentation ponders what ‘forever’ access to licensed resources means, both as intellectual property and technological access. New initiatives such as Controlled Digital Lending (CDL) and Occam’s reader are potential tools that work for the public good. While new initiatives can be exciting, the promise of perpetual access can be difficult to fulfill. Specific examples of how libraries and publishers have met, or failed to meet, license terms regarding perpetual access will be presented. How to best provide perpetual access to items outside of license agreements, such as Open Access journals and OER will also be broached. We will examine how practical, economic, and culturally responsive library initiatives fit within the constraints and opportunities allowed under licensing, copyright, and staffing levels. Participants will be invited to consider whether perpetual access is a goal that is necessary, merely encouraged, or something else entirely.
Michelle Polchow, Electronic Resources Librarian, University of California, Davis
Rebecca Grant - Facilitating Connectivity: reducing copyright-related barrier...dri_ireland
Facilitating Connectivity: reducing copyright-related barriers to sharing - a presentation by Rebecca Grant at the Pararchive conference, Connecting Communities: Storytelling and the Digital Archive, Leeds, 27th March 2015.
This paper focuses on the issues encountered by the DRI team regarding intellectual property, copyright and licensing while building a repository which does not own the rights to the digital content it holds; and presents some of the solutions put in place to address this challenge.
Building Pyramids: Creating Partnerships in Digital ScholarshipChelcie Rowell
Both the Z. Smith Reynolds Library of Wake Forest University and the UNCG University Libraries have designed service models to provide support for scholarly digital projects on their respective campuses. Both institutions are designing this new library service to be both scalable and sustainable from the outset. Additionally, rather than only being involved on the endpoints of scholarship (providing inputs and preserving outputs), both institutions are positioning librarians to partner with faculty throughout the scholarship lifecycle.
Chelcie Juliet Rowell and Richard Cox will discuss the history of their distinct campus strategies, as well as the current state of their initiatives, including but not limited to their environment, goals, types of services offered, and outcomes. Past scholarly digital projects will serve as real-world examples, including mapping applications, research data mashups, and more. They will also touch upon both the expected challenges they face, as well as how their individual approaches to supporting scholarly research are applicable to other institutions.
Z. Smith Reynolds Library is working to implement an infrastructure to support digital projects at Wake Forest University that is responsive to the particularities of the university's context, mission, and size. The aim is to provide a solution that is both scalable and sustainable, from simple course blogs to custom web applications. Existing in parallel with Find.ZSR, our library's catalog, Build.ZSR is intended to convey that our library is not just a place to seek information resources, but also a place to construct new knowledge. We will share our process for conceptualizing and implementing Build.ZSR, as well as elicit thoughtful critique from other participants.
Building Library Exhibits with BiblioBoard CreatorChelcie Rowell
A presentation at the meeting of the ACRL Digital Curation Interest Group at the 2015 ALA Annual Conference in San Francisco. Wake Forest University’s Special Collections and Archives employs a number of both commercial and open source technologies to preserve, curate and display its digital collections. One that is relatively new and lesser known is BiblioBoard Creator, which aids in creating digital exhibits. This presentation will provide a brief overview of the platform and describe how WFU is using it to engage students in curatorial work, and in the process is generating increased interest in the university’s institutional history among both current students and alumni. Undergraduate students have been busy identifying hundreds of yearbooks articles, issues of the student newspaper, and football programs. They are also uploading and creating descriptions and organizing them into an online exhibit. The speaker will address things to consider when evaluating this and similar tools, including ease of use, intended audience, metadata input and output, presentation quality, mobile accessibility, and cost.
A Pond Feeding a Lake Feeding an Ocean: A DPLA Contributing Institution's Pe...Chelcie Rowell
Chelcie Juliet Rowell will share Wake Forest University's perspective as a contributing institution to the DPLA via the North Carolina Digital Heritage Center Service Hub. Like many institutions, we are grappling with how to represent archival materials at the item-level as the DPLA data model requires. In addition, we are using participation in the DPLA as an opportunity to clean up our metadata. Borrowing the principle of iterative and incremental development from the agile software development community, we treat each monthly harvest as a four-week development cycle during which we identify and implement small but meaningful improvements to our metadata. A presentation at the Society of North Carolina Archivists 2014 Annual Conference.
DSpace for Digital Special Collections: The Wake Forest ExperienceChelcie Rowell
Z. Smith Reynolds Library (ZSR) embraces an open-source ethos for library technology. In support of that ethos, ZSR adopted DSpace as its institutional repository platform in 2009. Through WakeSpace, a DSpace instance, we provide access to digital special collections as well as Wake Forest University faculty and student scholarship. Although some functionality is supported out of the box, considerable staff time must be devoted to developing interface improvements. Chelcie Juliet Rowell, Digital Initiatives Librarian, will discuss advantage and disadvantages of adopting DSpace, its impact on the ZSR community, and goals for future development. A presentation at the Society of North Carolina Archivists 2014 Annual Conference.
A Pond Feeding a Lake Feeding an Ocean: Wake Forest University as a Contribut...Chelcie Rowell
Wake Forest University has begun contributing digital collections to the Digital Public Library of America (DPLA) via the North Carolina Digital Heritage Center Service Hub. Each month, the North Carolina Digital Heritage Center aggregates OAI-PMH feeds of digital collections of contributing North Carolina institutions, and the DPLA in turn harvests this aggregation. Wake Forest is using participation in the DPLA as an opportunity to assess and clean up its metadata. Borrowing the principal of iterative and incremental development from the agile software development community, each monthly harvest is treated as a four-week development cycle during which small but meaningful improvements to metadata are identified and implemented (e.g. revising rights statement or populating the dc.date.created field). In contrast to a model that delivers a finished product only at the end of a project timeline, this approach allows the organization to immediately reap the benefits of participation in the DPLA, such as increased referrals to digital materials from the DPLA site and API. A presentation at the the Coalition for Networked Information 2014 Spring Membership Meeting.
Digital Preservation Policy at the Library of Congress
Metadata Ownership & Metadata Rights
1. Metadata Ownership
& Metadata Rights
Introduction by Jane Greenberg
Tim Elfenbein, Will Midgeley, Emily Roscoe, Chelcie Rowell, & Jessica Wood
UNC Scholarly Communications Working Group
13 February 2013
2. Overview ➡ Will Midgley
(presented by Jane Greenberg)
Copyright 101 ➡ Emily Roscoe
Metadata & Policy ➡ Jessica Wood
(presented by Chelcie Rowell)
Case Study: Duke University Press ➡ Tim Elfenbein
Case Study: Open Bibliographic Data ➡ Chelcie Rowell
4. Introduction
Can (or when) may metadata be considered a
commodity, product, or intellectual creation?
Who owns metadata? What are the rights
issues that surround metadata? How, why, or
when, might rights issues apply?
6. What is copyrightable expression?
• Google PageRank?
• Netflix "suggestions for you" algorithm?
• WorldCat record?
• Flickr tags?
7. Metadata as Intellectual Property
• Intellectual property made possible through
legal rights for creators of intellectual
products: copyrights, patents, trademarks,
trade secrets, etc.
Purpose: “to promote the Progress of Science and
useful Arts, by securing for limited Times to Authors
and Inventors the exclusive Right to their respective
Writings and Discoveries”
• Can’t own facts, however
8. Is Metadata Copyrightable?
• In Feist Publications v. Rural Telephone
Service, Supreme Court rejected “sweat of
the brow” doctrine
Result: collections of facts are not copyrightable;
intellectual product must contain modicum of
creativity for protection
Copyright section of U.S. Code also states that
“compilations” can be sufficiently original to secure
copyright
9. Who Owns Metadata?
• Producers of metadata are often taxpayer-
funded (e.g. Library of Congress, individual
libraries uploading their records to OCLC's
WorldCat, informatics staff at government
agencies and lab)
• Can metadata be “owned”? Is it more like a
fact about the world, or a creative
expression?
10. When Might Rights Issues Apply to
Metadata?
SkyRiver v. OCLC
• SkyRiver is a for-profit company providing
bibliographic services.
• Sued OCLC for overcharging libraries who
switched to SkyRiver for cataloging
• SkyRiver’s catalog is made up of LoC
records (public domain), British Library,
member libraries, etc.
• Suit mostly about pricing, but SkyRiver wants
access to OCLC database
11. When Might Rights Issues Apply to
Metadata?
OCLC v. Library Hotel
• Library Hotel (www.libraryhotel.com) used
Dewey Decimal Classification theme
• DDC owned (trademarked and partially
copyrighted) by OCLC
• OCLC sued over trademark infringement
• Issue here was more over metadata’s
“brand” than the actual scheme itself
12. Drawbacks of Open Metadata
• "Loss of potential attribution" and "loss of
potential income" (Oomen and Baltussen)
OCLC v. Library Hotel an example of first
SkyRiver v. OCLC an example of second
13. Benefits of Open Metadata
• Driving users to your content
• Stimulating collaboration
• Enabling new scholarship that can only be
done with open data
• Allowing creation of new services for
discovery
• "increas[ing] relevance to digital society"
(Oomen and Baltussen)
15. Provided Protections
• Title 17, United States Code (U.S.C.)
• Protects authors of “original works of
authorship” (published and unpublished) to
do and authorize others to do the following:
Reproduce
Distribute copies
Perform (audio works)
Display (visual works)
Prepare derivative works
16. Limitations on Author Rights
Exemptions for copyright liability
• Fair Use
The purpose and character of the use
The nature of the copyrighted work
The amount and substantiality of the portion used in
relation to the copyrighted work as a whole
The effect of the use upon the potential market for, or
value of, the copyrighted work
• Compulsory License
17. Copyright Points to Remember
• Copyright secured automatically, though
registration provides added protection
• Works consisting entirely of information that is
common property and containing no original
authorship are NOT protected by copyright
• Copyright protection expires
18. Who is Author?
In the case of works made for hire, the
employer and not the employee is considered
to be the author. Section 101 of the copyright
law defines a “work made for hire” as:
• a work prepared by an employee within the scope
of his or her employment; or
• a work specially ordered or commissioned for use
19. Case Law: Claims of Meta[data]
Rights Violations
Trademark violation claims in meta tags (HTML)
• PLAYBOY ENTERPRISES INC v. WELLES (279
F3d 796 (2002))
"Nominative use"
Copyright violation claims in page number use
• Matthew Bender & Co Inc v. West Publishing
Co (158 F. 3d 674 (1998))
21. Survey of Metadata Reuse Policies
• University of Pittsburgh institutional repository
• University of Edinburgh institutional repository
• University of Surrey institutional repository
• National Estuarine Research Reserve System
• IMLS Digital Collections & Content project
22. Common Features of Metadata
Reuse Policies
Access:
• Anyone has right to access metadata
Re-use:
• Non-profit users have unrestricted permission
• For-profit users must request permission
In both cases of re-use, metadata creator must receive an
attribution.
23. OCLC's "Community Norms"
• Based on Open Data Commons Attribution License
• Established through collective process among OCLC
member institutions
• Adopted by Harvard Library and others
• Includes recommended language for contracts with
outside IT vendors
Policy rationale regarding use of WorldCat bibliographic data:
"to encourage the widespread use of WorldCat bibliographic data while
also supporting the ongoing and long-term viability and utility of WorldCat
and of WorldCat-based services"
24. OCLC Community Norms: Acceptable Uses
of WorldCat Metadata
• Incorporating into local library catalogs
• Supporting patron research, facilitating resource discovery
• Verifying bibliographic data on local holdings
• Granting access to non-OCLC members for personal,
scientific, or institutional research/re-use
• Transferring WorldCat data on local holdings to outside
vendors providing services to the one's local institution
These norms were established by the OCLC Cooperative, last
updated on June 2, 2010.
25. OCLC Community Norms: Discouraged
Uses of WorldCat Metadata
• unauthorized distribution of OCLC log-ins, passwords
• mass downloads of WorldCat records w/o prior
permission from OCLC
• mass distribution of data directly from WorldCat to non-
members w/o prior permission from OCLC
Significant violations of the Community Norms, if reported, will
be sent to the Global Council and the OCLC Board of
Trustees for arbitration.
(OCLC 2012-2013
Board of Trustees)
27. Duke University Press
Non-Profit Scholarly Publisher
Medium-sized university press digitizing its book and
journal backlist, as well as producing new digital content
Wishes to integrate journal and book content on a common
platform, and improve findability and searchability
Huge influx of newly digitized content with little metadata
28. Taxonomy Strategies
Commercial Metadata Services
Joseph Busch: Ex-President of ASIS, Ex-Board Member of
Dublin Core Metadata Initiative
Services: Taxonomy construction, workshops, training, and
project definition (taxonomy governance)
Proposal: Review of DUP content, stakeholder interviews,
identify priority content metadata and controlled
vocabulary, develop and test content taxonomy,
establish governance guidelines, provide training
Cost: Mid-five figures
29. TEMIS
Semantic Content Enrichment Platform
Metadata extraction, semantic annotation, classification
and clustering, facet and filter builder, etc. Uses existing or
creates new/enhanced domain taxonomies.
30. DUP’s Options for Metadata Development
Create and maintain our own taxonomy
• Pros – Best fit for content, possible competitive edge
• Cons – Expensive and time-consuming,
not interoperable, only first step
Opt-in to semantic tagging initiative with other publishers
hosted by HighWire
• Pros – Access to larger corpus for taxonomy
development, interoperable metadata formats,
off-loading labor
• Cons – Expensive, loss of DUP focus and control,
could get squished by the needs of bigger players
32. Open Knowledge Foundation:
Principles of Open Bibliographic Data
Explicit and robust license statement
Recognized waiver or license
Defined by the Open Definition
Explicitly placed in the Public Domain via ODC-
PDDL or CC0
33. JISC Open Bibliographic Data Guide
One of the possibilities that open bibliographic data
offers is the chance for libraries and indeed anyone to
reuse the data to build innovative services for
researchers, teachers, students and librarians.
—Andy McGregor, JISC Programme Manager
34. JISC Open Bibliographic Data Guide:
Building Business Cases
WHY? Core rationale is about discoverability
and gaining in credibility the more our
resources are discovered from ‘out there’
(through such as Google) and not from ‘in here’
(through the local OPAC).
HOW? Cost effectively and while maintaining
control at the point of release of data.
37. Richard Wallis Reflects on OCLC's
Release of WorldCat as Linked Data
1. Hundreds of millions of items
2. Used Schema.org vocabulary
3. Human-readable and machine-readable
(RDFa) on WorldCat.org
4. OCLC cooperating with other communities to
extend Schema.org for libraries
5. Open Data Commons license (ODC-BY)
6. First step in an ongoing process!
40. Harvard Bibliographic Data Set
"The accessibility of the entire set of data for
each item will, we hope, spur imaginative uses
that will find new value in what libraries know."
Mary Lee Kennedy
Senior Associate Provost for the Harvard Library
41. Within hours of release, one user
developed his own search interface
45. Open Bibliographic Data in Balance
Benefits Drawbacks
in line with mission to no going back – once
disseminate knowledge data is released difficult
& enable innovation to withdraw or re-
publicity & status for first release with more
movers stringent terms
creates opportunities for open-ended
third parties to develop commitment of time &
services that may drive resources
traffic back to library and
library holdings
47. References
Bueno, Carlos. A parser for the Harvard Library Bibliographic Dataset.
https://github.com/aristus/copymine-harvard#readme.
DPLA API query for items contributed by
Harvard.http://api.dp.la/v0.03/item/?filter=dpla.contributor:harvard_edu.
Eaton, Alf. Working With the Harvard Library Bibliographic Data Set. HubLog.
http://hublog.hubmed.org/archives/001953.html.
Gray, Jonathan. Europeana opens up data on 20 million cultural items. Guardian
DataBlog. http://www.guardian.co.uk/news/datablog/2012/sep/12/
europeana-cultural-heritage-library-europe.
Harvard Library Bibliographic Dataset. http://openmetadata.lib.harvard.edu/bibdata.
IMLS Digital Collections and Content Project. Metadata Reuse Policy.
http://imlsdcc.grainger.illinois.edu/MetadataReuse.
JISC Open Bibliographic Data Guide. http://obd.jisc.ac.uk/.
National Estuarine Research Reserve System Centralized Data Management Office.
NOAA Ocean and Coastal Resource Management Policy for the NERRS National
Monitoring Program. http://cdmo.baruch.sc.edu/data/policy.cfm.
48. References
Open Bibliographic Working Group of the Open Knowledge Foundation. Principles on
Open Bibliographic Data. http://openbiblio.net/principles/.
Open Data Commons Attribution License. http://opendatacommons.org/licenses/by/.
Record Use Policy Council. WorldCat Rights and Responsibilities for the OCLC
Cooperative. http://www.oclc.org/worldcat/recorduse/policy/default.htm.
Schwartz, Meredith. Harvard releases metadata into public domain. The Digital Shift.
http://www.thedigitalshift.com/2012/04/metadata/harvard-releases-metadata-into-
public-domain/.
Shieber, Stuart. The new Harvard Library open metadata policy. The Occasional
Pamphlet on Scholarly Communication.
http://blogs.law.harvard.edu/pamphlet/2012/04/27/the-new-harvard-library-open-
metadata-policy/.
University of Edinburgh. DataShare data policy for full-text and other full data items and
metadata policy for information describing items in the repository.
http://www.ed.ac.uk/schools-departments/information-services/services/research-
support/data-library/data-repository/service-policies/data-metadata-policy.
49. References
University of Pittsburgh. D-Scholarsip@Pitt metadata policy for information describing
items in the repository. http://d-scholarship.pitt.edu/policies.html.
University of Surrey. Surrey Research Insight (SRI) Open Access metadata policy for
information describing items in the repository and access and reuse policy for full-text
and other full data items. http://epubs.surrey.ac.uk/policies.html.
Wallis, Richard. OCLC WorldCat Linked Data Release – Significant in Many Ways.
Data Liberate. http://dataliberate.com/2012/06/oclc-worldcat-linked-data-release
-significant-in-many-ways/.
"What Does One Do With Millions of MARC records?" http://gavialib.com/2012/05/what-
does-one-do-with-millions-of-marc-records/.
WorldCat Record for Harry Potter and the Deathly Hallows.
http://www.worldcat.org/oclc/155131850.