Before I jump into the meat of this session, I want to talk briefly about what brought this question to the forefront of my mind. Take a step back in time with me to Summer 2012. I had just finished my first year of library school and I had started interning at the UCLA Social Science Data Archive. In my first year of library school I had done the two quarters of the data curation classes that UCLA offers. One of the first teams I was working with in the Archive was an education research team. They needed to write a data management plan and so I was walking them through the different elements in a DMP.
It was going really well until we got to the part of the DMP about Intellectual Property Rights which as you can see here, ICPSR defines as entities or persons who will hold the intellectual property rights to the data. That was when I realized that even after two quarters of data management classes, I really didn’t know what to tell them about data ownership. After looking at UCLA’s institutional policies, I found out that at UCLA the university owns the data collected by employees of the university, but when I looked online at what other people thought, the answers were all over the map.
I soon found out that I wasn’t the only person who was confused about intellectual property issues related to datasets. According to a 2013 study done at the University of Oxford, intellectual property is the element of data curation that causes the most confusion for academic researchers. More than long term preservation and and metadata.
Does your institution have clear policies on data ownership? Yes No I don’t know
Do you know what those policies are? Yes No Ummm… sort of
In this session today, I want to frame some of the complications around data ownership and intellectual property. I also want to tell you about what I’ve discovered about data ownership at my own institution after looking at our data policies and speaking to our Office of General Counsel. Lastly, I would like to share the results of an informal environmental scan I’ve done over the past few weeks of data ownership policies at other institutions.
I would like to do these things with two caveats. 1.) I am a librarian, not a lawyer. If you have specific questions about datasets at your institution, I recommend you speak to the legal counsel at your institution. Talking to them is the best way of mitigating risk, but knowing the concepts that we are going to talk about today before you go talk to them is a great way to have a fruitful and informed conversation. The second caveat is that there are very few absolutes in copyright law so you may hear me say “It depends” or “that’s a grey area” more than once in this presentation. Sometimes with copyright, you just have to look at each situation individually.
Okay, let’s start at the very beginning… Before we specifically get into data ownership, I want to make sure we all understand some of the basic concepts behind intellectual property like what we are even talking about when we say intellectual property. I like this definition from the Cornell University Law School that says… Intellectual property deals with ownership rights for creations of the mind as opposed to physical things like your baseball card collection or your house.
There are three common types of IP rights. Copyright covers things like literary or artistic works. As librarians, this is the area of copyright law that we are usually most familiar with and indeed this is what we will mostly talk about today. Patents apply to inventions. George Foreman Grill. Trademarks are a third type of copyright law – a trademark is any word, name, symbol, or design used in commerce to identify and distinguish the goods of one manufacturer from those of another. This of course is a benefits consumers and businesses because we know that when we go to what looks like a Starbucks or a Five Guys, that’s exactly what it is. And it also allows brands to distinguish themselves through marketing. McDonalds example Twoallbeefpattiesspecialsaucelettucecheesepicklesonionsonasesameseedbun – no relation to academic research data. This is just a bonus. http://www.mcdonalds.com/us/en/terms_conditions.html
This is the area of intellectual property that we will primarily be looking at today although we will pay a small amount of attention to patents at the end. Copyright is automatic. Unlike patents and trademarks you don’t have to register for copyright although you do have to register if you would like to sue someone for copyright infringement. My copyright of Stan’s picture is the life of the author (me) + 70 years. If I die in 2060 (80 years old), this photo enters the public domain in 2130. Copyright can only pass from the creator of an original work of authorship to another party via written agreement like a work for hire.
Copyright is not just one right but it’s actually a bundle of rights which include… (read slide) The reason we bestow these rights to authors is to promote and incentivize authors to keep creating new and original things. If someone could just rip off their works the moment an author published, it wouldn’t give them enough time to profit from the sale of their intellectual efforts. On the other hand, copyright has always been limited in scope because the framers of copyright law also think it’s important for the public to have unlimited access to information after a certain period of time. So copyright law is a balancing act between the rights of authors to benefit exclusively from the sale and distribution of their work and the rights of the public to eventually have unrestricted access to content.
Let’s shift gears to the primary topic of this webinar which is academic research data. When we say academic research data, we might be talking about anything from straightforward numeric lists to highly annotated audio, video, and text files. In the academic research we also tend to include not just data in our definitions of data, but everything that is needed to makes sense of that data for secondary use, which means that “data” may also include highly original materials such as survey instruments or interview protocols.
Data might differ significantly depending on your field of study. Scientific data – observations, computational models, lab notebooks Social sciences –survey results, video recordings, field notes Humanities – data might be mining of text, both big and small, newspapers, records of human history Intellectual Property law does not treat all kinds of data the same, the reasons for which we will discuss today.
Another characteristic of academic research data is that it seems to sit squarely in between gov’t data and proprietary data on a spectrum of openness Gov’t data is open by default unless there are very good reasons for keeping it private like national security. Openly available government data like the U.S. Census forms the backbone of our research data commons, especially in the social sciences. At the other end of the spectrum, we have trade secrets like the formula for Coke or the Google search algorithm. This data is eligible for protection under trade secret law but it does it requires data owners to do two things: 1) the owner of the data needs to take measures to protect the confidentiality of the data and 2.) the information must have independent economic value by, among other things, remaining confidential. Sitting in between those two, we have academic research data – sits squarely in the space between gov’t data that must be shared with anyone and proprietary data that shouldn’t be shared with anyone. Unlike gov’t data, academic research data doesn’t usually automatically become openly available. In the past, if it was shared at all, it was often shared through a gift exchange culture through scientists on a request by request basis. In the past few years, however, there has been a movement towards making this data more open, especially if it was funded through public money. There are other factors that make academic research data more or less available as well. Sometimes it depends on the discipline. Disciplines where there may be commercial value to the data (like chemistry or gaming) tend to be more closed whereas disciplines where the data has primarily scholarly value and not so much commercial value (like astronomy) the data may lean towards more openness.
Why do we care about data ownership? Why does it even matter if researchers know what rights are associated with their data? #1 reason is reproducibility - Reproducibility is one of the cornerstone’s of science and without data sharing, reproducibility of research results is almost impossible. Knowing who owns the data also helps determine who is ultimately responsible for the long-term preservation of the data. Is it the researcher? The institution? Who should we direct data requests to? Lastly, as data librarians, many of us are helping our researchers create DMPs and we need to provide sound guidance on this issue. Almost all DMP requirements ask about ownership and IP.
Hopefully we all agree at this point that clarifying data ownership is important for all the reasons we just talked about, but it can be difficult to parse data ownership due to a variety of complications. The first complication is the amount of stakeholders and potential rights holders around data collection.
Researchers are often seen as the first and clearest stakeholders when it comes to data ownership. Since the general rule of copyright is that creators are automatically granted copyright to their original works of authorship, it’s logical to assume that researchers are the rightful owners of the data. And in fact researchers do often consider themselves the owners when it comes to the data. Until recently most researchers were never asked to share their data or think about who technically owns it unless the data was commercially viable, which is a small percentage of researcher data in most academic institutions. This is especially true of individual or small groups of researchers whose data is hard won. If someone had to travel across the world to take physical samples by hand, they may feel more ownership over the data than large research teams whose data streams in automatically through sensors.
Universities are another stakeholder when it comes to data ownership, Universities provide infrastructure, support, and salaries that make data collection possible. Some funding agencies are even clear in their project contracts that they are contracting with the University, not the individual researcher. And of course, it’s safe to say that if a particular researcher conducted their research out of their house instead of a prestigious research institution, their chances of securing funding for data collection would go down tremendously.
Which brings us to funding agencies. Like institutions funding agencies have a large financial investment in data collection and it seems like a right to stake their claim to the data. They are often the ones putting stipulations on how the data must be shared once the results of the research are published.
The final stakeholder is the the general public. When you work at a public institution or work on a federally or state funded grant, the tax payers are paying for the data collection as well. They have a vested interest in making sure the results of public funding are made publicly available.
With all of these stakeholders it can be difficult to know who is the actual owner of the data. This is a study that was done at UNC in 2012 that asked researchers on campus, who owns the data? (Read question) 46% of the researchers thought that it was the researcher who owns the data. 15% thought the university owned the data… 8% - funding agency, 9% the public… In reality, if you work at an academic institution, the most likely owner of the data is the University at which you work. But as we saw earlier with the DaMaRo study and now with this UNC study, this issue is far from understood in the research communities.
Another complication around the issue of data ownership is that it’s not even always clear that data is eligible for copyright protection. Under the law, data are often treated as “facts” which lack the requisite creativity to be eligible for copyright protection. There is this idea that a person who discovers something about nature or the world, doesn’t create that fact, he or she discovers it. For instance, Isaac Newton didn’t create gravity, he discovered it. The preeminent treatise on copyright law puts it like this…
Courts have long agreed with Nimmer’s sentiment that facts about the world, can’t be copyrighted. “Sweat of the brow” is not sufficient for protection. The logic behind all of these rulings is that facts belong in the public domain because they lack originality, are not the created by the discoverer, and would severely impair further research if held behind a statutory curtain.
So that’s it right? Data is not eligible for copyright? Not exactly. Many kinds of data created during a research process are subject to the same rights as literary and artistic works. Survey instruments, photographs, maps, and texts, just to name a few.
Data ownership refers to the physical or intellectual property. It would include copyright to the extent that the materials are eligible for copyright Data governance – who can do what with the data. Just because a researcher doesn’t own the data doesn’t mean they don’t have certain rights over the data like the right to select a repository for the data. Data stewardship – the care and feeding of the data. Where is it stored? Is there sufficient documentation to understand it? What metadata schema is going to be used to describe it? The vast majority of the time it is the researcher on the study who is responsible for data stewardship.
Now that we have a solid background on the basics of intellectual property and some of the complications around data ownership, we are going to shift gears to talk about data policy. At my institution, this is the data ownership policy for researchers…. Personally, I think it’s interesting that the U included stewardship in their policy since that seems a lot less clear-cut for me than ownership.
Unless you have a contract with a sponsor that would trump University policy, the University owns the data. Contracts trump statutes.
When a new faculty or staff member comes to work at the U, they sign an Employee IP Assignment Agreement saying basically that the work they do while employed at the University is work for hire. University IP includes…
There is an exception to this rule that faculty members retain copyright of their “traditional scholarly products” like journal articles and monographs. But when I spoke to our general counsel about whether or not “traditional scholarly products” would include data, she said that it’s unlikely, that the term is fairly narrowly defined. However, it would have to be evaluated on a case by case basis. There is also a bootstrap policy - if you have other staff members working with a faculty, the policy allows them to be treated as faculty if it’s a joint work - joint authorship on a paper, music and lyrics. Different people contributing to ONE work. “Joint work” is a term of art in copyright. That is not really to do with data, but I found it an interesting fact nonetheless.
Here is something that is not a grey area. You cannot make money off your research at the University independently. That is a clear violation of your contract at the University. If you plan on making money off your research product you must speak to TVC.
We have other data policies at the U related to data, but not necessarily IP. Data responsibility is different than data ownership and the P.I. is the person who is responsible for the data.
Based on FOIA regulations dating back to the 1980s. Of course researchers may want to archive their data for more than five years, but three is the minimum amount of time, they are required to keep their data. “the PI should develop…” We can help them with that! I’ve been doing a lot of training lately at the U on data management and organization and every time someone tells me that this is something they have never learned about before. The need is there!
RECAP! Who owns the data at the U…
The obvious question at this point is how normal are the U’s policies? Are we the only institution that so overtly claims ownership of research data? Over the past few months, I’ve done an informal environmental scan of the data policies of other institutions and found similar results. At every institution that I have looked at online, either there is no data policy OR a data policy that says if you perform research as an employee of a University, the University owns the data unless specified otherwise in a formal agreement or contract. I did not find one policy that said the researcher owned the data.
Here are more examples of data policies from other academic institutions…
In the news… Paul Aisen an Alzheimer’s researcher at UCSD moved his research to USC. UCSD filed the lawsuit to get the database back when the researcher moved it to USC. Claimed they were the rightful owners of the dataset. A California judge issued an injunction to restore control of a massive database to UCSD after the researcher tried to take the database to USC. If your data brings a great deal of prestige or notoriety to your institution.
Let’s shift gears for a moment to a question that I have received from time to time. Some researchers wonder that if they deposit their data in a data repository, if they are handing over copyright to the data repository. The answer to that is usually going to be no. Data repositories steward the data but they do not usually expect to have rights over the data collections that they distribute or provide access to. They will often work with researchers to come up with It is important to assign a clear license to determine how data can be used in the future. Many data repositories use Creative Commons licenses to share data and the license can usually be negotiated with the copyright holder. The copyright holders must agree to the terms of the deposit.
In our last few minutes together, I want to talk about the three ways researchers can share their data. Since copyright is automatic and bestowed on the author without any effort of their part. Therefore for copyright owners to allow others to access and use their datasets, they must manage the copyright so that it permits others to use their data. There are three legal mechanisms for sharing data: contracts, licenses, and waivers Contracts – Contracts are common when accessing data that is not publicly available or is only available through commercial vendors. They often come in the form of data use agreements or click-through agreements. (Roper Center, Utah Population Database). If you agree to the use of the data, those are the terms under which you must work. Licenses – licenses are like standardized versions of a contract. Many repositories share data under licenses. CC licenses are the most common. Recommend against “non-derivative” – not very useful for research! Beware of license incompatibility – if people want to combine datasets, causes issues. Impedes the reusability of data. Beware: Not all licenses are the same! CC BY NC ND – CC – okay, NC – maybe, ND - means people can’t remix data, limits its value and reuse. Waivers – waivers are the most open way of sharing data because it means that the data author gives up all rights to the data. However, it also means you give up your right to attribution, which most researchers want to maintain. They don’t mind sharing their research, but they want to get credit for the work of data collection.
This is a question I keep coming back to that I have not found a satisfactory answer to. Many institutional repositories do allow their researchers to assign Creative Commons licenses to their datasets. Many researchers do make their data openly available, either because they are required to by their funding agency or journal, or because they simply want to to make their research more transparent. Not only does this not seem to be a problem, but when I spoke to my general counsel at the University, they confirmed that sharing data openly is in line with the mission of the University to spread knowledge broadly. The only time that seems to not be the case is when your data is commercializable in some way. This is an area that is not spelled out in policy. The University will likely only step in if you plan on commercializing your data. So it seems safe to assume that if their academic research data does not have commercial value and it doesn’t violate any existing contracts, researchers can share their data with a Creative Commons license or waiver or they can release their code on GitHub. However, it still doesn’t quite sit right with me because only rights holders can assign new rights to a copyrightable work. So even if it’s common practice, is it correct or enforceable down the road? I would love to hear your thoughts on this…
Read the research policies, talk to your copyright librarian or Office of General Counsel DMPs are a great time to educate your researchers about the rights they have over their data. Most researchers are surprised to hear that they do not actually own the data that they collect. If you choose a Creative Commons license, strive for one that makes data openly sharable and reusable such as a CC BY license which still allows data authors to get credit for their work. Get involved! NISO CODATA BRDIData Q Creative Commons Open Knowledge Foundation
Ownership, intellectual property, and governance considerations for academic research data
Ownership, intellectual property,
and governance considerations for
academic research data
Research Data Management Librarian, University of Utah
DataFOUR Professional Development Webinar Series
September 25, 2015
1. Frame some of the complications around data
ownership and intellectual property
2. Tell you about what I have discovered about data
ownership at my own institution
3. Share the results of my environmental scan about
data ownership at other institutions
1. I am a librarian, not a lawyer
2. There are very few absolutes
Photos: National Archives, Hathi Trust
Slide adapted with permission from Amy Rudersdorf and Franky Abbott, DPLA
“Any product of the human intellect that the law
protects from unauthorized use by others.”
-Cornell University Law School Legal Information Institute
• Copyright (Literary and artistic works)
• Patents (Inventions)
• Trademarks (Symbols, names, and images used in commerce)
Patented January 11, 1995
Michael Boehm and Robert Johnson
• “Copyright” refers to reproduction or publication
restrictions on an item – it’s copyright status according
to U.S. law.
• A form of protection for authors that applies to
original works of authorship that are “fixed in a tangible
• Copyright is automatic.
• The right to reproduce the copyrighted work
• The right to create derivative works
• The right to distribute copies of the work
• The right to perform the copyrighted work publicly
• The right to display the copyrighted work publicly
Copyright = A bundle of rights
Portal to Texas History
Academic research data
“The recorded factual material
commonly accepted in the
research community as
necessary to validate research
findings.” – U.S. OMB,
Ownership of Data – 2012 Survey
Table from “Research Data Stewardship at UNC,” 2012
Complication #2 – Data and IP
“The discoverer of a scientific fact as to the nature of
the physical world, an historical fact, a contemporary
news event, or any other ‘fact’ may not claim to be the
‘author’ of that fact. If anyone may claim authorship of
facts, it must be the Supreme Author of us all. The
discoverer merely finds and records.”
Melville Nimmer, 1963
• Baker v. Seldon (1879) – documents must contain
a significant amount of originality to qualify for
• Feist v. Rural (1991) – phone books and other
compilations of facts are not eligible for copyright;
• Miller v. Universal Studios, Inc. (1981) –
Aggregated research is not eligible for copyright.
• If data are selected, arranged, and coordinated in an
original way, they may be eligible for copyright (17
U.S.C. §101. Definitions)
• “Data” often includes materials that are highly
• Data laws are not harmonized worldwide (Reichman
& Uhlir, 2003)
Complication #3 - Terminology
• Data ownership
• Data governance
• Data stewardship
“The University of Utah retains ownership and
stewardship of the scientific data and records for
projects conducted at the University or that use
University of Utah personnel or resources.”
- Research Handbook, Section 9.9
University Policy (cont.)
“Except where precluded by the specific terms
of a sponsored agreement, tangible research
property, including the scientific data and other
records of research conducted by the faculty or
staff of the University, belongs to the
- Research Handbook, Section 9.9
But what about IP?
University IP includes “the tangible and
intangible results of research (including for
example data, lab notebooks, charts, etc.)”
- Employee Intellectual Property Assignment
Copyright of Faculty Members
Faculty members retain copyright over their
“traditional scholarly products” but that term is
narrowly defined and would have to be
evaluated on a case-by-case basis.
Intellectual Property - Patents
At the U, if you plan on commercializing your data, you
must speak with TVC (Technology,Venture, and
“The P.I. is responsible for the collection, management,
maintenance, and retention of research data accumulated under
a research project.The University must retain research data in
sufficient detail and for an adequate period of time to enable
appropriate responses to questions about accuracy, authenticity,
privacy, and compliance with laws and regulations governing the
conduct of research. It is the P.I.s responsibility to determine
what records need to be retained to comply with sponsor
Research Handbook, 9.9.2
Data Responsibility (cont.)
“Research data must be archived for a minimum
of three years after the final project closeout.”
“The P.I. should develop appropriate
procedures for proper archiving and tracking of
Research Handbook, 9.9.4
Who owns the data at the U?
• The University
• The project sponsor if they negotiated data
ownership in the contract
• Another institution or commercial entity with
which you are collaborating
• IF you are a faculty member and IF your data
can be defined as a “traditional scholarly work,”
you would retain copyright of your data
At other institutions?
• Columbia University
• Cornell University
• Duke University
• Johns Hopkins University
• University of Massachusetts
• University of Kentucky
• University of Minnesota
• Virginia Commonwealth University
University of Minnesota
Data publishers do
not usually expect to
have rights over the
data collections it
provides access to.
Mechanisms for data sharing
Krier and Strasser, 2014
More shades of grey…
Can a researcher assign a Creative Commons license to
Can a researcher make their data open source without
What can librarians do?
1. Be familiar with your institution’s policies
2. Educate your researchers about the ownership
issues surrounding their data
3. Encourage waivers and unrestrictive licenses to
encourage open sharing of data
4. Become part of the conversation on your campus
and in the library community around data ownership