First, it is important to point out that doing nothing is not an option, at least not a very good one. In other words, if you publish data without a corresponding contract, license, or waiver, there are still legal consequences.
To explain those consequences, I have to start by giving a quick primer on copyright law.
publishing data without a license, contract, or waiverneans at least arguablyyour published data is subject to “all rights reserved” copyright protection by default. That means no one can reproduce, modify, or distribute your data without your permission. Of course, that rule only applies to the extent your data is subject to copyright. As I explained earlier, the scope of copyright protection in data is complicated. The facts themselves are never protected, but the selection and arrangement of those facts may be. People can also use the data to the extent their use falls under an exception or limitation to copyright. As you can probably guess, these determinations are incredibly difficult to make, even for lawyers who specialize in copyright law. The practical result is anyone accessing your data will face potentially significant legal liability if they reuse your data without consulting a lawyer. This legal uncertainty could create a chilling effect on reuse of your data.
To help avoid creating a legal landmine for those accessing your data, you should use one of the three legal mechanisms when sharing your data. Start with option you are most familiar with – contracts.
You might know them as “data use agreements” or “data access policies.” They can take an infinite number of forms. Sometimes data has underlying exclusive right (like copyright), but not always. Only requirements are legal formalities, including offer and acceptance. Can be done by requiring people to “click to agree”, or with a general reference in the contract saying by accessing the data, the user agrees to the terms and conditions. Bit of legal fiction here.
Customizable=can dictate who can access, what can be done with it
1. even the best lawyer can’t possibly delineate every scenario. 2. if your requirements are too burdensome or complicated, many people may decide using your data is not worth the costs; others who are less conscientious will just ignore your requirements 3. control can also be adverse to public policy – see Canadian example
Example = Govt of Canada terms, violation if hurt rep of govt, was removed quicklyTroublesome. Esp when rarely read or negotiated.
4. because of legal fiction of offer and acceptance, especially in situations where the contract terms are included in a link on the bottom of the page, which most users will not even notice. Hard to argue they are “accepting an offer” by using the site.This is especially true when these un-negotiated contracts include terms that are particularly unfair or onerous. 5. contracts only bind the two parties to the agreement. If someone takes the data from your site and shares it elsewhere, anyone who obtains the data from the second user will not be bound by the terms and conditions of your contract
Next option is to use a public license, like the ones offered by CC.
While licenses may look a lot like contracts because they condition reuse of content upon certain terms, they are different in an important respect. Licenses require an underlying exclusive right to be enforceable. Where the underlying right begins and ends, the license begins and ends. That means, to understand the scope of a license, you have to understand the scope of the underlying right.
In the context of sharing scientific data, that right is typically copyright or sui generis database rights or both. As I explained, the scope of copyright law is murky, and it varies by jurisdiction. As a general rule of thumb, in the science and research context, you can often assume that a database is subject to copyright because of the way in which the data are selected and arranged. (typically meets the very low originality threshold required or sweat of the brow in some jurisdictions) Nevertheless, in the U.S. and elsewhere, you can also assume that the data itself can be extracted and reused without infringing the copyright in the database structure because data is typically factual. This rule does not apply in the European Union. EU member countries have enacted a sui generis database right. This right gives a maker of a database the power to prevent extraction or reuse of a substantial part of the contents of that database, even if the data does not meet the originality requirement of copyright law.
license for data can be rooted in copyright or in the sui generis database right if it applies, or both. E.g., Creative Commons licenses are copyright licenses. If a CC license is applied to a database, by default it covers the database itself and the contents within the database, all to the extent protected by copyright. Any use of the data or database that triggers copyright, requires compliance with the other license conditions. Any use of the data that does not trigger copyright - say if, for example, the data is in the public domain -- does not require attribution, even if it triggers database rights. Note: CC is rethinking its treatment of database rights as we create the new version of the CC license suite.
(1) Bc standardized, when something is licensed under a CC license, the terms that apply are the same for all data licensed that same license; this makes it easier on users, making your data more attractive; also minimizes interoperability problems; (2) conditions fairly minimal; entire suite of CC licenses revolve around four potential license conditions (BY, NC, ND, SA); makes it easier than it would be if were subject to customized license b/c terms are fairly well-known and understood worldwide; (3) far reach - license conditions apply to anyone who obtains material, even if they obtain it from a different source (including modified data)
(1) at same time, also a limited reach - mapping copyright or even SGDR means not every use of data requires compliance with license conditions, some see this as a negative (e.g., if falls within exception or limitation to copyright, for example if data is in the public domain); (2) complexity of copyright law means people may not understand that not every use of their licensed data requires compliance with conditions; also creates risk of under or over compliance by users; means lawyers often still have to be involved; (3) attribution requirements likely do not align perfectly with norms and expectations for receiving credit.
Attribution refers to a legally-imposed requirement to attribute the rights holder when data is copied or re-used in a specified manner. The remedy against someone who fails to attribute correctly is a lawsuit, based on infringement of an intellectual property right.Credit, on the other hand, is what we all want - explicit recognition for our contribution to someone else’s work. Remedy for violating expectations is typically an accusation of plagiarism or other sort of professional rebuke. Finally, there is citation, which is rooted in norms of scholarly communication. The purpose of citation is to support an argument with evidence. But citation has also become a proxy for credit, albeit an imperfect one. Remedy for failing to cite is a lack of academic or professional credibility.The attribution obligations written into a license are, by their nature, inflexible. This can lead to absurd situations where a user or aggregator of data is technically required by a license to attribute 1000 data providers, each in his or her own idiosyncratic manner. (attribution stacking)If you don’t do this, you are subject to a potential lawsuit. And even if you do-- citing 1000 providers in 1000 different ways – you may still not satisfy citation norms or people’s expectations for getting credit.Note: often same problems with contracts
The last legal mechanism I will discuss is the waiver. A waiver can take many forms, but the purpose is to surrender all exclusive rights to data. In effect, it places the data in the public domain.
Waivers are not enforceable in all jurisdictions. To deal with this problem, Creative Commons created a legal tool called CC0. CC0 operates on three levels. First, it is a waiver of all copyright and related rights. If the waiver fails, CC0 has a fall-back license that grants all permissions to the licensed work without any conditions. And, as a final backup, CC0 contains a non-assertion pledge, where the rights holder promises not to assert rights in the licensed work. This three-pronged approach is designed to make it operable worldwide
Waivers are often coupled with a statement of norms the provider would prefer that reusers follow. This gives people guidelines for citing and giving credit, but it doesn’t create a legally enforceable requirement to follow them.
legal certainty - there is no need for users to attempt to parse the copyrightability of an arrangement of data, which eliminates risk of under or over compliance; makes data more attractiveno interoperability problems – no need for users to spend time sifting through conflicting attribution requirements from different licenses or contracts; lawyers are [almost] eliminated from the equation
feeling like you don’t have control over your data can be scary, but it is important to remember that most citation norms are self-enforcing, and things typically work fine (legal example: Bluebook citation does not create cause for a lawsuit, but most people follow it in order to legitimize their work); have to consider whether filing a lawsuit is really the remedy you want to resort to anyway
To summarize, there are three primary legal mechanisms available, and as a baseline, it’s important to choose one of them. Doing nothing is not a viable alternative if you want your data to be used and cited. 1) Contracts are tempting because of ability to customize and control, but there are serious transaction costs to doing so, including risk your data will be avoided because it terms are too onerous or difficult to discern. 2)public licenses are a good option if attribution is particularly important to you, but there are limits here, too. The fact that licenses operate atop copyright law or sui generis database rights is a pro – because it makes it easier for people to understand. But it is also a con – copyright law in particular is complex and difficult to understand. If people misunderstand their obligations, they face a potential lawsuit, which is a scary proposition. There are also a lot of things copyright does not protect, especially when it comes to factual data. 3) In many ways, waivers are an often-overlooked alternative. This approach can scare lawyers because it means there is no legal recourse for failing to give credit, but reliance on norms is not as scary as it sounds. We do it all the time, especially in the academic and professional context. For the most part, people want to do the right thing. And for those who don’t, a contract or license probably wouldn’t stop them anyway. Now violated opening rules by letting my bias show through. Hopefully this helps you begin to navigate legal consequences of the way you share your data.