AI Essentials: From Tools to Strategy
● Understand how US copyright law and licensing terms govern the training
of AI models on copyrighted content.
● Identify key considerations when evaluating vendor contracts and
publisher policies related to AI usage on licensed materials.
● Discuss how institutional policies can shape responsible AI use in
academic settings.
Week 3: AI Governance: Copyright, Licensing, and Compliance
Welcome today’s presenters
Katherine Klosek
Director of Information Policy &
Federal Relations
Association of Research Libraries
Samantha Teremi
Licensing Librarian
University of California, Berkeley
Stephen Wolfson
Asst General Counsel & Copyright
Advisor
University of Pennsylvania
Reminder…
This session offers practical guidance and foresight, not
legal advice.
Quick poll
When it comes to training AI models on copyrighted materials, which
statement best reflects your view?
○ Training on copyrighted works should be freely allowed — it’s part
of innovation and fair use.
○ Training is acceptable only with proper licensing or permission.
○ Training should require transparency about data sources and opt-
out mechanisms.
○ Training should be prohibited unless all rights are cleared in
advance.
○ It depends — we need clearer laws and guidance.
Copyright and AI:
Background and
recent developments
Stephen M. Wolfson
Assistant General Counsel/Copyright Advisor
University of Pennsylvania Libraries
smw@upenn.edu
Copyright basics for AI
● Copyright is a limited-duration property right that allows authors to control
their creative works in several ways
○ Copyright grants authors a bundle of exclusive rights of:
■ Reproduction, creation of derivative works, distribution, public display, public
performance
● Copyright automatically attaches to original works of authorships as soon as
they are created, if they satisfy three conditions:
○ Independently created; creative; and fixed in a tangible medium of expression
○ It is extremely easy to satisfy these three conditions
● Copyright does not protect things like:
○ Facts, ideas, methods of operation, concept, principles
○ Names, titles, short phrases
○ Works of the federal government
○ The text of cases and statutes
Copyright basics: Fair use
● Fair use is an exception to the exclusive
rights that copyright law gives to
rightsholders
○ it allows others to use copyrighted works in
ways that would otherwise be infringement
without permission from the rightsholders
● Fair use is a balancing test
○ To determine fair use, courts weigh four
factors against each other, considering how
the use fits within the purposes of
copyright, and determine whether, on
balance, a use is fair or not
○ The factors are: 1. The purpose and character
of the use; 2. the nature of the copied work;
3. the amount copied; 4. the harm to the
market for the original
Copyright and generative AI
GenAI Inputs: Is Training Copyright Infringement?
● Large Language Models must be trained on
huge amounts of data to work
○ Most training data today is unlicensed
● LLMs ingest data, analyze it, adjust their
weights and biases, and then purge those data
○ LLMs do not store copies of training data; they learn
patterns in data and can reproduce those patterns
● Whether training constitutes infringement is a
threshold question for the future of the
technology because LLMs can’t exist without
training on data
○ It would be cost prohibitive for many developers to
license all of the data they need
A line of cases support the argument that
training is fair use, at least under many
circumstances
GenAI Inputs: Cases supporting fair use
● Authors Guild v. Google, 804 F.3d 202 (2nd.
Cir. 2015)
○ Using copyrighted works as parts of data in a
database was transformative and fair use
● Kelly v. Arriba Soft, 336 F.3d 811 (9th Cir.
2003)
○ Using image thumbnails in a search engine was
transformative and fair use
● Vanderhye v. iParadigms, LLC, 562 F. 3d 630
(4th Cir. 2009)
○ Using and saving copyrighted works for
plagiarism checking software was
transformative and fair use
But a few recent cases may
change our thinking on this
Thomson Reuters v. Ross (D.
Del. 2025)
Thomson Reuters v. Ross (D. Del. Feb. 11, 2025)
● This case involves the legal research database, Westlaw
○ Westlaw is an extremely powerful -- and quite expensive -- database that allows researchers to
find all kinds of legal information resources, including cases, statues, regulations,
administrative rules, treatises, guides, and scholarship
● Ross built an AI legal research tool using content taken from Westlaw
○ Ross worked with LegalEase to produce “bulk memos” to train its AI model
○ The bulk memos were based on Westlaw’s headnotes
● Headnotes are annotations on rules of law that West editors attach to cases
○ Often annotations are almost exactly the same as the case text
● TR sued, saying that these bulk memos were essentially just West Headnotes
Westlaw headnotes
Westlaw headnotes
West key number system
Thomson Reuters v. Ross (D. Del. Feb. 11, 2025)
● The court first held that the headnotes and the key number system were copyrighted
works
○ Simply selecting some case text was enough to justify copyright protection, even
where the text of the annotation was taken verbatim from the case
● Then the court found that Ross’s use was not fair
○ The most important factors were 1 (purpose and character/commerciality) and 4
(market harm)
■ Ross’s use of the headnotes was for the same purpose as they were created -- to
enable users to find cases -- and for a commercial purpose; both cut against fair
use
■ Ross wanted to create a competing product based on TR’s works -- this cut
against fair use
○ The court seemed very bothered that Ross used TRs works to create a competing
product
Thomson Reuters v. Ross Inc.:
What’s next?
● This will likely be the first decision from an
appellate court on AI training
○ Ross filed its brief for interlocutory
appeal at the 3rd Circuit on fair
use/training
○ The 3rd Circuit is currently considering
the case
● Nota Bene (maybe)
○ The court district court noted that this
is not a generative AI case
○ but I think it is similar enough that it
could have a very important impact
Bartz v. Anthropic (N.D. Cal.
June 23, 2025)
Bartz v. Anthropic (N.D. Cal. June 23, 2025)
● Anthropic created a corpus of around 7 million items to train its LLM, Claude
○ This included
■ The Books3 library of around 200,000 unauthorized book copies and similar online libraries
of digital book copies
■ Digital copies of print books that Anthropic purchased -- including both new and used
books -- then discarded the print copies
● Anthropic stored this corpus to be used for other purposes (including -- but not
limited -- to training)
● A group of plaintiffs whose books are in Books3 sued Anthropic, claiming,
among other things, that copying and storing their books as part of Claude’s
training was copyright infringement
Bartz v. Anthropic (N.D. Cal. June 23, 2025)
● Judge Alsup held that: training Claude was fair use and copying/storing print
books that Anthropic purchased was fair use but building and saving a library of
unauthorized copies was not fair use
○ He described training as “spectacularly” transformative
○ But he doubted but did not rule that pirating copies that they could otherwise buy could ever
be justified
● He seems to reject the market dilution theory
○ The idea that an AI can compete in the market with works in its training data by being able to
create generally similar works, not necessarily substantially similar works or identical works, and
that will dilute the market for the works in the training data
● Market dilution theory seems wrong to me
○ Copyright protects specific expressions, not against general competition
○ This could significantly narrow the scope of fair use beyond the AI context
Bartz v. Anthropic: What’s happening now
● On July 17, 2025, Judge Alsup certified a
class of “All beneficial or legal copyright
owners” infringed by Anthropic’s storing of
unauthorized copies
○ This included thousands of claimants including
both authors and publishers
○ Losing the class action could have cost Anthropic
billions of dollars, possibly ending the company
○ Anthropic fought this initially but decided it
wanted to settle
● On Oct. 17, 2025, Judge Alsup granted
preliminary approval of a $1.5B settlement
in Bartz v. Anthropic
○ Estimated ~$3100/claim
○ https://www.anthropiccopyrightsettlement.com/
Kadrey v. Meta (N.D. Cal. June
25, 2025)
Kadrey v. Meta (N.D. Cal. Jun3 25, 2025)
● Meta trained its LLM, Llama, on the
Books3 library and other shadow
libraries
● A group of authors whose books were
in Books3, including comedian Sarah
Silverman, sued
● Meta moved for summary judgement
that training Llama on these books was
fair use
Kadrey v. Meta
● Judge Chhabria was skeptical that training would be fair use in most cases
○ But under these facts and these arguments, he felt bound to hold in favor of Meta
● Judge Chhabria disagreed with Judge Alsup on the market harm factor and
instead he embraced market dilution
○ “Courts can’t stick their heads in the sand to an obvious way that a new technology might
severely harm the incentive to create, just because the issue has not come up before. Indeed, it
seems likely that market dilution will often cause plaintiffs to decisively win the fourth factor—
and thus win the fair use question overall—in cases like this.”
● He described training as “highly transformative” but still doubted it should be
fair use because of the market harm
● Ultimately, he found in favor of Meta because the Authors did not develop their
argument enough
What’s next?
● There are over 50+ other cases about copyright and genAI currently ongoing,
so we will see more holdings in the future
● The 3rd Circuit will probably be the first appellate court to rule on these
issues, so it is the one to watch right now
● These three cases represent three directions courts could take on training
○ Will courts focus on whether and how the tool competes with the training data?
○ Will courts split building a corpus from training?
○ Will courts focus on market dilution?
● I expect SCOTUS to weigh in eventually, maybe in multiple cases
● So we are very far from the end of this story
Questions?
https://www.librarycopyrightalliance.org/
Copyright and Artificial Intelligence Part 3:
Generative AI Training (pre-publication version)
“Even where a model’s outputs are not
substantially similar to any specific copyrighted
work, they can dilute the market for works similar
to those found in its training data, including by
generating material stylistically similar to those
works.”
https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf
Breakout 1
What trends are you seeing in publisher policies
for AI use in academic settings?
How are these trends affecting teaching,
learning, and research?
Licensing
Considerations for
AI
U.S. CONTRACTUAL
OVERRIDE
Even if a use is fair, or if the content is
not protected by copyright at all, there
may be a contract that restricts
scraping, TDM, AI, and/or breaking
DRM to do TDM or use AI
● TDM Permitted: Can conduct TDM & retain
copies of mined works for scientific research
● No AI Training Opt-Outs: Copyright owners
may not opt out of allowing works to be
used for AI training for scientific research
● No contractual Override: License
agreements cannot negate either of these
rights
● Appropriate security measures
EU AI Act &
Directive on Copyright
Example AI Restriction
Customer and its Authorized Users may not:
1. directly or indirectly develop, train, program, improve, and/or enrich any artificial
intelligence tool (“AI Tool”) accessible to anyone other than Customer and its
Authorized Users, whether developed internally or provided by a third party; or
2. reproduce or redistribute the Content to any third-party AI Tool, except to the extent
limited portions of the Content are used solely for research and academic
purposes (including to train an algorithm) and where the third-party AI Tool (a) is
used locally in a self-hosted environment or closed hosted environment solely for use
by Customer or Authorized Users; (b) is not trained or fine-tuned using the
Content or any part thereof; and (c) does not share the Content or any part thereof
with a third party.
Learning
From Licensed
Content
Homegrown
Non-generative
Homegrown generative
Third-party
Non-generative
Third-party generative
Use vs.
Training
Negotiation Strategies
● Divide AI uses into subtypes
● Frame clauses in the negative
● Evaluate and address publisher concerns:
○ Licensed content only available to Licensee &
Authorized Users
○ Disruption to functionality
○ Reproduction/redistribution to third parties
○ Competing or commercial products
○ Reasonable information security standards
Except as expressly stated in this Agreement or otherwise permitted in
writing by [Licensor], or as permitted by any Creative Commons licenses
or public domain dedications applied to the Subscribed Products, the
Subscriber and its Authorized Users may not:...
Negotiation Strategy:
Exclude OA/
Creative Commons-Licensed Works
Negotiation Strategy:
Leverage Stakeholder Support
https://ucnet.universityofcalifornia.edu/employee-news/president-drake-and-provost-newman-affirm-the-universitys-
commitment-to-protect-author-researcher-and-reader-rights/
Breakout 2
How can your organization or institution ensure that
students and faculty have lawful access to copyrighted
works for AI training in research and academic settings?
How can we foster open dialogue and collaboration
between libraries and publishers during negotiations for
licensed scholarly materials?
Wrap up and takeaways

Klosek, Teremi, and Wolfson "AI Essentials: From Tools to Strategies: A 2025 NISO Training Series, Session Four - AI Governance: Copyright, Licensing, and Compliance"

  • 1.
    AI Essentials: FromTools to Strategy
  • 2.
    ● Understand howUS copyright law and licensing terms govern the training of AI models on copyrighted content. ● Identify key considerations when evaluating vendor contracts and publisher policies related to AI usage on licensed materials. ● Discuss how institutional policies can shape responsible AI use in academic settings. Week 3: AI Governance: Copyright, Licensing, and Compliance
  • 3.
    Welcome today’s presenters KatherineKlosek Director of Information Policy & Federal Relations Association of Research Libraries Samantha Teremi Licensing Librarian University of California, Berkeley Stephen Wolfson Asst General Counsel & Copyright Advisor University of Pennsylvania
  • 4.
    Reminder… This session offerspractical guidance and foresight, not legal advice.
  • 5.
    Quick poll When itcomes to training AI models on copyrighted materials, which statement best reflects your view? ○ Training on copyrighted works should be freely allowed — it’s part of innovation and fair use. ○ Training is acceptable only with proper licensing or permission. ○ Training should require transparency about data sources and opt- out mechanisms. ○ Training should be prohibited unless all rights are cleared in advance. ○ It depends — we need clearer laws and guidance.
  • 6.
    Copyright and AI: Backgroundand recent developments Stephen M. Wolfson Assistant General Counsel/Copyright Advisor University of Pennsylvania Libraries smw@upenn.edu
  • 7.
    Copyright basics forAI ● Copyright is a limited-duration property right that allows authors to control their creative works in several ways ○ Copyright grants authors a bundle of exclusive rights of: ■ Reproduction, creation of derivative works, distribution, public display, public performance ● Copyright automatically attaches to original works of authorships as soon as they are created, if they satisfy three conditions: ○ Independently created; creative; and fixed in a tangible medium of expression ○ It is extremely easy to satisfy these three conditions ● Copyright does not protect things like: ○ Facts, ideas, methods of operation, concept, principles ○ Names, titles, short phrases ○ Works of the federal government ○ The text of cases and statutes
  • 8.
    Copyright basics: Fairuse ● Fair use is an exception to the exclusive rights that copyright law gives to rightsholders ○ it allows others to use copyrighted works in ways that would otherwise be infringement without permission from the rightsholders ● Fair use is a balancing test ○ To determine fair use, courts weigh four factors against each other, considering how the use fits within the purposes of copyright, and determine whether, on balance, a use is fair or not ○ The factors are: 1. The purpose and character of the use; 2. the nature of the copied work; 3. the amount copied; 4. the harm to the market for the original
  • 9.
  • 10.
    GenAI Inputs: IsTraining Copyright Infringement? ● Large Language Models must be trained on huge amounts of data to work ○ Most training data today is unlicensed ● LLMs ingest data, analyze it, adjust their weights and biases, and then purge those data ○ LLMs do not store copies of training data; they learn patterns in data and can reproduce those patterns ● Whether training constitutes infringement is a threshold question for the future of the technology because LLMs can’t exist without training on data ○ It would be cost prohibitive for many developers to license all of the data they need
  • 11.
    A line ofcases support the argument that training is fair use, at least under many circumstances
  • 12.
    GenAI Inputs: Casessupporting fair use ● Authors Guild v. Google, 804 F.3d 202 (2nd. Cir. 2015) ○ Using copyrighted works as parts of data in a database was transformative and fair use ● Kelly v. Arriba Soft, 336 F.3d 811 (9th Cir. 2003) ○ Using image thumbnails in a search engine was transformative and fair use ● Vanderhye v. iParadigms, LLC, 562 F. 3d 630 (4th Cir. 2009) ○ Using and saving copyrighted works for plagiarism checking software was transformative and fair use
  • 13.
    But a fewrecent cases may change our thinking on this
  • 14.
    Thomson Reuters v.Ross (D. Del. 2025)
  • 15.
    Thomson Reuters v.Ross (D. Del. Feb. 11, 2025) ● This case involves the legal research database, Westlaw ○ Westlaw is an extremely powerful -- and quite expensive -- database that allows researchers to find all kinds of legal information resources, including cases, statues, regulations, administrative rules, treatises, guides, and scholarship ● Ross built an AI legal research tool using content taken from Westlaw ○ Ross worked with LegalEase to produce “bulk memos” to train its AI model ○ The bulk memos were based on Westlaw’s headnotes ● Headnotes are annotations on rules of law that West editors attach to cases ○ Often annotations are almost exactly the same as the case text ● TR sued, saying that these bulk memos were essentially just West Headnotes
  • 16.
  • 17.
  • 18.
  • 19.
    Thomson Reuters v.Ross (D. Del. Feb. 11, 2025) ● The court first held that the headnotes and the key number system were copyrighted works ○ Simply selecting some case text was enough to justify copyright protection, even where the text of the annotation was taken verbatim from the case ● Then the court found that Ross’s use was not fair ○ The most important factors were 1 (purpose and character/commerciality) and 4 (market harm) ■ Ross’s use of the headnotes was for the same purpose as they were created -- to enable users to find cases -- and for a commercial purpose; both cut against fair use ■ Ross wanted to create a competing product based on TR’s works -- this cut against fair use ○ The court seemed very bothered that Ross used TRs works to create a competing product
  • 20.
    Thomson Reuters v.Ross Inc.: What’s next? ● This will likely be the first decision from an appellate court on AI training ○ Ross filed its brief for interlocutory appeal at the 3rd Circuit on fair use/training ○ The 3rd Circuit is currently considering the case ● Nota Bene (maybe) ○ The court district court noted that this is not a generative AI case ○ but I think it is similar enough that it could have a very important impact
  • 21.
    Bartz v. Anthropic(N.D. Cal. June 23, 2025)
  • 22.
    Bartz v. Anthropic(N.D. Cal. June 23, 2025) ● Anthropic created a corpus of around 7 million items to train its LLM, Claude ○ This included ■ The Books3 library of around 200,000 unauthorized book copies and similar online libraries of digital book copies ■ Digital copies of print books that Anthropic purchased -- including both new and used books -- then discarded the print copies ● Anthropic stored this corpus to be used for other purposes (including -- but not limited -- to training) ● A group of plaintiffs whose books are in Books3 sued Anthropic, claiming, among other things, that copying and storing their books as part of Claude’s training was copyright infringement
  • 23.
    Bartz v. Anthropic(N.D. Cal. June 23, 2025) ● Judge Alsup held that: training Claude was fair use and copying/storing print books that Anthropic purchased was fair use but building and saving a library of unauthorized copies was not fair use ○ He described training as “spectacularly” transformative ○ But he doubted but did not rule that pirating copies that they could otherwise buy could ever be justified ● He seems to reject the market dilution theory ○ The idea that an AI can compete in the market with works in its training data by being able to create generally similar works, not necessarily substantially similar works or identical works, and that will dilute the market for the works in the training data ● Market dilution theory seems wrong to me ○ Copyright protects specific expressions, not against general competition ○ This could significantly narrow the scope of fair use beyond the AI context
  • 24.
    Bartz v. Anthropic:What’s happening now ● On July 17, 2025, Judge Alsup certified a class of “All beneficial or legal copyright owners” infringed by Anthropic’s storing of unauthorized copies ○ This included thousands of claimants including both authors and publishers ○ Losing the class action could have cost Anthropic billions of dollars, possibly ending the company ○ Anthropic fought this initially but decided it wanted to settle ● On Oct. 17, 2025, Judge Alsup granted preliminary approval of a $1.5B settlement in Bartz v. Anthropic ○ Estimated ~$3100/claim ○ https://www.anthropiccopyrightsettlement.com/
  • 25.
    Kadrey v. Meta(N.D. Cal. June 25, 2025)
  • 26.
    Kadrey v. Meta(N.D. Cal. Jun3 25, 2025) ● Meta trained its LLM, Llama, on the Books3 library and other shadow libraries ● A group of authors whose books were in Books3, including comedian Sarah Silverman, sued ● Meta moved for summary judgement that training Llama on these books was fair use
  • 27.
    Kadrey v. Meta ●Judge Chhabria was skeptical that training would be fair use in most cases ○ But under these facts and these arguments, he felt bound to hold in favor of Meta ● Judge Chhabria disagreed with Judge Alsup on the market harm factor and instead he embraced market dilution ○ “Courts can’t stick their heads in the sand to an obvious way that a new technology might severely harm the incentive to create, just because the issue has not come up before. Indeed, it seems likely that market dilution will often cause plaintiffs to decisively win the fourth factor— and thus win the fair use question overall—in cases like this.” ● He described training as “highly transformative” but still doubted it should be fair use because of the market harm ● Ultimately, he found in favor of Meta because the Authors did not develop their argument enough
  • 28.
    What’s next? ● Thereare over 50+ other cases about copyright and genAI currently ongoing, so we will see more holdings in the future ● The 3rd Circuit will probably be the first appellate court to rule on these issues, so it is the one to watch right now ● These three cases represent three directions courts could take on training ○ Will courts focus on whether and how the tool competes with the training data? ○ Will courts split building a corpus from training? ○ Will courts focus on market dilution? ● I expect SCOTUS to weigh in eventually, maybe in multiple cases ● So we are very far from the end of this story
  • 29.
  • 30.
  • 31.
    Copyright and ArtificialIntelligence Part 3: Generative AI Training (pre-publication version) “Even where a model’s outputs are not substantially similar to any specific copyrighted work, they can dilute the market for works similar to those found in its training data, including by generating material stylistically similar to those works.” https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf
  • 32.
    Breakout 1 What trendsare you seeing in publisher policies for AI use in academic settings? How are these trends affecting teaching, learning, and research?
  • 33.
  • 34.
    U.S. CONTRACTUAL OVERRIDE Even ifa use is fair, or if the content is not protected by copyright at all, there may be a contract that restricts scraping, TDM, AI, and/or breaking DRM to do TDM or use AI
  • 35.
    ● TDM Permitted:Can conduct TDM & retain copies of mined works for scientific research ● No AI Training Opt-Outs: Copyright owners may not opt out of allowing works to be used for AI training for scientific research ● No contractual Override: License agreements cannot negate either of these rights ● Appropriate security measures EU AI Act & Directive on Copyright
  • 36.
    Example AI Restriction Customerand its Authorized Users may not: 1. directly or indirectly develop, train, program, improve, and/or enrich any artificial intelligence tool (“AI Tool”) accessible to anyone other than Customer and its Authorized Users, whether developed internally or provided by a third party; or 2. reproduce or redistribute the Content to any third-party AI Tool, except to the extent limited portions of the Content are used solely for research and academic purposes (including to train an algorithm) and where the third-party AI Tool (a) is used locally in a self-hosted environment or closed hosted environment solely for use by Customer or Authorized Users; (b) is not trained or fine-tuned using the Content or any part thereof; and (c) does not share the Content or any part thereof with a third party.
  • 37.
  • 38.
  • 39.
    Negotiation Strategies ● DivideAI uses into subtypes ● Frame clauses in the negative ● Evaluate and address publisher concerns: ○ Licensed content only available to Licensee & Authorized Users ○ Disruption to functionality ○ Reproduction/redistribution to third parties ○ Competing or commercial products ○ Reasonable information security standards
  • 40.
    Except as expresslystated in this Agreement or otherwise permitted in writing by [Licensor], or as permitted by any Creative Commons licenses or public domain dedications applied to the Subscribed Products, the Subscriber and its Authorized Users may not:... Negotiation Strategy: Exclude OA/ Creative Commons-Licensed Works
  • 41.
    Negotiation Strategy: Leverage StakeholderSupport https://ucnet.universityofcalifornia.edu/employee-news/president-drake-and-provost-newman-affirm-the-universitys- commitment-to-protect-author-researcher-and-reader-rights/
  • 42.
    Breakout 2 How canyour organization or institution ensure that students and faculty have lawful access to copyrighted works for AI training in research and academic settings? How can we foster open dialogue and collaboration between libraries and publishers during negotiations for licensed scholarly materials?
  • 43.
    Wrap up andtakeaways

Editor's Notes

  • #33 I’ll be addressing how contracts factor into lawful AI usage on copyrighted works. It all starts with scholars needing access to large digital data sets to do text and data mining on. Access to the data sets is often governed by contracts, as in the majority of cases, scholars are not purchasing materials but rather acquiring or licensing them pursuant to a contract. For example, libraries enter into contracts with publishers to provide their users with access to Licensed Materials. Libraries are expressly paying for lawful access to use the copyrighted content in these agreements their scholars are permitted to use AI on these Licensed Materials only for nonprofit educational uses.
  • #34 A problem we have in the U.S, though, is that fair use isn’t protected from being overridden by licensing agreements or terms of use. “Contractual override” means that private parties, like publishers, may “contract around” fair use by requiring libraries to negotiate for otherwise lawful activities (such as conducting TDM or training AI for research). Academic libraries are forced to negotiate and often pay significant sums each year to try to preserve fair use rights for campus scholars through the database and electronic content license agreements they sign. And publishers are now inserting extremely convoluted and nuanced restrictions on the use of AI in license agreements. They are so nuanced that many institutions don’t even realize what they’re agreeing to and giving up. Even worse, many publishers are inserting complete AI bans into their license agreements. It’s also worth noting that some larger vendors are restricting AI in an effort to create their own AI tool and then license back those rights to libraries for a fee, which is quite cost-prohibitive for libraries and also works against their mission of ensuring that fair use is maintained for research methods like AI and Text and Data Mining, so that innovative research can be conducted in a more equitable manner.
  • #35 However, the European Union doesn’t face these same challenges. In more than forty countries—including all those within the European Union (EU)—publishers are prohibited from using contracts to nullify exceptions to copyright in non-profit scholarly and educational contexts. Article 3 of the EU’s Directive on Copyright in the Digital Single Market preserves the right for scholars within research organizations and cultural heritage institutions to conduct TDM for scientific research, and further forbids publishers from invalidating this exception through license agreements. While the EU AI Act established a classification system for AI use based on level of risk, with some AI applications being prohibited altogether, and those dubbed as high-risk AI systems (HRAIS) or general purpose AI (GPAI) applications facing respective compliance requirements in order to be permitted, the AI Act further cements protections in the EU’s Directive on Copyright. For example, under the EU AI Act, copyright owners may not opt out of having their works used in conjunction with artificial intelligence tools in TDM research in these research institution contexts—meaning copyrighted works must remain available for scientific research that is reliant on AI training, and publishers cannot override these AI training rights through contract. Publishers are thus obligated to—and do—preserve fair use-equivalent research exceptions for TDM and AI within the EU, and can do so in the United States and UK, too. The only thing Article 3 builds in, understandably, is the opportunity for rightsholders to seek reasonable security measures.
  • #36 But in the meantime, in the U.S. publishers are trying to take away nearly everything when it comes to AI usage. And everything comes down to what we can negotiate in our license agreements. To illustrate what I mean, here’s an example of a clause sent to the University of California, Berkeley by a publisher. The publisher initially said that neither the Licensee nor the Authorized Users can train or improve any AI tool if it’s accessible or released to third parties. And they tried to forbid the use of any outputs or analysis derived from the licensed content to train any tool available to third parties. But the second paragraph is even more concerning. It wouldn’t even let scholars train a third party AI tool under any circumstance, much less make that tool available to others. Needless to say, UC Berkeley did not agree to this and has since negotiated much better AI usage and training rights.
  • #37 In large part, publishers are trying to stop dissemination of these kinds of tools that now “know” something based on the licensed content. That is, they want to prevent tools from knowing facts about the licensed content. But this is literally the purpose of licensing content. When you license content for scholars to read, they are learning information from the content. When they write about it or teach about the content, they are not regenerating the actual expression from the content, the part that is protected by copyright, they are conveying the lessons learned from it -- facts not protected by copyright. Prohibiting the training of AI tools and the dissemination of those tools is functionally equivalent to prohibiting authorized users from learning anything about the content you’re licensing for them to learn from. And publishers should not be able to monopolize the dissemination of information learned from their content -- and especially when that information is used non-commercially.
  • #38 These agreements are really challenging to negotiate, and so we have found it really helpful to pay very careful attention to distinctions being made along the following lines: As to the tool itself: (and those are the quadrants in this diagram) – Whether the AI tools is homegrown vs. third-party – Whether that homegrown or third-party tool is generative or non-generative. As to the use being made of the tool (that’s the bubble in the middle of this diagram) – Whether it’s just use or also training or improvement of the tool If a publisher fails to make these distinctions, it’s a bit of a red flag, because the risk levels for each type are not equivalent, so you don’t want to accept broad prohibition. Generally, for all types of AI tools except third-party generative, we’re often able to negotiate so that scholars just need to use reasonable security measures and ensure that the tool they’re creating doesn’t reproduce or redistribute licensed content to third parties. Otherwise, there are no restrictions on disseminating the AI tool, trained or not. The only type of AI tool often requiring additional precautions is third-party generative AI, which has to be used and trained in a self-hosted environment or closed hosted environment, and not released into the wild or back to the tool’s creator with any licensed content embedded in it.
  • #39 So when going into negotiations, we’ve found the following strategies to be helpful: As just mentioned, divide AI uses into subtypes rather than accepting broad language You might have noticed in the example clause, that it was framed in the negative (so what the users can’t do, with certain exceptions, rather than what they can do). There’s no real science to this one other than that we’ve noticed it seems less scary for publishers. Lastly, we try to open the door to conversation by verbalizing to publishers what we believe their concerns to be and how our language addresses those concerns, and then ask them to provide feedback. For example, we’ve found typical concerns to include: Content only being made available to licensee & authorized users Prohibiting disruption to the functionality of licensed materials Not reproducing/redistributing licensed materials to third parties Not creating a competing or commercial product Using reasonable information security standards Typically libraries already have language addressing all of those things in their contracts, regardless of AI. So we’ve found it helpful to remind publishers that while we’re happy to reiterate those things specifically in relation to AI, we’re usually already agreeing to not do those things and the use of AI per our contract terms does not pose an additional threat in those respects.
  • #40 Next, it’s important to ensure that any open access or content that is subject to a Creative Commons license doesn’t get overridden by restrictive licensing terms. You want to avoid a situation in which scholars who have chosen to apply a CC license to their works are then prohibited from using AI and making fair uses with those CC-licensed works. So we include in our license agreements a statement that the AI restrictions do not apply to any works that are CC-licensed or in the public domain.
  • #41 Another helpful strategy, which may or may not work for your institution, is setting standards from on high. For UC Berkeley, this looks like our Presidential mandate, in which the President of the entire University of California issued a robust statement supporting our efforts to protect scholars’ rights to use and train AI in publisher negotiations. As the President explained, “UC scholars should not be contractually restricted by publishers from analyzing the scholarly literature by advanced computational means; and neither should researchers anywhere else. Collectively, the academy has created a corpus of scholarly literature; collectively, we need to ensure that it can be harnessed to advance our academic as well as public service missions, today and in the future.” With these licensing strategies and considerations in mind, I’d like to get you thinking about the next breakout session. So I’ll leave you with these questions: How can your organization or institution ensure that students and faculty have lawful access to copyrighted works for AI training in research and academic settings? How can we foster open dialogue and collaboration between libraries and publishers during negotiations for licensed scholarly materials?