This talk looks at
(a) the progress in regulating GPAI, renamed foundation models, by the EU AI Act as the EU parliament reaches a final text in May 2023
(b) what other laws exist to regulate generative AI meanwhile , notably copyright and the GDPR (latter dealt with in detail here https://www.slideshare.net/lilianed/can-chatgpt-be-compatible-with-the-gdpr-discuss )
How to regulate foundation models: can we do better than the EU AI Act?
1. How to regulate
foundation models:
can we do better
than the EU AI Act?
Lilian Edwards
Professor of Law,
Newcastle University
@lilianedwards
Lilian.edwards@ncl.ac.uk
April 2023
2. What are large or “foundation” models?
• GPT-2/3/3.5/4 (Open
AI/Microsoft)(prompt to text)(2019
on)
• “Large Language Model” or LLM
• ChatGPT
• DALL-E 2 (text to images – Google)
• Stable Diffusion (open source – text
to image)
• HarmonAI – makes AI generated
music (Stability)
• CoPilot (prompt generates computer
code – GitHub/OpenAI)
• Meta Make-me-A-Video (text to
video - Meta)
• ERNIE ( Baidu, China) (prompt to
text)
11. Important (for law) features of large or
“foundation” models
• Generative – create text, images etc rather than merely
classifying or predicting (ML)
• Trained on unprecedentedly large datasets
• Often scraped from “public” Internet
• Impossible to manually review legality, privacy or harm
of every item in datasets
• Computationally expensive and retraining slow ->
• large tech co dominance
• GPT-4 training cost >$100mn
• environmentally worrying
• Training sets allow the model to assess probability of
next word, pixel etc – not direct copying
• Models are general , can have multiple uses, eg to write
a party invite, a racist attack or provide customer
support within an automated hiring system
• Generated content increasingly difficult to distinguish
from human-created content (disinfo, deepfakes)
• Outputs may be “hallucinations” Hoppner, 2023
12. Issues with large models
PHASE 1 STOCHASTIC PARROTS
• Don’t actually understand , just “parrot”
• Bias, discrimination, misrepresentation and stereotyping of
groups; hate speech
PHASE 2 WILL NO-ONE THINK OF THE ARTISTS?
• Image and video deepfakes
• Pastiche
• Copyright
PHASE 3 FAKE NEWS ON STEROIDS
• Fake news and “hallucination” (text + images]
• Education & plagiarism
• Digital Services Act
PHASE 4 – YOU HAVE ZERO PRIVACY, GET OVER IT
• GDPR
13. Solution 1 : the EU AI Act and “GPAI”
AIA “risk based” approach
• Unacceptable risk – ‘Complete’
prohibition, 4 examples – Article 5
• High-risk –Fixed categories of
risky domains, based on intended
use ; “essential requirements”
including dataset quality, human
oversight –
• Limited risk – Transparency
obligations for a few AI systems
(chatbots, deepfakes, emotion ID,
biometric categorisation) – Article
52
• Minimal risk – Codes of conduct –
Article 69
Photo Source: European Commission, Digital Strategy Website
https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
14. Annex II - Products
• Machinery
• Toys
• Recreational craft and watercraft
• Lifts
• Equipment and protective systems
intended for use in potentially
explosive atmospheres
• Radio equipment
• Pressure equipment
• Cableway installations
• PPE
• Medical devices
• [...]
High-Risk AI systems (Designation) - Annexes II&III
• Annex III - Services
• Biometric identification and
categorisation of natural persons;
• Management and operation of critical
infrastructure;
• Education and vocational training;
• Employment, workers management and
access to self-employment;
• Access to and enjoyment of essential
private services and public services and
benefit;
• Law enforcement;
• Migration, asylum and border control
management;
• Administration of justice and democratic
processes
15. • Compliance with requirements (Art.8);
• Risk management system (Art.9);
• Data and data governance (Art.10); (“data quality”)
• “Training, validation and testing data sets shall be relevant, representative, free of errors and complete. They shall have the appropriate
statistical properties, including, where applicable, as regards the persons or groups of persons on which the high-risk AI system is
intended to be used.”
• Technical documentation (Art.11);
• Record-keeping (Art.12);
• Transparency and provision of information to users (Art.13);
• Human oversight (Art.14);
• “Human oversight shall aim at preventing or minimising the risks to health, safety or fundamental rights that may emerge when a high-
risk AI system is used in accordance with its intended purpose or under conditions of reasonably foreseeable misuse”
• Accuracy, robustness and cybersecurity (Art.15).
High-Risk AI systems (Requirements) – Arts. 8 - 15
16.
17. Definition of GPAI in EU AIA
LGAIMs = Large General AI Models – Hacker, Engel and Mauer “Regulating ChatGPT and other Large generative AI Models”, February
2023
Issues : over-inclusive (no emphasis generality in “ability, task or output”); important for classification ->
Council position
‘general purpose AI system’ means an AI system that is trained on broad data at scale, is
designed for generality of output, and can be adapted to a wide range of tasks
EU Parliament, March 2023
18. Developers v deployers
• Developers?
• Akin to manufacturers – control
and have knowledge of the
training sets, weights, algorithms,
content moderation etc (esp if
closed source)
• Most high-risk obligations arise at
development stage (training &
human fine-tuning stages)
• Have practical power and
economic gains
• But can they handle
unforeseeable uses/risks? Or can
only tech giants competition
issues? Open source providers?
• Deployers? [NB “users” in AIA!]
• Originally only duties on deployer
if they make a “substantial
modification” to the AI system, ie
become new provider
• Practicality ?: May be impossible
for them to fix or even audit issues
of data quality etc without access
to upstream source code, training
datasets etc (often
secret/proprietary – eg as with
GPT-3)
• “AI as a service” API/cloud model
will be prevalent till models
smaller
“Pick n Mix?”
19. Solutions in AIA process?
• 13 May 2022, French presidency added amendment excluding GPAIs from AIA
• Back and forth..
• Council position, Dec 2022, arts 4a-4c
• GPAI deemed high risk if they “may be used as high risk AI systems or as
components of high risk AI systems”
• Unless “explicitly excluded all high-risk uses”, but not if not in “good faith”
• European Parliament
• All generative AI models to go into high risk if
• they generate text that might be mistaken for human
• And all deepfakes and AV content showing something that “never
happened” unless an “obvious artistic work”
• & Providers owe cooperation & transparency obligations to downstream
users
• Commission to tweak the high-risk obligations by delegated Acts..
20. Foundation models v GPAI?? 19/4/23
“Foundation models”
• “an AI system model that is trained on broad data
at scale, is designed for generality of output, and
can be adapted to a wide range of distinctive
tasks.”
• Eg Chat GPT , Stable Diffusion
• “Trained on data scraped from entire Internet”?
Not just if labelled
• Stricter obligations - High risk ++?
• Adds sustainability obligations + independent
expert ex ante oversight
• Documented analysis and testing throughout
lifecycle
• Disclosures re copyright in training set; filters to
avoid delivery unlawful content
“GPAI”
• “AI system that can be used in and adapted to
a wide range of applications for which it was
not intentionally and specifically designed.”
• ?? E.g. “unlabelled data that need further
training by the provider, such as algorithms
developed to recognise skin cancer”
• Laxer regime, since obligations fall on high risk
providers who build on the data (?)
Deployers & value chain
• “non-binding standard contractual clauses that
regulate rights and obligations consistent with each
party’s level of control”
21. • Discrimination/ equality law
• Liability (product liability for AI)
• Copyright
• Content, hate speech, libel, fake
news (DSA)
• Privacy & data protection
• Personality & image rights
• Competition law
• See Hoppner
https://papers.ssrn.com/sol3/paper
s.cfm?abstract_id=4371681
• Apart from last though, only AIA
and DMA allows for structural ex
ante regulation
• Private ordering?
Solution 2 : everything but the AI Act!
22. 1. Copyright & AI generated art: another
entire talk
• WaPost : “He used AI to win
a fine-arts competition. Was
it cheating?
• One judge said the striking
piece evoked Renaissance
art. But some critics
compared it to ‘entering a
marathon and driving a
Lamborghini to the finish
line.’”
• Effect on original artists? A
tool or a replacement??
23. • Boris Eldagsen The Electrician
• “winner” Sony World
Photography Awards,
• 17 April 2023
24. AI generated art and
copyright
• Often copyright in the art used as training dataset
inputs (eg Rutowski) – but they are not part of the
model
• No direct copying though sometimes a perfect copy
might emerge (memorisation)
• Is there actual copying of inputs?
• Is there (US) fair use or (UK/DSA) research or TDM
exception?
• Who owns the outputs?
• Again almost never direct copies
• “After the style of”..
• Derivative works?
• Some partial solutions : opt-out; haveIbeentrained ;
license, royalties or benefit sharing
• Litigation!
25. AI art litigation (US)
• Getty v Stability, based on copying of input works copyright Getty
• Transformative fair use?
• Anderson vs Stability & Deviant Art
• Aims to acquire rights over all OUTPUTS as derivative works from artistic works in
training set
• Defendants case to dismiss -
• NB Stability is open source, so you can analyse the underlying training sets
(cf Open (sic) AI & GPT-4)
26. If you thought artists were
p*ssed off by genAI..
… try the music
industry!
27. Can GPAI be compliant with GDPR?
• AIA high risk data quality and
transparency requirements actually
say nothing about privacy of users
• Machine learning process long
regarded as dubious but
• no federal DP law in US
• no quality of privacy,
confidentiality to data made
public in US
• GPAIs use permissionless public
data (eg Common Crawl)
• But - Replika decision, Italy, 2/2/23
• Primarily about exposure of children
to unsuitable sexualized material
• Unable to make valid contracts
31. What next?
• “reports of a 400% surge in VPN
downloads in Italy”
• Spain, France, pan-European investigation
by EDPB
• Canada investigation
• Does GPAI fundamentally challenge the
GDPR?
• If so, which gives?? (and does it take ML
with it?)
• The end of the data Wild West?
• But are the privacy regulators really the
right ones to take generative AI on?
• (and will the UK become a light-touch AI
regulation” law haven” for Chat GPT!)
32. Private ordering – control?
• Contracts and licenses
• Eg Responsible AI Licensing
(RAIL)(FAccT ‘22)
• Model cards, model sheets
• Privacy policies
• EPSRC Generative AI terms of Service
project Jan-March 2023, report end
April !
• Technical controls
• Filters : eg Open AI NSFW filter
• API control eg Project December app
• Internal human moderation when
finetuning the model pre release
• Watermarks for generated content eg PAI
Guidelines for Synthetic Media
• Enforcement?
• Privity of contract
33. AGI-nising choices: Nuclear arm or buggy software?
"Should we let machines flood our
information channels with
propaganda and
untruth? Should we automate
away all the jobs, including the
fulfilling ones? Should we develop
nonhuman minds that might
eventually outnumber, outsmart,
obsolete and replace
us? Should we risk loss of control
of our civilization?" (source;
emphasis in original)
Future of Life Institute Open Letter, March
22, 2023
Narayanan and Kapoor https://aisnakeoil.substack.com/p/a-
misleading-open-letter-about-sci?
P Hacker “The propensity of ChatGPT particularly to hallucinate when it does not find readymade answers can be exploited to generate text devoid of any connection to reality, but written in the style of utter confidence”
Clip art
getty photos
Book covers
Early product concept, ads, design
architecture mock ups
Can add shadows, extend out paintings, produce programme credits, instant editorial cartooning
Eg anthropomorphisation of CBT chatbots, postmortem avatars*
“Socio-economic “ harms
Energy – very costly to train giant models
Bans from fan fora – lots of fantasy art
but very derivative =- in its nature!
But – effort? “He started with a simple mental image — “a woman in a Victorian frilly dress, wearing a space helmet” — and kept fine-tuning the prompts, “using tests to really make an epic scene, like out of a dream.” He said he spent 80 hours making more than 900 iterations of the art, adding words like “opulent” and “lavish” to fine tune its tone and feel. He declined to share the full series of words he used to create his art, saying it is his artistic product, and that he intends to publish it later. “If there’s one thing you can take ownership of, it’s your prompt,” he said.”
A tool or a copy?
Transparency is to downstream deployer not data subject; data quality is not about whether personal data that was permissioned
Eg Project December postmortem avatar app – app withdrawn by GPT-3 API control as inappropriate, but not on application of end-user
Later they stopped asking you to fill in a form re the probity of your applicatiuon and now its just money!