Outlines a Generative AI process model combining post-deployment testing, incident reporting and DB, rapid response teams, and possible role for a regulatory agency
agency.
IAC 2024 - IA Fast Track to Search Focused AI Solutions
GenAI Incident DB+Response Team
1. Post-deployment Testing, Incident Database, and Rapid Response Teams
Looking to the future with the US AI Safety Institute: The NIST AI RMF and Playbook
with the extensions of the NIST GAI-PWG are good initial steps towards a framework for
creating Trustworthy AI. However the explosive pace of Generative AI (GAI) software
production will probably overwhelm any recommended voluntary guidelines. (See
recent OpenAI announcements especially customizable GPTs and Assistants) One
problem is that multiple organizations will be often involved in delivering GAI
applications (e.g. data providers, foundation model builders,
fi
ne tuning enhancers,
plug-in creators, application deployers, output distributors). It will be necessary to
have post-deployment independent red team testing and downstream user
incident reporting across many stages of the delivery process.
For generic foundation models, red team post-deployment testing could combine
human and generative AI-based tools
fi
ne-tuned for testing . For complex domain-
speci
fi
c applications (e.g. health, law,
fi
nance, coding, engineering, manufacturing),
there should be independent red team domain experts and/or domain-speci
fi
c
generative AI tools that can test and evaluate deployed GAI applications. A regulatory
agency could manage this type of testing in coordination with domain-speci
fi
c
professional organizations.
I believe that it will be essential to create a public GAI Incident Database. This
Database should include ID of GAI software, Description of software, Incident
Description, Status of Repair, Testing Results, Risk Evaluation, and Warnings. This will
be invaluable to potential users of the GAI software. (The Database could also include
similar information about data sources).There will be a vast number of incidents
reported with the increasing use of GAI. It is essential to evaluate the potential risks
associated with the incidents and track the status of
fi
xes. There should be a mandate
to report serious incidents (de
fi
nition needed) with deployed systems. Regulatory
responses should be de
fi
ned for high risk incidents. Only a neutral organization (e.g.
U.S. Arti
fi
cial Intelligence Safety Institute Consortium) with large resources and
access to expert evaluators and red teams will be able to maintain a large
incident database, determine risks, and validate
fi
xes.
All organizations involved with the Generative AI application delivery process
should have rapid response teams for
fi
xing problems discovered in post-
deployment testing and use. As incidents are discovered in an organization’s
deployed applications, the organization’s rapid response team should be required
to report the status of
fi
xes to a regulatory agency in a timely fashion to avoid
being penalized (e.g decerti
fi
cation of application for unresolved serious incidents,
criminal penalties for deliberate illegal errors ). The time allowed for
fi
xes should be
based on the seriousness of the problem.
The diagram below is a basic illustration of Post-deployment Testing combined with an
Incident Database and Rapid Response
fi
xes under the supervision of a regulatory
organization. Click to expand.
2. The GAI Deliverable Producer creates deliverables (e.g. input data, foundation model,
fi
ne-tuned model, applications, or output data). Hopefully they use the AI RMF
guidelines for pre-deployment testing and then release the deliverables. Post-
deployment testing could be done by the recipient and/or an independent red team. If
the post-deployment testing or use of the deliverables detects an incident, it is sent to
the incident database and the Rapid Response team of the GAI Deliverable Producer.
A regulatory agency (e.g. USA AI Safety Institute) evaluates the risk associated with the
incident and attaches a warning. The Rapid Response Team is responsible for
producing a
fi
x for the incident problem in a timely fashion depending on the risk level.
The USA AI Safety Institute tracks the status of the
fi
x and can take action (e.g.
penalties, deliverable decerti
fi
cation) if the
fi
x is signi
fi
cantly delayed.
AI RMF on Risk over the GAI Life Cycle
Arti
fi
cial Intelligence Risk Management Framework (AI RMF 1.0)
https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf
“Risk at different stages of the AI lifecycle: Measuring risk at an earlier stage in the AI
lifecycle may yield different results than measuring risk at a later stage; some
risks may be latent at a given point in time and may increase as AI systems adapt
and evolve. Furthermore, different AI actors across the AI lifecycle can have different
risk perspectives.For example, an AI developer who makes AI software available, such
as pre-trained models, can have a different risk perspective than an AI actor who is
responsible for deploying that pre-trained model in a speci
fi
c use case. Such deployers
may not recognize that their particular uses could entail risks which differ from those
perceived by the initial developer. All involved AI actors share responsibilities for
designing, developing, and deploying a trustworthy AI system that is
fi
t for purpose”