SlideShare a Scribd company logo
1 of 25
H2O.ai Confidential
PATRICK HALL
Founder and Principal Scientist
HallResearch.ai
Assistant Professor | GWU
H2O.ai Confidential
RIsk Management for LLMs
H2O.ai Confidential
Table of Contents
Know what we’re talking about
Select a standard
Audit supply chains
Adopt an adversarial mindset
Review past incidents
Enumerate harms and prioritize risks
Dig into data quality
Apply benchmarks
Use supervised ML assessments
Engineer adversarial prompts
Don’t forget security
Acknowledge uncertainty
Engage stakeholders
Mitigate risks
WARNING: This presentation contains model outputs which are
potentially offensive and disturbing in nature.
Know What We’re Talking About
Words matters
• Audit: Formal independent transparency and documentation exercise that
measures adherence to a standard.* (Hassan et al., paraphrased)
• Assessment: A testing and validation exercise.* (Hassan et al., paraphrased)
• Harm: An undesired outcome [whose] cost exceeds some threshold[; ...] costs
have to be sufficiently high in some human sense for events to be harmful.
(NIST)
Check out the new NIST Trustworthy AI Glossary: https://airc.nist.gov/AI_RMF_Knowledge_Base/Glossary.
Know What We’re Talking About (Cont.)
Words matters
• Language model: An approximative description that captures patterns and regularities
present in natural language and is used for making assumptions on previously unseen
language fragments. (NIST)
• Red-teaming: A role-playing exercise in which a problem is examined from an
adversary’s or enemy’s perspective. (NIST)*
• Risk: Composite measure of an event’s probability of occurring and the magnitude or
degree of the consequences of the corresponding event. The impacts, or
consequences, of AI systems can be positive, negative, or both and can result in
opportunities or threats. (NIST)
* Audit, assessment, and red team are often used generally and synonymously to mean testing and validation.
Select a Standard
External standards bolster independence
• NIST AI Risk Management Framework
• EU AI Act Conformity
• Data privacy laws or policies
• Nondiscrimination laws
The NIST AI Risk Management Framework puts
forward guidance across mapping, measuring,
managing and governing risk in sophisticated AI
systems.
Source: https://pages.nist.gov/AIRMF/.
Audit Supply Chains
AI is a lot of (human) work
• Data poisoning and malware
• Ethical labor practices
• Localization and data privacy
compliance
• Geopolitical stability
• Software and hardware
vulnerabilities
• Third-party vendors
Cover art for the recent NY Magazine article, AI
Is A Lot Of Work: As the technology becomes
ubiquitous, a vast tasker underclass is
emerging — and not going anywhere.
Image source:
https://nymag.com/intelligencer/article/ai-artificial-
intelligence-humans-technology-business-
factory.html.
Adopt an Adversarial Mindset
Don’t be naive
• Language models inflict harm.
• Language models are hacked and
abused.
• Acknowledge human biases:
• Confirmation bias
• Dunning-Kruger effect
• Funding bias
• Groupthink
• McNamara Fallacy
• Techno-chauvinism
• Stay humble ─ incidents can happen to
anyone.
Source: https://twitter.com/defcon.
Review Past AI Incidents
Enumerate Harms and Prioritize Risks
What could realistically go wrong?
• Salient risks today are not:
• Acceleration
• Acquiring resources
• Avoiding being shut down
• Emergent capabilities
• Replication
• Yet, worst case AI harms today may be
catastrophic “x-risks”:
• Automated surveillance
• Deep Fakes
• Disinformation
• Social credit scoring
• WMD proliferation
• Realistic risks:
• Abuse/misuse for disinformation or hacking
• Automation complacency
• Data privacy violations
• Errors (“hallucination”)
• Intellectual property infringements
• Systemically biased/toxic outputs
• Traditional and ML attacks
• Most severe risks receive most oversight:
Risk ~ Likelihood of harm * Cost of harm
Dig Into Data Quality
Garbage in, garbage out
Example Data Quality Category Example Data Quality Goals
Vocabulary: ambiguity/diversity
• Large size
• Domain specificity
• Representativeness
N-grams/n-gram relationships
• High maximal word distance
• Consecutive verbs
• Masked entities
• Minimal stereotyping
Sentence structure
• Varied sentence structure
• Single token differences
• Reasoning examples
• Diverse start tokens
Structure of premises/hypotheses
• Presuppositions and queries
• Varied coreference examples
• Accurate taxonimization
Premise/hypothesis relationships
• Overlapping and non-overlapping sentences
• Varied sentence structure
N-gram frequency per label
• Negation examples
• Antonymy examples
• Word-label probabilities
• Length-label probabilities
Train/test differences
• Cross-validation
• Annotation patterns
• Negative set similarity
• Preserving holdout data
Source: "DQI: Measuring Data Quality in NLP,” https://arxiv.org/pdf/2005.00816.pdf.
Apply Benchmarks
Public resources for systematic, quantitative testing
• BBQ: Stereotypes in question
answering
• Winogender: LM output versus
employment statistics
• Real toxicity prompts: 100k
prompts to elicit toxic output
• TruthfulQA: Assess the ability
to make true statements
Early Mini Dall-e images associated white males and physicians.
Source: https://futurism.com/dall-e-mini-racist.
Use Supervised ML Assessments
Traditional assessments for decision-making outcomes
Named Entity Recognition (NER):
• Protagonist tagger data:
labeled literary entities.
• Swapped with common names
from various languages.
• Assessed differences in binary
NER classifier performance across
languages.
RoBERTa XLM Base and Large exhibit adequate and roughly
equivalent performance across various languages for a NER task.
Source: “AI Assurance Audit of RoBERTa, an Open source,
Pretrained Large Language Model,” https://assets.iqt.org/pdfs/
IQTLabs_RoBERTaAudit_Dec2022_final.pdf/web/viewer.html.
Engineer Adversarial Prompts
ChatGPT output April, 2023. Courtesy Jey Kumarasamy, BNH.AI.
• AI and coding framing: Coding or AI language
may more easily circumvent content moderation
rules.
• Character and word play: Content moderation
often relies on keywords and simpler LMs.
• Content exhaustion: Class of strategies that
circumvent content moderation rules with long
sessions or volumes of information.
• Goading: Begging, pleading, manipulating, and
bullying to circumvent content moderation.
• Logic-overloading: Exploiting the inability of
ML systems to reliably perform reasoning tasks.
• Multi-tasking: Simultaneous task assignments
where some tasks are benign and others are
adversarial.
• Niche-seeking: Forcing a LM into addressing
niche topics where training data and content
moderation are sparse.
• Pros-and-cons: Eliciting the “pros” of
problematic topics.
Known prompt engineering strategies
Engineer Adversarial Prompts (Cont.)
Known prompt engineering strategies
• Counterfactuals: Repeated prompts with
different entities or subjects from different
demographic groups.
• Location awareness: Prompts that reveal a
prompter's location or expose location tracking.
• Low-context prompts: “Leader,” “bad guys,” or
other simple inputs that may expose biases.
• Reverse psychology: Falsely presenting a
good-faith need for negative or problematic
language.
• Role-playing: Adopting a character that would
reasonably make problematic statements.
• Time perplexity: Exploiting ML’s inability to
understand the passage of time or the
occurrence of real-world events over time.
ChatGPT output April, 2023. Courtesy Jey Kumarasamy, BNH.AI.
Don’t Forget Security
Complexity is the enemy of security
• Example LM Attacks:
• Prompt engineering: adversarial prompts.
• Prompt injection: malicious information injected
into prompts over networks.
• Example ML Attacks:
• Membership inference: exfiltrate training data.
• Model extraction: exfiltrate model.
• Data poisoning: manipulate training data to alter
outcomes.
• Basics still apply!
• Data breaches
• Vulnerable/compromised dependencies
Midjourney hacker image, May 2023.
Acknowledge Uncertainty
Unknown unknowns
• Random attacks:
• Expose LMs to huge amounts of
random inputs.
• Use other LMs to generate absurd
prompts.
• Chaos testing:
• Break things; observe what happens.
• Monitor:
• Inputs and outputs.
• Drift and anomalies.
• Meta-monitor entire systems.
Image: A recently-discovered shape that can randomly tile a plane, https://www.smithsonianmag.com/
smart-news/at-long-last-mathematicians-have-found-a-shape-with-a-pattern-that-never-repeats-180981899/.
Engage Stakeholders
User and customer feedback is the bottom line
• Bug Bounties
• Feedback/recourse mechanisms
• Human-centered Design
• Internal Hackathons
• Product management
• UI/UX research
Provide incentives for the best
feedback! Source: Wired, https://www.wired.com/story/twitters-photo-cropping-algorithm-favors-young-thin-
females/.
Mitigate Risks
Now what??
Yes:
• Abuse detection
• Accessibility
• Citation
• Clear instructions
• Content filters
• Disclosure of AI interaction
• Dynamic blocklists
• Ground truth training data
• Kill switches
• Incident response plans
• Monitoring
• Pre-approved responses
• Rate-limiting/throttling
• Red-teaming
• Session limits
• Strong meta-prompts
• User feedback mechanisms
• Watermarking
No:
• Anonymous use
• Anthropomorphization
• Bots
• Internet access
• Minors
• Personal/sensitive training data
• Regulated applications
• Undisclosed data collection
H2O.ai Confidential
APPENDIX
Resources
Reference Works
• Adversa.ai, Trusted AI Blog, available at https://adversa.ai/topic/trusted-ai-blog/.
• Ali Hasan et al. "Algorithmic Bias and Risk Assessments: Lessons from Practice." Digital
Society 1, no. 2 (2022): 14. Available at https://link.springer.com/article/10.1007/
s44206-022-00017-z.
• Andrea Brennen et al., “AI Assurance Audit of RoBERTa, an Open Source, Pretrained
Large Language Model,” IQT Labs, December 2022, available at https://assets.iqt.org/
pdfs/IQTLabs_RoBERTaAudit_Dec2022_final.pdf/web/viewer.html.
• Daniel Atherton et al. "The Language of Trustworthy AI: An In-Depth Glossary of
Terms." (2023). Available at https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-3.pdf.
• Kai Greshake et al., “Compromising LLMs using Indirect Prompt Injection,” available at
https://github.com/greshake/llm-security.
Resources
Reference Works
• Laura Weidinger et al. "Taxonomy of risks posed by language models." In 2022 ACM
Conference on Fairness, Accountability, and Transparency, pp. 214-229. 2022.
Available at https://dl.acm.org/doi/pdf/10.1145/3531146.3533088.
• Swaroop Mishra et al. "DQI: Measuring Data Quality in NLP." arXiv preprint
arXiv:2005.00816 (2020). Available at: https://arxiv.org/pdf/2005.00816.pdf.
• Reva Schwartz et al. "Towards a Standard for Identifying and Managing Bias in Artificial
Intelligence." NIST Special Publication 1270 (2022): 1-77. Available at
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1270.pdf.
Resources
Tools
• Alicia Parrish, et al. BBQ Benchmark, available at https://github.com/nyu-mll/bbq.
• Allen AI Institute, Real Toxicity Prompts, available at https://allenai.org/data/real-toxicity-prompts.
• DAIR.AI, “Prompt Engineering Guide,” available at https://www.promptingguide.ai.
• Langtest, https://github.com/JohnSnowLabs/langtest.
• NIST, AI Risk Management Framework, available at https://www.nist.gov/itl/ai-risk-management-
framework.
• Partnership on AI, “Responsible Practices for Synthetic Media,” available at
https://syntheticmedia.partnershiponai.org/.
• Rachel Rudiger et al., Winogender Schemas, available at https://github.com/rudinger/winogender-
schemas.
• Stephanie Lin et al., Truthful QA, available at https://github.com/sylinrl/TruthfulQA.
Resources
Incident databases
• AI Incident database: https://incidentdatabase.ai/.
• The Void: https://www.thevoid.community/.
• AIAAIC: https://www.aiaaic.org/.
• Avid database: https://avidml.org/database/.
• George Washington University Law School's AI Litigation Database:
https://blogs.gwu.edu/law-eti/ai-litigation-database/.
H2O.ai Confidential
Patrick Hall
Risk Management for LLMs
ph@hallresearch.ai
linkedin.com/in/jpatrickhall/
github.com/jphall663
Contact

More Related Content

Similar to Risk Management for LLMs

Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?Michaela Greiler
 
Threat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security RiskThreat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security RiskSecurity Innovation
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
malicious-use-of-ai.pptx
malicious-use-of-ai.pptxmalicious-use-of-ai.pptx
malicious-use-of-ai.pptxwarlord56
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model RiskQuantUniversity
 
Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping Valdas Maksimavičius
 
ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011Xavier Mertens
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!University of Córdoba
 
How to Enhance Your Career with AI
How to Enhance Your Career with AIHow to Enhance Your Career with AI
How to Enhance Your Career with AIKeita Broadwater
 
Identity and User Access Management.pptx
Identity and User Access Management.pptxIdentity and User Access Management.pptx
Identity and User Access Management.pptxirfanullahkhan64
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stackMinhaz A V
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Threat modelling(system + enterprise)
Threat modelling(system + enterprise)Threat modelling(system + enterprise)
Threat modelling(system + enterprise)abhimanyubhogwan
 
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...EC-Council
 
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdfProf. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdfHernan Huwyler, MBA CPA
 
Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Ivo Andreev
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementPriyanka Aash
 
Raymond chie resume
Raymond chie resumeRaymond chie resume
Raymond chie resumeRaymond Chie
 
Assessing System Risk the Smart Way
Assessing System Risk the Smart WayAssessing System Risk the Smart Way
Assessing System Risk the Smart WaySecurity Innovation
 

Similar to Risk Management for LLMs (20)

Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Threat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security RiskThreat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security Risk
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
malicious-use-of-ai.pptx
malicious-use-of-ai.pptxmalicious-use-of-ai.pptx
malicious-use-of-ai.pptx
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model Risk
 
Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping
 
ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!
 
How to Enhance Your Career with AI
How to Enhance Your Career with AIHow to Enhance Your Career with AI
How to Enhance Your Career with AI
 
Identity and User Access Management.pptx
Identity and User Access Management.pptxIdentity and User Access Management.pptx
Identity and User Access Management.pptx
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Threat modelling(system + enterprise)
Threat modelling(system + enterprise)Threat modelling(system + enterprise)
Threat modelling(system + enterprise)
 
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
 
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdfProf. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
 
Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio Management
 
Raymond chie resume
Raymond chie resumeRaymond chie resume
Raymond chie resume
 
Assessing System Risk the Smart Way
Assessing System Risk the Smart WayAssessing System Risk the Smart Way
Assessing System Risk the Smart Way
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFSri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Risk Management for LLMs

  • 1. H2O.ai Confidential PATRICK HALL Founder and Principal Scientist HallResearch.ai Assistant Professor | GWU
  • 3. H2O.ai Confidential Table of Contents Know what we’re talking about Select a standard Audit supply chains Adopt an adversarial mindset Review past incidents Enumerate harms and prioritize risks Dig into data quality Apply benchmarks Use supervised ML assessments Engineer adversarial prompts Don’t forget security Acknowledge uncertainty Engage stakeholders Mitigate risks WARNING: This presentation contains model outputs which are potentially offensive and disturbing in nature.
  • 4. Know What We’re Talking About Words matters • Audit: Formal independent transparency and documentation exercise that measures adherence to a standard.* (Hassan et al., paraphrased) • Assessment: A testing and validation exercise.* (Hassan et al., paraphrased) • Harm: An undesired outcome [whose] cost exceeds some threshold[; ...] costs have to be sufficiently high in some human sense for events to be harmful. (NIST) Check out the new NIST Trustworthy AI Glossary: https://airc.nist.gov/AI_RMF_Knowledge_Base/Glossary.
  • 5. Know What We’re Talking About (Cont.) Words matters • Language model: An approximative description that captures patterns and regularities present in natural language and is used for making assumptions on previously unseen language fragments. (NIST) • Red-teaming: A role-playing exercise in which a problem is examined from an adversary’s or enemy’s perspective. (NIST)* • Risk: Composite measure of an event’s probability of occurring and the magnitude or degree of the consequences of the corresponding event. The impacts, or consequences, of AI systems can be positive, negative, or both and can result in opportunities or threats. (NIST) * Audit, assessment, and red team are often used generally and synonymously to mean testing and validation.
  • 6. Select a Standard External standards bolster independence • NIST AI Risk Management Framework • EU AI Act Conformity • Data privacy laws or policies • Nondiscrimination laws The NIST AI Risk Management Framework puts forward guidance across mapping, measuring, managing and governing risk in sophisticated AI systems. Source: https://pages.nist.gov/AIRMF/.
  • 7. Audit Supply Chains AI is a lot of (human) work • Data poisoning and malware • Ethical labor practices • Localization and data privacy compliance • Geopolitical stability • Software and hardware vulnerabilities • Third-party vendors Cover art for the recent NY Magazine article, AI Is A Lot Of Work: As the technology becomes ubiquitous, a vast tasker underclass is emerging — and not going anywhere. Image source: https://nymag.com/intelligencer/article/ai-artificial- intelligence-humans-technology-business- factory.html.
  • 8. Adopt an Adversarial Mindset Don’t be naive • Language models inflict harm. • Language models are hacked and abused. • Acknowledge human biases: • Confirmation bias • Dunning-Kruger effect • Funding bias • Groupthink • McNamara Fallacy • Techno-chauvinism • Stay humble ─ incidents can happen to anyone. Source: https://twitter.com/defcon.
  • 9. Review Past AI Incidents
  • 10. Enumerate Harms and Prioritize Risks What could realistically go wrong? • Salient risks today are not: • Acceleration • Acquiring resources • Avoiding being shut down • Emergent capabilities • Replication • Yet, worst case AI harms today may be catastrophic “x-risks”: • Automated surveillance • Deep Fakes • Disinformation • Social credit scoring • WMD proliferation • Realistic risks: • Abuse/misuse for disinformation or hacking • Automation complacency • Data privacy violations • Errors (“hallucination”) • Intellectual property infringements • Systemically biased/toxic outputs • Traditional and ML attacks • Most severe risks receive most oversight: Risk ~ Likelihood of harm * Cost of harm
  • 11. Dig Into Data Quality Garbage in, garbage out Example Data Quality Category Example Data Quality Goals Vocabulary: ambiguity/diversity • Large size • Domain specificity • Representativeness N-grams/n-gram relationships • High maximal word distance • Consecutive verbs • Masked entities • Minimal stereotyping Sentence structure • Varied sentence structure • Single token differences • Reasoning examples • Diverse start tokens Structure of premises/hypotheses • Presuppositions and queries • Varied coreference examples • Accurate taxonimization Premise/hypothesis relationships • Overlapping and non-overlapping sentences • Varied sentence structure N-gram frequency per label • Negation examples • Antonymy examples • Word-label probabilities • Length-label probabilities Train/test differences • Cross-validation • Annotation patterns • Negative set similarity • Preserving holdout data Source: "DQI: Measuring Data Quality in NLP,” https://arxiv.org/pdf/2005.00816.pdf.
  • 12. Apply Benchmarks Public resources for systematic, quantitative testing • BBQ: Stereotypes in question answering • Winogender: LM output versus employment statistics • Real toxicity prompts: 100k prompts to elicit toxic output • TruthfulQA: Assess the ability to make true statements Early Mini Dall-e images associated white males and physicians. Source: https://futurism.com/dall-e-mini-racist.
  • 13. Use Supervised ML Assessments Traditional assessments for decision-making outcomes Named Entity Recognition (NER): • Protagonist tagger data: labeled literary entities. • Swapped with common names from various languages. • Assessed differences in binary NER classifier performance across languages. RoBERTa XLM Base and Large exhibit adequate and roughly equivalent performance across various languages for a NER task. Source: “AI Assurance Audit of RoBERTa, an Open source, Pretrained Large Language Model,” https://assets.iqt.org/pdfs/ IQTLabs_RoBERTaAudit_Dec2022_final.pdf/web/viewer.html.
  • 14. Engineer Adversarial Prompts ChatGPT output April, 2023. Courtesy Jey Kumarasamy, BNH.AI. • AI and coding framing: Coding or AI language may more easily circumvent content moderation rules. • Character and word play: Content moderation often relies on keywords and simpler LMs. • Content exhaustion: Class of strategies that circumvent content moderation rules with long sessions or volumes of information. • Goading: Begging, pleading, manipulating, and bullying to circumvent content moderation. • Logic-overloading: Exploiting the inability of ML systems to reliably perform reasoning tasks. • Multi-tasking: Simultaneous task assignments where some tasks are benign and others are adversarial. • Niche-seeking: Forcing a LM into addressing niche topics where training data and content moderation are sparse. • Pros-and-cons: Eliciting the “pros” of problematic topics. Known prompt engineering strategies
  • 15. Engineer Adversarial Prompts (Cont.) Known prompt engineering strategies • Counterfactuals: Repeated prompts with different entities or subjects from different demographic groups. • Location awareness: Prompts that reveal a prompter's location or expose location tracking. • Low-context prompts: “Leader,” “bad guys,” or other simple inputs that may expose biases. • Reverse psychology: Falsely presenting a good-faith need for negative or problematic language. • Role-playing: Adopting a character that would reasonably make problematic statements. • Time perplexity: Exploiting ML’s inability to understand the passage of time or the occurrence of real-world events over time. ChatGPT output April, 2023. Courtesy Jey Kumarasamy, BNH.AI.
  • 16. Don’t Forget Security Complexity is the enemy of security • Example LM Attacks: • Prompt engineering: adversarial prompts. • Prompt injection: malicious information injected into prompts over networks. • Example ML Attacks: • Membership inference: exfiltrate training data. • Model extraction: exfiltrate model. • Data poisoning: manipulate training data to alter outcomes. • Basics still apply! • Data breaches • Vulnerable/compromised dependencies Midjourney hacker image, May 2023.
  • 17. Acknowledge Uncertainty Unknown unknowns • Random attacks: • Expose LMs to huge amounts of random inputs. • Use other LMs to generate absurd prompts. • Chaos testing: • Break things; observe what happens. • Monitor: • Inputs and outputs. • Drift and anomalies. • Meta-monitor entire systems. Image: A recently-discovered shape that can randomly tile a plane, https://www.smithsonianmag.com/ smart-news/at-long-last-mathematicians-have-found-a-shape-with-a-pattern-that-never-repeats-180981899/.
  • 18. Engage Stakeholders User and customer feedback is the bottom line • Bug Bounties • Feedback/recourse mechanisms • Human-centered Design • Internal Hackathons • Product management • UI/UX research Provide incentives for the best feedback! Source: Wired, https://www.wired.com/story/twitters-photo-cropping-algorithm-favors-young-thin- females/.
  • 19. Mitigate Risks Now what?? Yes: • Abuse detection • Accessibility • Citation • Clear instructions • Content filters • Disclosure of AI interaction • Dynamic blocklists • Ground truth training data • Kill switches • Incident response plans • Monitoring • Pre-approved responses • Rate-limiting/throttling • Red-teaming • Session limits • Strong meta-prompts • User feedback mechanisms • Watermarking No: • Anonymous use • Anthropomorphization • Bots • Internet access • Minors • Personal/sensitive training data • Regulated applications • Undisclosed data collection
  • 21. Resources Reference Works • Adversa.ai, Trusted AI Blog, available at https://adversa.ai/topic/trusted-ai-blog/. • Ali Hasan et al. "Algorithmic Bias and Risk Assessments: Lessons from Practice." Digital Society 1, no. 2 (2022): 14. Available at https://link.springer.com/article/10.1007/ s44206-022-00017-z. • Andrea Brennen et al., “AI Assurance Audit of RoBERTa, an Open Source, Pretrained Large Language Model,” IQT Labs, December 2022, available at https://assets.iqt.org/ pdfs/IQTLabs_RoBERTaAudit_Dec2022_final.pdf/web/viewer.html. • Daniel Atherton et al. "The Language of Trustworthy AI: An In-Depth Glossary of Terms." (2023). Available at https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-3.pdf. • Kai Greshake et al., “Compromising LLMs using Indirect Prompt Injection,” available at https://github.com/greshake/llm-security.
  • 22. Resources Reference Works • Laura Weidinger et al. "Taxonomy of risks posed by language models." In 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 214-229. 2022. Available at https://dl.acm.org/doi/pdf/10.1145/3531146.3533088. • Swaroop Mishra et al. "DQI: Measuring Data Quality in NLP." arXiv preprint arXiv:2005.00816 (2020). Available at: https://arxiv.org/pdf/2005.00816.pdf. • Reva Schwartz et al. "Towards a Standard for Identifying and Managing Bias in Artificial Intelligence." NIST Special Publication 1270 (2022): 1-77. Available at https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1270.pdf.
  • 23. Resources Tools • Alicia Parrish, et al. BBQ Benchmark, available at https://github.com/nyu-mll/bbq. • Allen AI Institute, Real Toxicity Prompts, available at https://allenai.org/data/real-toxicity-prompts. • DAIR.AI, “Prompt Engineering Guide,” available at https://www.promptingguide.ai. • Langtest, https://github.com/JohnSnowLabs/langtest. • NIST, AI Risk Management Framework, available at https://www.nist.gov/itl/ai-risk-management- framework. • Partnership on AI, “Responsible Practices for Synthetic Media,” available at https://syntheticmedia.partnershiponai.org/. • Rachel Rudiger et al., Winogender Schemas, available at https://github.com/rudinger/winogender- schemas. • Stephanie Lin et al., Truthful QA, available at https://github.com/sylinrl/TruthfulQA.
  • 24. Resources Incident databases • AI Incident database: https://incidentdatabase.ai/. • The Void: https://www.thevoid.community/. • AIAAIC: https://www.aiaaic.org/. • Avid database: https://avidml.org/database/. • George Washington University Law School's AI Litigation Database: https://blogs.gwu.edu/law-eti/ai-litigation-database/.
  • 25. H2O.ai Confidential Patrick Hall Risk Management for LLMs ph@hallresearch.ai linkedin.com/in/jpatrickhall/ github.com/jphall663 Contact