2. What is Generative AI?
GenAI enables the creation of novel content
Input
GenAI Model
Learns patterns in
unstructured data
Unstructured data
Output Novel Content
Data
Traditional AI Model
Learns relationship
between data and label
Output Label
Labels
VS
3. H2O.ai Confidential
For complex models (neural networks, gradient boosters, etc. )
● Lack of transparency
○ It’s not obvious what the model is calculating.
○ It’s not obvious why the model made a decision.
● And may not be obvious when the model breaks.
● Model robustness issues. May get strange results for out of
distribution input.
● Model probing can leak private information.
● May contain bias to certain groups
Responsible AI for Traditional ML
4. H2O.ai Confidential
For complex models (LLM )
● Lack of transparency
○ It’s not obvious what the model is calculating.
○ It’s not obvious why the model made a decision.
● And may not be obvious when the model breaks.
● Model robustness issues. May get strange results for out of
distribution input.
● Model probing can leak private information.
● May contain bias to certain groups
Responsible AI for Gen AI
5. v
H2O.ai Confidential
Interpretability Supervised AI
Global
● What is the average quality of the model in general?
○ Accuracy
○ Feature importance
○ Fairness
Local
● What are the properties of a single response?
○ Correct / Incorrect
○ Local feature importance
○ Robustness to perturbations
8. v
H2O.ai Confidential
Interpretability: Global / Local LLM
Global measures
● How accurate is the model in general?
● How frequently does it hallucinate?
● How frequently does the answer contain undesirable qualities like toxicity,
privacy violations, or unfairness?
Local measures
● Is the current response accurate?
● Does the current response contain undesirable qualities like toxicity,
privacy violations, or unfairness?
10. v
H2O.ai Confidential
Accuracy: LLMs
● Frequently sound reasonable
● Can hallucinate
● The training data may be from a huge training set that is difficult
to check.
14. v
H2O.ai Confidential
Accuracy: LLMs
Confirm results against a given source:
● Checking results against a given source (RAG)
● Checking results against the tuning data
● Checking results against an external source (eg wikipedia)
● Checking results against the training data (cumbersome).
● Checking for self-consistency (Self-check GPT)
Scoring methods
● Natural language inference
● Comparing embeddings
● Influence functions
16. v
H2O.ai Confidential
Counterfactual analysis: LLM
How consistent are results under different:
● Prompts / instructions.
● Proper names or pronouns (fairness)
● Provided context
● Word replacement with synonyms
● Other rewording
19. v
H2O.ai Confidential
Conclusions
● Gen AI models have many of the complexities as other models.
● Some methods from unsupervised learning are still useful.
● Unstructured output will also benefit from new methods.