SlideShare a Scribd company logo
Generative
AI for Social
Good
Colleen Farrelly, Post Urban
What is generative AI?
• Deep learning frameworks that can produce new data based on
input prompts and large training datasets
• Can have any/all of these steps in the framework:
• Encoder-decoder structure
• Training sample matches
• Random noise and blending components
• Comparison steps to ensure realism
Examples
ChatGPT
DALL-E
Stable Diffusion
Large Language Models on
Hugging Face
Custom Generative Adversarial
Networks for other data types
Text Generators
• Massive training datasets
• Typically scraped and
possibly quality controlled
• Mostly in English
• Deep learning frameworks with
billions of parameters to train
• Can be modified by fine-tuning
• Specific examples relevant to
text generation task at hand
• LoRA as quicker way to train
Image Generators
• Many types
• Encoder-decoder steps in some
• Pull up related images
• Blend images
• Add random noise to back-fill
• Image generators plus comparison steps
• Two competing generators with one a
few training steps ahead of the other
• Comparison step to benchmark
against real dataset
• Some rely heavily on topology
Case Studies
Case 1: Medical
Image Generation
• Medical imaging data issues:
• Small sample sizes
• Sample imbalance (rare diseases…)
• Issues when augmenting small samples
or imbalanced samples:
• Biological structure fidelity in
generation (ex: ventricles in brain)
• Image variety in generation
TopoGAN
• Solution involves a generative
adversarial network with
topological awareness
• Topology
• Betti number
introduction
• Advantages:
• Preserves structures
like branching and
loops
• Generates large
number of images
close to target images
Case 2: Human Resource Diversity
Training
• Mindbloom
• Addresses training needs by providing synthetic people with whom to discuss
several types of conversations
• Employee reporting sexual harassment
• Addressing cultural mismatch of new employee
• Policy changes that impact employees
• Misgendering in the workplace
• Conversation and voice generation with proprietary generative algorithms
• Demo
Automated Reporting on Skill Improvement
Case 3: Protein
Generation
• Designing and testing new drugs takes a lot of
time and money.
• Not good for new pandemics in urgent need
of treatment
• Increased drug costs for consumers
• Many types of proteins/molecules in venom of
different animals
• Metalloproteinases, three finger toxins,
phospholipidase A2, disintigrins…
• Varies by geography and species
• Slight modifications of toxins as good
initial drug designs
Graph Generators
• Approach to protein/molecule-specific generative models:
• Translate protein/molecule to graph form
• Define properties of interest (solubility, for instance) or binding score
• Create generative model to work on generating similar graphs
• GAN trials generate new proteins/molecules with:
• Better target properties
• More variety
• Less time/cost to generation than other models/human generation
Case 4: Public Health Campaigns
• Many recent infectious diseases that can be spread from person to
person:
• Ebola
• COVID-19
• HIV
• Issues with traditional generation of video and poster messaging to
address behaviors contributing to spread
• Time to create script, image, and translations for local populations
• Lives lost in delays
Coupling Generators
• Generate culturally-relevant
images
• Generate text
• Translate text to local languages
Ethical Considerations
Potential Bad
Behaviors
• Deep fakes
• Fake news
• Biased data
• Hallucinations and jailbreaks
• Manipulation of algorithm by
text engineering
Open-Source Resources
• https://huggingface.co/models
• https://www.craiyon.com/
• https://github.com/TopoXLab/TopoGAN-ECCV2020
• https://github.com/Biomatter-Designs/ProteinGAN

More Related Content

Similar to Generative AI for Social Good at Open Data Science East 2024

Introduction•Super Computer developed by IBM Research•Named for .pdf
Introduction•Super Computer developed by IBM Research•Named for .pdfIntroduction•Super Computer developed by IBM Research•Named for .pdf
Introduction•Super Computer developed by IBM Research•Named for .pdf
anupambedcovers
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to Know
Barry Smith
 

Similar to Generative AI for Social Good at Open Data Science East 2024 (20)

ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013
 
Introduction•Super Computer developed by IBM Research•Named for .pdf
Introduction•Super Computer developed by IBM Research•Named for .pdfIntroduction•Super Computer developed by IBM Research•Named for .pdf
Introduction•Super Computer developed by IBM Research•Named for .pdf
 
Melissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AIMelissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AI
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflow
 
1 d.1
1 d.11 d.1
1 d.1
 
N=10^9: Automated Experimentation at Scale
N=10^9: Automated Experimentation at ScaleN=10^9: Automated Experimentation at Scale
N=10^9: Automated Experimentation at Scale
 
Social Listening for Scientists - BLA Case Study
Social Listening for Scientists - BLA Case StudySocial Listening for Scientists - BLA Case Study
Social Listening for Scientists - BLA Case Study
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Health information professionals and Artificial Intelligence
Health information professionals and Artificial IntelligenceHealth information professionals and Artificial Intelligence
Health information professionals and Artificial Intelligence
 
Text Mining
Text MiningText Mining
Text Mining
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
 
Ethics and computing to healthcare
Ethics and computing to healthcareEthics and computing to healthcare
Ethics and computing to healthcare
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to Know
 
MIS Unit-2.pptx
MIS Unit-2.pptxMIS Unit-2.pptx
MIS Unit-2.pptx
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
 
Intro_To_FHIR.pptx
Intro_To_FHIR.pptxIntro_To_FHIR.pptx
Intro_To_FHIR.pptx
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 

More from Colleen Farrelly

More from Colleen Farrelly (20)

Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
 

Recently uploaded

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
benishzehra469
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 

Recently uploaded (20)

Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 

Generative AI for Social Good at Open Data Science East 2024

  • 2. What is generative AI? • Deep learning frameworks that can produce new data based on input prompts and large training datasets • Can have any/all of these steps in the framework: • Encoder-decoder structure • Training sample matches • Random noise and blending components • Comparison steps to ensure realism
  • 3. Examples ChatGPT DALL-E Stable Diffusion Large Language Models on Hugging Face Custom Generative Adversarial Networks for other data types
  • 4. Text Generators • Massive training datasets • Typically scraped and possibly quality controlled • Mostly in English • Deep learning frameworks with billions of parameters to train • Can be modified by fine-tuning • Specific examples relevant to text generation task at hand • LoRA as quicker way to train
  • 5. Image Generators • Many types • Encoder-decoder steps in some • Pull up related images • Blend images • Add random noise to back-fill • Image generators plus comparison steps • Two competing generators with one a few training steps ahead of the other • Comparison step to benchmark against real dataset • Some rely heavily on topology
  • 7. Case 1: Medical Image Generation • Medical imaging data issues: • Small sample sizes • Sample imbalance (rare diseases…) • Issues when augmenting small samples or imbalanced samples: • Biological structure fidelity in generation (ex: ventricles in brain) • Image variety in generation
  • 8. TopoGAN • Solution involves a generative adversarial network with topological awareness • Topology • Betti number introduction • Advantages: • Preserves structures like branching and loops • Generates large number of images close to target images
  • 9. Case 2: Human Resource Diversity Training • Mindbloom • Addresses training needs by providing synthetic people with whom to discuss several types of conversations • Employee reporting sexual harassment • Addressing cultural mismatch of new employee • Policy changes that impact employees • Misgendering in the workplace • Conversation and voice generation with proprietary generative algorithms • Demo
  • 10. Automated Reporting on Skill Improvement
  • 11. Case 3: Protein Generation • Designing and testing new drugs takes a lot of time and money. • Not good for new pandemics in urgent need of treatment • Increased drug costs for consumers • Many types of proteins/molecules in venom of different animals • Metalloproteinases, three finger toxins, phospholipidase A2, disintigrins… • Varies by geography and species • Slight modifications of toxins as good initial drug designs
  • 12. Graph Generators • Approach to protein/molecule-specific generative models: • Translate protein/molecule to graph form • Define properties of interest (solubility, for instance) or binding score • Create generative model to work on generating similar graphs • GAN trials generate new proteins/molecules with: • Better target properties • More variety • Less time/cost to generation than other models/human generation
  • 13. Case 4: Public Health Campaigns • Many recent infectious diseases that can be spread from person to person: • Ebola • COVID-19 • HIV • Issues with traditional generation of video and poster messaging to address behaviors contributing to spread • Time to create script, image, and translations for local populations • Lives lost in delays
  • 14. Coupling Generators • Generate culturally-relevant images • Generate text • Translate text to local languages
  • 16. Potential Bad Behaviors • Deep fakes • Fake news • Biased data • Hallucinations and jailbreaks • Manipulation of algorithm by text engineering
  • 17. Open-Source Resources • https://huggingface.co/models • https://www.craiyon.com/ • https://github.com/TopoXLab/TopoGAN-ECCV2020 • https://github.com/Biomatter-Designs/ProteinGAN