SlideShare a Scribd company logo
NLP: Challenges and Opportunities in
Underserved Areas
Colleen M. Farrelly, Machine Learning Lead
Natural Language
Processing
• Many applications of text data
• Customer feedback
• Legal documents
• Job search/other search engines
• Image captions
• Product titles
• Need to wrangle text into matrix form in many
applications
• Embeddings
• Parts-of-speech counts
• Sentiment analysis results
Common Tools:
Sentiment Analysis
• Understand positive/negative/neutral tone of text
data
• Expansion to other emotions:
• Anger
• Sadness
• Surprise
• Some uses:
• Identifying customer churn
• Evaluating educational interventions
• Predicting clinical outcomes
• Some packages exist for some languages and
applications.
• Other languages or emotions require custom code
and dictionaries.
Common Tools:
Embeddings
Embeddings
Capture relative frequency of word use within a text
and across texts
• Can use down-weighting to ignore common words like “a” or
“the”
• Don’t capture context well in the simple versions
• She bolted the door shut.
• She bolted out the door.
Pretrained encoder/decoder neural networks that
can capture context
• BERT
• GPT-3
Most pretrained models only support a limited
number of languages (though have ways of training
a similar model on a new language corpus)…
Consider the
apps you use
every day…
Now imagine
they didn’t
exist in your
language…
NLP Needs in
Underserved
Areas
• Translation and speech-to-text for
unsupported languages (Hausa,
Lingala, Quechua…)
• Sentiment dictionaries for unsupported
languages/emotional nuances of the
language
• NLP-powered apps (search engines,
matching/recommenders, symptom
checkers, conversational agents…)
• Language preservation of endangered
languages
• 308 highly endangered ones just in
Africa
Market Size for NLP Applications
• Worldwide NLP market projected
to grow from $21B in 2021 to
$127B by 2028.
• South America and Africa are
mostly ignored markets for NLP-
backed technology in healthcare,
travel, retail, education, and other
markets.
• Local companies and universities
are currently trying to meet market
needs.
Caveats…
• Collecting the data
• Existing sources, creating written sources for non-written languages (3074 of 7139 languages that exist)
• Capturing speech tone variety, storing large audio files for non-written languages
• Getting large enough sample sizes from endangered languages (Domari in Northern Africa/Middle East)
• Ownership of data
• Foreign corporations? Governments? Universities? Local speakers?
• Biases and misuses
• Unintentional translation issues from non-native speakers reviewing technology (ex. diseases/symptoms)
• Lack of representation in languages targeted/training in NLP (wealthy world vs. developing world)
• Use of technologies to spread conflict (companies, world powers, neighboring countries… interfering)
Case Studies: Recent
Collaborations
Sub-Saharan Africa
Customized
Dictionaries and
Embeddings
• AfroLeadership
• Crowd-source local language
sentiment dictionaries, writing
samples for embeddings…
• Led by students and researchers
at local Cameroonian universities
• Hausa Hackathon
• Non-profit initiative to build
corpus/dictionaries and build
applications to support the Hausa
language
• Hackathons for Hausa speakers
and NLP professionals interested
in Hausa applications
• Masakhane
• Non-profit collaboration of NLP
researchers in Africa
• Broad set of target languages
Companies Powered by NLP
• Mpuza Inc
• Job matching app connecting companies and job
seekers
• Powered by NLP-based matching engine
• Caveat of needing filters for extremism recruiting:
• Rwanda history and neighboring DRC violence
• Need to identify extremist recruitment job posts
• Name changes of extremist groups
• Concealed recruitment/threats…
• False positives for human rights and security positions
Miami’s Unique Position
Questions…
• How many familiar with NLP?
• How many lived in another country
as a child?
• How many interested in making
money or making a social good
impact?
We’re positioned accelerate NLP
development for underserved populations.
Starting companies
Volunteering time
Creating NLP hackathons
All from where we are in Miami…
Contact Information
cfarrelly@med.miami.edu

More Related Content

Similar to NLP: Challenges and Opportunities in Underserved Areas

introduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).pptintroduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).ppt
TemesgenTolcha2
 
Language Access for Legal Aid Websites
Language Access for Legal Aid WebsitesLanguage Access for Legal Aid Websites
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
AmanBadesra1
 
Pedagogical uses of translation
Pedagogical uses of translationPedagogical uses of translation
Pedagogical uses of translationMapiLop
 
Holmes MS Thesis Defense
Holmes MS Thesis DefenseHolmes MS Thesis Defense
Holmes MS Thesis Defenseraphey2
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
www.myassignmenthelp.net
 
J Raphael Holmes MS Thesis Defense
J Raphael Holmes MS Thesis DefenseJ Raphael Holmes MS Thesis Defense
J Raphael Holmes MS Thesis Defenseraphey2
 
JR Holmes MS Thesis Defense
JR Holmes MS Thesis DefenseJR Holmes MS Thesis Defense
JR Holmes MS Thesis Defenseraphey2
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language
Dr. Amit Kumar Jha
 
Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan ČernockýDolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Security Session
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Dr.Badriya Al Mamari
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
botsplash.com
 
Reflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptxReflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptx
E.A. Draffan
 
Power point presentation
Power point presentationPower point presentation
Power point presentation
dianalynn630
 
1. reason why study spl
1. reason why study spl1. reason why study spl
1. reason why study spl
Zambales National High School
 
5810 day 3 sept 20 2014
5810 day 3 sept 20 2014 5810 day 3 sept 20 2014
5810 day 3 sept 20 2014
SVTaylor123
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA DATASCIENCE
 

Similar to NLP: Challenges and Opportunities in Underserved Areas (20)

introduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).pptintroduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).ppt
 
Language Access for Legal Aid Websites
Language Access for Legal Aid WebsitesLanguage Access for Legal Aid Websites
Language Access for Legal Aid Websites
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 
Pedagogical uses of translation
Pedagogical uses of translationPedagogical uses of translation
Pedagogical uses of translation
 
Tech ppt. 1
Tech ppt. 1Tech ppt. 1
Tech ppt. 1
 
Holmes MS Thesis Defense
Holmes MS Thesis DefenseHolmes MS Thesis Defense
Holmes MS Thesis Defense
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
J Raphael Holmes MS Thesis Defense
J Raphael Holmes MS Thesis DefenseJ Raphael Holmes MS Thesis Defense
J Raphael Holmes MS Thesis Defense
 
JR Holmes MS Thesis Defense
JR Holmes MS Thesis DefenseJR Holmes MS Thesis Defense
JR Holmes MS Thesis Defense
 
CoLing 2016
CoLing 2016CoLing 2016
CoLing 2016
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language
 
Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan ČernockýDolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
 
Call for upload
Call for uploadCall for upload
Call for upload
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Reflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptxReflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptx
 
Power point presentation
Power point presentationPower point presentation
Power point presentation
 
1. reason why study spl
1. reason why study spl1. reason why study spl
1. reason why study spl
 
5810 day 3 sept 20 2014
5810 day 3 sept 20 2014 5810 day 3 sept 20 2014
5810 day 3 sept 20 2014
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 

More from Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
Colleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
Colleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
Colleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
Colleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
Colleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
Colleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
Colleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
Colleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
Colleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
Colleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
Colleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
Colleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
Colleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
Colleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
Colleen Farrelly
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
Colleen Farrelly
 

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
 

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

NLP: Challenges and Opportunities in Underserved Areas

  • 1. NLP: Challenges and Opportunities in Underserved Areas Colleen M. Farrelly, Machine Learning Lead
  • 2. Natural Language Processing • Many applications of text data • Customer feedback • Legal documents • Job search/other search engines • Image captions • Product titles • Need to wrangle text into matrix form in many applications • Embeddings • Parts-of-speech counts • Sentiment analysis results
  • 3. Common Tools: Sentiment Analysis • Understand positive/negative/neutral tone of text data • Expansion to other emotions: • Anger • Sadness • Surprise • Some uses: • Identifying customer churn • Evaluating educational interventions • Predicting clinical outcomes • Some packages exist for some languages and applications. • Other languages or emotions require custom code and dictionaries.
  • 5. Embeddings Capture relative frequency of word use within a text and across texts • Can use down-weighting to ignore common words like “a” or “the” • Don’t capture context well in the simple versions • She bolted the door shut. • She bolted out the door. Pretrained encoder/decoder neural networks that can capture context • BERT • GPT-3 Most pretrained models only support a limited number of languages (though have ways of training a similar model on a new language corpus)…
  • 6. Consider the apps you use every day… Now imagine they didn’t exist in your language…
  • 7. NLP Needs in Underserved Areas • Translation and speech-to-text for unsupported languages (Hausa, Lingala, Quechua…) • Sentiment dictionaries for unsupported languages/emotional nuances of the language • NLP-powered apps (search engines, matching/recommenders, symptom checkers, conversational agents…) • Language preservation of endangered languages • 308 highly endangered ones just in Africa
  • 8. Market Size for NLP Applications • Worldwide NLP market projected to grow from $21B in 2021 to $127B by 2028. • South America and Africa are mostly ignored markets for NLP- backed technology in healthcare, travel, retail, education, and other markets. • Local companies and universities are currently trying to meet market needs.
  • 9. Caveats… • Collecting the data • Existing sources, creating written sources for non-written languages (3074 of 7139 languages that exist) • Capturing speech tone variety, storing large audio files for non-written languages • Getting large enough sample sizes from endangered languages (Domari in Northern Africa/Middle East) • Ownership of data • Foreign corporations? Governments? Universities? Local speakers? • Biases and misuses • Unintentional translation issues from non-native speakers reviewing technology (ex. diseases/symptoms) • Lack of representation in languages targeted/training in NLP (wealthy world vs. developing world) • Use of technologies to spread conflict (companies, world powers, neighboring countries… interfering)
  • 11. Customized Dictionaries and Embeddings • AfroLeadership • Crowd-source local language sentiment dictionaries, writing samples for embeddings… • Led by students and researchers at local Cameroonian universities • Hausa Hackathon • Non-profit initiative to build corpus/dictionaries and build applications to support the Hausa language • Hackathons for Hausa speakers and NLP professionals interested in Hausa applications • Masakhane • Non-profit collaboration of NLP researchers in Africa • Broad set of target languages
  • 12. Companies Powered by NLP • Mpuza Inc • Job matching app connecting companies and job seekers • Powered by NLP-based matching engine • Caveat of needing filters for extremism recruiting: • Rwanda history and neighboring DRC violence • Need to identify extremist recruitment job posts • Name changes of extremist groups • Concealed recruitment/threats… • False positives for human rights and security positions
  • 14. Questions… • How many familiar with NLP? • How many lived in another country as a child? • How many interested in making money or making a social good impact?
  • 15. We’re positioned accelerate NLP development for underserved populations. Starting companies Volunteering time Creating NLP hackathons All from where we are in Miami…