The document discusses challenges and opportunities in natural language processing for underserved areas and languages. It outlines how tools like sentiment analysis and word embeddings are not available for many languages and can help applications in healthcare, education, and other areas. The document also presents some recent collaborations aimed at developing NLP resources for languages in Africa through creating custom dictionaries and datasets.
Natural Language Processing: L01 introductionananth
This presentation introduces the course Natural Language Processing (NLP) by enumerating a number of applications, course positioning, challenges presented by Natural Language text and emerging approaches to topics like word representation.
Natural Language Processing: L01 introductionananth
This presentation introduces the course Natural Language Processing (NLP) by enumerating a number of applications, course positioning, challenges presented by Natural Language text and emerging approaches to topics like word representation.
In this webinar we discuss some of the things that need to be taken into consideration when making your website accessible in languages other than English. We spend a good amount of time going over the challenges and benefits of increasing accessibility and discuss the role machine translation.
myassignmenthelp is premier service provider for NLP related assignments and projects. Given PPT describes processes involved in NLP programming.so whenever you need help in any work related to natural language processing feel free to get in touch with us.
Reflections on building a Multi-country AAC Implementation Guide.pptxE.A. Draffan
UNICEF with the Global Symbols team supported by local professionals working with
AAC users, their families and carers set out to collaboratively provide an implementation guide based on their experiences in several Eastern European countries. The aim
of the guide was to illustrate work already being undertaken in the area and to ensure the sharing of knowledge
and resources where gaps were discovered.
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
A brief overview of generative AI technologies and their use for social good initiatives, including cultural training, medical image generation, drug design, and public health.
PyData Global 2023 talk overviewing case studies in network science, including stock market crash prediction, food price pattern mining, and stopping the spread of epidemics.
More Related Content
Similar to NLP: Challenges and Opportunities in Underserved Areas
In this webinar we discuss some of the things that need to be taken into consideration when making your website accessible in languages other than English. We spend a good amount of time going over the challenges and benefits of increasing accessibility and discuss the role machine translation.
myassignmenthelp is premier service provider for NLP related assignments and projects. Given PPT describes processes involved in NLP programming.so whenever you need help in any work related to natural language processing feel free to get in touch with us.
Reflections on building a Multi-country AAC Implementation Guide.pptxE.A. Draffan
UNICEF with the Global Symbols team supported by local professionals working with
AAC users, their families and carers set out to collaboratively provide an implementation guide based on their experiences in several Eastern European countries. The aim
of the guide was to illustrate work already being undertaken in the area and to ensure the sharing of knowledge
and resources where gaps were discovered.
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
A brief overview of generative AI technologies and their use for social good initiatives, including cultural training, medical image generation, drug design, and public health.
PyData Global 2023 talk overviewing case studies in network science, including stock market crash prediction, food price pattern mining, and stopping the spread of epidemics.
Overview of mathematical and machine learning models related to climate risk modeling, climate change simulations, and change point detection. Includes a hands-on session with geometry-based systems analysis of food prices related to climate change and geopolitical factors.
WiDS Workshop on natural language processing and generative AI. Details common methods that tie into coding examples. Ends with ethics discussion regarding these technologies and potential for misuse.
Link to talk YouTube: https://www.youtube.com/watch?v=byGzKm0H1-8&list=PLHAk3jHXWpxI7fHw8m5PhrpSRpR3NIjQo&index=3
ODSC-East 2023 presentation covering topics related to my book, The Shape of Data, including how geometry plays a role in text/image embeddings, network science problems, survey data analytics, image analytics, and epidemic wrangling.
This talk overviews my background as a female data scientist, introduces many types of generative AI, discusses potential use cases, highlights the need for representation in generative AI, and showcases a few tools that currently exist.
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
The tools possible to leverage for public health interventions has changed significantly in the past decades. Tools from geometry, natural language processing, and generative AI allow for a quick design and implementation of interventions, even in very rural parts of the world. Case studies involve HIV, Ebola, and COVID interventions.
WoComToQC workshop lecture on Forman-Ricci curvature for applications in industry (social networks, disaster logistics, spatial data, and spatiotemporal goods pricing data).
PyData Global talk covering tools from geometry/topology and their uses in public health, public policy, and social good initiatives. Examples include food price prediction, COVID policies, public health interventions, and fair AI.
Data Science Dojo Talk on comparing time series using persistent homology. Short overview of time series data. A bit of topology. Code available. Example includes stock exchange data.
Statistical and topological algorithm piece of an Applied Machine Learning Days Morocco talk. Covers ARIMA models, SSA models, GEE models, and persistent homology. Applications include pricing data, stock data, development data, and healthcare data. Datasets and full presentation can be found on GitHub: https://github.com/gabayae/Time-Series-Applications_AMLD2022
An introduction to quantum machine learning.pptxColleen Farrelly
Very basic introduction to quantum computing given at Indaba Malawi 2022. Overviews some basic hardware in classical and quantum computing, as well as a few quantum machine learning algorithms in use today. Resources for self-study provided.
Indaba Malawi workshop on basic approaches to time series data, including ARIMA models and SSA models. Example in R includes an agricultural example from historical Malawi data with Rssa package and base ARIMA models.
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
Women in Data Science (Alexandria, Egypt) keynote address. Topics cover my journey into data science/machine learning, an overview of data science as a profession, and some case studies on topology/geometry in analytics. Example case studies include insurance, natural language processing, social network analysis, and psychometrics.
WiDS Alexandria, Egypt workshop in topological data analysis (Python and R code available on request), covering persistent homology, the Mapper algorithm, and discrete Ricci curvature. Examples include text data and social network data.
First part of a workshop looking at industry case studies in natural language processing for From Theory to Practice Workshop (AIMS, Kigali, March 2022).
SAS Global 2021 Introduction to Natural Language Processing Colleen Farrelly
Overview of text data, processing of text data, integration of text data with structured databases, and uses of text data in analytics across a variety of fields. Here's the talk link: https://www.youtube.com/watch?v=wS0X1bSsuUU
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
NLP: Challenges and Opportunities in Underserved Areas
1. NLP: Challenges and Opportunities in
Underserved Areas
Colleen M. Farrelly, Machine Learning Lead
2. Natural Language
Processing
• Many applications of text data
• Customer feedback
• Legal documents
• Job search/other search engines
• Image captions
• Product titles
• Need to wrangle text into matrix form in many
applications
• Embeddings
• Parts-of-speech counts
• Sentiment analysis results
3. Common Tools:
Sentiment Analysis
• Understand positive/negative/neutral tone of text
data
• Expansion to other emotions:
• Anger
• Sadness
• Surprise
• Some uses:
• Identifying customer churn
• Evaluating educational interventions
• Predicting clinical outcomes
• Some packages exist for some languages and
applications.
• Other languages or emotions require custom code
and dictionaries.
5. Embeddings
Capture relative frequency of word use within a text
and across texts
• Can use down-weighting to ignore common words like “a” or
“the”
• Don’t capture context well in the simple versions
• She bolted the door shut.
• She bolted out the door.
Pretrained encoder/decoder neural networks that
can capture context
• BERT
• GPT-3
Most pretrained models only support a limited
number of languages (though have ways of training
a similar model on a new language corpus)…
7. NLP Needs in
Underserved
Areas
• Translation and speech-to-text for
unsupported languages (Hausa,
Lingala, Quechua…)
• Sentiment dictionaries for unsupported
languages/emotional nuances of the
language
• NLP-powered apps (search engines,
matching/recommenders, symptom
checkers, conversational agents…)
• Language preservation of endangered
languages
• 308 highly endangered ones just in
Africa
8. Market Size for NLP Applications
• Worldwide NLP market projected
to grow from $21B in 2021 to
$127B by 2028.
• South America and Africa are
mostly ignored markets for NLP-
backed technology in healthcare,
travel, retail, education, and other
markets.
• Local companies and universities
are currently trying to meet market
needs.
9. Caveats…
• Collecting the data
• Existing sources, creating written sources for non-written languages (3074 of 7139 languages that exist)
• Capturing speech tone variety, storing large audio files for non-written languages
• Getting large enough sample sizes from endangered languages (Domari in Northern Africa/Middle East)
• Ownership of data
• Foreign corporations? Governments? Universities? Local speakers?
• Biases and misuses
• Unintentional translation issues from non-native speakers reviewing technology (ex. diseases/symptoms)
• Lack of representation in languages targeted/training in NLP (wealthy world vs. developing world)
• Use of technologies to spread conflict (companies, world powers, neighboring countries… interfering)
11. Customized
Dictionaries and
Embeddings
• AfroLeadership
• Crowd-source local language
sentiment dictionaries, writing
samples for embeddings…
• Led by students and researchers
at local Cameroonian universities
• Hausa Hackathon
• Non-profit initiative to build
corpus/dictionaries and build
applications to support the Hausa
language
• Hackathons for Hausa speakers
and NLP professionals interested
in Hausa applications
• Masakhane
• Non-profit collaboration of NLP
researchers in Africa
• Broad set of target languages
12. Companies Powered by NLP
• Mpuza Inc
• Job matching app connecting companies and job
seekers
• Powered by NLP-based matching engine
• Caveat of needing filters for extremism recruiting:
• Rwanda history and neighboring DRC violence
• Need to identify extremist recruitment job posts
• Name changes of extremist groups
• Concealed recruitment/threats…
• False positives for human rights and security positions
14. Questions…
• How many familiar with NLP?
• How many lived in another country
as a child?
• How many interested in making
money or making a social good
impact?
15. We’re positioned accelerate NLP
development for underserved populations.
Starting companies
Volunteering time
Creating NLP hackathons
All from where we are in Miami…