Introduction to Multilingual Retrieval Augmented Generation (RAG)
COMRADES summary
1. www.comrades-project.euFunded by the Horizon 2020 Framework Programme of the European UnionFunded by the Horizon 2020 Framework Programme of the European Union
www.comrades-project.eu
Summary of COMRADES
Harith Alani
Project Coordinator
The Open University
1
2. www.comrades-project.euFunded by the Horizon 2020 Framework Programme of the European Union
COMRADES Tools
• CREES - Crisis Event Extraction Service
chrome.google.com/webstore/detail/crisis-event-extraction-
s/jekdamaeeejebcccbgleijlfamjcbilc
• EMINA - Emergent Informativeness and Actionability
github.com/GateNLP/emina
• TwitIE - Twitter Information Extraction pipeline
gate.ac.uk/wiki/twitie.html
• YODIE - Yet another Open Data Information Extraction system
gate.ac.uk/applications/yodie.html
• Veracity classifier cloud.gate.ac.uk/shopfront/displayItem/rumour-
veracity
• COMRADES Ushahidi platform https://comrades.ushahidi.com.
2
3. www.comrades-project.euFunded by the Horizon 2020 Framework Programme of the European Union
30 PEER-REVIEWED PUBLICATIONS
30 scientific publications
27 deliverables
3
0 5 10 15 20 25 30 35 40 45
ASSET
CAPTOR
CHAINREACT
COMMONFARE
COMRADES
CROWD4ROADS
HACKAIR
MAZI
NEXTLEAP
PROFIT
PTWIST
SAVINGFOOD
SOCRATIC
STARS4ALL
Scientific Outputs and Other Resources
4. www.comrades-project.euFunded by the Horizon 2020 Framework Programme of the European Union
ENGAGING STAKEHOLDERS
(exceeding DoA)
RESPONDERS
POLICE
MAKERS
REPORTERS
DEPLOYERS
DoA
4
6. www.comrades-project.euFunded by the Horizon 2020 Framework Programme of the European Union
CROSS-CRISES CLASSIFICATION MODELS
(exceeding DoA)
6
HURRICANE
HARVEY
HURRICANE
IRMA
KERALA
FLOODS
LOMBOK
EARTHQUAKE
How can we train
models to become
less biased towards
specific disaster
events, or type of
events?
?
TYPHOON
TRAIN CRASH BOMBING
MASS SHOOTING
?
?
Classification
Model
DoA
7. www.comrades-project.euFunded by the Horizon 2020 Framework Programme of the European Union
CROSS-LANGUAGE AI MODELS (exceeding DoA)
Monolingual Classification
with Monolingual Models
Cross-lingual Classification
with Monolingual Models
Train the model on one language and
test it on data in the same language.
For example, train and test on data
written in English. This is the default
approach, and can be used as a
baseline.
Run the classifiers on crisis data in
languages that were not observed in
the training data. For example, we
test the classifier on Italian when the
classifier was trained on English or
Spanish.
Cross-lingual Classification
with Machine Translation
Train the classification model on data
in a certain language (e.g. Spanish),
and use it to classify data that has
been automatically translated from
other languages (e.g., Italian and
English) into the language of the
training data.
7
Experimented with 6 languages: English,
Italian , Spanish, French, German, and
Portuguese
Evaluated classifiers with multiple features,
languages, and type of crises, resulting in a
total of 1152 experiments
DoA
8. www.comrades-project.euFunded by the Horizon 2020 Framework Programme of the European Union
GOOGLE CHROME PLUGINS (exceeding DoA)
8
CREES automatically processes short texts in a Google
sheet, and identifies if a text is about a crisis, crisis-types
and information-types
Uses Deep Learning methods
Google Sheet Add-on
DoA
9. www.comrades-project.euFunded by the Horizon 2020 Framework Programme of the European Union
• Photo of consortium?
9
”.. I would suggest, then, that the formula for
the next 10,000 start-ups is very, very simple,
which is to take x and add AI. That is the
formula, that's what we're going to be doing.
And that is the way in which we're going to
make this second Industrial Revolution”
Kevin Kelly, IBM
CREES
EMINA
TwitIE
YODIE
VERACITY
COMRADES
PLATFORM
VOLUME
VALUE
VARIETY
VALIDITY
VELOCITY