The imperative of making Big Data work not only in English. How to process multilingual data? The need and setup of an European Language Cloud. A chance and window for European Big Data industry.
Presented at the Workshop on Multilingual Data Value Chains in the Digital Single Market, Brussels, Jan 2015.
2. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 2
Language-neutral Digital Single Market
Do we really have a language
problem? Surely not at this
conference.
Handout Note:
3. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 3
Language Is Important, Isn’t it?
Biggest Exporter per Target Country – @BrilliantMaps
Exporting requires speaking
customers’ languages. But
product localization has been
done for decades and is not what
the Digital Single Market is about.
Handout Note:
4. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 4
VP’s Vision – Our Challenge
“Consumers need to be able
to buy the best products
at the best prices,
wherever they are in Europe.”
Vice-President Ansip, Dec 2014
Accelerating growth through a connected Europe:
Speech at GSMA Mobile 360 conference in Brussels
http://europa.eu/rapid/press-release_SPEECH-14-2420_en.htm
5. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 5
Vision Broken already by a Simple Search
<search string>
“Rasenmäher”
A simple search and consumers
are already caught in their
language silo and market.
They will not find the best product
at the best price.
Handout Note:
6. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 6
Customers/Citizens Control Content
KNOWLEDGE
Content has become increasingly
democratized. Customers/citizens
write in their mother tongue.
Who mines the knowledge in this
multilingual data will win.
Handout Note:
7. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 7
BigBig = UnstructuredBig = Unstructured = TextBig = Unstructured = Text = Multilingual
8. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 8
Is Machine Translation the Answer?
Machine
Translation
Text
Analytics
Attention: MT quality depends on available resources in language
and domain. Inaccuracies multiply in the process.
0.8 MT x 0.8 Sentiment Analysis = 0.64 hit rate!
9. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 9
Multilingual
Knowledge
System
Processing Multilingual Data
Insights & Sense & Sentiment
Language Detection
NLP/Tokenization
ML
TextAnalytics
Search
MT
Provenance
10. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 10
Innovation Space European Language Cloud
For companies who process text
the European Language Cloud is a
web-based set of APIs that
provides the basic functionality to
build and market products for all
languages of the DSM and Europe’s
main trading partners.
Unlike previous incomplete
attempts to solve multilingualism
ELC provides easy-to-use API calls
in a reliable base quality under the
same favorable terms.
11. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 11
European Language Cloud Stakeholder
Member
States
European
Language Cloud
maintain
Language Resources
SMEs
grow
Global Market
serve
Institutions, Big Biz
useoperates
Non-profit
Industry Assn
promotescoordinates
bootstraps
12. Jan 16, 2015 Multilingual Data Value Chains in the Digital Single Market @JochenHummel 12
Turning a Challenge into a Win
If we manage, in spite of our many cultures and
languages, to create a Digital Single Market and
cross-border eGov, we will become the fittest
for the global market.
13. Manage Enterprise Knowledge
Globally Across Languages
13
Jochen Hummel
m jochen@coreon.com
c +49 172 766 66 33
s jochen.hummel
t @jochenhummel
l Berlin-Mitte
Editor's Notes
Because if you are not a programmer you are rather interested in content
Especially since Content has become democratized
Control over content is shifting from enterprises and governments to customers and citizens.
And people create content in their mother tongue.
Number crunching is done. We have the algorithms and the computing power.
Text crunching, however, hasn’t progressed much in the last decades.
Text mining tools work only well in English.
Machine Translation is not the solution, but will be rather a side product of solving multilingualism.