How to turn global textual information into actionable knowledge. Control over content has shifted from enterprises and governments to customers and citizens. This content is big and multilingual. Who ever manages to mine the knowledge in this contents wins. An architecture for processing multilingual data. How to build a Multilingual Knowledge System with Coreon.
2. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 2
About Me
Founded and exited
Founded and passed on
Manage
Founded and present
Chair
3. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 3
Language-neutral Digital Single Market
Do we really have a language
problem? Surely not at this
conference.
Handout Note:
4. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 4
Language Is Important, Isn’t it?
Biggest Exporter per Target Country – @BrilliantMaps
Exporting requires speaking
customers’ languages. But this
happens only when products hit
customers, often handled by
subsidiaries or even distributors.
Handout Note:
5. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 5
Customers/Citizens Control Content
KNOWLEDGE
But since Web 2.0 content is
increasingly created by
customers/citizens. Companies
curate. Practical, but results in
knowledge being extra-murros.
Handout Note:
6. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 6
BigBig = UnstructuredBig = Unstructured = TextBig = Unstructured = Text = Multilingual
7. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 7
Is Machine Translation the Answer?
Machine
Translation
Text
Analytics
Attention: MT quality depends on available resources in language
and domain. Inaccuracies multiply in the process.
0.8 MT x 0.8 Sentiment Analysis = 0.64 hit rate!
8. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 8
Multilingual
Knowledge
System
Processing Multilingual Information
Insights & Sense & Sentiment
Language Detection
NLP/Tokenization
ML
TextAnalytics
Search
MT
Provenance
9. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 9
A Repository for Knowledge
9
08
45
76
35
17
1: Taxonomy:
output
devices
visual
output
devices
screen
audio
output
devices
head-
phones
10. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 10
A Repository for Knowledge and Language
10
3: Multilingualism:
Synonymy
• screen • monitor
• écran
• Bildschirm• Monitor • Display
08
45
76
rejected
accepted
35
17
2: Synonymy: 1: Taxonomy: 4: Control: 9
12. One System, One View, All Languages:
Concepts, Relations, Terms
12
Immediate broader /
narrower neighborhood
Concept meta
data
Terms and
synonyms
Extensive term
descriptors
Location in map
Alphabetic,
multilingual list
13. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 13
Termbases
• Control language
• Focus on translation
• Lack knowledge modelling
• Only searching, no exploring
Taxonomies
• For knowledge structuring
• Lack multilingualism
• Lack language control
Two Parallel Approaches to Inventorise and
Leverage Knowledge
Huge, unaddressed potential for cross-lingual
data analysis, enterprise search, e-discovery,
and to facilitate interoperability
boost with data
add structure
… …
mirror base Spiegelfuß
wing mirror Außenspiegel
… …
mirror
wing
mirror
left wing
mirror
…
14. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 14
Enterprise
Search
Social
Media
Analytics
Auto-
Classi-
fication
Inter-
operability
HR
Training
Globali-
sation
Responding to Today‘s Business Challenges
15. Dec 5, 2014 Turning Global Textual Information into Actionable Knowledge @JochenHummel 15
Single Digital Market
Search and Matching cross-border.
Target Markets and Use Cases
Global Champions
Optimizing global business processes.
eGov Interoperability
Process information linguistically neutral.
16. Manage Enterprise Knowledge
Globally Across Languages
16
Jochen Hummel
m jochen@coreon.com
c +49 172 766 66 33
s jochen.hummel
t @jochenhummel
l Berlin-Mitte
Editor's Notes
Because if you are not a programmer you are rather interested in content
Especially since Content has become democratized
Control over content is shifting from enterprises and governments to customers and citizens.
And people create content in their mother tongue.
Number crunching is done. We have the algorithms and the computing power.
Text crunching, however, hasn’t progressed much in the last decades.
Text mining tools work only well in English.
Machine Translation is not the solution, but will be rather a side product of solving multilingualism.
Missing Potentials for:
... such as ... researchers, indexers, taxonomists, marketeers, customer support, finance analysts, public officer (e-government), translators, technical writers, etc
Termbases store concepts only one by one fail to master large data, unsafe use
Taxonomies achieve structure but ignore the pragmatics of terminologies fail to manage multilingual data
No structured multilingual resources no crosslingual data analysis nor e-discovery, missed leverage
Enterprise Search: tune engines to deliver smarter results (search keywords expanding to synonyms, other languages, and more general/more specific). Automatic multi-keyword, hierarchical document classfication and tagging – Suche nach écran findet Dokumente die sich mit „optischem Ausgabegerät“ befassen
Interoperability: When two or more concept systems need to come together. EU international institutions where you are having equated players. – annähernd ähnliche Schulabschlüsse für „Realschule“ im Englischen?
Social media analytics: Intelligent bag of words to filter tweets, posts etc. Multilingually. – automatische klassifizieren von vielen posts, mehrsprachig – zur Entscheidungsfindung; geht nur multilingual, geht nur mit semantik
E-discovery: Mining big text data to discover facts that are not written explicitly.
Terminology is the algorithm and engine to cluster and to derive.
(türöffner, sprungbrett, hausmeister, brückenbauer, steigbügel, hebel, brückenpfeiler, fundament)
Globalisierung: Produktion! - Glob
Enterprise Search, Social Media: Magnifying glass, Algorithmus (twitter, facebook)
Auto-classification: documents
Interoperabilität: Landkarte (map)
Mitarbeiterschulung: Wegweiser, Landkarte