Timo Honkela
19 May 2014
Digital Preservation and Computational
Modeling of Language and Culture:
Some Philosophical and E...
Background
Natural language database interface
with dependency-based compositional semantics
● H. Jäppinen, T. Honkela, H. Hyötyniemi...
Classical example: Learning meaning from context:
Maps of words in Grimm fairy tales
Honkela, Pulkki & Kohonen 1995
Map of Finnish Science
Chemistry
Physics and
engineering
Biosciences
Medicine
Culture and
society
WordICA
Timo Honkela, Aapo Hyvärinen, and Jaakko Väyrynen. WordICA - Emergence of
linguistic representations for words by ...
Learning taxonomies
Mari-Sanna Paukkeri, Alberto Pérez García-Plaza,
Víctor Fresno, Raquel Martínez Unanue and Timo
Honkel...
Central
Interests:
Contextuality
and
Subjectivity
Meaning is contextual
red wine
red skin
red shirt
Gärdenfors: Conceptual Spaces
Hardin: Color for Philosophers
Meaning is contextual
SNOW -
WHITE?
WHITE
Meaning is contextual
● “Small”, “big”
● “White house”
● “Get”
● “Every” - “Every Swede is tall/blond”
● etc. etc.
Another...
Meaning is subjective
Meaning is subjective
● Good
● Fair
● Useful
● Scientific
● Democratic
● Sustainable
● etc.
A proper theory of
meaning has...
Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating
processes of concept formation and ...
Digital
Humanities
Digital humanities
● Research within humanities
with the help of computers
– Digital resources
– Computational models
● Ba...
Digital Computational
Humanities
Content
storage and
transfer
Content
analysis
● Heinz von Foerster
in “Responsibilities of Compentence” (1972):
“The hard sciences are successful because
they deal with...
Tieteenalat järjestettynä
hakemusten englanninkielisten
osuuksien suhteellisen määrän mukaan(*)
Matematiikka 95.3
Farmasia...
Eläinlääketiede 88.5
Kansanterveystiede 88.1
Kielitieteet 87.6
Filosofia 87.3
Liiketaloustiede, talousmaantiede ja tuotant...
Accessing and analyzing
digital resources
Archives
Libraries
Universities
Citizens
Researchers
Media
DIGITAL
RESOURCES
Museums
Teachers
Artists
Companies
Societies
...
Texts
Images
Videos
Computational
models
Numerical
data
DIGITAL RESOURCES
Speeches/
convers.
Multimedia
documents
Interact...
Resource Meta data
DIGITAL RESOURCES
Resources
Content and
information
professional
Users of
the contents
(professionals
and lay people)
Machine learning
and
p...
Resources
Users of
the contents
(professionals
and lay people)
Other forms of description
Crowdsourcing
Importance
of open...
Resources
Machine learning
and
pattern recognition
systems
Formal metadata Other forms of description
Classification
Clust...
Challenge:
A tension between
the usability and standardization
of content descriptions
and
richness and evolution of
langu...
Computational
Methods and
Tools
Mainframe computers
Personal computers
Internet
Multimedia
Virtual reality
World wide web
Social media
MOOCs
Mobile device...
...
Statistics
Information theory
Probability theory
Dynamical systems theory
...
Implications of machine learning
● Machines are not anymore simply doing
what they are programmed to do
● Machine learning...
Theories
Data
Models Hypotheses
Conceptual systems
Melissa Bowerman
Max Planck Institute for
Psycholinguistics
Space under Construction
Language-Specific Spatial Categorizat...
DUTCH
INOP AAN
INOP AAN
OPEN
open
boxopen
dooropen
bagopen
envelope
open
mouthopen clamshell
open pair of
shutters
open
latched
drawer
open hand
o...
(Pye 1995, 1996)
PLATE STICK ROPE CLOTHES
può puòduàn
(long rigid
thing)
MANDARIN può
-q’upi:j
(other hard
thing)
rach’aqi...
Processing
multimodal information
Acknowledgements:
Finnish Broadcasting Company (YLE)
An example of automatic multimedia content analysis
users.ics.aalto.f...
Speaker
recognition
Video analysis / scene classification
Speech recognition
(speech to text)
Video analysis / scene classification
Speaker
recognition
Speech recognition
(speech to text)
OCR
Movement verbs
David Bailey's thesis (1997):
Verbs related to hand movement
Point of view from
cognitive linguistics
● The meaning of linguistic symbols in the mind of the
language users derives fro...
Abstract vs concrete grounding
Ronald Langacker
Multimodally Grounded Language Technology
A project funded by Academy of Finland
2011-2014
Timo Honkela as the Principal I...
Consider how different languages
divide the conceptual space
in different ways
(cf. e.g. Melissa Bowerman et al.) Förger &...
Analysis of
subjectivity
GICA: Grounded Intersubjective
Concept Analysis
Analysis of “health” in the
State of the Union addresses
Subjects on objects in contexts:
Using GICA method to quantify
ep...
Thank you for your attention!
Upcoming SlideShare
Loading in …5
×

Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

441 views

Published on

A presentation in the symposium “Interfaces between Language, Literature and Culture: 
Research at Department of Modern Languages” at University of Helsinki, 19th of May, 2014

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
441
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

  1. 1. Timo Honkela 19 May 2014 Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects timo.honkela@helsinki.fi Symposium “Interfaces between Language, Literature and Culture: Research at Department of Modern Languages”
  2. 2. Background
  3. 3. Natural language database interface with dependency-based compositional semantics ● H. Jäppinen, T. Honkela, H. Hyötyniemi & A. Lehtola (1988): A Multilevel Natural Language Processing Model. Nordic Journal of Linguistics 11:69-87. What is the turnover of the ten largest stock exchange companies in forestry? Morphological analysis Dependency parsing Logical analysis Database query formation Result from the SQL database
  4. 4. Classical example: Learning meaning from context: Maps of words in Grimm fairy tales Honkela, Pulkki & Kohonen 1995
  5. 5. Map of Finnish Science Chemistry Physics and engineering Biosciences Medicine Culture and society
  6. 6. WordICA Timo Honkela, Aapo Hyvärinen, and Jaakko Väyrynen. WordICA - Emergence of linguistic representations for words by independent component analysis. Natural Language Engineering, 16(3):277–308, 2010. Jaakko J. Väyrynen, Lasse Lindqvist, and Timo Honkela. Sparse distributed representations for words with thresholded independent component analysis. In Proceedings of IJCNN'07, pages 1031–1036, 2007.
  7. 7. Learning taxonomies Mari-Sanna Paukkeri, Alberto Pérez García-Plaza, Víctor Fresno, Raquel Martínez Unanue and Timo Honkela (2012). Learning a taxonomy from a set of text documents. Applied Soft Computing, 12(3), pp. 1138--1148.
  8. 8. Central Interests: Contextuality and Subjectivity
  9. 9. Meaning is contextual red wine red skin red shirt Gärdenfors: Conceptual Spaces Hardin: Color for Philosophers
  10. 10. Meaning is contextual SNOW - WHITE? WHITE
  11. 11. Meaning is contextual ● “Small”, “big” ● “White house” ● “Get” ● “Every” - “Every Swede is tall/blond” ● etc. etc. Another comment: Strict compositionality cannot be assumed Fuzziness
  12. 12. Meaning is subjective
  13. 13. Meaning is subjective ● Good ● Fair ● Useful ● Scientific ● Democratic ● Sustainable ● etc. A proper theory of meaning has to take this into account
  14. 14. Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating processes of concept formation and communication. Journal of Economic Methodology, 15(3):245–259, 2008. Intermediate conclusion ● Languages, including formal languages, should be considered as tools for coordination, storing and sharing knowledge in a compressed form – approximate and relative to the point of view taken ● Constructing a language or symbol system is an investment and spreading the language into use in a community is even a larger one
  15. 15. Digital Humanities
  16. 16. Digital humanities ● Research within humanities with the help of computers – Digital resources – Computational models ● Basic motivation – One can already fly to moon and build sophisticated factory products – The most important open questions in the world are related to humanities and social sciences
  17. 17. Digital Computational Humanities Content storage and transfer Content analysis
  18. 18. ● Heinz von Foerster in “Responsibilities of Compentence” (1972): “The hard sciences are successful because they deal with the soft problems; the soft sciences are struggling because they deal with the hard problems”
  19. 19. Tieteenalat järjestettynä hakemusten englanninkielisten osuuksien suhteellisen määrän mukaan(*) Matematiikka 95.3 Farmasia 94.1 Kemia 93.7 Fysiikka 93.4 Biokemia, molekyylibiologia, mikrobiologia, perinnöllisyystiede ja biotekniikka 93.4 Solu- ja kehitysbiologia, fysiologia ja ekofysiologia 93.4 Tietojenkäsittelytieteet 93.0 Sähkötekniikka ja elektroniikka 92.8 Ympäristötekniikka 92.7 Geotieteet 92.1 Ekologia, evoluutiotutkimus ja systematiikka 92.1 Kone- ja valmistustekniikka 91.9 Metsätieteet 91.4 Avaruustieteet ja tähtitiede 91.0 Prosessi- ja materiaalitekniikka 90.8 Tilastotiede 90.7 Muu ympäristön ja luonnonvarojen tutkimus 90.1 (*) Suomen Akatemialle osoitettujen hakemusten korpuksessa
  20. 20. Eläinlääketiede 88.5 Kansanterveystiede 88.1 Kielitieteet 87.6 Filosofia 87.3 Liiketaloustiede, talousmaantiede ja tuotantotalous 87.2 Hammaslääketiede 86.7 Kansantaloustiede 86.3 Rakennus- ja yhdyskuntatekniikka 85.9 Maatalous- ja elintarviketieteet 85.4 Ympäristöpolitiikka, -talous ja -oikeus 85.3 Maantiede 84.8 Arkkitehtuuri ja teollinen muotoilu 83.7 Viestintä- ja informaatiotieteet 83.1 Kasvatustiede 82.6 Valtio-oppi ja hallintotiede 82.2 Taiteiden tutkimus 81.6 Sosiaalitieteet 80.4 Kulttuurien tutkimus 79.3 Historia ja arkeologia 78.1 Teologia 77.0 Oikeustiede 70.8
  21. 21. Accessing and analyzing digital resources
  22. 22. Archives Libraries Universities Citizens Researchers Media DIGITAL RESOURCES Museums Teachers Artists Companies Societies Municipalities State Decision makers Journalists Information specialists
  23. 23. Texts Images Videos Computational models Numerical data DIGITAL RESOURCES Speeches/ convers. Multimedia documents Interactive systems Computer software
  24. 24. Resource Meta data DIGITAL RESOURCES
  25. 25. Resources Content and information professional Users of the contents (professionals and lay people) Machine learning and pattern recognition systems Formal metadata Language technology resources and systems Other forms of description
  26. 26. Resources Users of the contents (professionals and lay people) Other forms of description Crowdsourcing Importance of openness
  27. 27. Resources Machine learning and pattern recognition systems Formal metadata Other forms of description Classification Clustering Importance of the availability of data
  28. 28. Challenge: A tension between the usability and standardization of content descriptions and richness and evolution of language and its interpretation, genre and style variation, and contextuality, subjectivity and cultural dependence
  29. 29. Computational Methods and Tools
  30. 30. Mainframe computers Personal computers Internet Multimedia Virtual reality World wide web Social media MOOCs Mobile devices Cloud services Games and gamification 3D printing Big Data Pattern recognition Statistical machine learning Robotics
  31. 31. ... Statistics Information theory Probability theory Dynamical systems theory ...
  32. 32. Implications of machine learning ● Machines are not anymore simply doing what they are programmed to do ● Machine learning algorithms are programs in the traditional sense but they enable evolving “behaviors” of the system based on the “experience” that the system gathers after having been programmed ● This makes it possible for the systems to have a certain level of “conceptual autonomy”: they build their view on some phenomena based on the data/texts/etc. that are given to them
  33. 33. Theories Data Models Hypotheses
  34. 34. Conceptual systems
  35. 35. Melissa Bowerman Max Planck Institute for Psycholinguistics Space under Construction Language-Specific Spatial Categorization In First Language Acquisition Lund University Cognitive Science 2003
  36. 36. DUTCH INOP AAN INOP AAN
  37. 37. OPEN open boxopen dooropen bagopen envelope open mouthopen clamshell open pair of shutters open latched drawer open hand open book eyes open open fan Categorization of `opening’ in English and Korean. 'tear away from base' YELTA 'remove barrier tointerior space' PPAYTA ‘unfit’ TTUTA ‘rise’ PELLITA 'separate two parts symmetrically' take off wallpaper unwrap package spread legs apart take off ring take cassette out of case sun rises spread blanket out peacock spreads tail 'spread out flat thing' TTUT A PHYELCHITA
  38. 38. (Pye 1995, 1996) PLATE STICK ROPE CLOTHES può puòduàn (long rigid thing) MANDARIN può -q’upi:j (other hard thing) rach’aqij (“tear”) -tóqopi’j (long, flexible thing) -paxi:j (rock, glass, clay thing) K’ICHE’ MAYAN tear, ripbreakENGLISH breakbreak http://www.mpi.nl/people/bowerman-melissa http://www.mpi.nl/people/bowerman-melissa/publications
  39. 39. Processing multimodal information
  40. 40. Acknowledgements: Finnish Broadcasting Company (YLE) An example of automatic multimedia content analysis users.ics.aalto.fi/jorma/ scholar.google.com/citations?user=suHzeyIAAAAJ&hl=en users.ics.aalto.fi/mikkok/ elec.aalto.fi/en/about/careers/professors/mikko_kurimo/ Jorma Laaksonen Mikko Kurimo
  41. 41. Speaker recognition Video analysis / scene classification Speech recognition (speech to text)
  42. 42. Video analysis / scene classification Speaker recognition Speech recognition (speech to text) OCR
  43. 43. Movement verbs
  44. 44. David Bailey's thesis (1997): Verbs related to hand movement
  45. 45. Point of view from cognitive linguistics ● The meaning of linguistic symbols in the mind of the language users derives from the users' sensory perceptions, their actions with the world and with each other. ● For example: the meaning of the word 'walk' involves – what walking looks like – what it feels like to walk and after having walked – how the world looks when walking (e.g. objects approach at a certain speed, etc.). – ...
  46. 46. Abstract vs concrete grounding Ronald Langacker
  47. 47. Multimodally Grounded Language Technology A project funded by Academy of Finland 2011-2014 Timo Honkela as the Principal Investigator A collaboration between departments of * Information and Computer Science, and * Media Technology
  48. 48. Consider how different languages divide the conceptual space in different ways (cf. e.g. Melissa Bowerman et al.) Förger & Honkela 2013
  49. 49. Analysis of subjectivity
  50. 50. GICA: Grounded Intersubjective Concept Analysis
  51. 51. Analysis of “health” in the State of the Union addresses Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity. Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar. Proc. of IJCNN 2012.
  52. 52. Thank you for your attention!

×