Successfully reported this slideshow.

More Related Content

The Voice of Time

  1. 1. The Voice of Time Michael Falk University of Kent and Western Sydney University
  2. 2. 1. Australia in the Pacific 2. A Romanticist’s Contribution 3. Workflow: harvest, clean, decipher
  3. 3. 1 | Australia in the Pacific Source: AFP
  4. 4. Source: Patrick Kirch, On the Road of the Winds: An Archaeological History of the Pacific Islands before European Contact (Oakland: University of California Press, 2017), p. 6.
  5. 5. Sources: Fishhook map and photo: Val Attenbrow, ‘Aboriginal fishing in Port Jackson’, in The Natural History of Sydney (Sydney, 2010); Dingo photograph: Henry Whitehead - Original photograph, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=12057483; Sahul and Sundaland map: Kirch, p. 57; Linguistic data: Rachel Hendery (personal communication), and POLLEX database. Language Word Hawaiian kapu NZ Maori tapu Proto southern Vanuatu *tabur Gugu Yimidhirr thabul
  6. 6. 2 | A Romanticist’s Contribution
  7. 7. Dunlop’s transcription: Nge a runba wonung bulkirra umbilinto bulwarra; Pital burra kultan wirripang buntoa Modern reconstruction (Wafer 2017, p. 204): ngayaranpa wanang palkirr yampilintu pulwarra pital para katan wiripang pantuwa Dunlop’s poetic translation: Our home is the gibber-gunyah Where hill joins hill on high; … And the rushing of wings, as the wangas pass, Sweeps the wallaby’s print from the glistening grass. Modern literal translation (Wafer 2017, p. 206): Ours is the place where the mountains cohabit with the heights The eaglehawks and wallabies are happy TheSydneyMorningHerald,11Oct1848
  8. 8. 3 | Workflow: Harvest, Clean, Decipher © BBC
  9. 9. Harvest Clean Decipher Encoder-Decoder Text Correction Model Language Classification Model Train Train ALTA dataset of 6000 hand- corrected articles (Cassidy and Mollá, 2017) ??? InferCleaned tokensCleanTokenised text Local PostgreSQL Database Untold Riches Public API
  10. 10. Google Colab
  11. 11. RNN Basics: A single time-step RNN Cell (this time- step) i 0, 0, 0, … 1, … 0, 0, 0 ‘one- hot’ vector RNN Cell (previous time-step) 0.99 0.24 0.01 ... c<t-1>(n-1) c<t-1>(n) o<t> RNN Cell (next time- step) c<t> c<t>: the ‘memory cell’ at time-step t. It is updated each time and saved for the next time-step o<t>: the ‘output’ at time-step t. k n
  12. 12. Model design #1: A general English-language model RNN Cell k RNN Cell i RNN Cell n RNN Cell d RNN Cell y RNN Cell n RNN Cell e RNN Cell E RNN Cell S σ σ σ σ σ σ σ σ σ σ The ‘softmax activation function’ guesses the next letter based on the output of the cell, returning a vector of probabilities, e.g.: (P(a)=0.01, P(b)=0.4, P(c)=0.02, … P(z)=0.001, P(S)=0.1, P(E)=0.002)
  13. 13. The Problem
  14. 14. Model design #2: Binary classification model RNN Cell S The ‘softmax activation function’ predicts whether the whole word is English or Australian and simply outputs a two-vector, e.g.: (P(English)=0.37, P(Australian)=0.63) RNN Cell RNN Cell k RNN Cell RNN Cell i RNN Cell RNN Cell n RNN Cell RNN Cell e RNN Cell RNN Cell E RNN Cell … … … Concatenate σ
  15. 15. Problems and Promises

Editor's Notes

  • Kiribati, Cook Islands, Tonga and Solomon Islands
  • Green = training set, blue = validation set
  • ×