Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Austin Data 2014 
Grounding Text 
Jason Baldridge 
@jasonbaldridge 
Associate Professor Co-founder & Chief Scientist 
Frid...
What does “barbecue” mean? 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 14
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 1...
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 1...
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 1...
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 1...
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 1...
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 1...
What I thought semantics was before 2005 
From: John Enrico and Jason Baldridge. 2011. Possessor Raising, Demonstrative Ra...
Updated perspective a la Ray Mooney (UT Austin CS) 
http://www.cs.utexas.edu/users/ml/slides/chen-icml08.ppt 
© 2012 Jason...
http://www.lib.Travel at the Turn of the 20th Century utexas.edu/books/travel/index.html 
© 2012 Jason M Baldridge Text An...
Motivation: Google Lit Trips [http://www.googlelittrips.com/] 
http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/1...
Motivation: Google Lit Trips [http://www.googlelittrips.com/] 
http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/1...
Crisis response: Haiti earthquake 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
7 
Friday, September 5, 14
Crisis response: Haiti earthquake 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
7 
Friday, September 5, 14
Look, Mom, no hands! (Err, um... no metadata.) 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
8 
Friday, Sept...
Look, Mom, no hands! (Err, um... no metadata.) 
Topics with a clear, circumscribed 
© 2013 Jason M Baldridge Text Analytic...
But, of course, metadata is now plentiful. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
9 
Friday, Septembe...
Geotagged Wikipedia 
30° 17′ N 97° 44′ W 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
10 
Friday, September...
01:55:55 RT @USER_dc5e5498: Drop and give me 50.... 
05:09:29 I said u got a swisher from redmond!? He said nah kirkland! ...
01:55:55 RT @USER_dc5e5498: Drop and give me 50.... 
05:09:29 I said u got a swisher from redmond!? He said nah kirkland! ...
Document geolocation: where is this person? 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
12 
Friday, Septem...
Language modeling approach 
Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ... 
Frankfurt, Frechen, Hürth, Brühl, Wesse...
Language modeling approach 
Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ... 
Frankfurt, Frechen, Hürth, Brühl, Wesse...
Where’s a word on Earth? 
beach mountain 
wine barbecue 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
Friday...
Where’s a word on Earth? 
beach mountain 
wine barbecue 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
Friday...
Locations of Twitter users are not uniformly distributed! 
(Small) GeoUT (Twitter) plotted 
on Google Earth, one pin per u...
k-d tree for geotagged Wikipedia, looking at N. America 
Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: 
Supervised ...
k-d tree for geotagged Wikipedia, looking at N. America 
Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: 
Supervised ...
Pre-grid clustering [Erik Skiles, MA thesis, UT Austin, Ling] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
...
Four clusters on GeoUT (390 million tweets) 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
18 
Friday, Septem...
Four clusters on GeoUT (390 million tweets) 
All tweets 
West coast East coast Midwest & South Spanish language 
© 2013 Ja...
Automatic document geolocation 
[Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011] ...
Automatic document geolocation 
[Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011] ...
Image geo-location: http://graphics.cs.cmu.edu/projects/im2gps/ 
© 2010 Jason M Baldridge Text Analytics Summit, June 2013...
Performance (kd-tree with clustering) 
Wikipedia (entire world) 
Half of documents geotagged within 12 km of truth 
Percen...
Hierarchical geo-location with logistic regression 
Wing & Baldridge 2014: Hierarchical Discriminative Classification for ...
Performance (kd-tree with clustering) 
Twitter (USA) 
Half of users geotagged within 170 km of truth 
Percent of documents...
Hierarchical logistic regression beats flat naive Bayes 
Accuracy @ 161 km, kd-tree grid 
Naive Bayes Hierarchical LR 
© 2...
Logistic regression weights good features heavily 
© 2010 Jason M Baldridge Text Analytics Summit, June 2013 
25 
Friday, ...
Toponym (place name) resolution 
They visit Portland every year. 
© 2013 Jason M Baldridge Text Analytics Summit, June 201...
Toponym (place name) resolution 
They visit Portland every year. 
© 2013 Jason M Baldridge Text Analytics Summit, June 201...
Toponym resolution in context 
Although Elisha Newman made the first land entry in the township of Portland (June, 
1833),...
Spatial minimality 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
28 
Although Elisha Newman made the first l...
Spatial minimality 
GeoNames 
4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/In...
Spatial minimality 
Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not ...
Spatial minimality 
Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not ...
Spatial minimality often fails 
I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. ...
Spatial minimality often fails 
I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. ...
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikip...
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikip...
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikip...
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikip...
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikip...
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikip...
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikip...
Results: disambiguating toponyms 
TR-CoNLL 
Reuters News Texts 
August 1996 
Perseus Civil War Corpus 
Books 
Late 19th Ce...
Identifying, disambiguating, and displaying toponyms 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
32 
Frida...
Back to grounding 
Grounding often involves connecting text to 
knowledge sources and other modalities (image, video) 
& b...
Back to grounding 
Grounding often involves connecting text to 
knowledge sources and other modalities (image, video) 
& b...
Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)] 
© 2013 Jason M Baldridge Text Analytics Sum...
Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)] 
© 2013 Jason M Baldridge Text Analytics Sum...
He says, she says http://www.tweetolife.com/gender/ 
Friday, September 5, 14
Temporality of words, by hour http://www.tweetolife.com/hour/ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
...
Temporality of words, by hour http://www.tweetolife.com/hour/ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
...
Temporality of expressions, by day: http://www.google.com/trends 
© 2013 Jason M Baldridge Text Analytics Summit, June 201...
Temporality of expressions, by day: http://www.google.com/trends 
© 2013 Jason M Baldridge Text Analytics Summit, June 201...
Temporality of expressions, by year: http://ngrams.googlelabs.com/ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2...
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
200...
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
200...
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
200...
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
200...
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
200...
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
200...
More modalities: videos [Motwani & Mooney, 2012] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
40 
Friday, S...
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 201...
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 201...
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 201...
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 201...
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 201...
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 201...
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 201...
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 201...
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 201...
Combining distributional models with logics 
Erk (2013): “Towards a semantics for distributional representations.” 
Garret...
Multi-component structured vector-space models 
the children visit the beach 
visit 
children beach 
© 2013 Jason M Baldri...
Language learning in context [Kim & Mooney, 2013] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
44 
Friday, ...
Language learning in context [Kim & Mooney, 2013] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
44 
Friday, ...
All your meaning are belong to us 
Friday, September 5, 14
All your meaning are belong to us 
Friday, September 5, 14
All your meaning are belong to us 
Friday, September 5, 14
Grounding matters 
http://davidrothman.net/2009/09/02/all-your-healthbase-are-belong-to-us-want-em-back/ 
Friday, Septembe...
Open Source Software (Scala/Java) 
Junto - label propagation 
https://github.com/scalanlp/junto 
Textgrounder - document g...
Open Source Software (Scala/Java) 
Junto - label propagation 
https://github.com/scalanlp/junto 
Textgrounder - document g...
Open Source Software (Scala/Java) 
Junto - label propagation 
https://github.com/scalanlp/junto 
Textgrounder - document g...
This research was sponsored by: 
Grant from the Grant: W911NF-10-1-0533 
Morris Memorial Trust Fund 
Final note: Whitman h...
Document geolocation 
Supervision 
- documents labeled with latitude & longitude 
Methods 
- Language Modeling for Informa...
Toponym resolution 
Supervision 
- indirectly acquired toponym annotations using a gazeteer 
and geo-annotated Wikipedia 
...
Upcoming SlideShare
Loading in …5
×

Grounding Text

Jason Baldridge, Co-Founder and Chief Scientist at People Pattern and Associate Professor of Computational Linguistics at the University of Texas at Austin, shares recent research of his from UT Austin on text-based geolocation using Wikipedia, Twitter and other sources.

  • Be the first to comment

Grounding Text

  1. 1. Austin Data 2014 Grounding Text Jason Baldridge @jasonbaldridge Associate Professor Co-founder & Chief Scientist Friday, September 5, 14
  2. 2. What does “barbecue” mean? © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  3. 3. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  4. 4. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  5. 5. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  6. 6. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  7. 7. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  8. 8. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  9. 9. What I thought semantics was before 2005 From: John Enrico and Jason Baldridge. 2011. Possessor Raising, Demonstrative Raising, Quantifier Float and Number Float in Haida. International Journal of American Linguistics. 77(2):185-218 © 2012 Jason M Baldridge Text Analytics Summit, June 2013 3 Friday, September 5, 14
  10. 10. Updated perspective a la Ray Mooney (UT Austin CS) http://www.cs.utexas.edu/users/ml/slides/chen-icml08.ppt © 2012 Jason M Baldridge Text Analytics Summit, June 2013 4 Friday, September 5, 14
  11. 11. http://www.lib.Travel at the Turn of the 20th Century utexas.edu/books/travel/index.html © 2012 Jason M Baldridge Text Analytics Summit, June 2013 5 Friday, September 5, 14
  12. 12. Motivation: Google Lit Trips [http://www.googlelittrips.com/] http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/11/1_The_Grapes_of_Wrath_by_John_Steinbeck.html © 2013 Jason M Baldridge Text Analytics Summit, June 2013 6 Grapes of Wrath in Google Earth Text Friday, September 5, 14
  13. 13. Motivation: Google Lit Trips [http://www.googlelittrips.com/] http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/11/1_The_Grapes_of_Wrath_by_John_Steinbeck.html © 2013 Jason M Baldridge Text Analytics Summit, June 2013 6 Grapes of Wrath in Google Earth Text Friday, September 5, 14
  14. 14. Crisis response: Haiti earthquake © 2013 Jason M Baldridge Text Analytics Summit, June 2013 7 Friday, September 5, 14
  15. 15. Crisis response: Haiti earthquake © 2013 Jason M Baldridge Text Analytics Summit, June 2013 7 Friday, September 5, 14
  16. 16. Look, Mom, no hands! (Err, um... no metadata.) © 2013 Jason M Baldridge Text Analytics Summit, June 2013 8 Friday, September 5, 14
  17. 17. Look, Mom, no hands! (Err, um... no metadata.) Topics with a clear, circumscribed © 2013 Jason M Baldridge Text Analytics Summit, June 2013 8 geographic focus emerge! Friday, September 5, 14
  18. 18. But, of course, metadata is now plentiful. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 9 Friday, September 5, 14
  19. 19. Geotagged Wikipedia 30° 17′ N 97° 44′ W © 2013 Jason M Baldridge Text Analytics Summit, June 2013 10 Friday, September 5, 14
  20. 20. 01:55:55 RT @USER_dc5e5498: Drop and give me 50.... 05:09:29 I said u got a swisher from redmond!? He said nah kirkland! Lol..ooooooooOkay! 05:57:35 Lmao!:) havin a good ol time after work! Unexpected! #goodtimes 06:00:09 RT @USER_d5d93fec: #letsbereal .. No seriously, #letsbereal>>lol. Don't start. 06:00:37 On my way to get @USER_60939380 yeee! She want some of this strawberry! Sexy! ... 47°31’41’’ N 122°11’52’’ W © 2013 Jason M Baldridge Text Analytics Summit, June 2013 11 Geotagged Twitter Friday, September 5, 14
  21. 21. 01:55:55 RT @USER_dc5e5498: Drop and give me 50.... 05:09:29 I said u got a swisher from redmond!? He said nah kirkland! Lol..ooooooooOkay! 05:57:35 Lmao!:) havin a good ol time after work! Unexpected! #goodtimes 06:00:09 RT @USER_d5d93fec: #letsbereal .. No seriously, #letsbereal>>lol. Don't start. 06:00:37 On my way to get @USER_60939380 yeee! She want some of this strawberry! Sexy! ... 47°31’41’’ N 122°11’52’’ W © 2013 Jason M Baldridge Text Analytics Summit, June 2013 11 Geotagged Twitter Friday, September 5, 14
  22. 22. Document geolocation: where is this person? © 2013 Jason M Baldridge Text Analytics Summit, June 2013 12 Friday, September 5, 14
  23. 23. Language modeling approach Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ... Frankfurt, Frechen, Hürth, Brühl, Wesseling, ... Wing & Baldridge 2011: Simple supervised document geolocation with geodesic grids. © 2013 Jason M Baldridge Text 13 Analytics Summit, June 2013 Friday, September 5, 14
  24. 24. Language modeling approach Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ... Frankfurt, Frechen, Hürth, Brühl, Wesseling, ... Wing & Baldridge 2011: Simple supervised document geolocation with geodesic grids. © 2013 Jason M Baldridge Text 13 Analytics Summit, June 2013 Friday, September 5, 14
  25. 25. Where’s a word on Earth? beach mountain wine barbecue © 2013 Jason M Baldridge Text Analytics Summit, June 2013 Friday, September 5, 14
  26. 26. Where’s a word on Earth? beach mountain wine barbecue © 2013 Jason M Baldridge Text Analytics Summit, June 2013 Friday, September 5, 14
  27. 27. Locations of Twitter users are not uniformly distributed! (Small) GeoUT (Twitter) plotted on Google Earth, one pin per user. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 15 Density of (all) documents in GeoUT over the USA (390 million tweets) Friday, September 5, 14
  28. 28. k-d tree for geotagged Wikipedia, looking at N. America Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 16 Friday, September 5, 14
  29. 29. k-d tree for geotagged Wikipedia, looking at N. America Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 16 Friday, September 5, 14
  30. 30. Pre-grid clustering [Erik Skiles, MA thesis, UT Austin, Ling] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 17 Friday, September 5, 14
  31. 31. Four clusters on GeoUT (390 million tweets) © 2013 Jason M Baldridge Text Analytics Summit, June 2013 18 Friday, September 5, 14
  32. 32. Four clusters on GeoUT (390 million tweets) All tweets West coast East coast Midwest & South Spanish language © 2013 Jason M Baldridge Text Analytics Summit, June 2013 18 Friday, September 5, 14
  33. 33. Automatic document geolocation [Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011] Friday, September 5, 14
  34. 34. Automatic document geolocation [Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011] Friday, September 5, 14
  35. 35. Image geo-location: http://graphics.cs.cmu.edu/projects/im2gps/ © 2010 Jason M Baldridge Text Analytics Summit, June 2013 Friday, September 5, 14
  36. 36. Performance (kd-tree with clustering) Wikipedia (entire world) Half of documents geotagged within 12 km of truth Percent of documents within 166km (100 miles): 91% Twitter (USA) Half of users geotagged within 330 km of truth Percent of documents within 166km (100 miles): 40% For better or worse, it soon might not matter whether you have location turned on or not... what you say is where you are / are from. (Also, other factors, e.g. who you are linked to, of course.) © 2010 Jason M Baldridge Text Analytics Summit, June 2013 21 Friday, September 5, 14
  37. 37. Hierarchical geo-location with logistic regression Wing & Baldridge 2014: Hierarchical Discriminative Classification for Text-Based Geolocation. © 2010 Jason M Baldridge Text Analytics Summit, June 2013 22 Friday, September 5, 14
  38. 38. Performance (kd-tree with clustering) Twitter (USA) Half of users geotagged within 170 km of truth Percent of documents within 166km (100 miles): 49% Twitter (World) Half of users geotagged within 490 km of truth Percent of documents within 166km (100 miles): 31% Flickr (entire world) Half of documents geotagged within 18 km of truth Percent of documents within 166km (100 miles): 66% © 2010 Jason M Baldridge Text Analytics Summit, June 2013 23 Friday, September 5, 14
  39. 39. Hierarchical logistic regression beats flat naive Bayes Accuracy @ 161 km, kd-tree grid Naive Bayes Hierarchical LR © 2010 Jason M Baldridge Text Analytics Summit, June 2013 24 Twitter USA Twitter World Flickr English Wikipedia German Wikipedia Portuguese Wikipedia 36.2 49.2 28.7 31.3 58.5 66.0 84.5 88.9 89.3 90.2 77.1 89.5 Friday, September 5, 14
  40. 40. Logistic regression weights good features heavily © 2010 Jason M Baldridge Text Analytics Summit, June 2013 25 Friday, September 5, 14
  41. 41. Toponym (place name) resolution They visit Portland every year. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 26 Friday, September 5, 14
  42. 42. Toponym (place name) resolution They visit Portland every year. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 26 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Which Portland? (Also: Canada, Australia, Ireland...) Friday, September 5, 14
  43. 43. Toponym resolution in context Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 27 Friday, September 5, 14
  44. 44. Spatial minimality © 2013 Jason M Baldridge Text Analytics Summit, June 2013 28 Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. Friday, September 5, 14
  45. 45. Spatial minimality GeoNames 4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-15 4084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-15 4127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-14 4169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-15 4217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-05 4277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-15 4305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-15 4305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-15 4305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15 404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-29 4521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-29 4650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-14 4720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-14 4841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-14 4871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-14 4906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-15 5006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-14 5746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 28 Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. Friday, September 5, 14
  46. 46. Spatial minimality Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. GeoNames 4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-15 4084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-15 4127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-14 4169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-15 4217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-05 4277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-15 4305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-15 4305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-15 4305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15 404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-29 4521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-29 4650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-14 4720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-14 4841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-14 4871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-14 4906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-15 5006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-14 5746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14 Toponym # Locations © 2013 Jason M Baldridge Text Analytics Summit, June 2013 28 Ann Arbor Detroit Ionia Lyons Portland White Pigeon 1 >7 >4 >15 >17 1 Friday, September 5, 14
  47. 47. Spatial minimality Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. GeoNames 4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-15 4084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-15 4127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-14 4169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-15 4217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-05 4277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-15 4305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-15 4305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-15 4305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15 404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-29 4521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-29 4650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-14 4720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-14 4841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-14 4871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-14 4906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-15 5006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-14 5746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14 Ionia Lyons Toponym # Locations © 2013 Jason M Baldridge Text Analytics Summit, June 2013 28 Portland White Pigeon Ann Arbor Detroit Ionia Lyons Portland White Pigeon 1 >7 >4 >15 >17 1 Friday, September 5, 14
  48. 48. Spatial minimality often fails I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. By far, Ashville is more hip, especially West Asheville. Asheville has a lot in common with Portland. Austin, I've never been to so I cannot comment. But what makes a place cool and hip, in my opinion are that give a area "punch". There are a lot of ingredients. One is geography. Add a college or university (and all that they bring- and draw), good restaurants, a good music scene, a progressive attitude and tolerance. Hmmm. I'm sure there are many more to ponder. But that's my start. Oh, lots of bars! From: http://www.city-data.com/forum/austin/1694181-what-makes-city-like-austin-portland-3.html City-data.com incorrectly marks “West” and “Portland” as the cities in Texas -- presumably because of their textual and spatial proximity to “Austin”. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 29 Friday, September 5, 14
  49. 49. Spatial minimality often fails I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. By far, Ashville is more hip, especially West Asheville. Asheville has a lot in common with Portland. Austin, I've never been to so I cannot comment. But what makes a place cool and hip, in my opinion are that give a area "punch". There are a lot of ingredients. One is geography. Add a college or university (and all that they bring- and draw), good restaurants, a good music scene, a progressive attitude and tolerance. Hmmm. I'm sure there are many more to ponder. But that's my start. Oh, lots of bars! From: http://www.city-data.com/forum/austin/1694181-what-makes-city-like-austin-portland-3.html City-data.com incorrectly marks “West” and “Portland” as the cities in Texas -- presumably because of their textual and spatial proximity to “Austin”. But: it is clear from the text that Portland, Oregon and Austin, Texas are the referents, though their states are never mentioned and are far from the other locations! © 2013 Jason M Baldridge Text Analytics Summit, June 2013 29 Friday, September 5, 14
  50. 50. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  51. 51. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  52. 52. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  53. 53. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  54. 54. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  55. 55. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  56. 56. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. P(Portland-OR|music) > P(Portland-ME|music) P(Portland-OR|wharf ) < P(Portland-ME|wharf ) © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  57. 57. Results: disambiguating toponyms TR-CoNLL Reuters News Texts August 1996 Perseus Civil War Corpus Books Late 19th Century © 2013 Jason M Baldridge Text Analytics Summit, June 2013 31 Average error distance Accuracy Average error distance Accuracy Population SPIDER (spatial minimality) WISTR (Wiki supervised) SPIDER +WISTR 216 81.0 1749 59.7 2180 30.9 266 57.5 279 82.3 855 69.1 430 81.8 201 85.9 Take-home message: text classifiers are very effective & can be boosted by spatial minimality algorithms. Friday, September 5, 14
  58. 58. Identifying, disambiguating, and displaying toponyms © 2013 Jason M Baldridge Text Analytics Summit, June 2013 32 Friday, September 5, 14
  59. 59. Back to grounding Grounding often involves connecting text to knowledge sources and other modalities (image, video) & bootstrapping. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 33 Friday, September 5, 14
  60. 60. Back to grounding Grounding often involves connecting text to knowledge sources and other modalities (image, video) & bootstrapping. Also, they can help us create models for deeper aspects of language, such as syntactic structure and logical form. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 33 Friday, September 5, 14
  61. 61. Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 34 Friday, September 5, 14
  62. 62. Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 34 Friday, September 5, 14
  63. 63. He says, she says http://www.tweetolife.com/gender/ Friday, September 5, 14
  64. 64. Temporality of words, by hour http://www.tweetolife.com/hour/ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 36 Friday, September 5, 14
  65. 65. Temporality of words, by hour http://www.tweetolife.com/hour/ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 36 Friday, September 5, 14
  66. 66. Temporality of expressions, by day: http://www.google.com/trends © 2013 Jason M Baldridge Text Analytics Summit, June 2013 37 Friday, September 5, 14
  67. 67. Temporality of expressions, by day: http://www.google.com/trends © 2013 Jason M Baldridge Text Analytics Summit, June 2013 37 Friday, September 5, 14
  68. 68. Temporality of expressions, by year: http://ngrams.googlelabs.com/ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 38 slave trenches aircraft war Friday, September 5, 14
  69. 69. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  70. 70. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  71. 71. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  72. 72. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  73. 73. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  74. 74. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  75. 75. More modalities: videos [Motwani & Mooney, 2012] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 40 Friday, September 5, 14
  76. 76. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  77. 77. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  78. 78. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  79. 79. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  80. 80. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  81. 81. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  82. 82. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  83. 83. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  84. 84. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  85. 85. Combining distributional models with logics Erk (2013): “Towards a semantics for distributional representations.” Garrette et al (2012): “A formal approach to linking logical form and vector-space lexical semantics” Beltagy et al (2013): “Montague Meets Markov: Deep Semantics with Probabilistic Logical Form” © 2013 Jason M Baldridge Text Analytics Summit, June 2013 42 Friday, September 5, 14
  86. 86. Multi-component structured vector-space models the children visit the beach visit children beach © 2013 Jason M Baldridge Text Analytics Summit, June 2013 43 Agent Patient Friday, September 5, 14
  87. 87. Language learning in context [Kim & Mooney, 2013] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 44 Friday, September 5, 14
  88. 88. Language learning in context [Kim & Mooney, 2013] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 44 Friday, September 5, 14
  89. 89. All your meaning are belong to us Friday, September 5, 14
  90. 90. All your meaning are belong to us Friday, September 5, 14
  91. 91. All your meaning are belong to us Friday, September 5, 14
  92. 92. Grounding matters http://davidrothman.net/2009/09/02/all-your-healthbase-are-belong-to-us-want-em-back/ Friday, September 5, 14
  93. 93. Open Source Software (Scala/Java) Junto - label propagation https://github.com/scalanlp/junto Textgrounder - document geolocation https://github.com/utcompling/textgrounder Fieldspring - toponym resolution https://github.com/utcompling/fieldspring Low-resource POS tagging https://github.com/dhgarrette/low-resource-pos- tagging-2013 Updown - polarity classification https://github.com/scalanlp/junto OpenNLP - machine learning / NLP http://opennlp.apache.org/ Friday, September 5, 14
  94. 94. Open Source Software (Scala/Java) Junto - label propagation https://github.com/scalanlp/junto Textgrounder - document geolocation https://github.com/utcompling/textgrounder Fieldspring - toponym resolution https://github.com/utcompling/fieldspring Low-resource POS tagging https://github.com/dhgarrette/low-resource-pos- tagging-2013 Updown - polarity classification https://github.com/scalanlp/junto OpenNLP - machine learning / NLP http://opennlp.apache.org/ Chalk - NLP https://github.com/scalanlp/chalk Nak - machine learning https://github.com/scalanlp/nak Breeze - linear algebra https://github.com/scalanlp/nak ScalaNLP Friday, September 5, 14
  95. 95. Open Source Software (Scala/Java) Junto - label propagation https://github.com/scalanlp/junto Textgrounder - document geolocation https://github.com/utcompling/textgrounder Fieldspring - toponym resolution https://github.com/utcompling/fieldspring Low-resource POS tagging https://github.com/dhgarrette/low-resource-pos- tagging-2013 Updown - polarity classification https://github.com/scalanlp/junto OpenNLP - machine learning / NLP http://opennlp.apache.org/ Chalk - NLP https://github.com/scalanlp/chalk Nak - machine learning https://github.com/scalanlp/nak Breeze - linear algebra https://github.com/scalanlp/nak ScalaNLP Friday, September 5, 14
  96. 96. This research was sponsored by: Grant from the Grant: W911NF-10-1-0533 Morris Memorial Trust Fund Final note: Whitman had it right many years ago! - Walt Whitman, A Song of the Rolling Earth (in Leaves of Grass) Friday, September 5, 14
  97. 97. Document geolocation Supervision - documents labeled with latitude & longitude Methods - Language Modeling for Information Retrieval Code - Textgrounder: https://github.com/utcompling/textgrounder Publications - Stephen Roller, Mike Speriosu, Sarat Rallapalli, Benjamin Wing and Jason Baldridge. 2012. Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. EMNLP 2012. Jeju, Korea. - Benjamin Wing and Jason Baldridge. 2011. Simple supervised document geolocation with geodesic grids. In Proceedings of ACL HLT 2011. Friday, September 5, 14
  98. 98. Toponym resolution Supervision - indirectly acquired toponym annotations using a gazeteer and geo-annotated Wikipedia Methods - logistic regression - named entity recognition Code - Fieldspring: https://github.com/utcompling/fieldspring Publications - Mike Speriosu and Jason Baldridge. Text-Driven Toponym Resolution using Indirect Supervision. To appear in proceedings of ACL 2013. Friday, September 5, 14

    Be the first to comment

    Login to see the comments

  • OmriSadoun

    May. 30, 2018

Jason Baldridge, Co-Founder and Chief Scientist at People Pattern and Associate Professor of Computational Linguistics at the University of Texas at Austin, shares recent research of his from UT Austin on text-based geolocation using Wikipedia, Twitter and other sources.

Views

Total views

2,037

On Slideshare

0

From embeds

0

Number of embeds

337

Actions

Downloads

16

Shares

0

Comments

0

Likes

1

×