The document describes a project called ElfYelp that uses geolocated topic models to analyze patterns in a large Danish folklore corpus collected by Evald Tang Kristensen between 1867-1924. It involved geotagging over 20,000 folklore stories collected from across Denmark. The project aims to explore relationships between place, meaning and context across different regions using techniques from geo-located social recommendation systems. It presents different "macroscopes" or computational analysis tools that have been developed, including WitchHunter, TrollFinder and GhostScope, to visualize topics, terms and storytellers' conceptual maps. The document also discusses using Latent Geographical Topic Analysis and non-Gaussian topic models to characterize regional folk
Part 2 of Book Backdrops..... This is a shell of a presentation I just gave At University of Southern Illinois in Edwardsville, ILL. E-mail me at gpetri@gmail.com if you would like a booksist and other handouts.
Crowdsourcing texts of many dimensionsJustin Tonra
This paper associated with these slides analyses the theoretical and practical implications of crowdsourcing two different kinds of text: transcriptions and annotations. Two projects that adopt the model for these respective purposes are Transcribe Bentham and Ossian Online. They exhibit differing motivations for choosing this model, and aim to crowdsource tasks whose requirements and biases place particular demands and restrictions on participants. As a consequence, the accuracy of the term crowdsource must be questioned for more subjective tasks that require the generation of original intellectual content.
Part 2 of Book Backdrops..... This is a shell of a presentation I just gave At University of Southern Illinois in Edwardsville, ILL. E-mail me at gpetri@gmail.com if you would like a booksist and other handouts.
Crowdsourcing texts of many dimensionsJustin Tonra
This paper associated with these slides analyses the theoretical and practical implications of crowdsourcing two different kinds of text: transcriptions and annotations. Two projects that adopt the model for these respective purposes are Transcribe Bentham and Ossian Online. They exhibit differing motivations for choosing this model, and aim to crowdsource tasks whose requirements and biases place particular demands and restrictions on participants. As a consequence, the accuracy of the term crowdsource must be questioned for more subjective tasks that require the generation of original intellectual content.
From Crowdsourcing to Knowledge CommunitiesJon Voss
Slides from talk entitled From Crowdsourcing to Knowledge Communities: Creating Meaningful Scholarship Through Digital Collaboration
Presented at Museums and the Web 2015, April 9, 2015 (Chicago) and Digital Humanities 2015, July 1, 2015 (Sydney).
Accompanying papers:
http://mw2015.museumsandtheweb.com/paper/from-crowdsourcing-to-knowledge-communities-creating-meaningful-scholarship-through-digital-collaboration/
http://dh2015.org/abstracts/xml/VOSS_Jon_From_Crowdsourcing_to_Knowledge_Communit/VOSS_Jon_From_Crowdsourcing_to_Knowledge_Communities__C.html
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...Peter Broadwell
A discussion of the initial steps taken to assemble a corpus of web-based “fake news” in order to facilitate a massive narrative framework analysis of online misinformation masquerading as news, using a modified version of software previously applied to the study of anti-vaccination narratives. Accompanying the data-gathering discussion is a commentary on how current web-archiving approaches and frameworks might be enhanced to help achieve such research-oriented objectives. This work additionally presents some initial results of small pilot studies conducted to test the narrative analytical techniques that ultimately will be scaled up to the level of millions of online postings. Because these subsequent studies are likely to compare the narrative “shapes” of news stories along a continuum from hoaxes to verifiable reporting, the pilot studies focus on archives of web materials based around two conspiracies: one that turned out to be real, namely, the so-called “Bridgegate” scandal of politically motivated lane closures on the George Washington Bridge, and one that was false: the so-called “Pizzagate” hoax.
The East Asian Studies Macroscope: Infrastructure for Collaborative Scholars...Peter Broadwell
The East Asian Studies Macroscope (EASM) is a joint effort by faculty and staff from the UCLA Department of Asian Languages and Cultures, the UCLA Library, and the UCLA Center for Digital Humanities to build partnerships with institutions in East Asia with significant digitized text archives for the purpose of developing software tools and practices for advanced collaborative research using digital corpora. These efforts build on the field’s notable successes in creating single-corpora digital collections and interfaces, seeking to develop technological infrastructure and methods that can work with multiple corpora held at different institutions.
This talk will review briefly the results of EASM pilot projects conducted with large digitized collections of poetry from the Tang Dynasty and Heian-period Japan. These examples highlight the key infrastructural elements of the proposed platform and their contributions to scholarship: 1) remote, authorized computational access to multiple large-scale corpora, especially those that cannot be shared in full due to their size and/or access restrictions; 2) support for analytical tools that operate across collections, such as multi-corpus topic modeling and network analysis; and 3) features for scholarly collaboration at all stages of the research process, enabling sharing and critiquing of experimental workflows, results, and visualizations.
aiSelections: Computational Techniques for Matching Faculty Research Profiles...Peter Broadwell
Presented at the 5th International Conference on Qualitative and Quantitative Methods in Libraries (QQML), "La Sapienza" University, Rome, Italy, June 6, 2013.
TrollFinder: Geo-Semantic Exploration of a Very Large Corpus of Danish FolklorePeter Broadwell
Presented at The Third Workshop on Computational Models of Narrative (CMN'12), in conjunction with the conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, May 26, 2012.
Social Network Analysis of Collaborative Composition in Film Scoring via the ...Peter Broadwell
Presented at the Music and the Moving Image VIII conference, New York University, May 31, 2013 and the Musical Networks conference, hosted by Echo: A Music-Centered Journal at UCLA, October 19-20, 2012
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
From Crowdsourcing to Knowledge CommunitiesJon Voss
Slides from talk entitled From Crowdsourcing to Knowledge Communities: Creating Meaningful Scholarship Through Digital Collaboration
Presented at Museums and the Web 2015, April 9, 2015 (Chicago) and Digital Humanities 2015, July 1, 2015 (Sydney).
Accompanying papers:
http://mw2015.museumsandtheweb.com/paper/from-crowdsourcing-to-knowledge-communities-creating-meaningful-scholarship-through-digital-collaboration/
http://dh2015.org/abstracts/xml/VOSS_Jon_From_Crowdsourcing_to_Knowledge_Communit/VOSS_Jon_From_Crowdsourcing_to_Knowledge_Communities__C.html
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...Peter Broadwell
A discussion of the initial steps taken to assemble a corpus of web-based “fake news” in order to facilitate a massive narrative framework analysis of online misinformation masquerading as news, using a modified version of software previously applied to the study of anti-vaccination narratives. Accompanying the data-gathering discussion is a commentary on how current web-archiving approaches and frameworks might be enhanced to help achieve such research-oriented objectives. This work additionally presents some initial results of small pilot studies conducted to test the narrative analytical techniques that ultimately will be scaled up to the level of millions of online postings. Because these subsequent studies are likely to compare the narrative “shapes” of news stories along a continuum from hoaxes to verifiable reporting, the pilot studies focus on archives of web materials based around two conspiracies: one that turned out to be real, namely, the so-called “Bridgegate” scandal of politically motivated lane closures on the George Washington Bridge, and one that was false: the so-called “Pizzagate” hoax.
The East Asian Studies Macroscope: Infrastructure for Collaborative Scholars...Peter Broadwell
The East Asian Studies Macroscope (EASM) is a joint effort by faculty and staff from the UCLA Department of Asian Languages and Cultures, the UCLA Library, and the UCLA Center for Digital Humanities to build partnerships with institutions in East Asia with significant digitized text archives for the purpose of developing software tools and practices for advanced collaborative research using digital corpora. These efforts build on the field’s notable successes in creating single-corpora digital collections and interfaces, seeking to develop technological infrastructure and methods that can work with multiple corpora held at different institutions.
This talk will review briefly the results of EASM pilot projects conducted with large digitized collections of poetry from the Tang Dynasty and Heian-period Japan. These examples highlight the key infrastructural elements of the proposed platform and their contributions to scholarship: 1) remote, authorized computational access to multiple large-scale corpora, especially those that cannot be shared in full due to their size and/or access restrictions; 2) support for analytical tools that operate across collections, such as multi-corpus topic modeling and network analysis; and 3) features for scholarly collaboration at all stages of the research process, enabling sharing and critiquing of experimental workflows, results, and visualizations.
aiSelections: Computational Techniques for Matching Faculty Research Profiles...Peter Broadwell
Presented at the 5th International Conference on Qualitative and Quantitative Methods in Libraries (QQML), "La Sapienza" University, Rome, Italy, June 6, 2013.
TrollFinder: Geo-Semantic Exploration of a Very Large Corpus of Danish FolklorePeter Broadwell
Presented at The Third Workshop on Computational Models of Narrative (CMN'12), in conjunction with the conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, May 26, 2012.
Social Network Analysis of Collaborative Composition in Film Scoring via the ...Peter Broadwell
Presented at the Music and the Moving Image VIII conference, New York University, May 31, 2013 and the Musical Networks conference, hosted by Echo: A Music-Centered Journal at UCLA, October 19-20, 2012
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
2. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Evald Tang Kristensen, Danske Sagn vol. 3, no. 108
Told by Jens Bek and Mikkel Hansen in Lille-Tåning
There were a couple of giants that had had a falling out,
one in Borum-Eshøj, and the other over in Hasle høj. So
they were going to beat each other with maces. The one
over on Borum-Eshøj hit first, but his aim was way off and
he made that water hole over by Brabrand that they call
Brabrand Lake. Then he was going to hit over, but he didn’t
have much strength left, and his mace didn’t reach further
than Gjeding Lake. So there was no point in him hitting
anymore. Now the other one was to go at it, and he started
to hit, but was much stronger. He winds up smacking down
over Borum-Eshøj, and the spiked ball comes off his mace,
and it flies further to the west and makes Lading Lake.
3. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Evald Tang Kristensen, Danske Sagn vol. 3, no. 108
Told by Jens Bek and Mikkel Hansen in Lille-Tåning
There were a couple of giants that had had a falling out,
one in Borum-Eshøj, and the other over in Hasle høj. So
they were going to beat each other with maces. The one
over on Borum-Eshøj hit first, but his aim was way off and
he made that water hole over by Brabrand that they call
Brabrand Lake. Then he was going to hit over, but he didn’t
have much strength left, and his mace didn’t reach further
than Gjeding Lake. So there was no point in him hitting
anymore. Now the other one was to go at it, and he started
to hit, but was much stronger. He winds up smacking down
over Borum-Eshøj, and the spiked ball comes off his mace,
and it flies further to the west and makes Lading Lake.
4. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Evald Tang Kristensen’s Danish legends
5. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
• Collected in Denmark
between 1867-1924
• From 3,500 storytellers,
mostly in Jutland
• 20,431 legends mention a
resolvable place name
• 6,423 place names ->
2,126 lat/long pairs
• 14,254 places
mentioned ≠ place told
Evald Tang Kristensen’s Danish legends
and their geographical context
6. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Challenge: geo-topic discovery and exploration
• Explore and analyze the relationships between place, meaning and
context across arbitrary regions of the geo-located ETK corpus
• Use techniques from geo-located social recommendation systems:
Where are the elves? What else is in the area? Are there other
areas like this one? What factors make them similar?
7. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Categories in Evald Tang Kristensen’s
Danish legends
Witches
and their
Sport
Hidden Folk
Witches and
their Sport
8. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Story categories: multi-level indices
Primary categories:
Mound dwellers (Hidden folk)
Elves
Household spirits
Traveling monsters
Water spirits
Wiverns and small creepy-crawlies
Werewolves and nightmares
Religious legends
Death portents
Lights and portents
Heroes and their sport
Churches, monasteries, holy springs
Legends about farms and towns
Diverse place legends
Legends about treasure
Small kings and their feuds... Enemy invasions
Manor lords, ladies and mistresses
Ministers
Diverse people
Robbers, murderers and thieves
Strandings
Plague and illnesses
Secondary categories:
Robber's Christmas eve
On grain, rats and mice
Hidden folk driving or riding
The Devil as a playing companion
Jilted lovers bewitched
Swedes and Poles north of the Limfjord
Destruction of mounds. Animals sick, unrest in
the house
Sand movements
Meadows and swamps
Giant graves
Cessation of the destruction of a mound
Changelings, the old child
Bad ministers
Giants build churches
Giants throw stones at churches
The murdered child, mother with the knife and
washing it or the clothes
Black dogs and the like show themselves
Smiths in mounds
The church's foundations moved
Witches as revenants
9. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Geo-semantic exploration
Borum-Eshøj Hasle høj Brabrand Gjeding Lake Lading Lake
Elves 0 0 0 0 0
Giants and
their sport
0 0 0 0 0
Mound
dwellers
0 0 0 0 0
There were a couple of giants that had had a falling out, one in Borum-
Eshøj, and the other over in Hasle høj. So they were going to beat each
other with maces. The one over on Borum-Eshøj hit first, but his aim was
way off and he made that water hole over by Brabrand that they call
Brabrand Lake. Then he was going to hit over, but he didn’t have much
strength left, and his mace didn’t reach further than Gjeding Lake. So
there was no point in him hitting anymore. Now the other one was to go at
it, and he started to hit, but was much stronger. He winds up smacking
down over Borum-Eshøj, and the spiked ball comes off his mace, and it
flies further to the west and makes Lading Lake.
10. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Geo-semantic exploration
Borum-Eshøj Hasle høj Brabrand Gjeding Lake Lading Lake
Elves 0 0 0 0 0
Giants and
their sport
1 1 1 1 1
Mound
dwellers
0 0 0 0 0
There were a couple of giants that had had a falling out, one in Borum-
Eshøj, and the other over in Hasle høj. So they were going to beat each
other with maces. The one over on Borum-Eshøj hit first, but his aim was
way off and he made that water hole over by Brabrand that they call
Brabrand Lake. Then he was going to hit over, but he didn’t have much
strength left, and his mace didn’t reach further than Gjeding Lake. So
there was no point in him hitting anymore. Now the other one was to go at
it, and he started to hit, but was much stronger. He winds up smacking
down over Borum-Eshøj, and the spiked ball comes off his mace, and it
flies further to the west and makes Lading Lake.
11. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Geo-semantic exploration
Borum-Eshøj Hasle høj Brabrand Gjeding Lake Lading Lake
Elves 0 0 0 0 0
Giants and
their sport
1 1 1 1 1
Mound
dwellers
1 1 0 1 0
A mound man lived in a mound close to Hasle village (the mound has
now disappeared and been replaced by a gravel pit), and he was invited
once to a birth celebration over at the mound man’s place in Borum-
Eshøj, and he was supposed to be the godfather, but the day of the party,
his wife got sick and he had to stay home. He didn’t want them not to get
his godfather gift, which was going to be a gold hammer, so he went up
on his mound to throw it over there. But as he stood there swinging, the
hammer head fell off the shaft, and it flew to the northwest and landed in
a little dale near Mundelstrup and created Gjeding Lake, but the shaft
made it to Borum-Eshøj without leaving a trace.
12. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Computational folkloristics and the macroscope
Early microscope macroscope
13. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Some geo-semantic folklore “scopes”
WitchHunter: Visualizes concentrations of story categories in the
landscape
TrollFinder: For a given area, finds terms and categories that are
“characteristic” of stories mentioning places in the area
GhostScope: Places all storytellers at the center of the landscape,
plots place references to build conceptual maps
TreasureX: Links actual places told to places mentioned, plotting
the references on a map
Börner, Katy. 2011. “Plug-and-Play Macroscopes.”
Communications of the ACM 54 (3): 60–69.
14. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
WitchHunter: Category/place co-occurrence
15. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
TrollFinder: Finding region-specific terms
16. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
GhostScope: Storytellers’ conceptual geographies
17. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Advanced challenges for ElfYelp
• Exploring an unfamiliar region: in an arbitrary location,
what salient geo-semantic features of the corpus are
found here?
• For a given region, how do we find regions that are geo-
semantically similar? (Location recommendation
problem)
• Pairwise comparison of places is computationally
expensive. Using geographical topic models with a set
number of region centroids is easier and also allows
characterization of arbitrary, unlabeled points.
18. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Latent Geographical Topic Analysis (LGTA)
Created to identify regional
topics from collections of
human-tagged, geo-located
photographs (Flickr)
Any point on the map can be
recognized as a mixture of
multiple latent “geo-topics”
that span the landscape
Yin, Zhijun, et al. 2011. “Geographical
Topic Discovery and Comparison.”
Proceedings of the 20th International
Conference on the World Wide Web.
ACM, 2011.
19. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Latent Geographical Topic Analysis (LGTA)
1. Treat locations
as documents
containing story
terms/tags
20. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Latent Geographical Topic Analysis (LGTA)
1. Treat locations
as documents
containing story
terms/tags
1. Central points
of place
clusters stand
in for the rest
1. Run pLSA
math/magic on
these points
and their tags
21. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
LGTA geo-topics as regional “core samples”
Lindorm / witches and Satan / hidden
folk / maiden revenants: 86.88%
Life cycle and calendrical rituals:
8.65%
Mound dwellers, ghosts and Satan’s
influence: 2.22%
Things that happen in fields: 1.24%
22. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
ElfYelp: region geo-topics, similar regions, stories
23. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Kling, Christoph Carl, et al. 2014. “Detecting Non-Gaussian Geographical
Topics in Tagged Photo Collections.” Proceedings of the 7th ACM
International Conference on Web Search and Data Mining. ACM, 2014.
Alternative: non-Gaussian geographical topics
24. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Next steps/future work
• Enhance feature extraction for locations when building
geo-topics and improve performance with multiple
tags/keywords per document
• Ability to query geo-topic mixtures for any point on the
map (not just region centers): supported by LGTA, but
not yet implemented in ElfYelp
• Incorporate geographic boundaries, references to
toponyms (e.g., “that hill over there”) into the model
25. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
ElfYelp:
http://etkspace.scandinavian.ucla.edu/maps/elfyelp.html
The macroscope menagerie:
http://etkspace.scandinavian.ucla.edu/macroscope.html
● WitchHunter
● TrollFinder
● GhostScope
● TreasureX
● The Danish Folklore Nexus
● ...and many more
Please try it out!
26. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
Thanks to our funders / supporters
• American Council of Learned Societies
• The National Endowment for the Humanities
• UCLA Council on Research
• Nordic Council of Ministers
• The Institute for Pure and Applied Mathematics
(NSF)
27. ElfYelp: Geolocated Topic Models for Pattern Discovery in a Large Folklore Corpus
Peter M. Broadwell, Timothy R. Tangherlini
#dh2015 - University of Western Sydney - July 2, 2015
A Brief AdvertA Brief Advertisement--please apply!
Tim: Maybe begin with a sample story showing importance of location (or not)…
Tim: Maybe begin with a sample story showing importance of location (or not)… this is good
Tim: Our prior work on this collection (see dh2014) has involved finding ways to make sense of a corpus that’s way too big to label by hand. Now we want to take advantage of the pervasive geographic place references in the collection to ask more questions.
Tim: People weren’t terribly mobile then; storytellers tend to situate stories in their local environment, frequently incorporating references to nearby places. We find regions to be intrinsically more interesting than individual points; note that ETK’s collecting approach basically involved “sampling” regions based on where the good storytellers were; “good” storytellers usually told stories about their local environment.
Tim: We’re getting at the idea that the countryside is much more complex than people have given it credit for being. Through folklore, people projected concerns about contemporary issues onto their local environment - though sometimes in the guise of elves and mound dwellers. Villages could have widely divergent ways of doing this, though conversely, sometimes villages that were very far apart might have quite similar formulations and ways of viewing the landscape (and be completely unaware of it). We want to explore this phenomenon.
[Consider: both Echo Park in LA and South Delhi, India have Elf cafes! - found on Yelp and Zomato]
Tim: ETK grouped stories with similar themes into volumes for publication; there are ~38 categories. This is a very shallow and unreliable system, but it’s a start. He also tagged each story with one of ~770 sub-categories, which are even more idiosyncratic but also more fine-grained, so they are potentially useful for computational profiling of the narrative landscape, along with actual keyword counts.
Tim
Tim: The idea is to use opening sample story & maybe one more to demonstrate place/term co-occurrence counts, perhaps with a map. I’ll put the slide together, or just chuck it if it’s not needed.
Tim: The idea is to use opening sample story & maybe one more to demonstrate place/term co-occurrence counts, perhaps with a map. I’ll put the slide together, or just chuck it if it’s not needed.
Tim: The second story used here is DS_01_0_00298 (#298), told by N Nielsen in Viborg. I also found at least one other story that was very similar to the first one, albeit in the Mound Dwellers volume (DS_01_0_01319)
Pete: Early close reading? There are many different kinds of geographically oriented “distant reading” that we can do with this corpus. We refer to the various tools we have developed for these purposes as “macroscopes”; a macroscope is a tool for modeling and exploring highly complex systems.
Pete: Here they are
Pete: Plan to have this open in a browser window for a very quick demo
Pete: Plan to have this open in a browser window for a very quick demo
Pete: Plan to have this open in a browser window for a very quick demo
Pete: Note that existing categories and topics built from keywords do not take geographical context into account in a generative way
Pete: Related: Sizov, Sergej. “GeoFolk: Latent Spatial Semantics in Web 2.0 Social Media.” Proceedings of the Third ACM International Conference on Web Search and Data Mining. ACM, 2010.
Pete
Pete
Pete: The output is a set of geo-topic “core samples” showing the proportions of particular topics in a given place (both the number of core samples and the number of geo-topics can be specified). These topic mixtures fan out from the core point (we use a Gaussian spatial distribution) and blend with the mixtures of the points around it. Given this model, we can “drill” a new core sample at any point on the map to see what’s there.
Pete: This should be skipped in favor of a live demo, ideally. Demonstrate region selection, geo-topic “core sample” readout, similar places readout, info about the region (from TrollFinder), ability to drill down to story texts..
Pete: Other, more recent projects have focused on the “location recommendation” challenge using geo-tagged social data. This approach divides the landscape into “cells” around each point and tries to find the best “cut” of the cells such that one topic predominates in the entire shape found. This is useful for marketing purposes, but not for the types of corpus and spatial exploration we’re doing; the LGTA “core sample” approach is a better fit.
Tim: Feel free to summarize our conclusions here as well and restate the research questions this is helping us to address