• Save
Trading Consequences - Claire Grover
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Trading Consequences - Claire Grover

  • 2,267 views
Uploaded on

Presentation given at the Geospatial in the Cultural Heritage Domain - Past, Present & Future event in London on 7th March 2012. The event was organised as part of the JISC GECO project.

Presentation given at the Geospatial in the Cultural Heritage Domain - Past, Present & Future event in London on 7th March 2012. The event was organised as part of the JISC GECO project.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,267
On Slideshare
647
From Embeds
1,620
Number of Embeds
7

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 1,620

http://tradingconsequences.blogs.edina.ac.uk 1,258
http://geco.blogs.edina.ac.uk 281
http://storify.com 56
http://accessibility_checker.siteimprove.com 9
http://blogs.edina.ac.uk 9
http://translate.googleusercontent.com 5
http://www.docshut.com 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Claire GroverUniversity of Edinburgh GECO Workshop 7/3/12
  • 2. • Funded by Digging into Data, round 2• Runs January 2012 to December 2014• Partners: Ewan Klein, Claire Grover, Bea Alex (Text mining) Colin Coates, Jim Clifford (Historical analysis) James Reid (Data integration) Aaron Quigley (Information visualization) GECO Workshop 7/3/12
  • 3. Trading Consequences• What can archival text tell us about the economic and environmental consequences of global commodity trading during the nineteenth century?• Goal: Help historians to discover novel patterns and to explore new hypotheses.• Example questions: • What were the routes and volumes of international trade in resource commodities, 1850-1914? • What were the local environmental consequences of distant demand for these resources? GECO Workshop 7/3/12
  • 4. Geolocating Cinchona GECO Workshop 7/3/12
  • 5. Global supply to West Ham factories GECO Workshop 7/3/12
  • 6. Project Overview• Scope: global, but with focus on Canadian natural resource flows to test reliability and efficacy of our approach.• Sources: digitised documents from the nineteenth century British Empire.• Methods: – Text mining and geoparsing to transform text into structured data, e.g., relational DB – Query interface targeted at historians – Information visualisation for interactive exploration GECO Workshop 7/3/12
  • 7. Text Mining Background• Language Technology Group (LTG): many years working on text mining in a range of domains.• In collaboration with EDINA developed tools for geoparsing digitised historical collections such as Bopcris, Histpop, Stormont Papers.• The Edinburgh Geoparser is a freely available georeferencing system, soon to be open source. GECO Workshop 7/3/12
  • 8. Available via EDINA GECO Workshop 7/3/12
  • 9. Edinburgh Geoparser• Named entity recognition (NER) – recognises place names and other entities in text – standard NEs: location, person, organisation, date. Add commodity for Trading Consequences.• Gazetteer look-up: – Unlock – Geonames – Pleiades+ (see GAP project)• Georesolution to select most likely interpretation of place names in context. GECO Workshop 7/3/12
  • 10. NER Display (Wikipedia: Battle of Barrosa) GECO Workshop 7/3/12
  • 11. Output Display (Wikipedia: Battle of Barrosa) GECO Workshop 7/3/12
  • 12. Digging for what?• Instances in text of trade-related relationships between commodity entities, location entities and date entities.• Ideally also with associated with organisations, quantities and sums of money.• Even more ideally, associated with an indication of environmental impact. GECO Workshop 7/3/12
  • 13. Digging where?• OCR textual data from digitised datasets.• Three primary sets: – The House of Commons Parliamentary Papers (HCPP) – The Canadiana.org data archive – The Foreign and Commonwealth Office Collection from JSTOR• As much additional material as possible from British, Canadian, Indian, East Asian, and African newspapers, major British periodicals, and the growing databases of nineteenth-century books in English and French. GECO Workshop 7/3/12
  • 14. Preliminary explorations of data• Initial explorations of two kinds of data using our existing text mining toolset: – Sources that can be mined for commodity terms to assist in the creation of ontological resources – Sample texts from our three main datasets.• Approximation of commodity NER: noun phrases which have hypernyms such as substance, physical matter, plant or animal in WordNet.• Non-optimized geoparsing. GECO Workshop 7/3/12
  • 15. Commercial Botany of the Nineteenth Century (Jackson, 1890) GECO Workshop 7/3/12
  • 16. Canadiana.org GECO Workshop 7/3/12
  • 17. HCCPGECO Workshop 7/3/12
  • 18. Location-commodity pairs in sample (2 books, 3 primary sources) 25 most frequent 8479 India 25 most frequent 9457 Fir 4236 St. John 4982 Pine locations: 3647 England commodities: 3590 Cotton 3545 New Orleans 3389 trees 2902 Bengal 3224 Birch 2823 Bombay 2654 Spars 2788 United States 2301 silver 2664 Calcutta 2283 bamboo 2556 Quebec 2000 water 2494 France 1978 cattle 2382 Canada 1925 logs 2210 New York 1902 teak 2154 Petersburg 1811 produce 2098 Jamaica 1804 bamboos 2072 Europe 1699 oil 2041 Melbourne 1635 wood 1807 Dresden 1614 Queen 1701 Assam 1460 Wine 1619 China 1310 waters 1603 London 1278 papers 1544 Africa 1244 Wood 1435 Montreal 1243 Hides 1411 Ceylon 1204 tea 1332 Quoy GECO Workshop 7/3/12 1202 paper 1314 Canara 1159 tree
  • 19. Mapping commodities for the sample GECO Workshop 7/3/12
  • 20. Issues and Challenges• Our aim is to transform historians’ understanding.• Will we find and be able to access all relevant source texts?• Text mining won’t be completely accurate – will there be enough redundancy in the data to balance this?• Text Mining issues: – Low level text quality issues – Isolating relevant instances, i.e. location-commodity relations – Tables – French• Georeferencing issues GECO Workshop 7/3/12
  • 21. OCR Quality GECO Workshop 7/3/12
  • 22. TablesGECO Workshop 7/3/12
  • 23. Some georeferencing Issues• Which gazetteer? – GeoNames is global but some place names or their spellings have changed. – Is there an alternative?• Segmenting texts into appropriate units: – Some OCR text is in one big file, some is split into individual pages. – Georesolution assumes each text is a coherent whole and each place name contributes to the disambiguation context for all of the others.• Weighting of heuristics: – Population is good for modern newspaper text, but could be misleading for 19th Century. – Weighting coastal/port records more highly than inland ones – how do you know a place is (was) a port? GECO Workshop 7/3/12
  • 24. • http://www.jisc.ac.uk/whatwedo/programmes/d igitisation/diggingintodata/tradingconsequences. aspx• http://tradingconsequences.blogs.edina.ac.uk/• http://twitter.com/#!/digtrade GECO Workshop 7/3/12
  • 25. Thank you! GECO Workshop 7/3/12
  • 26. • East India (Forest Conservancy). Return to an address of the honour able the House of Commons, dated 15 May 1871;-- for, "a selection of despatches and their enclosures to and from the Secretary of State for India in Council on Forest Conservancy in Indi a, showing the measures which have been adopted, and the operati ons which are going on in the several presidencies and lieutenant g overnorships, beginning with the despatch from the Governor Gene ral in council of the 21st day of May 1862 to the present time.”• Despatches on Forest Conservancy in India. Part I. India; Part II. Mad ras; Part III. Bombay• 1871• Canadiana – sessional papers 1895• Fifty years of economic botany• Commercial botany of the nineteenth century GECO Workshop 7/3/12