Your SlideShare is downloading. ×
Nova Scotia Surnames and Mapping Methods
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Nova Scotia Surnames and Mapping Methods


Published on

This project shows methods of mapping surname data from 1940 in Nova Scotia, based on voter lists. The names were mapped in ArcMap to create a hard copy map and ArcGIS Online application, while a CSV …

This project shows methods of mapping surname data from 1940 in Nova Scotia, based on voter lists. The names were mapped in ArcMap to create a hard copy map and ArcGIS Online application, while a CSV database was loaded onto CartoDB to use with a Leaflet template, allowing users to query surnames and see their distribution. The idea was inspired by an interactive map of Irish surname data. The data for this project was transcribed from

Published in: Education, Travel, Business

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 2. TOPICS  Historical Context of Nova Scotia (where, who, etc)  Data Preparation (Ancestry Microfilms, CSV organization)  Using CartoDB to store Surname Data in CSV format  Using a Leaflet template to query and present data for individual surnames  Using ArcMap to present cartographic data  Prepare data within ArcMap to use with AGOL and Poster
  • 3. OVERVIEW  This projected consisted of genealogical research to create a database for three Nova Scotia counties.  The goal was to help show a geographic output in regards to the distribution of surnames and their frequencies within particular communities  This will be beneficial to those who are interested in finding possible home communities for their ancestors (it is possible that students great-grandparents are in this database)  May benefit historical groups looking to analyze temporal change in distribution of common surnames.
  • 4. HISTORICAL CONTEXT  Historical groups and trends affecting this study area:  English (New England Planters, Yorkshiremen in Cumberland Co., Loyalists, etc.)  Scottish (Highlanders mostly within Pictou Co.): 1700’s – 1800’s  Ulster-Scots (Colchester Co.), After 1760.  French-Swiss Huguenots (Northumberland Shore within all three counties). Via Lunenburg, 1770’s.
  • 5. DATA PREP - USING VOTER LISTS  Only three counties could be processed due to time restrictions.  Census lists are not legally attainable after 1921.  Voter lists are readily available from 1935 onwards on  Provided in microfilm format using Optical Character Recognition (OCR)technology, which is indexed for users to query data.  The data can often be incorrect:  Surnames with Mac as a prefix are often incorrectly indexed.  Incorrect identification of letters such as c and e, s and z, etc. (Dclancy vs. Delaney, Mackensic vs. MacKenzie, Eraser vs. Fraser, with many other examples).  Several missing counties.  Limitations: No record of Natives, no voters under 21.
  • 6. CREATING A DATABASE  An xlsx was used originally, which was converted to a csv so that it could be uploaded to CartoDB  An excel formula was used to return the last name of each individual  =IF(ISERROR(FIND(" ",A2)),A2,RIGHT(A2,LEN(A2) - FIND("~",SUBSTITUTE(A2," ","~",LEN(A2)-LEN(SUBSTITUTE(A2," ",""))))))  Problematic names: names with Jr, two words (ex.: Van Buskirk).  Finalized fields: PERSONID, Raw_Name (which was later excluded to conserve space), LASTNAME, LOCATION, COUNTY, LATITUDE, LONGITUDE.
  • 7. HOW DOES THE VOTER LIST DATA APPEAR? • The data is incorrectly transcribed as Miss Oeorgle Alkens (Miss George Aikens is correct).
  • 8. CARTODB  CartoDB is an online mapping service, which allows users to upload a CSV, Esri shapefile, among other files.  Users can store 5 MB of data, use 5 unique tables, and have up to 10,000 map views per month for free.  Users can manipulate the map by using SQL queries to display desired data.  CartoCSS language is editable to help style data, while users can modify an info window, and use a visualization wizard to alter the display properties.  Problem: A slight cost is necessary for larger datasets, which will likely be encountered with province-wide genealogical data.
  • 9. CARTODB IN ACTION Querying instances per community of Mc/MacNutt
  • 10. LEAFLET AND CARTODB  Open-source JavaScript library, which is very lightweight (only about 33 KB).  Can be used for several mapping sources, including Esri and Google.  Utilizes HTML5 and CSS3 codes.  For this project, a name search is used to select any record with the specified surname, along with a dropdown to filter the results by county.  A JavaScript function was added, which queried the CSV in CartoDB based on locations within a specific county.  The spiderfy option allows users to click on a location at a suitable level, with all of the symbols spreading outward from the central point.  Note: A thank-you to Ed Symons for helping demonstrate a leaflet template.
  • 11. LEAFLET AND CARTODB Fraser was the 2nd most commonly occurring name within the three counties in 1940, with over 1200 records. However, Fraser was the most commonly occurring name in Pictou County (over 1000).
  • 12. ANOTHER EXAMPLE: Showing the symbology of an individual record and the spiderfy option
  • 13. PROCESSING THE DATA IN ARCMAP: LOCATION  The original xlsx was used.  A Locations feature class was created by referencing two feature classes from the nscccods2 SDE geodatabase: places and roads. This feature class was used largely for gathering coordinates to use for the CartoDB CSV.  Points were placed near the community or between two communities if the electoral district consisted of both communities.  Estimations were made, based on looking at addresses for individuals, if a community could not be found.
  • 14. PROCESSING THE DATA IN ARCMAP: PART 2  As mentioned, the Locations shapefile was best used as a reference for coordinates.  The spreadsheet was spatially represented in ArcMap by displaying XY fields as Latitude and Longitude, and was saved as a feature class.  The frequency tool was performed to help show the frequency of every surname within each community, leaving 15,000 individual records for the 135 communities.  The surnames were ordered by both frequency and community, meaning that the top five values could easily be selected and exported into a new feature class.
  • 15. TOP SURNAMES Smith - Cumberland Langille - Colchester Fraser – Pictou
  • 16. USING ET GEO WIZARDS TO STYLIZE SURNAME DATA  The new feature class would have displayed all five surnames per location for one point.  The Build Thiessen Polygon Surface was used to develop a polygon that contained the community/location coordinate.  Helped determine placement of the random points and allowed the cartographer to refer to a spatial region that a community had influence over.  Random Points in Polygons tool could be used to generate five points per polygon, creating a total of 675 new points. The ID field had to be multiplied by 10 to create 5 points for each.  Both fields were exported and converted to a csv with FME Quick Translator to reorder by location and be assigned a uniqueID.  These tables were brought back into ArcMap, and joined based on the matching ObjectIDs.
  • 17. HOW DOES THIS LOOK?  Result of Thiessen Polygon and Random Point Tools. The communities are symbolized with the larger symbols, while the random points are blue.
  • 18. LABELING THE FEATURES  Three new label classes were created to indicate the frequency of surnames within a community:  Greater than than 50 within a community, 20 to 50 within a community, and less than 20 within a community.  Proportional labelling symbols were applied.  An effort was made to only have labels on-land and keep them a reasonable distance from county boundaries.  Labeling was difficult for some coastal communities, as the font was represented as being larger than the geographical area.
  • 19. POINTS EDITED FOR LABELING  In this example, the labeling is very basic and has no settings applied.  The labels have been moved to enhance visibility, but more work will be needed.  The points largely follow the Thiessen polygons.
  • 21. EXCEPTION… Due to the high population density, large number of communities, and very common surnames, the Thiessen Polygon strategy does not work well for Central Pictou Co.
  • 22. ANNOTATIONS  Annotations allow the data to be easily placed in any location. Users can move annotations in an edit session.  For this dataset, there may be many unplaced labels. These labels can still be drawn and edited in the same style as the other annotations.  Individual annotations can be edited to appear however the user would like them to appear.  Label classes can also have their original settings overridden if, for some reason, the user wants to change them.  Overlapping labels can be used with annotations.
  • 23. OTHER CARTOGRAPHIC ADDITIONS  DEM  Shapefile with dissolved counties to symbolize coastline (copied layer 3 times, different outline widths).  Community names  Labeled counties, water bodies
  • 25. USING AGOL TO SHOW SURNAME FREQUENCIES  The inspiration for this map was Mapping the Emerald Isle: a geo-genealogy of Irish Surnames found here  ArcGIS Online can be used to display label text, but must be converted to a shapefile first.  The data can be loaded into and used within the Find, Edit, Filter downloadable application, but this application has a more limited functionality than the Irish Surname story map.
  • 26. STEPS NEEDED TO CREATE AGOL-FRIENDLY LABELS: 1  Labels converted to annotations, containing limited information.  The annotations and the original top five surname feature class were joined to link the feature ID of the annotation with the original object ID of a feature.  The Feature Outline Masks tool was used to create unique features.  Coordinate system: WGS 1984 Web Mercator Auxiliary Sphere. This will allow the features to draw correctly.  The margin around the feature must be zero.  Mask kind is set to exact; the mask will only include the annotation.  A mask was only created for placed features; this is useful for lower- scale zoom levels.  All features were transferred, which will allow the user to rejoin the original feature class to these new features based on the Object ID and the Feature ID.
  • 27. STEPS NEEDED TO CREATE AGOL-FRIENDLY LABELS: 2  The Top 5 Surnames feature class was joined to the annotations, using the FeatureID (annotation) and the ObjectID (Top5Names).  The Top 5 Surnames feature was joined to the output of the Feature Outline Masks tool (using the FeatureID and ObjectID, again).  By doing so, features in AGOL will have meaningful attributes, such as community, county, and frequency.  Each feature will need to be created four times, which will account for four different zoom levels (based on the ArcGIS - Google Maps – Bing online zoom levels): 1:144,448, 1:288,895, 1:577,791, and 1:1,155,581.  Minimum and Maximum zoom levels need to be set at each step. This map shows how the features look when two feature classes are not controlled by zoom settings.
  • 28. PUBLISHING TO AGOL  Create a new document with only the four shapefiles that will be used on AGOL.  Publish as a feature service.  Turn off tiled mapping.  Mark as exception: Layer does not have a feature template set, and Map is being published with data copied to the server using data frame full extent. These will not affect the output.
  • 29. USING A CONFIGURABLE APP FROM AGOL  The Find, Filter, Edit application from Esri allows users to easily query and select data.  This application can be downloaded and configured to suit the user’s needs.  The web map created from the Surname feature service was used.  The filter and edit features were not useful for this assignment (changing the data is not desirable).  The 144K layer was used.  The LASTNAME field was used in the find field section.  The result fields are the LASTNAME, Location, Frequency, and County.  A zoom level of 13 was applied for selected features.
  • 30. AGOL APPLICATION IN ACTION  11860d390a7423a900f09e7036cb06e
  • 31. CONCLUSIONS AND LIMITATIONS  Useful for family identification  There are several possibilities for this data, as long as there is a geography  Programming knowledge may come in handy to further the research for this project  Ancestry: Missing counties, time-consuming, incorrect transcriptions  Possible add-in: ethnic origin symbolization (would require extensive research for each name)  Leaflet/CartoDB: cost-effective options for genealogy groups who cannot afford ArcGIS
  • 32. SOURCES          1/  shSurnamesMapUsingGIS.pdf
  • 33. QUESTIONS?