Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Creating A Database of Ship Citations


Published on

This is the PowerPoint slide deck from my presentation at The Charleston Conference, on November 4, 2010.

  • Be the first to comment

  • Be the first to like this

Creating A Database of Ship Citations

  1. 1. CREATING A DATABASE OF SHIP CITATIONS: THE CHALLENGES ENCOUNTERED IN SHIPINDEX.ORG The Charleston Conference, 3 Nov 2010 Peter McCracken Co-Founder & Director of Content and Business Development,
  2. 2. What kinds of ships are these? Bark (or barque); Ship; Brigantine; Barquentine; Topsail Schooner; Schooner
  3. 3. Serials :: Ships  Publication pattern (or format?) :: Vessel type  Serial title :: Ship name  ISSN :: IMO   Ship research :: Any other historical research
  4. 4. Ships :: Other historical research  Problems with ships are the same as problems with personal names, geographic descriptors, etc.  Can also apply to concepts, as well as things  Also ‘non-unique’ items, like a car model
  5. 5. Data challenges – personal names  Innumerable works by “Anonymous”  Names are often shortened  Pablo Picasso’s full name was Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso  Names have strange limitations  Some must be unique – Consider Michael J. Fox  Some are very common – Consider Adam Smith
  6. 6. Data challenges – geographic names  Numerous variations: Köln; Cologne; Keulen; Colonia; Colònia; Kolín nad Rýnem; Cwlen; Κολωνία; Kolonjo; ‫;كولونيا‬ Кьолн; Ķelne; Кёльн  Name changes  Hot Springs, NM -> Truth or Consequences, NM  Halfway, OR ->, OR  Clark, TX -> DISH, TX  St. Petersburg -> Petrograd -> Leningrad -> St. Petersburg (“Petersburg,” or “Piter”)
  7. 7. A “meaning-less” identifier  Regardless of the topic, some meaning-less identifier can provide significant assistance  “Meaning-less” in the sense of a one-to-many relationship between the identifier and the data  The identifier doesn’t change, but the data can
  8. 8. Overview of  A database of citations –  >1.42 million citations, from >200 resources  >140,000 citations are freely available  Changes how one does maritime research  Far more content can researched more quickly  Opens up maritime research to everyone  No need for inside knowledge on where to start searching  Uncovers many hidden resources  Locates free, but hidden, web resources
  9. 9. Maritime access points  Vessel name  Vessel number  IMO numbers are new; hull numbers change  Captain name  They change between voyages, and die during them  Rig or vessel type  Ships are rebuilt; definitions change; “ship”  ALSO: Port of registration; crew members; others
  10. 10. Vessel names – this is easy!  “What does the stern say?” 1872, American Lloyd’s Register of American and Foreign Shipping 1867, American Lloyd’s Register of American and Foreign Shipping
  11. 11. Sources of errors – primary sources  Mistakes in primary sources are very common, and forgiveable  Digitized version of Lloyd’s List of 1812 Ships called “Adolph & Fredericka”
  12. 12. Sources of errors – transcribers, indexers, OCR operators, etc.  Transcription errors are very easy to make – whether through incorrect assumptions, or just mistakes  “Earnets” for “Earnest”; “Elizaneth” for “Elizabeth”, etc.  Some files are much tougher to manage than others
  13. 13. More challenges  How do we locate Elizabeth? Or Mary?  Elizabeth = 1899 citations  Mary = 2614 citations  Top ten ship names, for no good reason: Mary, Maria, Elizabeth, Anna, Union, Victoria, Hope, Flora, Emma, America  Try to limit results sets?  by time period  by vessel rig (maybe?)  by location(?)  by nationality
  14. 14. Changing vessel names  What do we do when a vessel changes its name?  A person researching a vessel wants to know the life of a ship; at present they need to know its previous or subsequent names  This can only be done when we have unique vessel identifiers – otherwise, how do you know which Elizabeth became Hogwarts Belle?
  15. 15. Existing vessel identifiers  Hull Identification Number – Only US; any powered boat  USCG Documentation Number – Only US; >5 net tons  IMO Number – Assigned by Lloyd’s/Fairplay; international; passenger ships >100 gross tons, and cargo ships >300 gross tons; mandatory from 1996  Naval Identifiers – eg, PT-109, CV-42, BB-18, DD-793, D118, etc.  Lloyd’s numbers, and many more…
  16. 16. Unique historical vessel identifiers  Need an easy way to differentiate between “Mary,” “Mary,” and “Mary”  Needs to be unique and unchanging (unlike name, naval identifier, etc.)  Identifier itself has no meaning – no indication within it of size, nationality, etc.  Identifier is quickly & automatically assigned  Identification is coordinated with multiple organizations
  17. 17. Creating an identifier  Could be done through a standards-creation process, via NISO or another organization  Or informally, with publicly-defined guidelines, such as (just as examples):  Nine-digit number; ddd-ddddd-c (c=check digit)  Allow individuals to easily request identifiers for their vessels or their citations  Need ability to easily combine/split/modify  User-managed is likely most cost-effective solution
  18. 18. Creating an identifier  Must have buy-in from many groups  Should be easy to implement  Should be easy to use; available to many individuals and resources  Pre-populate as much as possible, open editing to all  Maintain advisory group to address concerns, disagreements, etc.
  19. 19. Defining <ShipIdentifier> <OtherIdentifiers> <IdentifierType> <IdentifierNumber> <ShipName> <DateNameStartedInUse> <DateNameEndedInUse> <PreviousShipName> <SubsequentShipName> <RigType> - defined list of types, & “other” <VoyageIdentifier> - multiple
  20. 20. More <ShipIdentifier> <MilitaryUsage?> - yes/no/unclear <Nationality> <ServiceBranch> <HullIdentifier> <VesselMeasurements> <MeasurementType> - list of options <MeasurementValue>
  21. 21. Defining <VoyageIdentifier> <ShipIdentifier> <Captain> <Crew> - multiple positions, multiple names <CrewPosition> <CrewmemberName> <OtherVoyageIdentifiers> <OtherVoyageDatabase> <OtherVoyageDbId>
  22. 22. Expanding to other fields  Makes discovery more manageable  Makes linking possible  Use the same concept for other areas of research, linking everything together  People  Places  Manufactured items  Artwork  Everything
  23. 23. Thoughts, questions, more? Thank you – Peter McCracken