URISA Connect Proper Care and Feeding of Metadata


Published on

Do you feel you are overrun with metadata requests? Does dealing with metadata make you want to lose your mind? With preparation, the care and feeding of metadata maintenance will no longer constitute time-killing drudgery.

In this webinar several tips and tricks for taming metadata will be presented. After reviewing the different options for geospatial metadata, each section of the Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata will be discussed in detail. By the end of the workshop, participants should be comfortable enough with the CSDGM to take provided sample files and create their own template.

Ultimately, by taking a few small steps to making metadata meaningful and manageable, it will also go from savage to subdued.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Ryan Elizabeth Bowe, GISP
    URISA Vanguard Cabinet Member (January 2012 – January 2014)
    Secretary of Cumberland URISA
    ASPRS Young Professionals Council member
    At Photo Science, I started out as a GIS Technician and have moved all around, including Alternate Sensor Operator, and settling in Metadata and Report Manager.
    I have written hundreds of thousands of metadata files!
    I really and truly LOVE metadata and hope I can share my passion with you today.
  • Yep, I love metadata so much I consider it yummy. After a minimal Overview of Metadata Meanings we will talk about
    Metadata Madness
    Maintaining Metadata
  • I love the can label justification for metadata. You have all probably heard that story about going into the store and finding an unlabeled can and trying to guess if it is cat food or tuna (and how much it costs as well) and then being asked if you’d buy and eat the contents without all your senses present. But since this is Proper Care and Feeding of Metadata, let’s try a new example.
  • So you don’t think you need metadata? Well, then. I have some seeds to sell you.
    I don’t have a clue what they are!
    I don’t know how you should plant them.
    Sun or Shade
    Potted or Outdoors
    With lots of space to grow or a minimal footprint?
    Germination time frame?
    Do you need a “male” and “female” plant like with kiwi?
    I don’t know how long the entity that grows will last.
    What if they are Phirana Plants?!? How do you kill them?
    Will you be picking beans until November?
    Will you instantly be killing giants like Jack the Giant Slayer if you “add water”?
    I don’t know what I’ll charge you for them.
    And, once I do charge you my mystery fee, I have no clue how I’ll deliver them to you.
    Worst of all, since I’m a mystery seed seller…you’ll have no way of knowing how to contact me if you do “get them wet” (or feed the mogwai after midnight)
  • Don’t you think Seymour wished Audrey II came with some metadata on that ill-fated total eclipse of the sun?
    So, is that a better example of Cat Food v. Tuna? About the same?
    These Little Shop of Horrors images make a great point as well…don’t wait until the metadata beastie is a big enough problem that it can consume you whole!
  • So, what is metadata? A headache, right? Like organizing all these library card catalog entries after ghostbusters. And, I’ve heard metadata likened to these old school card catalogs. But I heard rumors that they’re doing away with such things and going digital…so I have to wonder how long this comparison will be relevant.
  • And it’s been likened to the information on the back of photographs. I know those have gone digital. Here is an example from LightRoom. My cameraphone took this photo on the 7th of July at 8:37.02PM. The light was fantastic (and I could get into some geeky camera terms that are well-labeled, but we’re not here to talk camera stuff, we are here to talk metadata).
  • Here is another example of what I consider “current” metadata: your music collection! (8-track, Record, cassette, CD, waaa?) So, imagine you had a ton of Unknown songs in your music collection. Do you still consider metadata something to be avoided at all costs?
  • And, lastly, I also believe the new reason no one really needs to define metadata anymore is the NSA “scandal”. It is scary when you put it in this context, but when you think about your highly valuable geospatial data, it’s perfect, right? You don’t have to look at the 2GB image, you can read the metadata and know ANYTHING. How, you ask?!?
  • I’ve had enough fun defining metadata in general, so let’s talk about geospatial metadata. Back in 1994, Bill Clinton signed Executive order 12906, creating the National Spatial Data Infrastructure (NSDI) and Federal Geographic Data Committee (FGDC) in order to have a clearinghouse of geospatial data. The clearinghouses have changed faces over the years, but their searches have been based on the Content Standard for Digital Geospatial Metadata. CSDGM.
  • Before we talk about what TO do, let’s make sure you know EXACTLY what NOT to do.
    Do not stare at a blank slate!
    Look at the actual dataset
    Start to gather facts (talk to people who worked on the dataset if you didn’t work on it yourself)
    Request information from the “Source”
    Search for relevant templates (by searching for similar datasets, if nothing else)
  • CSDGM is your best friend while writing FGDC Metadata. I have a well-worn copy printed out by my desk. I still have to double check things when I write sections I do not use all the time. In order to familiarize myself with the document, I went through and I highlighted all the optional fields in my copy. It helped reinforce the “symbols” they use there. It also helps for days when the curly brace looks an awful lot like a parenthesis. Ahem…days when you’re feeling old. When you realize the “next” generation isn’t going to have to learn how to use a card catalog.
  • The other big geospatial metadata standard is ISO19115. NOAA’s NCDDC has a great series on it. I’ve taken it several times and learn something new each time. Since they do so well, I’m going to focus on FGDC more. I know, that link is difficult to read. But if you search for NOAA Metadata Training…that’s the first link. I can’t recommend their webinars slash training enough.
  • At one point in time I’d suggest going to ArcGIS as better than a blank slate because it had a readable interface with CSDGM specifications in it, but now, with 10.x, not so much. Here’s the old school 9.3.x editor, may it rest in peace.
  • Now, we have this…initially. Oh, but don’t forget the nifty trick of turning it to FGDC metadata in the options.
  • Just in case you haven’t found the options interface, here it is. It is on under the Customize > ArcCatalog Options menu. There, you get to choose your Metadata Style. And, also notice that you can tell it whether or not to automatically update your metadata. I usually like to leave this unchecked, but it is a personal preference. And it will be very nice if you need to track exactly how you created a feature.
  • But, now that you have the “correct” options chosen, you get this. Sigh. It’s giving me ISO descriptions down there. How’s that going to help me write FGDC metadata? At least it has some of the required tags correct (Identification Information and Metadata are the only two required sections, right?)
  • See my cursor hovering over the Title element…
  • So, yep, we are working with ISO in an FGDC file. There are a few places where you can find FGDC descriptions, but it’s on ones that are FGDC only, really. For the title, though, it’s exactly the same in the Citation Information > Title (8.4): The name by which the data set is known. But man does that make my head hurt! So, let’s go back and look at the couple sections. (Skipping the browse graphic because that is still a very nice tool.)
  • Again…I hover over those boxes (sometimes it takes a bit of encouraging to get the text at the bottom to come up….
  • So here are the next three sections. And the types…and I think I have an even bigger headache now!
  • Toolboxes. Before I just throw my hands up and walk away from ArcGIS, I will point out the all important difference between the Model (two blue dots, a yellow dot, and a green dot) and the Tools in your toolbox. For whatever reason I have not been able to get the models to function properly. They always error out for me. Now, maybe it is better in 10.2, but I don’t know. I do know that the tools (hammers) work, so instead of wasting time seeing if it works again…I stick with the hammer-time-tools!
  • Before we talk about tools other than ArcGIS, I have to remind you that you do not have to spend any money because the only thing you really need to write metadata is a text editor (such as text pad), the standard (all free online), and a validator (MP is provided from USGS and free).
  • There are plenty of tools out there other than ArcGIS…play around with tools (they all have trial periods if they aren’t free) and find one with which you are comfortable. This is another thing NOAA NCDDC training does really well. They review several different tools (Mermaid, CatMDEdit, GeoNetwork, ISOMorph, Geoportal, Altova, oXygen). Again, the main thing is to find something you’re comfortable with and run with it. For the longest time I would only use UltraEdit. Now, I use oXygen. And, if you’re really REALLY good…you’ll make your own tools. That’s another topic all together.
  • (This is a bit of a cart before the horse issue because you will probably want to decide if you are working in text before you commit to an XML editor. Then again, some of the platforms available will let you output the data in text or XML…so maybe it’s more chicken-before-the-egg debate?) Anyway, bit more about “platforms”. Once you pick your editor, you also get to choose between text and XML. I personally love XML. It’s probably something to do with the fact that the spaces make me feel vapid. If you’re off by one, you’re done. I don’t play games I cannot win, and that feels like the house always wins to me. Yes, it’s more readable, but I’ve been working with XML and the CSDGM long enough to be comfortable with tags. These screen shots are UltraEditor (on a Mac) for the text on the left and oXygen XML for the XML on the Right.
  • When it boils down to it, all you really need is a text editor (and there are plenty that are free that work just fine) and metaparser. By the way, I did NOT say Internet Explorer. Please don’t try to edit your XML files in IE. It just won’t work. It is, however, a good test to see if you have all your XML tags done properly. And, it is your link to MP!
    If you’re in for a challenge, install the software. The problem with this is that updates come out so frequently that it is much easier to run the online translation. I will say using the command line MP input builds character 
  • When the government is not shut down, this is what MP looks like. It will always be the most current version, so you don’t have to worry about reinstalling or downloading the latest MP. But, every once in a while, you might want to because it may be the only link you have to MP.
  • Old school! When the Government shut down I was quite worried I wouldn’t be able to use MP. And when a metadata file needed to be written in Text and XML I was quite terrified. Translating to text is not something I will ever want to do by hand. So I looked through and I found I had just downloaded a version because I knew it had the LiDAR extension in it. Saved! This is what you see when you start command line MP.
  • And here is how I used MP to translate a TIGER_unedited file to an XML file. Easy, right? 
  • Now that I have shown you some Tools and Formats and you suffered through the old school command line example (which I had originally hoped to never have to show anyone), let me tell you a story about another tool that led to my discovery of the power of templates. A long, long time ago, maybe the fifth time I wrote metadata, a contract with metadata was brought to my attention. It had a link to something called “XMLInput”, but the link was broken. After some serious internet searching, I managed to find the proper link. I tried a few times to make the actual tool work, but I gave up because the templates provided with the tool were much better for me. But as I have repeatedly pointed out I love XML and I love the CSDGM. Although these templates make terrible bedtime reading (even for me), using them has made me a better metadata writer. In the background you see the 133UATemplate, where you can delete the comments with all the information you ever needed to know about the tags…so it is very easy to write what you need.
    These are just a few templates that I rely on, there are often others for the different clients and “profiles”…one of the hardest has to be the National Flood Insurance Program (NFIP) because it has so many fields that do not change. It should be easy, but it is the square-peg-round hole issue. You have to describe your data with a set phrase that just…doesn’t describe the data!
    Another fun one is the USGS LiDAR base specification. It is one of the most recent revisions to MP! But, I haven’t seen the DTD updated. If you really get into it, you can revise a version of the DTD so you can see the changes in your XML editor! Give yourself a few hours…
  • Before we look at some of the sections of FGDC metadata (and be thinking of which ones you want to talk about from 4-10…I view the last three as building blocks so the most important to discuss) I want to look at the training options available to us. Some of them are straightforward training sites and conferences, but others aren’t
    You’d be surprised what social media can teach you! That’s where I found the NCDDC training.
    Also, lynda.com has an excellent XML class if you’re totally lost on those.
    While I like GeoSpatial Training Services, there’s nothing SPECIFICALLY metadata. Unless you’re needing coding lessons.
  • You can spend tons of money on various materials, but the best training method is to just hit the books on your own and write metadata. So lets hit the books and look at some of the sections.
  • I know you have all seen this before at least once now, but I’ve modified it a bit because the three building blocks were left out. There are actually ten sections of the CSDGM, with the last three being repeated throughout. (just think about how many time period features there will be in metadata!) What we are going to do now is to go through the sections in reverse order, starting with 10, contact information. Here we go.
  • I can’t stress how important it is to get a handle on these sections because they are used throughout.
    I’ve tried to keep a color scheme going here where red has several options available, green are optional, and blue is a little quirk. So, onto the examples.
    For Contact Person, you can have a person or an organization. For my example, it is a person. You don’t have to list a contact position (cntpos). Mailing and Physical is an Address Type. Although the CSDGM allows free text, the suggestions are “mailing”, “physical”, or “mailing and physical”…This is absolutely NOT the first line of your address! This is supposed to tell people how they can use this address.
  • Here are the two options for Contact Person/Organization. You can actually have an organization with the contact person as well, too. I just chose not to insert that here.
  • There are three main types of dates: single, range (with beginning and ending), and multiple (which is made up of single dates…and by made up of I mean a multiple date and time must have at least two dates). You can also use time here, but I rarely use time entries so it is one of the tags I’d have to go back to the CSDGM to be able to write properly.
  • So here’s a range of dates.
  • The only tricky part about Citation Information is that you can nest it incessantly within Larger Work Citation. I don’t see any reason to do this, though.
  • A bit more readable version of Citeinfo with just a bit less…
  • Here’s the graphical CSDGM, but it is missing the most important sections in my mind…which is what we will go over next. Granted, these are the “core” seven sections and the rest are just the building blocks that appear in most of these sections and sometimes multiple times in the same section.
  • Here’s my favorite section! Take a look at the si section because it’ll look really familiar in a bit. I’ve also added in the LiDAR Base Spec…
  • And here’s the brief version…which is much more readable!
  • Sorry, but if you thought my other slides were bad with the XML of sections, Distribution would have put you all into a coma. It’s back to the NSA and planning, which Pointy-Haired-Boss doesn’t do so well. And watch out for pointy haired bosses…they may say they’re editing your metadata but it’s in-one-ear-out-the-other. Sometimes it’s best to parse the metadata into an e-mail and say “here, read this and make sure you’re ok with it.”
  • Here are the four types of entity and attribute information detailed sections. There’s also the overview section, but that is infinitely simpler than these detailed ones. Both overview and detailed sections of entity and attribute information are definitely sections to to consider writing and having ready to pull into the larger file if you have commonly used fields.
  • Here are the next two…hopefully that explains the four options.
  • Spatial reference is the largest section, but it is one of the most important ones. Starting with the two easiest ones: geographic and local. I exported geograph from ArcGIS (see, it has it’s uses) but I really don’t care too much for the resolution! It does pass through MP though. And Local is just a description. Keep this in mind when you are trying to make sure that the HARN is noticed.
  • We are going to go back a bit because this is an easier section that stays the same no matter which one you choose. It’s the planar coordinate information with planar coordinate encoding method being able to have coordinate pair or distance and bearing in addition to row and column. Then, the rest is simple, right? Abscissa and ordinate resolution are the nominal minimal distance between x (abscissa) and y (ordinate).
  • UTM 14…note the differences between LCC and TM
  • I only have a few horizontal references but they feel as if they are the most deeply nested and confusing section of metadata (almost giving Distribution a run for its money). Anyone tell me what state plane zone I have here?
  • And, one more look at map projections: an albers. I left the spaces in there so we can scroll back through and see what changes (besides the values)
  • Sorry it is a bit early for Christmas unless you are a retail store, but the Horizontal Datum Name is optional and is restricted to NAD83 and NAD27. No free text option here. Ellipsoid Name has suggested GRS80 and Clarke 1866. So, what do you do with HARN? Ahh…the fun things to think of…
  • Both Altitude and Depths have similar formats as you can see here. And, while there are huge lists for the names, they both allow free text.
  • These are all green because you have options – you can have indirect or direct. Every once in a while I’m really glad I work in pixel (or dot) land. Raster spatial data organization information is much easier to fill out than vector spatial data organization information. Yes, I have to count rows and columns but for the most part the size of an image is standardized quite nicely. Not always, but…
  • …vectors have two formats Spatial Data Transfer Standard and Vector Product Format…just when you thought raster’s row and column counts looked bad…vectors have the option of a point and vector object count. And a somewhat difficult to figure out type. If you have questions about that, please visit the website on the slide.
  • Again, I find rows and columns much nicer than this. But, I don’t get out of it that easily. I have to write metadata for vectors as well. We have seamlines and contours and planimetric database features.
  • I hope you have all see the Accuracy v. Precision graphs…thanks again to overboard (one of my favorite comic strips) for another interpretation of accuracy and precision with a process involved as well. Which is exactly what the Data Quality section is! Three types of accuracies (Attribute, horizontal and/or vertical positional accuracies) and Lineage (sources and, more importantly, processes) with a little cloud cover thrown in for good measure.
  • The accuracy sections are quite varied but have the same basic format: a report with the optional quantitative bit which gives a value and how the value was assessed. And then two text reports.
  • Here is the attribute accuracy report. Even though attributes are drastically different from positional accuracy values, they still have that report; value and explanation format. I don’t have a very good example here, but an example might be if you had a classification attribute (whether it is contour type or feature type [road v. building] you could report how accurately you had the attributes categorized. Also notice that the quantitative section is always optional.
  • Again I am just quoting the standard here, but this would be where you could explain how the depression and intermediate contours were created and, if there are any void areas in the data, why they are present and, if you’re working in pixel land, what color they are.
  • And here is the horizontal positional accuracy section. Sometimes I find it difficult to write both the report and the explanation because they end up sounding exactly alike. What I usually do to keep this from happening is to report the target in the report and keep the explanation to exactly how that value was derived and which value it is (NMAS, NSSDA, etc.).
  • Notice that the vertical and horizontal tags are both “optional” … you have to have one or the other. You won’t just have positional accuracy without anything else. (Of course, I’m a “completist” in that sense. I hate seeing sections with N/A or None. If that’s the case, just leave it out!)
  • This is an overly simplified, but optional, source info. Theoretically, if a source is used in the process step, it should be defined. It is repeatable, but I warn you to use some restraint here. It gets ugly quickly!
  • So, there are two tricky domains here. srcused and srcprod. Both have to be “Source Citation Abbreviations from the Source Information entries.” But there aren’t any checks built into MP for this. Somehow, they trust us. Plus, I suppose since it is optional it really doesn’t have to be thoroughly checked. Again, it is repeatable. And again I warn that you should use caution. I know it says “information about a single event” in the process step description but no one wants to read about every single mouse movement.
  • We’re almost there! Cloud Cover is the last tag in Data Quality! It feels a bit out of place, but the amount of data which is obscured does fit! I hope that you see the silver lining on the metadata cloud right now.
  • Before we get to the last section, we will look at Distribution. If you do the minimum, it is very easy. But this is an old section that shows how much has changed since 1998. One of the sections that isn’t required is the Standard Order Process, which can have two forms. One of which is Non-Digital Form. But WAIT!!! What does the D in CSDGM stand for?!?
  • Just for argument sake, this is what the non-digital format looks like (along with the optional resource description at the top…just thought I’d throw that in there). Still nice and easy!
  • So, here’s the first have of the digital information. You might be able to tell my heart’s not in this. Typically I’m a data producer and that does not include data distribution. So, in order for me to write this I have to consult my crystal ball. AKA, I have to beg and plead to have someone contact the client to make sure it is OK to leave it out or to have them write it for me. I don’t know their URL’s and I usually don’t know their fees.
  • So, while my crystal ball unfogs and tells me how to write this section does anyone see any problems with this screen capture of formats from the CSDGM? I do…SHP, JPG, SID, MDB, GDB…just to name a few. But this does allow Free Text.
  • And here’s offline option. It’s nice and logical, I think.
  • So, this is what you need to have for the online option. There’s also Dialup Instructions, but I won’t waste our time on that. It has even been removed from ArcGIS with the following message “Dialup instructions can’t be provided in an item’s metadata using ArcGIS. If existing metadata includes this information it should be updated to include current information by which the item can be obtained.”
  • Well, we’ve made it this far. One more to go. It is probably my second favorite section. Metainfo being my first. I just hope you feel more like the bear in this photo than the fish by this point in the webinar.
  • There’s a very, very good reason to save this for last. You’ve probably had to research most of these things in one form or another to write the other sections…so for the most part this information will be easy. We’ll take it a few pieces at a time now.
  • The citation is simply a nested citeinfo (by now, with as many ellipses as there have been I hope you have seen just how important it was for us to start backwards…those are the building blocks of FGDC metadata so it is imperative to understand how to write those sections). Then comes what I think is probably one of the hardest sections to write because you have to have an excellent understanding of the data: the description section. It has an abstract (did your teachers always have you write the abstract last as well?), purpose, and catch-all supplemental information. Again, the time information is present
  • Spatial Domain is a really spatial one for me…I think about all the people using clearinghouses to search for data that is powered by my metadata’s bounding coordinates. And my heart goes pitter pat!
  • Keywords also make my heart go pitter pat because I think of people searching for specific datasets based on keywords. Here’s the basic structure. I use the first two frequently, but not the second two.
  • And here are three examples. Notice that I’ve repeated place. That’s important to notice. If you change the thesaurus, you need a new keyword section (theme, place, stratum temporal). You can repeat the keywords, though. So, in my example, the data falls in KY and TN.
  • We really are almost there. Once you get past the legal jargon (but, ever wonder what legal rights you’ll have if someone doesn’t read your metadata file or if the file doesn’t travel with the data?) we get into the optional sections of IDINFO. Finally. Point of contact, a nested cntinfo, is first.
  • More optional. The sec will look familiar from metadata. Browse graphic…which I highly recommend in ArcGIS. Data Credit is where Data Producers put all their hard work. Native Data Set Environment. You can go as detailed as you want in there and list the different versions. So, I should have listed 9.3.1 and 10.1 for ArcGIS, right? And, lastly, crossref. Another section which can quickly get carried away. Then again, there are several valid ways to use this!
  • Data manipulation techniques change
    Software is updated
    Data itself is updated
    MP is updated
    The general rule is if the data changes, you should revisit the metadata. Maybe you have some static layers. Great. You don’t have to change your metadata. Then again, maybe you update that static layer (images, maybe) every so often. After you use the first dataset’s metadata you can always improve the future “generations” of metadata. What tags make more sense for your organization? Which don’t make any sense at all?
  • When I read “Who Moved my Cheese?” I thought about data being “moved”…which made me realize if it was moved I’d have to update the metadata. Revisiting metadata can be painful, but it also lets you revamp the metadata quality. And, if you have a large group of people “messing” with your data, you will have to update the metadata frequently. I hope they all take good notes and can tell you what they did, though, or you have a flawless backup system!
  • http://s.ngm.com/2012/04/titanic/img/titanic-bow-615.jpg
    There’s so much more to talk about in terms of metadata. We could go through each line of the files and explain it, compare it in XML and Text…go through the different ways to display the metadata once it is written…. I encourage you to go out and try some things out. Set up your snippets for easy use. Try some new program. Even try to break ArcGIS (hey, I have a warped sense of fun but discovering new “features” in the software is at least interesting).
  • Vanguard Cabinet Informative Session Friday, November 22 at noon
  • URISA Connect Proper Care and Feeding of Metadata

    1. 1. Proper Care and Feeding of Metadata Ryan E. Bowe, GISP Photo Science, Inc. – A Quantum Spatial Company Vanguard Cabinet Member Cumberland URISA Secretary
    2. 2. What worked last year…
    3. 3. That’s the tip of the iceberg!
    4. 4. Additional URISA Educational Offerings URISA In Person workshops URISA Connect webinars including GIS Program Management in December URISA Leadership Academy in Calgary, Alberta – May 5-9, 2014 in Louisville, Kentucky – October 13-17, 2014 GIS-Pro 2014 Annual Conference in New Orleans, Louisiana – September 8-11, 2014 For more information on all of these events visit http://www.urisa.org