Successfully reported this slideshow.
Your SlideShare is downloading. ×

Automating Quality Control

Loading in …3

Check these out next

1 of 25 Ad

Automating Quality Control

Download to read offline

Validating your data is a critical step in almost every workflow. Learn how to build FME workspaces to automatically detect and repair problems with attributes, geometry, and more, and how to build a portal to let end users perform data validation on demand. Plus, learn about new functionality in FME Server for detecting workspace failures.

Validating your data is a critical step in almost every workflow. Learn how to build FME workspaces to automatically detect and repair problems with attributes, geometry, and more, and how to build a portal to let end users perform data validation on demand. Plus, learn about new functionality in FME Server for detecting workspace failures.


More Related Content

Slideshows for you (20)

Similar to Automating Quality Control (20)


More from Safe Software (20)


Automating Quality Control

  1. 1. Level 3: Automating Quality Control
  2. 2. AGENDA 1 2 3 4 Why do we validate data? Indoor Mapping standards compliance Validating CAD data Validating topology 5 Automating validation workflows
  3. 3. Data validation means checking ... ● Single objects (geometry and attributes) ● Relationships between objects ● Completeness ● Correctness ● Standards compliance
  4. 4. Data validation means checking ... ● Schema or data model ● Attribute values and domains ● Geometry ● Topology and spatial relationships ● Networks ● And more
  5. 5. Venues worldwide are generating indoor maps of their spaces for: ○ Space management / planning ○ Geolocating assets ○ Helping patrons navigate Indoor Mapping
  6. 6. Indoor Mapping Challenges ● Must integrate multiple sources to produce an indoor map. ○ GeoJSON, Revit, IFC, CAD (Autodesk, Bentley), Civil 3D, Esri Geodatabase, databases, CityGML … ● Must transform inconsistent data. ● Must comply with specifications of the indoor format, e.g. IMDF, HERE, ArcGIS Indoors, IndoorGML. ○ Strict data models and explicit spatial relationships. ● Venues constantly change, so maps need to be updated automatically.
  7. 7. Tips for Validating Attributes ● Phone Numbers / UUID / Business Names: ○ AttributeValidator and regular expressions (^+[0-9-]{10,15}$|^$) ● Hours of Opening – OSM Standard: ○ “24/7”, “Mo-Fr 08:30-20:00” ● Websites: ○ Regular expressions ^http://|^https:// ○ HTTPCaller & HTTP Status Code
  8. 8. Useful Transformers for Validating Geometries ● GeometryValidator – pass only valid geometries. ● GeometryFilter – filter by geometry type and pass only valid ones. ● SpatialFilter or SpatialRelator – ensure valid spatial relationships. ○ Choosing the right spatial join transformer: see the article
  9. 9. Automated IMDF Validation A. Upload your IMDF data and get your validation report. or A. Add an IMDFValidator transformer to your workspace - available from FME Hub -
  10. 10. CAD Standards Compliance
  11. 11. CAD Data Key source of data updates for many GIS departments. ● Very loose schemas or data models. ● Hard to impose a drawing standard on contractors. ● Often more detail than is needed in GIS.
  12. 12. ● Digital Submission Compliance ● Contractor CAD data added to GIS ● CAD standard ○ Standards Checker ○ Attribute Checker ○ Topology Checker City of Kitchener
  13. 13. Validating Topology Hydrographic Networks, Electric, Water, Gas Networks
  14. 14. Tips for Validating Topology ● Relationships include: ○ Connectivity ○ Adjacency ○ Enclosure ● Rules: ○ ISO 19110 Feature Catalog ○ Database connectivity rules
  15. 15. Natural Resources Canada Maintaining the feature catalog for Canada’s national map Spatial Relationships Attribute Values
  16. 16. FME Workspace for NRCan’s Catalog Validation
  17. 17. Utility Network Topology: Connectivity
  18. 18. Utility Network Migration Workspaces ● Schema mapping ● Topology ○ Geometric Network (lines & junctions) ○ Explicit network (associations between junctions & devices) ● Creating Assemblies ArcGIS Device UN Assembly
  19. 19. Transformers for Validating Connectivity ● NetworkTopologyCalculator for building geometric networks (lines & junctions). ● SpatialFilter for identifying objects that are supposed to connect, e.g. devices on lines. ● TopologyBuilder and PointOnLineOverlayer for building connected features and identifying missing junctions/devices.
  20. 20. Validating Automatically Tip: set up your data validation workflows to run automatically. ● On a schedule, e.g. daily quality control. ● In response to an event. ○ “Watch” a directory, FTP, Amazon S3 bucket … ○ Email. ○ Database triggers. ● As a web service. ● Self-serve drag-and-drop webpage (or mobile app) that anyone on the team can use.
  21. 21. To ensure complete, correct, and compliant data, we must check: ● Attributes ● Geometries ● Topology
  22. 22. Data Validation Resources Improving Data Compliancy Using FME City of Kitchener CAD Data Validation using FME Colonial Pipeline Data Validation Victories: Tips for Better Data Quality Safe Webinar FME Extensive Usage Inside the Mapping Production System Natural Resources Canada Creating & Validating IMDF Knowledge Center Ultimate Geospatial Data Validation Checklist Safe Blog IMDF Validator
  23. 23. Questions?

Editor's Notes

  • Grabber: Technology is an amplifier or multiplier. Technology can amplify both good and poor processes. In the context of data quality - technology can amplify the benefits of working with high quality data in your work - like increased productivity. OR it can amplify the effects of poor data quality - frustrating users, poor decisions based on incorrect data.

    Subject: I’m going to discuss some ideas on data validation and how FME can help with different validation tasks.

    Message: Why are we talking about data validation and compliance - again. Well it’s important.
    We’ve talked about data validation quite a bit in the past. Working with high quality, accurate data is more fun. You get to focus your time solving real problems rather than just cleaning data. FME has many tools to ensure your data is valid and complies to data standards used in your organisation.

    Why is compliance important. Because:
    Garbage in garbage out
    Bad data wastes time and resources
    No one has fun working with bad data
    FME is a great tool for helping you to validate your data
  • We’ve had great customer presentations from FME users such as City of Kitchener, Natural Resources Canada on what they are doing with FME on data validation. We’ll present a summary of some of their thoughts here.
  • Data validation takes on many forms:
    Are we validating a single object (self-intersection, attribute validation), or are we validating the relationship between objects(spatial contains, parent-child)
    Completeness – Is the data complete, no missing mandatory fields. For example, if you’re creating indoor mapping data you need entrance
    Correctness – do the data values meet the standard or data model that has been agreed on.

  • But there are also feature relationships -:
    Data Model: Is there a table relationship that has to be met, parent-child
    Data Model: Is the topology correct? Does an island in a lake touch a lake edge? Do county boundaries touch or overlap? Do devices sit on the lines of a utility network
    Topology: Does a line form part of a network or is it disconnected. Do the network attributes also confirm to the network – for example for an electrical circuit, do all the circuitID’s match for a given circuit/ Does an 8” pipe connect to a 6” pipe?
    Network Topology: Are their junctions at the same location (duplicates) creating topology errors
  • Why:
    Indoor mapping is driven by “space management” and “indoor navigation”.
    the “blue dot” disappears the moment you go inside - we need to move the blue dot inside venues
    Navigation - allowing people to find their gate. Determining pedestrian choke points. Finding the best locations for revenue sources

    90% of our time is spent indoors.

    Who’s interested:
    Conference centers
    sports arenas, shoping malls / retail
    Train stations)
    Finance, real estate, property management (big on TRIRIGA)
    Campuses, higher education
  • There are challenges around creating indoor data that we don’t generally see in other mapping applications:
    Integrate: Often multiple data sources need to be combined.
    Transform: Data suppliers are not used to sharing their data.
    Revit used for design, but as-built models are rare
    CAD data has very loose data schemas
    Architectural data requirements do not align with indoor mapping data requirements
    i.e. in CAD walls have inside & outside. Indoor just needs the centerline
    Indoor needs explicit ‘entrances’ often not modeled in architecture.
    Comply: Indoor formats have strict data models that don’t often match architectural. Also formats have explicit spatial relationships between objects.
    Automate: Many venues change constantly -
    airports have gates shared between domestic & international
    Conference centers have dynamic layouts
    So timeliness of data can be critical.
    Keep your indoor maps in sync by setting up your workflow to run as new data arrives (FME Server).

    *** You can use FME to do all of this! Integrate sources, transform it to meet requirements, automatically keep indoor maps synchronized. ***
  • Here are some examples of validating attribution typical in Indoor mapping scenarios. Note - this is non-spatial data. So FME isn’t restricted to validating spatial data
    You can demo these in: 1.AttributeValidation.fmw

    AttributeValidator is your go to resource for validating attributes. Whether it’s a domain list or regular expression match. i.e. phone numbers, UUID, valid Names. There are also external tools you can call. For example there is a python function for validating the version of of a UUID code.

    Hours of Opening: Who’s been in one of the Exit Rooms? If you don’t check the closing hours carefully it’s easy to get left in their all night. OSM has a standard around opening times for enterprises. These need to be validated to ensure that the likes of Apple & Google maps show us the correct opening hours. There’s a great site for building Opening Hour example strings and also an API for working with opening hour that can be used to validate the opening hour strings . At Safe we’re not proud - if there is a great tool available that users can use in a workflow, we’ll give access to it!
    Creating Opening Hours webpage

    We can validate a URL in two ways.
    Ensure that the URL is basically valid using regex - great website here.
    Test to see if the website returns a result.
    FME can do this with regular expressions and HTTPCaller

    We have found that creating custom transformers that cover a single explicit validation test is the cleanest way to create validation workflows. We learnt this tip form our colleagues at NRCan, as we’ll see a little later
  • GeometryValidator
    Many options for single object geometry data validation such as self-intersection, duplicate points etc.
    Simply validate the geometry type. You can’t throw a point at a line feature class in a geodatabase.
    SpatialFilter / SpatialRelator and the *Overlayer series of transformers - i.e. PointOnLineOverlayer
    Validate the spatial relations between different objects - more on this later.

    Choosing the appropriate spatial relationship tool in FME - great article here
  • Safe has made an IMDF Validator available to make it easy to validate your indoor IMDF datasets before they are submitted to Apple. This is an example of automating your data validation workflows. Go to link to show the tool. Just drag and drop your file.
    There is also an IMDFValidator custom transformer you can include in your own FME workflows (available on FME Hub).

    Demonstrate the Validator using the source dataset This is the results of the the Esri to IMDF tutorial. You can download and run the tutorial if you wish. The report results are in ./results/IMDFReport.html, if you don’t want to wait for the email with the link. No need to explain all the results.

    IMDF Validator as been used by 40+ organisations for validating their IMDF datsets.

    The IMDF specification requires the conformance to about 240 rules. We have created a series of custom transformers like the ValidateHours and CheckWebsite that are used for the validation of different objects - there are about 140 unique tests in the IMDF validator
  • Processing digital submissions of CAD data is often a key part of the data processing workflow in GIS departments.
    Drawing compliance (Colonial Pipeline)
    Data compliance (City of Kitchener)
    MMCD (Master Municipal Construction Documents Association) in BC is an organisation trying to impose standards on data for municipalities to improve efficiency and accuracy of processing CAD data

    Problems when taking delivery of CAD data:
    Very loose standards & data models
    Data suppliers:
    Small contractors and architects with limited knowledge of GIS and more structured database data models
    A lot of detail - example
    There’s more to a CAD drawing than data - the frame and title blocks can also be validated - this is done by Colonial Pipeline
  • Quality GIS data is critical for CoK. Their Esri Enterprise Geodb is linked to other information systems such as: AMANDA, Cityworks, SAP, Stormwater Rate
    Most GIS data is updated through contractor data. To improve efficiency and accuracy CoK developed a CAD standard that all contractors need to confirm too for data delivery. CofK defined a standard DWG template for AutoCAD Map 3D

    All the validation is driven by excel spreadsheets that define the rules for the different validation steps.

    Kitchener logo has a link to the full presentation by David

    Planned updates to the digital submission compliance include :
    Update the attribute & topology - they were created in 2013! - to use any new and improved FME functionality
    Perhaps more use of FME Server

    DEMO: Based on the CofK data validation
    The Attribute Checker uses a csv file produced in Excel that identifies what object data field names should be attached to the entities on specific AutoCAD layers. Additional checks on each field can be performed – minimum and maximum character lengths, field data types, minimum and maximum number values, pick list restrictions, and whether the field is required to have a value or not.
    Here the key transformer is the Joiner. Each entity on a layer is joined to each layer attribute in the CSV. Then the entities attributes and their values are compared against the list of ‘valid’ attributes and values retrieved form the CSV
    Quick demo here - ..\demos\2 AttributeChecker
    Concept is very simple - match the feature to a record in the spreadsheet. If there is a Join then the record is valid for that test. If there is no join then the feature is invalid.
    . You can see that there is a custom transformer specifically for domain tests - in this case testing the MATERIAL domain. An example of building custom transformers for specific tests.

    You could do this in AttributeValidator, but using a spreadsheet makes it a little easier to maintain the schema in the long run - for example if wanted to change the MATERIAL domain, you just have to edit the spreadsheet

    Colonial Pipeline took this one step further and validated the entire CAD drawing including both the drawing space and the paper space (Frame, Titles etc.)

  • When we talk about topology there are three primary relationships we can look for:
    Most utility networks have connectivity rules;
    water main must connect to a smaller water main through a reducer
    High side of a transformer can only connect to the primary conductors
    Some networks, like hydrographic networks, also include enclosure rules
    An island in a lake must be inside a lake boundary, but can’t touch the boundary

    Topology Rules can be formalized in a feature catalog, i.e. the ISO 19110 feature catalog or can be define in a rule set in a database, such as the ArcPro Utility Network rules.

    Esri Utility Networks
  • NRCan - Natural Resources Canada CCCOT/CCMEO division is responsible for Canada’s national map,
    NRCan use FME in a wide range of data production and validation tasks. All their data is described with a ISO 19110 compliant feature catalog. NRCan uses their feature catalogs to drive the validation process
    Data validation uses the catalog to ensure attribute and topological compliance.

    A bit more on the ISO 19110 feature catalog here
  • FME reads the feature catalog and then validates the data against the catalog rules. DatabaseJoiner is the key transformer for grabbing the correct rule for each feature being validated. They have built a series of custom transformers for each specific test. Catalog validations include:
    Spatial relations validation Domain attribute validation Proximity validation Minimal dimension validation Segmentation validation Data clipping validation

    You can see the similar pattern here. A catalog of your rules either in a spreadsheet or database, and then a specific custom transformer to validate that rule. This makes maintenance of your validation rules easier, if there is a comprehensive set of rules.
  • We’ve been working with our colleagues at Esri to build migration workspaces from the ArcGIS Geometric Network to Esri UN Asset Package.
    Migrating to Utility Networks involves creating high fidelity devices from ‘simple’ devices in the original ArcGIS Geometric Network.
    Success in a migration like this, or any other data migration, depends on understanding the quality of the source data. Garbage in garbage out. Validation will tell you:
    Do you have to do clean-up before you can start the migration,
    can the migration workflows include some clean-up?

    FME has tools to help with these decisions. We’ve already looked at Attribute Validator for assessing attribute values and domains and some topology validation. FME also has tools you can use to check geometric network connectivity. This might include:
    Validate the connectivity of lines - water lines, conductors
    Validate devices sit on vertices on lines
    Detect missing junctions - such as T’s or Taps
    Check for duplicate device locations and duplicate vertices on lines
    You might also have to check database relationships -
    Check relationships - device to device unit tables
  • NetworkTopologyCalculator - builds a connected network and gives each network an ID. Very good for visualizing network inconsistencies

    SpatialFilter - great for identifying objects that are supposed to connect but do not.

    TopologyBuilder & PointOnAreaOverlayer can build lists of connected features at nodes/junctions which can be analysed to validate missing junctions, type of junction. A good example is a T connector. If a water pipe has a lateral line form a wMain, there should be a T-connector at that junction. Similarly a reducer at a node where the pipe diameter changes. Tap at an intersection of three conductors.

    DEMO: TopologyValidation.fmw
  • Just building a workspace that validates a dataset is one way of automating your validation process. For example, AutoCAD Drawing standards (DWS) files include tools for validating layers, attribution etc. But processing drawings with DWS can still be a very manual process (Colonial Pipeline talks about this in one of their FME presentations). An FME workspace can encapsulate all your validation rules into one workflow. In addition, you can automate how those validation tasks are triggered using FME Server. This can take the form of directory or FTP site watchers, emailing data for validation, drag n’ drop.

    Opportunity to mention Automations in terms of event-based workflows.
    Opportunity to mention FME Data Express in terms of self-serve options (anyone can run your validation workspace on their mobile device, all they need to do is pass in the file they want to validate!)
  • Simple example here on the FME Server demos (link on image)
  • In conclusion:
    Clean data is a key to working with data in today’s world here we have highly integrated data systems
    Validating your data should be a key part of your data processes
    FME can help for Attributes, geometries, topologies
  • Message repeat :
    Why is compliance important. Because:
    Garbage in garbage out
    Bad data wastes time and resources
    No one has fun working with bad data
    FME is a great tool for helping you to validate your data

    FME has all the tools you need to check every part of your datasets, no matter what the format. There are also tools for Repairing your data - but that is a topic for another day!

    QA should be a part of EVERY WORKFLOW.
  • Call to action: Talk to the experts team for ideas or review some of these materials to find ideas on data validation workflows that work for you

    Here are some references to the data validation stories that I’ve mentioned in this presentation + other resources


    IMDF Validator:

    Other resources