Giddens ecn2013

327 views
219 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
327
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Crash Course - Hope that you learn some technical stuff today and some places you can reach out for help.
  • Close to 20 years of programming experience with 9 years focused directly in informatics to help make things better.
  • Background with herbaria building equipment and software to capture specimen images. 1st year at ECN
  • Hodges number vs scientific names
  • Crash Course - Hope that you learn some technical stuff today and some places you can reach out for help.
  • Giddens ecn2013

    1. 1. Getting collection data, maps, and images online via open source and commercial solutions Michael Giddens
    2. 2. Software developer with a focus in biodiversity informatics. Follow me @silverbiology
    3. 3. What we do • Design workflows and software to optimize image capture • Analyze & Process label images • Create portals for entomological and scientific collections • Develop interactive maps to tell stories about data • Provide support and technical advice for NSF projects
    4. 4. Digitization & Data Capture • • • • • • • Seconds count 28,800 seconds in an 8 work day 100k @ 30 seconds = 34.7 days 100k @ 29 seconds = 33.5 days … 100k @ 15 seconds = 17.3 days Humans are not robots
    5. 5. Solutions • • • • • • Look at every action as a micro task Find tasks to fill any wait time Stick to a single workflow Filename conventions are important Stick with image sizes and formats needed Renaming filenames using scanners or data entry e.g. SilverImage • Backup Images!!!
    6. 6. Things we learned • Make sure your lighting environment does not change • Dragon dictation is not accurate enough for number or scientific words • Manually renaming files is slow • Some student workers do not care as much as you do about your collection • People get burned out
    7. 7. Data Processing • • • • Optical Character Recognition Engines Machine Learning Crowd Sourcing Human in the Middle
    8. 8. Optical Character Recognition Engines • Free – Tesseract • Commercial – OmniPage – Abbyy • Services – www.silverbiology.com • Font Training • No handwriting solution on market
    9. 9. Machine Learning • • • • • Data Dictionaries Conditional logic / Decision Trees Past data to predict future data Label / Word Boundaries Orientation
    10. 10. Crowd Sourcing Notes From Nature • http://www.notesfromnature.org Ornithological from Natural History Museum Calbug – Essig Museum Collections
    11. 11. ALA Volunteer Program • http://volunteer.ala.org.au
    12. 12. Human In The Middle • • • • • • Rotating Images Tagging Areas Metadata tagging Identifying False Positives Verification Steps Bulk Validation
    13. 13. Web Portals • • • • In-House Specify 6 Portal Symbiota SilverCollection – California Academy of Sciences – Angelo State Natural History Museum – Louisiana State Arthropod Museum – Kansas State University Entomology Dept. – Mississippi Entomological Museum – NLBIF
    14. 14. Explore / Browse • • • • • • • • Taxonomy Taxonomy (Filtered) Family Genus Type Status Regions Collectors Custom
    15. 15. Custom Checklists
    16. 16. Spreadsheet Format
    17. 17. Collecting Events
    18. 18. Images
    19. 19. Specimen Details
    20. 20. Reports
    21. 21. Interactive Maps
    22. 22. • Online service to Map, Analyze and Build applications with your data • Simple to use • Easily create distribution maps, heat maps, and category maps • Access to full geospatial query engine • Visualizing ecological models • Works well with lots of data
    23. 23. GBIF - 350 Million Records http://www.gbif.org/occurrence
    24. 24. Visualizing two months in the life of seagull Eric Blog on Lifewatch by Peter Desmet
    25. 25. Interactive Occurrence Data
    26. 26. Interactive Map Modes Density Maps Polygons Grids
    27. 27. Useful Tools Provided By the Global Biodiversity Information Facility http://tools.gbif.org • • • • • • • Darwin Core Archive Assistant Darwin Core Archive Validator Higher Taxonomy Services Name Finder Name Parser GBIF API Services Integrated Publishing Toolkit (IPT)
    28. 28. Global Names Architecture http://www.gloablnames.org • Global Names Recognition and Discovery • Global Names Index
    29. 29. Questions? Michael Giddens www.silverbiology.com

    ×