Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

YAMZ.net: better, faster, cheaper taxonomy building

110 views

Published on

YAMZ.net is a tool for taxonomy building. Metadata vocabulary standardization ranks among the most awful design-by-committee experiences, whether at the international standards level or at the working group level. We used a crowdsourced metadata dictionary with reputation-based voting, and in which every term gets a unique persistent identifier. In the second half, are exercises to see how it all works in practice.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

YAMZ.net: better, faster, cheaper taxonomy building

  1. 1. YAMZ.net: better, faster, cheaper taxonomy building Anne Bowser, Jane Greenberg, Greg Janée, John Kunze, Manoj Tuguru
  2. 2. 2 Twofer 1. John: term hardening in open yamz.net dictionary • Crowdsourced, but with reputation-based voting • Every term has a unique persistent identifier (PID) 2. Jane: taxonomy testing with yamz.net • Give it a whirl • Feedback 2
  3. 3. • Open identifiers deserve their own festival • 9 and 10 November in Reykjavik, Iceland! • If you’re doing something interesting with PIDs (or you want to!) come and share your ideas with a crowd of like-minded innovators • Register at http://pidapalooza.org
  4. 4. 4 What is Metadata? Structured information about things in collections, eg, • Tools in a database • Birds observations in a local park • Insect specimens in a museum drawer Metadata used in every knowledge domain, eg, • Software in a repository (many per-domain repos) • Books in a library (many per-domain libraries) • Inventory in shopping sites 4
  5. 5. 5 Metadata as structured info? Structure = special punctuation to set off regions, eg, Cost: 12.00 USD <rating> 3 stars </rating> • Elements: Cost, rating • Values: 12.00 USD, 3 stars Terms to standardize: • USD, stars, Cost, rating, … 5
  6. 6. 6 yamz.net (yet another metadata zoo) 6
  7. 7. 7 Problem: traditional standardization • Change by committee is ugly, costly, and slow • Example: Dublin Core, 15 cross-domain terms • same terms after 5 years as after 11 months • new terms banned in fear of fragile consensus 7 European Parliament Technology - DG ITEC @ flickr
  8. 8. The Metadata Universe Jenn Riley, IU
  9. 9. The Metadata Universe Jenn Riley, IU
  10. 10. The Metadata Universe Jenn Riley, IU
  11. 11. The Metadata Universe Jenn Riley, IU
  12. 12. The Metadata Universe Jenn Riley, IU
  13. 13. 13 An alternate metadata universe • Vision: one dictionary, one namespace • All research domains, any part of “metadata speech” • Names, values, units, relationships, ... 13 SimonRobertson@flickr
  14. 14. 14 Crowdsourced, but with voting 14 vernacular canonical deprecated 3 classes of term ç all terms are born here ç these don’t evolve ç so terms never go away Each term gets a unique persistent id. Example: identifier: http://n2t.net/ark:/99152/h1193 term: oba definition: other (origin: from Tagalog)
  15. 15. 15 Reputation-based voting resists “gaming” • Meritocracy: strong terms rise, weak terms decline • Lessons from StackOverflow, Internet standards, and Wikipedia processes 15 Karunakar Rayker @flickr
  16. 16. 16 YAMZ usage patterns 16 Search for terms (words and definitions) find a term you love great – use it find a term you kind of love try it out, comment, engage with author no workable term found instantly enter own term and watch for comments find a word you love “I want that word!”, so enter a competing termbut a definition you hate
  17. 17. 17 Term tag in YAMZ 17
  18. 18. 18 Crowd-sourced yamz.net taxonomy builder • Meritocracy via reputation-based voting • Per-domain and cross-domain • Persistent identifiers to avoid ambiguity Classes of terms • Responsive and relevant: vernacular • Leverage and stability: canonical • Permanent historical record: deprecated 18
  19. 19. 19 PART 2: Testing YAMZ 1. YAMZ basic functions • Search • Comment • Create a term 2. Citscitools portal #citsci example 3. Give it a try • 2 cases 4. Feedback, next steps
  20. 20. 20 http://yamz.net/about http://yamz.net/ YAMZ basic functions 1. Search 2. Comment 3. Create a term
  21. 21. 21 Describing tools….relax…
  22. 22. 22 Describing tools….continued
  23. 23. 23 Portal for Citscitools
  24. 24. 24 http://yamz.net/ Case studies 1. Citizen scientist--Rebecca 2. Decision maker--Doug
  25. 25. 25 Citizen Scientist-Rebecca Rebecca and her neighbors are concerned about the air quality …learn…air quality monitoring stations are not granular enough to represent their neighborhood. Rebecca purchases an ambient air quality sensor she finds online and recruits neighbors …demonstrates a violation of the National Ambient Air Quality. Their Regional EPA contact informs them that the data cannot be used because of quality assurance issues. …a quick online search on SciStarter Project Database, Rebecca finds a local EPA supported project in need of help. …The Tools Database makes it easy for them to read reviews of low cost instruments- including those generating data accepted by the local EPA’s standards- and to “build, borrow or buy” the tools. She and her neighbors regularly meet up for training and to share their concerns with the Regional EPA office. The residents and the EPA develop mutual respect for each other and work together to discover and address community concerns.
  26. 26. 26 Decision-Maker-Doug Doug works…state wildlife department. His job is to interpret data from wildlife studies to determine whether to allocate more state resources towards protecting the habitat of an endangered species. Doug reads studies from state wildlife agency…university scientists, but…sample size and geographic coverage is insufficient to understand the movement of this species. He stumbles across a citizen science project that uses camera traps, collects hair and scat samples, and analyzes species DNA in a local DIY biology space. While impressed, he is skeptical of their methods and equipment because it’s a volunteer effort. Doug contacts the citizen science group and asks about their methods and equipment. The group directs him to the SciStarter Citizen Science Tools Database, where they bowered their equipment from, and he finds a wealth of information documenting the development and accuracy of the equipment. This information allows Doug to incorporate citizen science research on the species in question into his own justification for using state resources to increase conservation of this species.
  27. 27. Rebecca … online search on SciStarter Project Database, Rebecca finds a local EPA supported project in need of help. …The Tools Database makes it easy for them to read reviews of low cost instruments- including those generating data accepted by the local EPA’s standards- and to “build, borrow or buy” the tools. She and her neighbors regularly meet up for training and to share their concerns with the Regional EPA office. The residents and the EPA develop mutual respect for each other and work together to discover and address community concerns. Doug Doug contacts the citizen science group and asks about their methods and equipment. The group directs him to the SciStarter Citizen Science Tools Database, where they bowered their equipment from, and he finds a wealth of information documenting the development and accuracy of the equipment. This information allows Doug to incorporate citizen science research on the species in question into his own justification for using state resources to increase conservation of this species.
  28. 28. 28 MAKE Your Metadata term(s): http://yamz.net/ 1. Discuss your case • How would you search #citsci • What metadata terms would help 2. Explore YAMZ • Spend a few minutes searching YAMZ to see if there are any terms that are suitable (or not suitable). • Comment on a term 3. Identify 1 to 3 metadata terms and add them to YAMZ 4. Feedback
  29. 29. 29 Feedback - discussion 1. 3 things that you like 2. 3 things that you found confusing 3. other…
  30. 30. 3030 Discussion and Conclusions 1. Complementary and alternative approaches • More than one way to skin a cat… • Rationalist approach drawbacks tunnel vision, egotistical, + scope creep • Ownership has appeal, community ownership++ 2. Next steps • Populate, test, engage • Ranking algorithm • Community engagement Toothbrush

×