Presentation given at the Museums and the Web conference 2017 in Cleveland, OH, USA about the DigiBird project. The proposal can be found online here: http://mw17.mwconf.org/proposal/digibird-on-the-fly-collection-integration-supported-by-the-crowd/ and the full paper here: http://mw17.mwconf.org/paper/digibird-on-the-fly-collection-integration-supported-by-the-crowd/
2. valorisation project
May 2016 to November 2016
DigiBird project
Chris Dijkshoorn Cristina Bucur Lora Aroyo
Maarten Brinkerink Sander Pietersen Saskia Scheltjens
3. ‣ Why DigiBird?
‣ The DigiBird project:
‣ overview of the system
‣ institutions and projects involved
‣ Crowdsourcing and its challenges
‣ Conclusions
Outline
8. ‣ Digital collection:
‣ 650,000+ object records
‣ 274,000+ images
‣ Links to thesauri
‣ Graph search algorithm
‣ Annotate online collections
‣ Different domains: bible, birds, fashion, etc.
Rijksmuseum - Accurator
9. ‣ Focus on art-historical information
‣ Use of crowdsourcing and niche groups with
expertise
‣ Semantic annotations: a set of connecting
resources
‣ Occasional lack of expertise regarding subject
matter annotations
Rijksmuseum - Accurator
11. Xeno-canto Found for Nature Sounds
● Popularise bird sounds & recording
● Improve access to bird sounds
● Increase the knowledge of bird
sounds
● Build the ultimate, complete sound
guide
● Through www.xeno-canto.org
12. Xeno-canto collection in numbers
‣ 4,728+ hours of recording
‣ 328,632 recordings
‣ 9,662 species
‣ (> 90% of all species)
‣ 3,309 record lists
17. ‣ Cultural-historical organization
‣ > 70% Dutch audiovisual heritage from 1898
‣ 1,000,000+ hours of TV, radio, music and film
‣ 11 petabyte of data
‣ Various video collections
Netherlands Institute for Sound & Vision
18. What`s that?
‣ Video tagging game
‣ Audiovisual heritage collections
‣ 3 implementations
‣ > 1,000,000 social tags
‣ Digibird project
‣ Natuurbeelden collection
20. ‣ “Cognitive surplus”: build a bigger collection
‣ Gather annotations by engaging the crowd
‣ Everyone can participate
‣ Break hard problems in smaller tasks
‣ Scalable and cheap way of generating content
Crowdsourcing
21. ‣ “Cognitive surplus”: build a bigger collection
‣ Gather annotations by engaging the crowd
‣ Everyone can participate
‣ Break hard problems in smaller tasks
‣ Scalable and cheap way of generating content
BUT…
Crowdsourcing
23. ‣ Crowdsourcing tasks are undertaken in isolation
‣ It takes time to collect data
‣ It demands continuous promotional effort
‣ It is challenging for institutions to incorporate
the results of crowdsourcing into their existing
infrastructure
Crowdsourcing Challenges
25. ‣ Every institution has its own system
‣ No visibility similar initiatives
Challenge 1: Crowdsourcing tasks
are undertaken in isolation
26. ‣ Every institution has its own system
‣ No visibility similar initiatives
DigiBird solution
‣ Create a hub
‣ Provide on the fly integration
‣ Use a shared vocabulary
Challenge 1: Crowdsourcing tasks
are undertaken in isolation
30. Thesauri can bridge collection
IOC World Bird List
‣ 33,801 terms
‣ Structured using Simple Knowledge
Organization System (SKOS)
‣ Persistent identifiers
Importance shared vocabulary
31. Goals
‣ Make results available on the fly
‣ Provide insights in progress
DigiBird pipeline
34. ‣ Crowdsourcing relies on voluntary contributions
‣ Unpredictable when people will contribute
How DigiBird helps
‣ Monitor progress
Challenge 2: It takes time to collect
data
38. ‣ Data siloes
‣ Trust in data
Challenge 4: It is challenging for
institutions to incorporate the results
of crowdsourcing into their existing
infrastructure
39. ‣ Data siloes
‣ Trust in data
DigiBird solutions
‣ Provide a way to directly access data
‣ Different output formats
‣ Refine and review contributions
Challenge 4: It is challenging for
institutions to incorporate the results
of crowdsourcing into their existing
infrastructure
42. ‣ DigiBird: a hub for 4 distinct crowdsourcing
projects
‣ Integrated access to different collections about
birds
‣ Real-time access to information added
‣ Engaging audiences in crowdsourcing
Conclusions
44. On the fly collection integration supported by the crowd
Cristina-Iulia Bucur c.i.bucur@vu.nl
Chris Dijkshoorn c.r.dijkshoorn@vu.nl
www.digibird.org
Source code is available online:
‣ github.com/rasvaan/digibird_api
‣ github.com/rasvaan/digibird_client
Useful links:
‣ accurator.nl
‣ nederlandsesoorten.nl
‣ rijksmuseum.nl
‣ xeno-canto.org
‣ waisda.beeldengeluid.nl
45. Data retrieval
Request
formulation
Data integration
Response
formulation
Query filter
Merel
Request search
Merel
Request parameter
Turdus merula
Query concept
ioc:Turdus_merula
DigiBird pipeline example: retrieve information about a blackbird
-
===
-
rec
===
dc:creator
creator
===
dc:creator
creator
===
dc:creator
JSON result list SPARQL result list SPARQL result list
Return JSON, JSON-LD, N-Quads or Turtle
JSON result list