The document discusses bridging the gap between people and the massive amount of online multimedia content. It proposes decomposing videos and images into smaller fragments and building a media graph to link these fragments based on semantic relationships. Both machine learning and crowdsourcing are used to analyze and enrich media with metadata at scale. The goal is to turn "mute" images and context-free videos into relationship-aware media that allows nonlinear exploration. This would provide a more engaging experience for online audiences.
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Lora Aroyo
Presentation at the "Past, Present and Future of Digital Humanities & Social Sciences in the Netherlands" event, http://www.ehumanities.nl/past-present-and-future-of-digital-humanities-social-sciences-in-the-netherlands-programme-and-abstracts-2/
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityLora Aroyo
Software systems are becoming ever more intelligent and more useful, but the way we interact with these machines too often reveals that they don’t actually understand people. Knowledge Representation and Semantic Web focus on the scientific challenges involved in providing human knowledge in machine-readable form. However, we observe that various types of human knowledge cannot yet be captured by machines, especially when dealing with wide ranges of real-world tasks and contexts. The key scientific challenge is to provide an approach to capturing human knowledge in a way that is scalable and adequate to real-world needs. Human Computation has begun to scientifically study how human intelligence at scale can be used to methodologically improve machine-based knowledge and data management. My research is focusing on understanding human computation for improving how machine-based systems can acquire, capture and harness human knowledge and thus become even more intelligent. In this talk I will show how the CrowdTruth framework (http://crowdtruth.org) facilitates data collection, processing and analytics of human computation knowledge.
Some project links:
- http://controcurator.org/
- http://crowdtruth.org/
- http://diveproject.beeldengeluid.nl/
- http://vu-amsterdam-web-media-group.github.io/linkflows/
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Lora Aroyo
Presentation at the "Past, Present and Future of Digital Humanities & Social Sciences in the Netherlands" event, http://www.ehumanities.nl/past-present-and-future-of-digital-humanities-social-sciences-in-the-netherlands-programme-and-abstracts-2/
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityLora Aroyo
Software systems are becoming ever more intelligent and more useful, but the way we interact with these machines too often reveals that they don’t actually understand people. Knowledge Representation and Semantic Web focus on the scientific challenges involved in providing human knowledge in machine-readable form. However, we observe that various types of human knowledge cannot yet be captured by machines, especially when dealing with wide ranges of real-world tasks and contexts. The key scientific challenge is to provide an approach to capturing human knowledge in a way that is scalable and adequate to real-world needs. Human Computation has begun to scientifically study how human intelligence at scale can be used to methodologically improve machine-based knowledge and data management. My research is focusing on understanding human computation for improving how machine-based systems can acquire, capture and harness human knowledge and thus become even more intelligent. In this talk I will show how the CrowdTruth framework (http://crowdtruth.org) facilitates data collection, processing and analytics of human computation knowledge.
Some project links:
- http://controcurator.org/
- http://crowdtruth.org/
- http://diveproject.beeldengeluid.nl/
- http://vu-amsterdam-web-media-group.github.io/linkflows/
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...Lora Aroyo
http://mw2016.museumsandtheweb.com/proposal/accurator-enriching-collections-with-expert-knowledge-from-the-crowd/
Crowdsourcing is not a new phenomenon for museums. There are good examples for museums (e.g., Powerhouse museum, steve.museum). But not all crowdsourcing initiatives are successful. Crowdsourced tagging does not always contribute to a better understanding of art and can even be confusing.
The Rijksmuseum and the VU University Amsterdam developed the Accurator: a visual tool to get experts in domains like birds, bibles, ships, castles, etc. involved in annotating art and enrich the museums’ metadata with expertise that is not available internally.
In this how-to session, we demonstrate the tool and the ways other museums can implement this Open Web application for their own collections.
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Lora Aroyo
This presentation was given at the NL eSchience Center during the "De Geest Uit De Fles" event for the kick off of eHumanities project in 2014:
http://esciencecenter.nl/agenda/703-26-may-de-geest-uit-de-fles/
Stitch by Stitch: Annotating Fashion at the RijksmuseumLora Aroyo
https://www.rijksmuseum.nl/en/stitch-by-stitch
http://annotate.accurator.nl/
Fashion can be found everywhere in museums. Fashion heritage collected over centuries: costumes, accessories, paintings, prints and photographs. But while some clothes and accessories are easily found and identified, others are obscure and require a trained eye to describe. What are we looking at? What kind of sleeve is this? Which materials and techniques have been used? More specific descriptions of the images facilitate better use of digital collections and enable users to wander through them in detail.
Visualization of Disagreement-based Quality Metrics of Crowdsourcing DataCrowdTruth
Crowdsourcing represents a significant source of data which needs to be analyzed and interpreted. These tasks influence the quality of the output as well as the efficiency of the process. Visualization proved to be an effective way of dealing with large amount of data. In this paper we propose a visualization analytic model in the context of the CrowdTruth framework and CrowdTruth metrics for optimizing the crowdsourcing process and improving its data quality. The requirements for the dynamic, scalable and interactive visualizations were extracted through literature and interviews with users of the framework.
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Lora Aroyo
http://crowdtruth.org
Processing real-world data with the crowd leaves one thing absolutely clear - there is no single notion of truth, but rather a spectrum that has to account for context, opinions, perspectives and shades of grey. CrowdTruth is a new framework for processing of human semantics drawn more from the notion of consensus then from set theory.
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...CrowdTruth
Crowdsourced annotations data offers cognitive computing systems insights in lay semantics. This is especially important in health care, where medical terminology is often not aligned with patients `lay' language. However, the general crowd often has limited medical knowledge. Therefore this research investigated the opportunities of social health websites for obtaining ground truth annotations data for cognitive computing systems including clinical decision support systems. By identifying these websites and analyzing their data, it offers a starting point for the future utilization of user-generated health content for cognitive systems. However, the opportunities of social health data are currently limited by various legal regulations. Therefore this paper also dwells on the legal aspects of implementing social health data for cognitive computing systems.
Nathalie Nahai - Culture and its influence on websites (part 2)Nathalie Nahai
In this keynote presentation for SearchLove (London 2013), Nathalie explains how Hofstede’s dimensions can be used to profile your audience and convert more customers on your website.
Whether you're in e-commerce, consultancy, design or business, you'll discover which website elements and psychological techniques you can use to target customers from specific cultures (including your own).
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...Lora Aroyo
http://mw2016.museumsandtheweb.com/proposal/accurator-enriching-collections-with-expert-knowledge-from-the-crowd/
Crowdsourcing is not a new phenomenon for museums. There are good examples for museums (e.g., Powerhouse museum, steve.museum). But not all crowdsourcing initiatives are successful. Crowdsourced tagging does not always contribute to a better understanding of art and can even be confusing.
The Rijksmuseum and the VU University Amsterdam developed the Accurator: a visual tool to get experts in domains like birds, bibles, ships, castles, etc. involved in annotating art and enrich the museums’ metadata with expertise that is not available internally.
In this how-to session, we demonstrate the tool and the ways other museums can implement this Open Web application for their own collections.
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Lora Aroyo
This presentation was given at the NL eSchience Center during the "De Geest Uit De Fles" event for the kick off of eHumanities project in 2014:
http://esciencecenter.nl/agenda/703-26-may-de-geest-uit-de-fles/
Stitch by Stitch: Annotating Fashion at the RijksmuseumLora Aroyo
https://www.rijksmuseum.nl/en/stitch-by-stitch
http://annotate.accurator.nl/
Fashion can be found everywhere in museums. Fashion heritage collected over centuries: costumes, accessories, paintings, prints and photographs. But while some clothes and accessories are easily found and identified, others are obscure and require a trained eye to describe. What are we looking at? What kind of sleeve is this? Which materials and techniques have been used? More specific descriptions of the images facilitate better use of digital collections and enable users to wander through them in detail.
Visualization of Disagreement-based Quality Metrics of Crowdsourcing DataCrowdTruth
Crowdsourcing represents a significant source of data which needs to be analyzed and interpreted. These tasks influence the quality of the output as well as the efficiency of the process. Visualization proved to be an effective way of dealing with large amount of data. In this paper we propose a visualization analytic model in the context of the CrowdTruth framework and CrowdTruth metrics for optimizing the crowdsourcing process and improving its data quality. The requirements for the dynamic, scalable and interactive visualizations were extracted through literature and interviews with users of the framework.
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Lora Aroyo
http://crowdtruth.org
Processing real-world data with the crowd leaves one thing absolutely clear - there is no single notion of truth, but rather a spectrum that has to account for context, opinions, perspectives and shades of grey. CrowdTruth is a new framework for processing of human semantics drawn more from the notion of consensus then from set theory.
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...CrowdTruth
Crowdsourced annotations data offers cognitive computing systems insights in lay semantics. This is especially important in health care, where medical terminology is often not aligned with patients `lay' language. However, the general crowd often has limited medical knowledge. Therefore this research investigated the opportunities of social health websites for obtaining ground truth annotations data for cognitive computing systems including clinical decision support systems. By identifying these websites and analyzing their data, it offers a starting point for the future utilization of user-generated health content for cognitive systems. However, the opportunities of social health data are currently limited by various legal regulations. Therefore this paper also dwells on the legal aspects of implementing social health data for cognitive computing systems.
Nathalie Nahai - Culture and its influence on websites (part 2)Nathalie Nahai
In this keynote presentation for SearchLove (London 2013), Nathalie explains how Hofstede’s dimensions can be used to profile your audience and convert more customers on your website.
Whether you're in e-commerce, consultancy, design or business, you'll discover which website elements and psychological techniques you can use to target customers from specific cultures (including your own).
Presentation I gave at Innotech in fall of 2008 on Practical Government and the importance of Open Data standards. kind derivative of others here... but I did promise I would publish it (though I'm a bit late...)
The Community Engagement projects (currently known as e-Learning Creative Community Partnerships) have moved from using discussion forums, to trialling a range of social software tools. We've been invited by the Social Software Research project, to be a case study, and share the progress so far.
Cannes Cyber Lions 2011 — Winners and TrendsJakob Kahlen
Cannes Cyber Lions Winners and Trends 2011 presented at the Best Internet Conference 2011 in Tallinn Estonia by Cannes Cyber Lions jurymember and Creative Director Jakob Kahlen of Hello Monday.
Beyond Squishy: The Principles of Adaptive DesignBrad Frost
Responsive web design has hit the scene like a bomb, and now designers everywhere are showing off to their bosses and peers by resizing their browser windows. "Look! The site is squishy!"
While creating flexible layouts is important, there's a whole lot more that goes into truly exceptional adaptive web experiences. This session will introduce the Principles of Adaptive Design: ubiquity, flexibility, performance, enhancement and future-friendliness. We need go beyond media queries in order to preserve the web's ubiquity and move it in a future-friendly direction.
This is a presentation I did for the SLAQ 2008 conference in Brisbane on the types of Web 2.0 content TLs can capture to try to enhance their students' learning experiences.
What Is The Atomic Weight of Your Content & Why It MattersMichael Pranikoff
This workshop was given by PR Newswire Global Director of Emerging Media - Michael Pranikoff - at the PRSA North Pacific Region Conference on 6-25-16. The presentation walks through emerging media trends and the importance of creating content that is easy to quickly understand, digest, and make shareable both from a PR aspect of reaching media as well creating content that is direct for your intended audience.
Similar to "Video Killed the Radio Star": From MTV to Snapchat (20)
The Rijksmuseum Collection as Linked DataLora Aroyo
Presentation at ISWC2018: http://iswc2018.semanticweb.org/sessions/the-rijksmuseum-collection-as-linked-data/ of our paper published originally in the Semantic Web Journal: http://www.semantic-web-journal.net/content/rijksmuseum-collection-linked-data-2
Many museums are currently providing online access to their collections. The state of the art research in the last decade shows that it is beneficial for institutions to provide their datasets as Linked Data in order to achieve easy cross-referencing, interlinking and integration. In this paper, we present the Rijksmuseum linked dataset (accessible at http://datahub.io/dataset/rijksmuseum), along with collection and vocabulary statistics, as well as lessons learned from the process of converting the collection to Linked Data. The version of March 2016 contains over 350,000 objects, including detailed descriptions and high-quality images released under a public domain license.
FAIRview: Responsible Video Summarization @NYCML'18Lora Aroyo
Presentation at the NYC Media Lab (NYCML2018). There is a growing demand for news videos online, with more consumers preferring to watch the news than read or listen to it. On the publisher side, there is a growing effort to use video summarization technology in order to create easy-to-consume previews (trailers) for different types of broadcast programs. How can we measure the quality of video summaries and their potential to misinform? This workshop will inform participants about automatic video summarization algorithms and how to produce more “representative” video summaries. The research presented is from the FAIRview project and is supported by the Digital News Innovation Fund (DNI Fund), which is part of the Google News Initiative.
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...Lora Aroyo
Lora Aroyo, Chiel van den Akker, Marnix van Berchum, Lodewijk
Petram, Gerard Kuys, Tommaso Caselli, Jacco van Ossenbruggen, Victor de Boer, Sabrina Sauer, Berber Hagedoorn
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Lora Aroyo
The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to the volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, this assumption often creates issues in practice. Previous experiments we performed found that inter-annotator disagreement is usually never captured, either because the number of annotators is too small to capture the full diversity of opinion, or because the crowd data is aggregated with metrics that enforce consensus, such as majority vote. These practices create artificial data that is neither general nor reflects the ambiguity inherent in the data.
To address these issues, we proposed the method for crowdsourcing ground truth by harnessing inter-annotator disagreement. We present an alternative approach for crowdsourcing ground truth data that, instead of enforcing an agreement between annotators, captures the ambiguity inherent in semantic annotation through the use of disagreement-aware metrics for aggregating crowdsourcing responses. Based on this principle, we have implemented the CrowdTruth framework for machine-human computation, that first introduced the disagreement-aware metrics and built a pipeline to process crowdsourcing data with these metrics.
In this paper, we apply the CrowdTruth methodology to collect data over a set of diverse tasks: medical relation extraction, Twitter event identification, news event extraction and sound interpretation. We prove that capturing disagreement is essential for acquiring a high-quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with a majority vote, a method which enforces consensus among annotators. By applying our analysis over a set of diverse tasks we show that, even though ambiguity manifests differently depending on the task, our theory of inter-annotator disagreement as a property of ambiguity is generalizable.
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneLora Aroyo
Ambiguity in interpreting signs is not a new idea, yet the vast majority of research in machine interpretation of signals such as speech, language, images, video, audio, etc., tend to ignore ambiguity. This is evidenced by the fact that metrics for quality of machine understanding rely on a ground truth, in which each instance (a sentence, a photo, a sound clip, etc) is assigned a discrete label, or set of labels, and the machine’s prediction for that instance is compared to the label to determine if it is correct. This determination yields the familiar precision, recall, accuracy, and f-measure metrics, but clearly presupposes that this determination can be made. CrowdTruth is a form of collective intelligence based on a vector representation that accommodates diverse interpretation perspectives and encourages human annotators to disagree with each other, in order to expose latent elements such as ambiguity and worker quality. In other words, CrowdTruth assumes that when annotators disagree on how to label an example, it is because the example is ambiguous, the worker isn’t doing the right thing, or the task itself is not clear. In previous work on CrowdTruth, the focus was on how the disagreement signals from low quality workers and from unclear tasks can be isolated. Recently, we observed that disagreement can also signal ambiguity. The basic hypothesis is that, if workers disagree on the correct label for an example, then it will be more difficult for a machine to classify that example. The elaborate data analysis to determine if the source of the disagreement is ambiguity supports our intuition that low clarity signals ambiguity, while high clarity sentences quite obviously express one or more of the target relations. In this talk I will share the experiences and lessons learned on the path to understanding diversity in human interpretation and the ways to capture it as ground truth to enable machines to deal with such diversity.
Achieving Expert-Level Annotation Quality with CrowdTruth: The Case of Medical Relation Extraction. Anca Dumitrache, Lora Aroyo and Chris Welty. ==> http://ceur-ws.org/Vol-1428/
#CrowdTruth: Linked Data for Information Extraction @ISWC2015Lora Aroyo
CrowdTruth Measures for Language Ambiguity: The Case of Medical Relation Extraction. Anca Dumitrache, Lora Aroyo and Chris Welty ==> http://oak.dcs.shef.ac.uk/ld4ie2015/LD4IE2015/Program.html
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
12. massive
amount
of
digital
content
to
explore
…
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
13. but
at
some
point
it
all
looks
the
same
…
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
14. Massive Scale:
A lifetime of video content is uploaded to YouTube everyday.
Granularity Mismatch:
Searching for the relevant video fragments is still not possible.
Passive Engagement:
Video is still primarily a linear net-time viewing activity
15. … people search & browse
with some implicit relevance in mind
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
18. there
is
huge
seman8c
&
cultural
GAP
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
19. so=ware
systems
are
ever
more
intelligent
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
but
they
don’t
actually
understand
people
20. focus
on
human
knowledge
in
machine-‐readable
form
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
but
there
are
types
of
human
knowledge
that
can’t
be
captured
by
machines
21. classical
AI
involves
human
experts
to
manually
provide
training
knowledge
for
machines
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
human
expert-‐based
ground
truth
does
not
scale
for
current
demand
for
machines
to
deal
with
wide
ranges
of
real-‐world
tasks
and
contexts
22.
we
need
to
be
able
to
….
support
of
mulGple
perspecGves
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
26. humans
accurately
perform
interpreta8on
tasks
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
can
their
effort
be
adequately
harnessed
in
a
scien8fically
reliable
manner
that
scales
across
tasks,
contexts
&
data
modali8es?
27. Quan8ty
is
the
new
Quality
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
Human
Computa8on
adopts
human
intelligence
at
scale
to
improve
purely
machine-‐based
systems
28. diversity
of
opinion
Independent
decentralized
aggregated
James
Surowiecki
“the
wise
crowd”
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
29. a
novel
approach
to
gather
diversity
of
perspec8ves
&
opinions
from
the
crowd,
expand
expert
vocabularies
with
these
and
gather
new
type
of
gold
standard
for
machines
L.
Aroyo,
C.
Welty:
Crowd
Truth:
Harnessing
disagreement
in
crowdsourcing
a
rela?on
extrac?on
gold
standard.
ACM
WebSci
2013.
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
L.
Aroyo,
C.
Welty.
The
Three
Sides
of
CrowdTruth,
Journal
of
Human
Computa?on,
2014
http://CrowdTruth.org
http://data.CrowdTruth.org/
http://game.crowdtruth.org
30. Visual
Content
Domina8on
• 90%
of
informa8on
transmiSed
to
the
brain
is
visual
(processed
60,000X
faster
in
the
brain
than
text)
• Videos
increase
average
page
conversion
rates
by
86%
• Visuals
are
social-‐media-‐ready/friendly
-‐
easily
sharable
• Posts
with
visuals
receive
94%
more
page
visits
• Visuals
are
becoming
easier
and
easier
to
create
as
photo
/
video
ediGng
tools
become
more
accessible
31. any piece of media can be the starting point to
a world of compelling visual experiences.
turning “mute” images into content-aware images.
32. NEW JERSEY
HUDSON RIVER
CENTRAL PARK
URBANIZATION
VERIZON
METLIFE BUILDING
SUNSET
EAST RIVER
NEW YORK CITY
SKYSCRAPER
UPPER EAST SIDE
turning “mute” images into content-aware images.
any piece of media can be the starting point to
a world of compelling visual experiences.
33. combining machine processing with
crowdsourcing for enriching, curating &
gathering metadata
quickly & cheaply — at scale.
NEW JERSEY
HUDSON RIVER
CENTRAL PARK
URBANIZATION
VERIZON
METLIFE BUILDING
SUNSET
EAST RIVER
NEW YORK CITY
SKYSCRAPER
UPPER EAST SIDE
34. NEW JERSEY
HUDSON RIVER
CENTRAL PARK
URBANIZATION
VERIZON
NEW YORK CITY
SKYSCRAPER
METLIFE
BUILDING
UPPER EAST SIDE
EAST RIVER
MIDTOWN
MANHATTAN
PAN-AM BUILDING
PAN-AM AIRLINES HELICOPTER CRASH
AIR TRAVEL
ARCHITECTURE
turning “context-free” images in
relationship-aware images
35. NEW JERSEY
HUDSON RIVER
CENTRAL PARK
URBANIZATION
VERIZON
NEW YORK CITY
SKYSCRAPER
METLIFE
BUILDING
UPPER EAST SIDE
EAST RIVER
MIDTOWN
MANHATTAN
PAN-AM BUILDING
PAN-AM AIRLINES HELICOPTER CRASH
AIR TRAVEL
ARCHITECTURE
… not only images, but also for videos
YOUTUBE: NYC FROM THE
EMPIRE STATE BUILDING
allowing viewers to explore relationships across themes,
locations, characters, etc. — within a video.
41. BRIDGING THE GAP BETWEEN
PEOPLE & THE OVERWHELMING
AMOUNT OF ONLINE MULTIMEDIA CONTENT
42. HyperVideos:
Link video fragments in non-linear paths
Binging Engagement:
Construct continuous and interactive experiences
Video Snacks:
Break video down into snackable moments
SOLUTIONS
43. • Decomposing &
granular description
of images & videos.
• Constructing
mediaGraph with
rich media semantics.
• Continuously
enriching &
consolidating
machine, expert, &
user content
descriptions.
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
60. – The
first
6
months:
• 44.362
pageviews
• 12.279
visits
(3+
min
online)
• 555
registered
players
(thousands
anonymous
players!)
– 340.551
tags
added
to
602
items
– 137.421
matches
Results
of
First
Pilot
61. 11
PartcipaGng
Museums
1,782
Works
of
Art
in
the
Research
36,981
Tags
collected
2,017
Users
who
tagged
First
two
years
(2006-‐2008)
Q: Why did you tag?
0% 20% 40% 60% 80% 100%
don't remember
to connect with others
so that I could find works again later
other (please specify)
to learn about art
to improve search for other users
for fun
to help museums document art work
Public
MMA
62. Tags
by
Documentalists
• Tags
describe
mainly
short
segments
• Tags
are
oaen
not
very
specific
• Tags
not
describe
programmes
as
a
whole
• User
tags
were
useful
&
specific
-‐-‐>
domain
dependent
63. user vocabulary
8% in professional vocabulary
23% in Dutch lexicon
89% found on Google
locations (7%)
engeland
persons (31%)
objects (57%)
On
the
Role
of
User-‐Generated
Metadata
in
A/V
Collec?ons
Riste
Gligorov
et
al.
KCAP
Int.
Conference
on
Knowledge
Capture
2011
Crowd
vs.
Professionals
64. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All tags better than consensus only
• Improvement of 53%
• Consensus tags have
• higher precision: 0.59 vs. 0.49
• but lower recall: 0.28 vs. 0.42
WAISDA?
Tags
vs.
Rest
65. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All tags better than rest
• Individually
• beat NCRV tags by 69%
• beat captions by 39%
WAISDA?
Tags
vs.
Rest
66. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All tags better than rest
• Individually
• beat NCRV tags by 69%
• beat captions by 39%
• Combined
• Improvement of 5%
WAISDA?
Tags
vs.
Rest
67. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All data performs best
• largely due to contribution of
user tags – 33%
WAISDA?
Tags
vs.
Rest
68. System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138
NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201
NCRV tags + User tags 0.263
NCRV tags + NCRV catalog 0.150
All – User tags 0.208
All 0.276
All tags better than consensus only
• Improvement of 53%
• Consensus tags have
• higher precision: 0.59 vs. 0.49
• but lower recall: 0.28 vs. 0.42
All tags better than rest
• Individually
• beat NCRV tags by 69%
• beat captions by 39%
All data performs best
• largely due to contribution of
user tags – 33%
• Combined
• Improvement of 5%
WAISDA?
Tags
vs.
Rest
72. only a small fraction of about 8000 items
are currently on display
73. … online collection grows
125.000 artworks already available
another 40.000 are added every year
74. expertise of museum professionals is in
describing & annotating collection with art-
historical information, e.g. when they were
created, by whom, etc.
75. detailed information about depicted objects, e.g.
which species the animal or plant belongs to,
is in most cases not available
76. annotated only with “bird with blue head near
branch with red leaf”
species of the bird and the plant are missing
77. use crowdsourcing to get more annotations
use nichesourcing, i.e. niches of people with the
right expertise, to add more specific information
78. use sources like Twitter to find experts or
groups of experts on certain areas, e.g. bird
lovers, ornithologists or people who enjoy bird-
watching in their spare time
79. platform where users enter tags:
(1) structured vocabulary terms or (2) free text
hSp://annotate.accurator.nl
80. for tasks that are too difficult:
game in which players can carry out an expert
annotation task with some assistance
81. BIRDWATCHING RIJKSMUSEUM
Sunday October 4, 10.00 am - 14.00 pm
Cuypers Library Rijksmuseum
On World Animal Day, the Rijksmuseum will host a
birdwatching day in collaboration with Naturalis
Biodiversity Center, Wikimedia Netherlands and the
COMMIT/ SEALINCMedia project.
We are looking for bird watchers to join an expedi-
tion through the digital collections and help the
museums identify bird species in works of art.
82. dive.beeldengeluid.nl
In
Digital
Hermeneu8cs
Event-‐centric
Explora8on
@Sound
&
Vision
and
Royal
Library
3rd
Price
at
the
SemanGc
Web
Challenge
2014
83. OPENIMAGES.EU
• 3000
videos
• NL
InsGtute
for
Sound
&
Vision
• mostly
news
broadcasts
DELPHER.NL
• 1.5
Million
Scans
of
• Radio
bulleGns
• (hand
annotated)
• 1937
–
1984
84. Simple
Event
Model
(SEM)
OpenAnnota8on
(OA)
&
SKOS
DIVE:MEDIA OBJECT
SEM:EVENT
SEM:PLACE
SEM:TIME
SEM:ACTOR
SKOS:CONCEPT
OA:ANNOTATION
• LINKS
TO
EUROPEANA
(MULTILINGUAL)
• LINKS
TO
DBPEDIA
85. Digital
Submarine
UI
Infinity
of
Explora8on
Events
Linking
Objects
Crowd
Bringing
the
Human
Perspec8ves
Linked
(Open)
Data
86. En8ty
&
Event
Extra8on
with
CrowdTruth.org
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND LINKING TO
CONCEPTS THROUGH CROWDTRUTH.ORG
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND
CONCEPTS TO KEYFRAMES
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
87. Erp,
M.
van;
Oomen,
J.;
Segers,
R.;
Akker,
C.
van
de;
Aroyo,
L.;
Jacobs,
G.;
Legêne,
S;
Meij,
L.
van
der;O
ssenbruggen,
J.R.
van;
Schreiber,
G.
AutomaGc
Heritage
Metadata
Enrichment
with
Historic
Events
Museums
and
the
Web
2011
h;p://www.museumsandtheweb.com/mw2011/
papers/automaGc_heritage_metadata_enrichment_with_hi
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
89. “Digital
HermeneuGcs:
Agora
and
the
online
understanding
of
cultural
heritage”
In
proc.
of
Web
Science
Conference,
(ACM:
New
York,
2011)
Interpreta8on
Support
for
Online
CollecGons
92. Links
from
the
slides
On
the
Web
• http://waida.nl
• http://prestoprime.org
• http://agora.cs.vu.nl
• http://sealincmedia.wordpress.com
• http://dive.beeldengeluid.nl
• http://diveplu.beeldengeluid.nl
• http://annotate.accurator.nl
• http://accurator.nl
• http://crowdtruth.org
• http://data.crowdtruth.org
• http://game.crowdtruth.org
• http://www.adweek.com/socialtimes/
millennials-love-video-on-mobile-social-
channels-infographic/622313
• http://www.blogherald.com/2010/10/27/
history-of-online-video/
• http://wm.cs.vu.nl
On
TwiSer
@waisda
@agora-‐project
@sealincmedia
@prestocenter
@vistatv
#CrowdTruth
#Accurator
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
93. Lecture
Reading
Material
h;p://www.aaai.org/ojs/index.php/aimagazine/arGcle/view/2564
Truth
Is
a
Lie:
Crowd
Truth
and
the
Seven
Myths
of
Human
AnnotaGon
h;ps://www.wired.com/2006/06/crowds/
THE
RISE
OF
CROWDSOURCING
h;ps://www.microsoa.com/en-‐us/research/project/algorithmic-‐crowdsourcing/
h;p://cci.mit.edu/publicaGons/CCIwp2011-‐04.pdf
Programming
the
Global
Brain
h;p://www.orchid.ac.uk/eprints/248/1/main.pdf
The
ACTIVECROWDTOOLKIT:
An
Open-‐Source
Tool
for
Benchmarking
AcGve
Learning
Algorithms
for
Crowdsourcing
Research
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo