The document summarizes the Search and Hyperlinking task at the 2014 MediaEval benchmarking initiative. It provides an overview of the task and datasets used from 2012-2014. It also reports the results of various submissions on the search and hyperlinking sub-tasks based on evaluation metrics like MAP, P@5/10 and discusses lessons learned like the effect of prosodic features and metadata on performance. Finally, it acknowledges the contributions of the BBC and others in preparing the datasets and hosting user trials.
Unexpected Effects of Rescue Robots’ Team-Membership in a virtual Environmentstreamspotter
Corine Horsch, Nanja Smets, Mark Neerincx, and Raymond Cuijpers on the "Unexpected Effects of Rescue Robots’ Team-Membership in a virtual Environment" at ISCRAM 2013 in Baden-Baden.
10th International Conference on Information Systems for Crisis Response and Management
12-15 May 2013, Baden-Baden, Germany
Simran confidentiality protection in crowdsourcingIIIT Hyderabad
Crowdsourcing is the practice of getting information or input for a task or project from a number of people by floating out the task at hand to a pool of people who are usually not full-time employees. Use of Crowdsourcing to get work done is on the rise due to the benefits that it offers. Trends and surveys show hat the conventional workforce is moving towards gig-economy, and by 2020, 43% of the US workforce is expected to be on-demand workforce [9]. In our work, we focus on higher level software development/engineering tasks and how they could potentially lead to Confidentiality Loss in the process of giving the task and getting it done. We look at different stages and components of the crowdsourcing cycle to examine potential information leak sources. We conducted a survey to study how people perceive this problem and what is their level of understanding when it comes to sharing information online. We also analyzed a dataset of tasks posted online previously to get insight if the problem persists in the real world or not. Some conversations between the task posters and the workers were studied along with a dataset of reviews for workers to get a deeper insight into the problem and its detection. Based on the analysis, we propose NLP based techniques to detect such potential leaks and nudge the task poster before the information is dissipated. Having such an additional layer of scrutiny at the level of companies before a task is given for crowdsourcing out will keep in check any loss of confidential information.
Privacy Protectin Models and Defamation caused by k-anonymityHiroshi Nakagawa
Introduction of Privacy Protection Mathematical Models are the topics of this slide. The Models explained are 1) Private Information Retrieval, 2) IR with Homomorphic Encryption, 3) k-anonymity, 4) l-diversity, and finally 5) Defamation caused by k-Anonymity
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Unexpected Effects of Rescue Robots’ Team-Membership in a virtual Environmentstreamspotter
Corine Horsch, Nanja Smets, Mark Neerincx, and Raymond Cuijpers on the "Unexpected Effects of Rescue Robots’ Team-Membership in a virtual Environment" at ISCRAM 2013 in Baden-Baden.
10th International Conference on Information Systems for Crisis Response and Management
12-15 May 2013, Baden-Baden, Germany
Simran confidentiality protection in crowdsourcingIIIT Hyderabad
Crowdsourcing is the practice of getting information or input for a task or project from a number of people by floating out the task at hand to a pool of people who are usually not full-time employees. Use of Crowdsourcing to get work done is on the rise due to the benefits that it offers. Trends and surveys show hat the conventional workforce is moving towards gig-economy, and by 2020, 43% of the US workforce is expected to be on-demand workforce [9]. In our work, we focus on higher level software development/engineering tasks and how they could potentially lead to Confidentiality Loss in the process of giving the task and getting it done. We look at different stages and components of the crowdsourcing cycle to examine potential information leak sources. We conducted a survey to study how people perceive this problem and what is their level of understanding when it comes to sharing information online. We also analyzed a dataset of tasks posted online previously to get insight if the problem persists in the real world or not. Some conversations between the task posters and the workers were studied along with a dataset of reviews for workers to get a deeper insight into the problem and its detection. Based on the analysis, we propose NLP based techniques to detect such potential leaks and nudge the task poster before the information is dissipated. Having such an additional layer of scrutiny at the level of companies before a task is given for crowdsourcing out will keep in check any loss of confidential information.
Privacy Protectin Models and Defamation caused by k-anonymityHiroshi Nakagawa
Introduction of Privacy Protection Mathematical Models are the topics of this slide. The Models explained are 1) Private Information Retrieval, 2) IR with Homomorphic Encryption, 3) k-anonymity, 4) l-diversity, and finally 5) Defamation caused by k-Anonymity
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Maria Eskevich
We present an exploratory study of the retrieval of semi-professional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics.
Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.
The Clinical Significance of Transcript Alignment DiscrepanciesReece Hart
Gene transcripts are the lens through which we understand variants that are identified by genome sequencing, reported in scientific literature, and communicated on clinical reports. An accurate, shared representation of transcripts is essential to communicating variants reliably. This talk presents observations of significant discrepancies between sources of transcripts that will lead to discrepancies in the clinical interpretation of variants, and tools that we have released to contend with these complexities.
The Search and Hyperlinking Task at MediaEval 2014multimediaeval
The Search and Hyperlinking Task at MediaEval 2014 is the third edition of this task. As in previous versions, it consisted of two sub-tasks: (i) answering search queries from a collection of roughly 2700 hours of BBC broadcast TV material, and (ii) linking anchor segments from within the videos to other target segments within the video collection. For MediaEval 2014, both sub-tasks were based on an ad-hoc retrieval scenario, and were evaluated using a pooling procedure across participants submissions with crowdsourcing relevance assessment using Amazon Mechanical Turk.
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesToine Bogers
Traditionally, relevance assessments for expert search have been gathered through self-assessment or based on the opinions of co-workers. We introduce three benchmark datasets for expert search that use conference workshops for relevance assessment. Our data sets cover entire research domains as opposed to single institutions. In addition, they provide a larger number of topic-person associations and allow a more objective and fine-grained evaluation of expertise than existing data sets do. We present and discuss baseline results for a language modelling and a topic-centric approach to expert search. We find that the topic-centric approach achieves the best results on domain-specific datasets.
Presented at CSTA workshop, CIKM 2013,
October 28, 2013
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025
Learning by example: training users through high-quality query suggestionsClaudia Hauff
A presentation given at UvA in September 2015, discussing joint work with Morgan Harvey and David Elsweiler.
Full paper: http://dl.acm.org/citation.cfm?id=2767731
Slides from my talk on Personalised Access to Linked Data. Presented at the EKAW 2014 conference. The poster to this paper won the best poster award at the conference!
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025.
The link to the ipynb : https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
Keynote at Chilean Week of Computer Science. I present a brief overview of algorithms for Recommender and then I present my work Tag-based Recommendation, Implicit Feedback and Visual Interactive Interfaces.
Amet University- B.Sc Data Science Syllabusametinstitute
The AMET University's Data Science program is a comprehensive course that is designed to equip students with the knowledge and skills they need to excel in the field of data science. With the ever-increasing importance of data-driven decision-making in today's digital age, the program places a strong emphasis on ensuring that students are well-equipped to navigate the vast landscape of data analytics.
The program is designed to provide students with a solid foundation in statistical analysis, machine learning, and data visualization techniques. Through a combination of lectures, hands-on exercises, and practical projects, students learn how to extract valuable insights from complex datasets and use them to drive informed decision-making.
One of the key features of the program is its focus on real-world applications. Students are encouraged to work on projects that are relevant to their field of study or industry, and are provided with the necessary resources and support to succeed. This approach helps students to gain practical experience and develop the skills they need to excel in their careers.
In addition to the core curriculum, the program also offers a range of elective courses that allow students to specialize in areas of their interest. These courses cover a wide range of topics, including big data analytics, data mining, and data-driven marketing. With such a diverse range of options, students can tailor their learning experience to suit their individual needs and career goals.
Overall, the AMET University's Data Science program is an excellent choice for anyone looking to build a successful career in data science. With its rigorous curriculum, practical focus, and flexible options, the program provides students with the tools they need to succeed in today's data-driven world.
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Maria Eskevich
We present an exploratory study of the retrieval of semi-professional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics.
Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.
The Clinical Significance of Transcript Alignment DiscrepanciesReece Hart
Gene transcripts are the lens through which we understand variants that are identified by genome sequencing, reported in scientific literature, and communicated on clinical reports. An accurate, shared representation of transcripts is essential to communicating variants reliably. This talk presents observations of significant discrepancies between sources of transcripts that will lead to discrepancies in the clinical interpretation of variants, and tools that we have released to contend with these complexities.
The Search and Hyperlinking Task at MediaEval 2014multimediaeval
The Search and Hyperlinking Task at MediaEval 2014 is the third edition of this task. As in previous versions, it consisted of two sub-tasks: (i) answering search queries from a collection of roughly 2700 hours of BBC broadcast TV material, and (ii) linking anchor segments from within the videos to other target segments within the video collection. For MediaEval 2014, both sub-tasks were based on an ad-hoc retrieval scenario, and were evaluated using a pooling procedure across participants submissions with crowdsourcing relevance assessment using Amazon Mechanical Turk.
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesToine Bogers
Traditionally, relevance assessments for expert search have been gathered through self-assessment or based on the opinions of co-workers. We introduce three benchmark datasets for expert search that use conference workshops for relevance assessment. Our data sets cover entire research domains as opposed to single institutions. In addition, they provide a larger number of topic-person associations and allow a more objective and fine-grained evaluation of expertise than existing data sets do. We present and discuss baseline results for a language modelling and a topic-centric approach to expert search. We find that the topic-centric approach achieves the best results on domain-specific datasets.
Presented at CSTA workshop, CIKM 2013,
October 28, 2013
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025
Learning by example: training users through high-quality query suggestionsClaudia Hauff
A presentation given at UvA in September 2015, discussing joint work with Morgan Harvey and David Elsweiler.
Full paper: http://dl.acm.org/citation.cfm?id=2767731
Slides from my talk on Personalised Access to Linked Data. Presented at the EKAW 2014 conference. The poster to this paper won the best poster award at the conference!
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025.
The link to the ipynb : https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
Keynote at Chilean Week of Computer Science. I present a brief overview of algorithms for Recommender and then I present my work Tag-based Recommendation, Implicit Feedback and Visual Interactive Interfaces.
Amet University- B.Sc Data Science Syllabusametinstitute
The AMET University's Data Science program is a comprehensive course that is designed to equip students with the knowledge and skills they need to excel in the field of data science. With the ever-increasing importance of data-driven decision-making in today's digital age, the program places a strong emphasis on ensuring that students are well-equipped to navigate the vast landscape of data analytics.
The program is designed to provide students with a solid foundation in statistical analysis, machine learning, and data visualization techniques. Through a combination of lectures, hands-on exercises, and practical projects, students learn how to extract valuable insights from complex datasets and use them to drive informed decision-making.
One of the key features of the program is its focus on real-world applications. Students are encouraged to work on projects that are relevant to their field of study or industry, and are provided with the necessary resources and support to succeed. This approach helps students to gain practical experience and develop the skills they need to excel in their careers.
In addition to the core curriculum, the program also offers a range of elective courses that allow students to specialize in areas of their interest. These courses cover a wide range of topics, including big data analytics, data mining, and data-driven marketing. With such a diverse range of options, students can tailor their learning experience to suit their individual needs and career goals.
Overall, the AMET University's Data Science program is an excellent choice for anyone looking to build a successful career in data science. With its rigorous curriculum, practical focus, and flexible options, the program provides students with the tools they need to succeed in today's data-driven world.
"Building research-related skills to Drive Your Success" delivered to GPSS Sept 4, 2013. Followed by Paul Barnard presenting on research ethics processes.
Applications of Machine Learning to Location-based Social NetworksJoan Capdevila Pujol
This work is part of a seminar talk given at Universitat de Girona (UdG). It is basically an introduction to Location-based Social Networks through two Machine Learning applications: a recommendation system and an event discovery technique.
Invited talk at USTC and SJTU, discuss recent progress in object re-identification against very large repository, especially the problem of fast key point detection, feature repeatability prediction, aggregation, and object repository indexing and search.
Similar to Search and Hyperlinking Overview @MediaEval2014 (20)
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
The Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdfmediapraxi
The rise of virtual labs has been a key tool in universities and schools, enhancing active learning and student engagement.
💥 Let’s dive into the future of science and shed light on PraxiLabs’ crucial role in transforming this field!
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
4. Users
Main group User Target
Researchers &
Educators
Journalists Research
Academic
researchers &
students
Investigate
Academic educators Educate
Public users Citizens Entertainment,
Infotainment
Media Professionals Broadcast
Professionals
Reuse
Media Archivists Annotate
7. 1998 2002 2008
2010 2013 2015
DATA
BIG DATA?
not representative
representative
8. Search & Hyperlinking task
• User oriented: aim to explore the needs of real users
expressed as queries.
– How: UK citizens and crowd sourcing for retrieval
assessment
• Temporal aspect: seek to direct users to the relevant
parts of retrieved video (“jump-in point”).
– How: segmentation, segment overlap, transcripts.
prosodic, visual (low-level, high-level; keyframes)
• Multimodal: want to investigate technologies for
addressing variety in user needs and expectations
– varied visual and audio contributions, intentional gap
between query and multimodal descriptors in content
9. ME Search & Hyperlinking task
in development: 2012 – 2014
Search Hyperlinking
2012 2013 2014 2012 2013 2014
Dataset BlipTv BBC BlipTv BBC
Features released:
Transcripts 2 ASR 3 ASR 2 ASR 3 ASR
Prosodic features no yes no yes
Visual clues for queries yes no no
Concept detection yes yes
Type of the task Known-item Ad-hoc Ad-hoc
Query/Anchors creation PC iPad PC iPad
Number of
queries/anchors
30/30 4/50 50/30 30/30 11/ 98/30
Relevance assessment MTurk users (BBC) MTurk MTurk
Numbers of assessed cases 30 50 9 900 3 517 9 975 13 141
Evaluation metrics MRR, MASP, MASDWP MAP(-
bin/tol),
MAP MAP(-bin/tol), P@5/10
10. Dataset: Video collection
• BBC copyright cleared broadcast material:
– Videos:
• Development set: 6 weeks between 01.04.2008 and 11.05.2008 (1335 hours/2323 videos)
• Test set: 11 weeks between 12.05.2008 and 31.07.2008 (2686 hours, 3528 videos)
– Manually transcribed subtitles
– Metadata
• Additional data:
– ASR: LIMSI/Vocapia, LIUM, NST-Sheffield
– Shot boundaries, keyframes
– Output of visual concept detectors by University of Leuven, and University of
Oxford
11. Dataset: Query
• 28 Users
- Policemen, Hair dresser, Bouncer, Sales manger,
Student, Self-employed
• Two hour session on iPads:
– Search the archive (document level)
– Define clips (segment level)
– Define anchors (anchor level)
Statement of
Information Need
Search
Refine
Relevant Clips
Define
Anchors
12. Data cleaning: Usable Information Need
• Description clearly specifies what is relevant
• A query with a suitable title exists
• Sufficient relevant segments exist (try query)
13. Data cleaning: Process
• For each information need in batch
1. check if usable
2. If in doubt use search to search for relevant data
3. reword & spellcheck description
4. select the first suitable query
5. Save
14. Data cleaning: Usable Anchor
• Longer than 5 seconds
• Destination description clearly identifies the
material the user wants to see when he would
activate the anchor described by label
• It is likely that there are some relevant items
in the collection
15. Data cleaning: Process
• For each information need in assigned batch
– Go through anchors
• check if usable
• reword & spellcheck description
• Assess whether it is like to find links in the collection
(possibly using search)
– Save
18. Ground truth creation
• Queries/Anchors: user studies at BBC:
- 28 users with following profile:
Age: 18-30 years old
Use of search engines and services on iPads on the daily basis
• Relevance assessment: via crowdsourcing on Amazon MTurk platform:
– Top 10 results from 58 search and 62 hyperlinking submissions
– 1 judgment per query or anchor that was accepted/rejected based on
an automated algorithm, special cases of users typos checked
manually
– Number of evaluated HITs:
9 900 for search, and 13 141 for hyperlinking
19. • P@5/10/20
• MAP based:
Evaluation metrics
• MAP: taking into account any overlapping segment:
• MAP-bin: relevant segments are binned for relevance:
• MAP-tol: only start times of the segments are considered:
27. Lessons learned
1. iPad vs PC = different user behaviour and
expectation from the system.
2. Prosodic features broaden the scope of the
search sub-task.
3. Use of shot segmentation based units
achieves the worst scores for both sub-tasks.
4. Use of metadata improves results for both
sub-tasks.
28. Lessons learned
1. iPad vs PC = different user behaviour and
expectation from the system.
2. Prosodic features broaden the scope of the
search sub-task.
3. Use of shot segmentation based units
achieves the worst scores for both sub-tasks.
4. Use of metadata improves results for both
sub-tasks.
29. The Search and Hyperlinking task was supported by
We are grateful to
Jana Eggink and
Andy O'Dwyer
from the BBC for preparing the collection and hosting the user trials.
... and of course Martha for advise & crowdsourcing access.
30. JRS at Search and Hyperlinking of Television
Content Task
Werner Bailer, Harald Stiegler
MediaEval Workshop, Barcelona, Oct. 2014
31. Linking sub-task
• Matching terms from textual resources
• Reranking based on visual similarity (VLAT)
• Using visual concepts (only/in addition)
• Results
– Differences between different text resources
– Context helped only in few of the cases
– Visual reranking provides small improvement
– Visual concepts did not provide improvements
35
32. Zsombor Paróczi, Bálint Fodor, Gábor Szűcs
Solution with concept enrichment
• Concept enrichment: the set of words is
extended with their synonyms or other
conceptually connected words.
• Top 10 vs top 50 conceptually connected words
for each word
• Conclusion: the results show that concept
enrichment with less words give better precision
because at the opposite case the noise is greater.
33. Television Linked To The Web
LinkedTV @ MediaEval 2014
Search and Hyperlinking Task
H.A. Le1, Q.M. Bui1, B. Huet1, B. Cervenková2, J. Bouchner2, E. Apostolidis3,
F. Markatopoulou3, A. Pournaras3, V. Mezaris3, D. Stein4, S. Eickeler4, and M. Stadtschnitzer4
1 - Eurecom, Sophia Antipolis, France.
2 - University of Economics, Prague, Czech Republic.
3 - Information Technologies Institute, CERTH, Thessaloniki, Greece.
4 - Fraunhofer IAIS, Sankt Augustin, Germany.
16-17 Oct 2014
www.linkedtv.eu
34. Reasons to visit the LinkedTV
poster
• Different granularities: video level, scene
level (visual/topic) and sentence level.
• Different features: text (subtitles /
transcripts), visual concepts, keywords, etc…
LinkedTV @ MediaEval 2014 Search and Hyperlinking Task
35. Reasons to visit the LinkedTV
poster
• How to incorporate visual information
to the search?
• Visual concept detection in the search
query:
Mapping between query keywords and visual
concepts (151 semantic concepts from
TRECVID 2012)
– Semantic word distance based on WordNet
– Identification of salient visual concepts from
Google Image search results (query keywords)
LinkedTV @ MediaEval 2014 Search and Hyperlinking Task
36. Reasons to visit the LinkedTV
poster
• How to incorporate visual information
to the search?
• Integration of detected visual concepts
to the search:
– Designing an enriched query, based on textual
(text query) and visual information (range
query)
– Fusion of text score (Solr) and visual concepts
scores
LinkedTV @ MediaEval 2014 Search and Hyperlinking Task