In this talk, I will discuss the extensions we have made to our approach to semantic image segmentation. I will show how the results of object detectors and spatial priors can be naturally integrated into our hierarchical conditional random field (HCRF) approach based on the harmony potential. The addition of these extra cues, as well as class-specific normalization of classifier outputs, significantly improves segmentation quality.
Open source communities and business eco system strategy - OW2 Consortium fro...SpagoWorld
The presentation supported the speech by Gabriele Ruffatti, Engineering Group's Architectures and Consulting Director and OW2 Board member, at Slovenia Business Linux Conference 2010 (27th-28th Sept 2010 - Portoroz, Slovenia).
The PASCAL organization provides e-book access to over 250,000 titles through four e-book platforms to support students, faculty and staff at academic institutions across South Carolina. Usage of e-books is increasing on college campuses. PASCAL offers training on the e-book platforms and promotes discovery of and access to e-books through the PASCALCat catalog and other methods. Top subjects used include social science, history, business/economics and political science. PASCAL will continue assessing usage and expanding e-book access.
This document provides information about lasers and their use in ophthalmology. It begins with definitions of laser and its acronym. It then discusses the history and development of lasers from 1917 to present. The key properties and mechanisms of laser light production are described. Common types of ophthalmic lasers and their applications are outlined, including Nd:YAG, excimer, and diode lasers used for conditions like glaucoma, refractive error correction, and retinal diseases. The laser-tissue interaction mechanisms of thermal, photochemical and ionizing effects are summarized. The document concludes with sections on laser instrumentation and delivery systems and specific laser procedures in ophthalmology.
The document describes the ORUSSI project which aims to develop an optimized platform for real-time road monitoring using a network of roadside sensors like cameras. It seeks to efficiently deploy and add sensors to surveillance systems. The project will develop a novel platform combining research in semantic transcoding, scalable video coding, wireless communication and roadside equipment. It demonstrates several computer vision algorithms running directly on cameras including vehicle counting, speed estimation, feature detection and anomaly detection. It also shows selective video transcoding to preserve features for detection while efficiently encoding videos. A large dataset was collected of vehicle videos under different conditions for the project activities.
IM3I is a flexible system for managing and publishing multimedia content. It provides a service-oriented architecture allowing multiple views of media stored in repositories. This improves reuse, repurposing, and sharing of rich media. Automatic annotation of audio and video is performed through customizable processing pipelines. Services provide syntactic and semantic annotations. Visual annotation uses Bag-of-Words with MSER, SURF, and SIFT features. An ontology-based search and browsing engine is accessible as a service or through rich interfaces. Other services and interfaces allow tagging and content-based image retrieval. Publishing functions are provided through additional services and interfaces.
The system provides a service-oriented architecture that allows for multiple viewpoints of multimedia data inside repositories. The analysis layer is responsible for extracting low-level features and semantic annotations from media files through various processing pipelines. Annotation of visual and audio content is performed using bag-of-visual words, MSER, SURF, SIFT features, and SVM classifiers. The system was evaluated for usability, allowing users to search, annotate, and interact with videos and interfaces.
The document summarizes an interactive video search and browsing system called Orione. It uses an ontology created from a lexicon and WordNet to provide automatic video annotations using visual features. The system has a web-based interface for simple or advanced searches and browsing video archives by concepts. It also allows manual annotations. The multitouch interface lets users browse the ontology to select concepts and search/organize results with gestures. Usability tests were conducted on the system.
Open source communities and business eco system strategy - OW2 Consortium fro...SpagoWorld
The presentation supported the speech by Gabriele Ruffatti, Engineering Group's Architectures and Consulting Director and OW2 Board member, at Slovenia Business Linux Conference 2010 (27th-28th Sept 2010 - Portoroz, Slovenia).
The PASCAL organization provides e-book access to over 250,000 titles through four e-book platforms to support students, faculty and staff at academic institutions across South Carolina. Usage of e-books is increasing on college campuses. PASCAL offers training on the e-book platforms and promotes discovery of and access to e-books through the PASCALCat catalog and other methods. Top subjects used include social science, history, business/economics and political science. PASCAL will continue assessing usage and expanding e-book access.
This document provides information about lasers and their use in ophthalmology. It begins with definitions of laser and its acronym. It then discusses the history and development of lasers from 1917 to present. The key properties and mechanisms of laser light production are described. Common types of ophthalmic lasers and their applications are outlined, including Nd:YAG, excimer, and diode lasers used for conditions like glaucoma, refractive error correction, and retinal diseases. The laser-tissue interaction mechanisms of thermal, photochemical and ionizing effects are summarized. The document concludes with sections on laser instrumentation and delivery systems and specific laser procedures in ophthalmology.
The document describes the ORUSSI project which aims to develop an optimized platform for real-time road monitoring using a network of roadside sensors like cameras. It seeks to efficiently deploy and add sensors to surveillance systems. The project will develop a novel platform combining research in semantic transcoding, scalable video coding, wireless communication and roadside equipment. It demonstrates several computer vision algorithms running directly on cameras including vehicle counting, speed estimation, feature detection and anomaly detection. It also shows selective video transcoding to preserve features for detection while efficiently encoding videos. A large dataset was collected of vehicle videos under different conditions for the project activities.
IM3I is a flexible system for managing and publishing multimedia content. It provides a service-oriented architecture allowing multiple views of media stored in repositories. This improves reuse, repurposing, and sharing of rich media. Automatic annotation of audio and video is performed through customizable processing pipelines. Services provide syntactic and semantic annotations. Visual annotation uses Bag-of-Words with MSER, SURF, and SIFT features. An ontology-based search and browsing engine is accessible as a service or through rich interfaces. Other services and interfaces allow tagging and content-based image retrieval. Publishing functions are provided through additional services and interfaces.
The system provides a service-oriented architecture that allows for multiple viewpoints of multimedia data inside repositories. The analysis layer is responsible for extracting low-level features and semantic annotations from media files through various processing pipelines. Annotation of visual and audio content is performed using bag-of-visual words, MSER, SURF, SIFT features, and SVM classifiers. The system was evaluated for usability, allowing users to search, annotate, and interact with videos and interfaces.
The document summarizes an interactive video search and browsing system called Orione. It uses an ontology created from a lexicon and WordNet to provide automatic video annotations using visual features. The system has a web-based interface for simple or advanced searches and browsing video archives by concepts. It also allows manual annotations. The multitouch interface lets users browse the ontology to select concepts and search/organize results with gestures. Usability tests were conducted on the system.
This document describes two interactive video search and browsing systems - a web application using the Rich Internet Application (RIA) paradigm, and a multi-touch collaborative application. Both systems use the same ontology-based video search engine, which allows semantic searching and browsing of video collections. The web application provides query expansion and interactive search interfaces, while the multi-touch application enables collaborative browsing and organization of video search results. The systems aim to provide responsive and intuitive interfaces for searching and exploring video archives.
DanThe is an online service, developed and implemented by Tuscany Region and MICC – Media Integration and Communication Center – University of Florence, to promote the resources related to digital cultural heritage of Tuscany. DanThe provides a direct access to collections, databases, regional museums, libraries and catalogues of cultural heritage.
The document provides a walkthrough of the IM3I multimedia information management platform. It describes several key features of the platform including multi-user video annotation, simple and advanced search, ontology browsing, and an authoring environment. The walkthrough explains how to use each feature through screenshots and step-by-step instructions. It aims to demonstrate the flexibility and customizability of the IM3I platform for managing and exploiting large multimedia archives.
IM3I provides a single point of access to manage and publish all types of digital content, including audio, video, and text files stored locally or online. It offers tools for processing, analyzing, indexing, tagging, searching, and publishing multimedia content in an integrated service-oriented environment. IM3I allows users to design flexible interfaces to publish their media in a way that meets their needs, with all changes to content and metadata instantly updating across publication interfaces.
IM3I is an immersive multimedia management and publishing platform that provides a framework for searching, summarizing, and visualizing large multimedia archives. It is based on a service-oriented architecture and can integrate a variety of content processing, analysis, indexing, tagging, annotation, search, and publishing services. The platform has been applied in media production workflows, educational content workflows, and for publishing archived content. It allows flexible composition of services into pipelines and custom interfaces to support various content use cases and user roles.
The IM3I project addresses the needs of media and communication industries facing advancing technologies and changing media consumption by developing highly customizable interfaces to search, summarize, and visualize large multimedia archives. Funded by the EU, IM3I provides a service-oriented architecture allowing multiple views of media data within repositories for more flexible interaction and sharing of rich media, opening new opportunities for content owners.
Semantic image segmentation is the process of assigning semantically relevant labels to all pixels in an image. Hierarchical Conditional Random Fields (HCRFs) are a popular and successful approach this problem. One reason for their popularity is their ability to incorporate contextual information at different scales. However, existing HCRF models do not allow multiple labels to be assigned to individual nodes. At higher scales in the image, this results in an oversimplified model, since multiple classes can be reasonable expected to appear within a single region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Furthermore, neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales.
MediaPick is a tangible semantic media retrieval system that allows users to browse concepts from an ontology structure to search for and retrieve video results from large media libraries. It uses a multi-touch overlay screen to detect gestures for interacting with the results. The system exploits a search engine and semantic reasoning to retrieve and organize related multimedia content based on the user's queries and gestures.
This document discusses interactive visual representations of complex information structures. It presents several existing systems that visualize semantic data and search results. It then describes a new visual interactive framework that can extract and merge results from diverse knowledge repositories. The framework uses a main data source along with related multimedia and social media sources. It generates a semantic XML structure and two interactive visual interfaces - a geometric paradigm and an urban paradigm - to explore the information. An experimental analysis evaluated the quality of the visual paradigms and usability of the system.
This document describes a method for accurately evaluating HER-2 amplification in fluorescence in situ hybridization (FISH) images. The method involves extracting nuclei from FISH images, assigning each nucleus a reliability score based on shape and size compliance with a template model, and computing the ratio of HER-2 to CEP-17 markers only using the most reliable nuclei. The method was tested on a dataset of 40 FISH images classified by experts into categories of HER-2 amplification. Using a training set to determine an optimal reliability score threshold maximized classification accuracy evaluated on a test set.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
More Related Content
More from Media Integration and Communication Center
This document describes two interactive video search and browsing systems - a web application using the Rich Internet Application (RIA) paradigm, and a multi-touch collaborative application. Both systems use the same ontology-based video search engine, which allows semantic searching and browsing of video collections. The web application provides query expansion and interactive search interfaces, while the multi-touch application enables collaborative browsing and organization of video search results. The systems aim to provide responsive and intuitive interfaces for searching and exploring video archives.
DanThe is an online service, developed and implemented by Tuscany Region and MICC – Media Integration and Communication Center – University of Florence, to promote the resources related to digital cultural heritage of Tuscany. DanThe provides a direct access to collections, databases, regional museums, libraries and catalogues of cultural heritage.
The document provides a walkthrough of the IM3I multimedia information management platform. It describes several key features of the platform including multi-user video annotation, simple and advanced search, ontology browsing, and an authoring environment. The walkthrough explains how to use each feature through screenshots and step-by-step instructions. It aims to demonstrate the flexibility and customizability of the IM3I platform for managing and exploiting large multimedia archives.
IM3I provides a single point of access to manage and publish all types of digital content, including audio, video, and text files stored locally or online. It offers tools for processing, analyzing, indexing, tagging, searching, and publishing multimedia content in an integrated service-oriented environment. IM3I allows users to design flexible interfaces to publish their media in a way that meets their needs, with all changes to content and metadata instantly updating across publication interfaces.
IM3I is an immersive multimedia management and publishing platform that provides a framework for searching, summarizing, and visualizing large multimedia archives. It is based on a service-oriented architecture and can integrate a variety of content processing, analysis, indexing, tagging, annotation, search, and publishing services. The platform has been applied in media production workflows, educational content workflows, and for publishing archived content. It allows flexible composition of services into pipelines and custom interfaces to support various content use cases and user roles.
The IM3I project addresses the needs of media and communication industries facing advancing technologies and changing media consumption by developing highly customizable interfaces to search, summarize, and visualize large multimedia archives. Funded by the EU, IM3I provides a service-oriented architecture allowing multiple views of media data within repositories for more flexible interaction and sharing of rich media, opening new opportunities for content owners.
Semantic image segmentation is the process of assigning semantically relevant labels to all pixels in an image. Hierarchical Conditional Random Fields (HCRFs) are a popular and successful approach this problem. One reason for their popularity is their ability to incorporate contextual information at different scales. However, existing HCRF models do not allow multiple labels to be assigned to individual nodes. At higher scales in the image, this results in an oversimplified model, since multiple classes can be reasonable expected to appear within a single region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Furthermore, neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales.
MediaPick is a tangible semantic media retrieval system that allows users to browse concepts from an ontology structure to search for and retrieve video results from large media libraries. It uses a multi-touch overlay screen to detect gestures for interacting with the results. The system exploits a search engine and semantic reasoning to retrieve and organize related multimedia content based on the user's queries and gestures.
This document discusses interactive visual representations of complex information structures. It presents several existing systems that visualize semantic data and search results. It then describes a new visual interactive framework that can extract and merge results from diverse knowledge repositories. The framework uses a main data source along with related multimedia and social media sources. It generates a semantic XML structure and two interactive visual interfaces - a geometric paradigm and an urban paradigm - to explore the information. An experimental analysis evaluated the quality of the visual paradigms and usability of the system.
This document describes a method for accurately evaluating HER-2 amplification in fluorescence in situ hybridization (FISH) images. The method involves extracting nuclei from FISH images, assigning each nucleus a reliability score based on shape and size compliance with a template model, and computing the ratio of HER-2 to CEP-17 markers only using the most reliable nuclei. The method was tested on a dataset of 40 FISH images classified by experts into categories of HER-2 amplification. Using a training set to determine an optimal reliability score threshold maximized classification accuracy evaluated on a test set.
More from Media Integration and Communication Center (13)
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Digital Marketing Trends in 2024 | Guide for Staying Ahead
PASCAL VOC 2010: semantic object segmentation and action recognition in still images
1. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
PASCAL VOC 2010
Semantic object segmentation and action recognition in still images
Andrew D. Bagdanov
bagdanov@cvc.uab.es
´
Departamento de Ciencias de la Computacion
´
Universidad Autnoma de Barcelona
Xavier Pep Nataliya Wenjuan Fahad
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
2. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Overview
On 03/05/2010 the PASCAL VOC competition was announced
and the training and validation sets published.
20 semantic categories for the competition remain the same:
aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable,
dog, horse, motorbike, person, potted plant, sheep, sofa, train, and tv/monitor.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
3. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Old competitions, new competitions
There are two (+ 1/2) main challenges in PASCAL.
Image classification is the prediction of the presence/absence of
an instance of class in a test image.
Object detection is the prediction of the bounding box and label
of each object from the twenty target classes in a test image.
Semantic image segmentation is the assignment of one of the
twenty class labels to every pixel in a test image.
Image segmentation is becoming a mainstream competition.
Action recognition in still images was included as a new “taster
challenge” this year.
Taster competitions are used to measure interest in new problems.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
4. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Our contributions to PASCAL VOC 2010
Last year we participated in the Detection, Classification and
Segmentation challenges.
This year we decided to concentrate on Classification and
Segmentation. Our segmentation technique relies heavily on
classification.
We also fielded a team in Action Recognition this year to see
what that’s all about.
As always, success in PASCAL VOC challenges is approximately
85% engineering, 10% inspiration and 5% luck (if you’re lucky).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
5. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Outline
1 Introduction
Overview of the challenges
Our contribution and main ideas
2 The harmony potential 2.0: fusing across scale
Building on last year’s submission
Fusing across scales and learning
3 Action recognition
A torrent of features
Exploiting the size of the problem
4 Discussion
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
6. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Giving semantics to pixels
Image Object Class
Semantic image segmentation is not object segmentation
Only for simple cases are they the same.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
7. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Turning a hard problem into a harder one
Image Object Class
The object is to assign semantic labels to every pixel
Fine distinctions must be made
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
8. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Make that a very hard one
Image Object Class
The objective is to assign semantic labels to every pixel
Fine distinctions must be made
Occlusions, varying viewpoint and size complicate things
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
9. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Action recognition in still images
New competition this year: human action recognition in still
images.
Individual images sampled from the Flikr dataset.
Bounding boxes of the human in each image is provided.
Very important: we don’t have to solve the detection problem.
Action recognition is offered as a “taster challenge” in order to
gauge interest in the general problem.
It was difficult to hypothesize about what would succeed and what
would not in this challenge.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
10. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Action classes
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
11. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Segmentation: the role of context
Context provides very important cues for make fine
discriminations at the (super-) pixel scale.
We can exploit three levels of scale: local, mid-level and global
[Zhu, NIPS2008].
Existing techniques apply overly-simplified models of context that
do not generalize upward from local to global scales.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
12. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Segmentation: global constraints on label
combinations
Our principal idea is to use global Classification to enhance
segmentation results.
Global image classification results tend to be less noisy than ones.
We will use them to constrain the combinations of semantic labels
we are likely to encounter during segmentation.
We showed last year how a tractable inference technique can be
devised for this labeling problem (our PASCAL 2009 entry).
This year we also show how mid-level context can be incorporated
in the form of object detections.
We also show how position priors cam be similarly incorporated
into the framework to provide class specific location information.
Finally, we devised a stochastic steepest ascent technique for
optimizing the many parameters in a class-specific way.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
13. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Action recognition: driven by data limitations
Initial experiments confirmed our intuition about the limitations of
the data.
Structural learning: sampling of pose space not dense enough.
Latent SVM: object interactions under-sampled as well.
Multiple kernel learning: converges to simple selection.
From a very early stage, we decided to treat action recognition as
an image classification problem.
We exploit the small size dataset by performing extensive cross
validation.
Features are one of our string points, and we had to get the
feature pipeline running for Classification in any case.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
14. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
HCRFs for labeling problem
We represent our segmentation problem as a graph: G = (V, E)
V is used for indexing random variables, and E is the set of
undirected edges representing compatibility relationships between
random variables.
X = {Xi } denotes the set of random variables or nodes, for i ∈ V.
An energy function will be defined over graphical configurations of
random variables.
By the Hammersley-Clifford theorem, the energy of a configuration
of x = {xi } can be written as the negative exponential of an
energy function E(x) = c∈C ϕc (xc ), where ϕc is the potential
function of clique c ∈ C.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
15. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Consistency potentials for labeling problems
The energy function of G can be written as:
E(x) = φ(xi ) + ψL (xi , xj ) + ψG (xi , xg ).
i∈V (i,j)∈EL (i,g)∈EG
The unary term φ(xi ) depends on a single probability
P(Xi = xi |Øi ), where Øi is the observation that affects Xi in the
model.
The smoothness potential ψL (xi , xj ) determines the pairwise
relationship between two local nodes.
The consistency potential ψG (xi , xg ) expresses the dependency
between local nodes and a global node.
And the Maximum a Posteriori (MAP) estimate of the optimal
labeling is:
x∗ = arg min E(x).
x
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
16. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
HCRF models of image segmentation
Smoothness Potts Robust P N
Free
(Shotten et al, CVPR2008) (Plath et al, ICML2009) (Ladicky et al, ICCV2009)
Colored nodes represent (hidden) semantic labels.
Dark nodes represent image measurements.
Red edges represent penalties imposed by potential.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
17. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Different features for discriminations
The previously mentioned approaches all try to make global
distinctions using local information.
Either by voting of local observations (Potts).
Or, by penalizing rampantly discordant local label assignments
PN .
None of these techniques try to exploit truly global information to
constrain local labels.
And none incorporate the notion of encoding combinations of
primitive node labels at the global level.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
18. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
The harmony potential: selective subsets
Only labels that do not agree with subset are penalized.
Can represent more diverse combinations.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
19. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
The harmony potential: overview
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
20. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Ranked subsampling of P(L)
We can do this using the following posterior:
∗ ∗ ∗
P( ⊆ xg |Ø) ∝ P( ⊆ xg )P(O| ⊆ xg ).
This allows us to effectively rank possible global node labels, and
∗
thus to prioritize candidates in the search for the optimal label xg .
∗
P( ⊆ xg |O) establishes an order on subsets of the (unknown)
∗
optimal labeling of the global node xg that guides the
consideration of global labels.
We may not be able to exhaustively consider all labels in P(L), but
∗
at least we consider the most likely candidates for xg .
And image classification can give us an estimate of this posterior.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
21. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
PASCAL 2010: pushing the limit
The previous slides describe our approach used for the PASCAL
2009 submission.
The discriminative model was based on only SVMs trained to
discriminate object classes from their own backgrounds.
Starting with the harmony potential approach, this year we
concentrated on adding cues derived from different levels of
mid-level context.
We found the HCRF model with harmony potential to be very
useful for performing this fusion.
Our hypothesis at the end of the 2009 competition was that
detection would be essential for pushing forward the
state-of-the-art.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
22. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
PASCAL 2010: fusing across scales
1 FG/BG: 20 SVMs trained to discriminate classes from their own
background. The same discriminative model used last year,
essential for localizing object boundaries.
2 CLASS: 20 SVMs trained to discriminate each object class from
the other object. Essential for distinguishing objects with similar
backgrounds (e.g. cows from sheep, birds from planes).
Incorporated directly into unary potential.
3 LOC: 20 class-specific location priors. Computed from ground
truth segmentations by simple, spatial averaging. A form of
top-down mid-level context.
4 OBJ: 20 class-specific object detectors [Felzenszwalb 2010] are
converted to superpixel scores by selecting the highest scoring
detection intersecting each pixel of the superpixel. A type of
bottom-up mid-level context.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
23. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
PASCAL 2010: learning unary potentials
We compute the unary potential by weighting the classification
scores {si (k , xi )}k∈F through a sigmoid function. The unary
potential becomes:
1
φL (xi ) = −µL Ki log
i
1 + exp(fi (k, xi ))
k∈F
fi (k , xi ) = a(k, xi )si (k , xi ) + b(k, xi )
µL is the weighting factor of the local unary potential, and
Ki normalizes over the number of pixels inside the superpixel.
We have two sigmoid parameters for each class/cue pair: a(k , xi )
and b(k , xi ).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
24. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Datasets
We have evaluated the harmony potential approach on two
standard, publicly available datasets.
The Pascal VOC 2010 Segmentation Challenge dataset contains
2250 color images of 20 different semantic classes.
This set is split into 750 images for training, 750 images for
testing, and 750 for validation.
The Microsoft MSRC-21 dataset contains 591 color images of 21
object classes.
We do our own splits for cross-validation on MSRC-21.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
25. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Unsupervised segmentation
Images are first over-segmented to with quick-shift to derive
super-pixels [Fulkerson, ICCV 2009].
This preserves object boundaries while simplifying the
representation.
Working at the super-pixel level reduces the number of nodes in
the CRF by 102 to 105 per image.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
26. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Local classification scores: P(Xi = xi |Oi )
We extract patches with 50% overlap on a regular grid at several
resolutions (12, 24, 36 and 48 pixels in diameter).
Patches are described with SIFT, color and for MSCR-21 location
features.
A vocabulary is constructed using k-means to quantize to 1000
SIFT words and 400 color words.
An SVM classifier using an intersection kernel is built for each
semantic category.
A similar number of positive and negative examples are used:
around a total of 8.000 superpixel samples for MSCR-21, and
20.000 for VOC 2010 for each class.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
27. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Global potential and general approach
For the PASCAL 2010 dataset we use our entry to the 2010 VOC
Classification Challenge:
[Khan, IJCV2010 (submitted)].
It uses a bag-of-words representation based on SIFT and color
SIFT, plus spatial pyramids and color attention
[Khan, ICCV 2009].
An SVM classifier with a χ2 kernel is trained for each semantic
category in the dataset.
The FG/BG and CLASS cues are computed by training a
discriminative model using an SVM with histogram intersection
kernel.
Except for the additional cues and optimization strategy,
architecture the same as our approach described at CVPR.
[Gonfaus, CVPR2010]
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
28. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Learning the HCRF parameters
We found it to be essential to train the per-class sigmoid
parameters through cross validation.
Classification scores are learned independently, are unbalanced
and are effectively incomparable in many cases.
The sigmoid functions weight the importance of each cue for each
class.
In addition to these (180) sigmoid parameters, we also must learn
the weighting factors for each potential.
We use a stochastic, steepest ascent technique to optimize these
parameters on a validation set.
In each step we randomly generate new instances of parameters.
New parameter instances are generated using a Gibbs-like
sampling strategy.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
30. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Qualitative results: MSRC-21
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
31. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Quantitative results: MSRC-21
MSRC-21 contains more multi-class images than PASCAL.
Our performance demonstrates the benefits of incorporating
global scale when making local decisions.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
32. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Qualitative results: PASCAL 2010
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
33. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Quantitative results: PASCAL 2010
FG/BG shows the performance of our baseline (PASCAL 2009)
approach.
At the top, performance on the validation set (i.e. how well we
thought we were doing).
Image tags indicated how well the technique can perform with
perfect global information.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
34. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
The cost of segmentation
The optimal MAP label configuration x∗ is inferred using
α-expansion graph cuts [Kolmogorov, PAMI2004].
The global node uses the 100 most probable label subsets
Sheet1
obtained from ranked subsampling.
MSRC-21 PASCAL 2010
85 50
48
80
mAP on PASCAL VOC 2010
46
75 44
mAP on MSRC-21
70 42
40
65 38
60 36
34
55
32
50 30
1 2 3 5 10 15 20 25 30 35 40 50 75 100 150 200
# labels selected
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
35. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Qualitative results: PASCAL 2010 failures
Context is sometimes weighted too much.
When the global classifier fails, little can be done.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
36. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Every little bit helps
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
37. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
A photo finish
Sheet1
Sheet1
42
15 20 25 30 35 40
40
mAP on PASCAL VOC 2010
FG-BG 33.9
CLASS 23.4 38
LOC 20.1 36
OBJ 26.2
34
FG-BG + CLASS 36.6
32
All 40.4
30
0 500 1000 1500 2000 2500 3000
#iterations
The final results are tough to call between BONN and CVC.
In the end, fusion over many scales and per-class, per-feature
parameter optimization won.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
38. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
The action recognition taster
Images collected from Flikr using action queries. A set of nine
actions was chosen in the end.
They are disjoint from the main challenge dataset.
Only subset of people are annotated (bounding box + action).
This subset labelled with exactly one action class.
Important point: we don’t have to solve the detection problem.
Most action classes in the challenge contain either large variation
in scale or large variations in pose (or both).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
40. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Grouplets and poselets
Two state-of-the art techniques to action recognition in still
images. The grouplets of Fei Fei Li [Yao et al, CVPR2010]:
And the latent poses of Greg Mori [Yang et al, CVPR2010]:
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
41. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Treat it like image classification
Initial experiments confirmed our intuition about the limitations of
the data.
Structural learning: sampling of pose space not dense enough.
Latent SVM: complexity of object interactions problematic.
Multiple kernel learning: converges to simple selection.
State-of-the-art techniques rely on learning complex structural
models of pose-variations over many
From a very early stage, we decided to treat action recognition as
an image classification problem.
We exploit the small size dataset by performing extensive cross
validation.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
42. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
The classification pipeline
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
43. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Action recognition: features
SIFT, color SIFT (normalize R/G and opponent), self-similarity,
SURF, PHOG (good for capturing pose), and color attention
(focuses on interesting color features).
Sparse and dense variations of most of these.
Plus a range of pyramid configurations (1, 2 × 2, 3 × 3, 4 × 4).
Object detectors also incorporated using a simple occurrence
histogram [Felzenszwalb 2010].
The goal was to incorporate all of this into a BoVW classifier and
push the limits of what is possible using classical BoW on actions.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
44. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Action recognition: contextual pyramids
Context was also important for most object classes.
We used a type of foreground/background pyramid decomposition
that split features into object or background.
The was done using a type of spatial soft-assign based on the
distance to the boundary of the object.
For some classes, we also assigned contextual object regions that
model the appearance of objects associated with them (the “horsy
box”).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
45. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Action recognition: learning in the design space
In the end, after all of the combinatorics introduced by pyramids
and other variations, we had about 100 feature configurations in a
big pool.
Most attempts to automatically learn the parameters of these
features were total failures.
Except one. Initial experiments with multiple kernel learning
showed that MKL starts converging quickly towards class-specific
feature selection rather than mixing.
With such a small dataset, and a little heuristic trimming, we were
able to exhaustively explore a part of the design space.
This resulted in the best per-class feature combinations.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
46. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Action recognition: classification
We experimented with a number of kernels (histogram
intersection, χ2 , bin-ratio distance).
There wasn’t a huge difference among these kernels.
In the end, we chose histogram intersection for our submission as
it appeared to generalize better.
In addition to over-fitting less, there are no parameters to tune and
it is very fast.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
47. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Overall results: average precision
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
48. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Per-class AP
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
49. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Per technique median average precision
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
50. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Qualitative results
When the horsey box and detectors fail, context dominates.
Classifier still surprisingly robust.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
51. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Qualitative results
Some fine discriminations very difficult to make.
Probably difficult even for humans.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
52. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Qualitative results
People taking photos should be banned.
Classes with large pose variations were the most difficult.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
53. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
Discussion: semantic image segmentation
The harmony potential works well for fusing global information into
local segmentations.
This year we also showed that the harmony potential framework is
also appropriate for incorporating different types of mid-level cues
as well.
Ranked sub-sampling, driven by the same posterior as used to
define the global potential function, renders the optimization
problem tractable.
Most useful when multiple semantic classes co-occur frequently.
Per-class learning of parameters essential (about +5% in final
results).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
54. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
Discussion: action recognition
This year’s taster challenge on action recognition was little more
than a toy.
However, we have demonstrated what is possible using proven
techniques from image classification.
We feel that object context, in particular object interaction context,
is the way forward.
The PASCAL data set is the right direction to go (more general),
but we need more samples.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
55. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
The future: segmentation
Semantic image segmentation has come a long way, but still has a
long way to go.
It is becoming a mainstream event in PASCAL.
This year we arrived as a sort of three-way detente between the
CVC (winner 2010), BONN (winner 2009) and OXFORD (best
paper award ECCV 2010) in segmentation.
Each have their own approach, and each has its advantages and
disadvantages.
Engineering can probably maximize results.
It is becoming mature, and we can begin thinking about what new
applications are enabled by such technologies.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
56. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
The future: action recognition
It seems that action recognition in still images is a popular
challenge.
The PASCAL organizers are keen to promote it for the future.
The concentration will remain on still images, but perhaps more
concentration on incorporating user interaction as well.
It seems that the community is becoming more interested in the
“alternative” PASCAL challenges.
The multimedia community probably has an important role to play
here.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010