This document summarizes a multimedia video classification approach using multiple modalities. It describes extracting audio features like spectral patterns and visual features like color histograms. Text is derived from automatic speech recognition and metadata. Combining audio, visual, and text features using machine learning classifiers achieves the best performance, with an average F-score of 68% for genre tagging using ASR and metadata. Increasing the number of modalities improves accuracy compared to single modality approaches.
Patricia Harpring presentation for "More Than Meets the Eye? Retrieving Art Images by Subject" session at VRA + ARLIS/NA 2nd joint conference in Minneapolis, MN.
Patricia Harpring presentation for "More Than Meets the Eye? Retrieving Art Images by Subject" session at VRA + ARLIS/NA 2nd joint conference in Minneapolis, MN.
Teaser section of my little book, "Experimental Media Voodoo™". It's about what I do as a digital media artist, and my approach to understanding digital culture, discussed and explained in micro-essays, case studies, tutorials, and neat charts!
Currently looking for a publisher for the full-color, full-bleed version! More info at : http://www.badmindtime.com/book .
Slides presented in the All Japan Computer Vision Study Group on May 15, 2022. Methods for disentangling the relationship between multimodal data are discussed.
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschStephan Baumann
Holistic Recommendation and StoryTelling Technology (HORST): A blueprint, presented at the Future Music Barcamp at Pop Academy Mannheim Germany. Received lots of hate and love. This is NOT an academic in-depth talk although the scientific background is available in detail upon request. Enjoy!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
ARF @ MediaEval 2012: Multimodal Video Classification
1. ~ Multimodal Video Classification ~
ARF (Austria-Romania-France) team
Bogdan IONESCU*1,3 Ionuț MIRONICĂ1 Klaus SEYERLEHNER2
bionescu@imag.pub.ro imironica@imag.pub.ro music@cp.jku.at
Peter KNEES2 Jan SCHLÜTER4 Markus SCHEDL2
peter.knees@jku.at jan.schlueter@ofai.at markus.schedl@jku.at
Horia CUCU1 Andi BUZO1 Patrick LAMBERT3
horia.cucu@upb.ro andi.buzo@upb.ro patrick.lambert@univ-savoie.fr
*this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557.
1 2 3 4
University Austrian Research
POLITEHNICA Institute for Artificial
of Bucharest Intelligence
2. Presentation outline
• The approach
• Video content description
• Experimental results
• Conclusions and future work
MediaEval - Pisa, Italy, 4-5 October 2012 1/16 2
3. The approach
> challenge: find a way to assign (genre) tags to unknown videos;
> approach: machine learning paradigm;
…
web food autos label data
train
unlabeled data
classifier labeled data
tagged video database
video database
MediaEval - Pisa, Italy, 4-5 October 2012 2/163
4. The approach: classification
> the entire process relies on the concept of “similarity” computed
between content annotations (numeric features),
> this year focus is on:
objective 1: go multimodal (truly)
visual audio text
objective 2: test a broad range of classifiers and descriptor
combinations;
MediaEval - Pisa, Italy, 4-5 October 2012 3/164
5. Video content description - audio
block-level audio features • Spectral Pattern,
(capture also local temporal information) ~ soundtrack’s timbre;
• delta Spectral Pattern,
e.g. 50% overlapping
~ strength of onsets;
• variance delta Spectral Pattern,
average ~ variation of the onset strength;
median • Logarithmic Fluctuation Pattern,
variance ~ rhythmic aspects;
... • Correlation Pattern,
~ loudness changes;
• Spectral Contrast Pattern,
~ ”toneness”;
• Local Single Gaussian model,
[Klaus Seyerlehner et al., MIREX’11, USA] ~ timbral;
• George Tzanetakis model,
~ timbral;
MediaEval - Pisa, Italy, 4-5 October 2012 4/16
5
6. Video content description - audio
standard audio features
(audio frame-based)
• Zero-Crossing Rate,
• Linear Predictive Coefficients,
time • Line Spectral Pairs,
• Mel-Frequency Cepstral Coefficients,
global
feature • spectral centroid, flux, rolloff, and
f1 f2 … fn
= kurtosis,
+ mean & + variance of each feature over
var{f2} var{fn} variance a certain window.
[B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands]
MediaEval - Pisa, Italy, 4-5 October 2012 5/16
6
7. Video content description - visual
MPEG-7 & color/texture descriptors
(visual frame-based)
• Local Binary Pattern,
global • Autocorrelogram,
feature • Color Coherence Vector,
=
mean & • Color Layout Pattern,
dispersion & • Edge Histogram,
skewness &
time
kurtosis & • Classic color histogram,
f1 f2 … fn median &
• Scalable Color Descriptor,
root mean square
• Color moments.
[OpenCV toolbox, http://opencv.willowgarage.com]
MediaEval - Pisa, Italy, 4-5 October 2012 6/16
7
8. Video content description - visual
feature descriptors
(visual frame-based)
• Histogram of oriented Gradients (HoG)
~ counts occurrences of gradient orientation
feature points (e.g. Harris)
in localized portions of an image (20º per bin)
• Harris corner detector
• Speeded Up Robust Feature (SURF)
image source http://www.ifp.illinois.edu/~yuhuang
[OpenCV toolbox, http://opencv.willowgarage.com]
MediaEval - Pisa, Italy, 4-5 October 2012 7/16
8
9. Video content description - text
TF-IDF descriptors
(Term Frequency-Inverse Document Frequency)
> text sources: ASR and metadata,
1. remove XML markups,
2. remove terms <5%-percentile of the frequency distribution,
3. select term corpus: retaining for each genre class m terms (e.g. m =
150 for ASR and 20 for metadata) with the highest χ2 values that
occur more frequently than in complement classes,
4. for each document we represent the TF-IDF values.
MediaEval - Pisa, Italy, 4-5 October 2012 8/16
9
10. Experimental results: devset (5,127 seq.)
> classifiers from Weka (Bayes, lazy, functional, trees, etc.),
> cross-validation (train 50% – test 50%),
avg. Fscore (over all genres)
- visual descriptors capabilities 30%±10%,
- using more visual is not more accurate than using few,
- best LBP+CCV+histogram (Fscore=41.2%).
[Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]
MediaEval - Pisa, Italy, 4-5 October 2012 9/1610
11. Experimental results: devset (5,127 seq.)
> cross-validation (train 50% – test 50%),
avg. Fscore (over all genres)
- audio still better than visual (improvement ~6%),
- proposed block-based better than standard (by ~10%),
[Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]
MediaEval - Pisa, Italy, 4-5 October 2012 10/16
11
12. Experimental results: devset (5,127 seq.)
> cross-validation (train 50% – test 50%),
avg. Fscore (over all genres)
- ASR from LIMSI more representative than LIUM (~3%),
- best performance ASR LIMSI + metadata (Fscore=68%).
[Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]
MediaEval - Pisa, Italy, 4-5 October 2012 11/16
12
13. Experimental results: devset (5,127 seq.)
> cross-validation (train 50% – test 50%),
avg. Fscore (over all genres)
- audio-visual close to text (ASR) for the automatic descriptors,
- increasing the number of modalities increases the performance.
[Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]
MediaEval - Pisa, Italy, 4-5 October 2012 12/16
13
14. Experimental results: official runs (9,550 seq.)
> train on devset, test on testset (SVM linear),
MediaEval MediaEval
2011 2011
MAP 12% MAP 10.3%
Run1 Run2 Run3 Run4 Run5
LBP+CCV+ TF-IDF on audio block-based + audio TF-IDF on
hist + audio ASR LIMSI LBP + CCV + hist + block-based metadata +
metadata
block-based TF-IDF on ASR ASR LIMSI
LIMSI
MediaEval - Pisa, Italy, 4-5 October 2012 13/16
14
15. Experimental results: official runs (9,550 seq.)
> genre MAP for Run 5: TF-IDF on ASR + metadata,
Run 1: visual + audio
autos gaming religion environment
52% 71% 71% 50%
MediaEval - Pisa, Italy, 4-5 October 2012 14/16
15
16. Conclusions and future work
> classification adapts to the corpus – changing the corpus will
change the performance;
> audio-visual descriptors are inherently limited;
> how far can we go with ad-hoc classification without human
intervention?
> future work:
more elaborated late-fusion ?
pursue tests on the entire data set;
perhaps more elaborated Bag-of-Visual-Words.
Acknowledgement: we would like to thank Prof. Fausto Giunchiglia and
Prof. Nicu Sebe from University of Trento for their support.
MediaEval - Pisa, Italy, 4-5 October 2012 15/16
16
17. thank you !
any questions ?
MediaEval - Pisa, Italy, 4-5 October 2012 16/16
17