Successfully reported this slideshow.

Active Human-in-the-Loop Deep Learning for Cultural Metadata Enrichment by Natasa Sofou - EuropeanaTech Conference 2018



Loading in …3
1 of 13
1 of 13

More Related Content

More from Europeana

Related Books

Free with a 14 day trial from Scribd

See all

Active Human-in-the-Loop Deep Learning for Cultural Metadata Enrichment by Natasa Sofou - EuropeanaTech Conference 2018

  1. 1. Active Human-in-the-Loop Deep Learning for Cultural Metadata Enrichment Europeana Tech 2018 Eddie Dervakos, Antonis Korkofigas, Natasa Sofou, Giorgos Stamou Intelligent Systems Laboratory, National Technical University of Athens
  2. 2. Context and Motivation Artificial Intelligence ● Process big amounts of data ● Deep Data ● Improve content discoverability through automatic information extraction for metadata enrichment Digital Cultural Heritage ● Huge Amount of content in Digital Cultural Heritage ● Content is not easily accessible and discoverable ● Poor metadata quality ● Manual annotation cannot solve the problem: it is costly and cannot scale up
  3. 3. Automatic metadata enrichment ● Use AI enabled services to extract meaningful and useful information from available content (additional, not existing metadata). ● Usually information is extracted and identified through a classification process: content is classified in predefined categories Vanitas Still Life with the Spinario, Claesz., Pieter Image: courtesy of Rijksmuseum, Public Domain Classify image based on its type and format: Oil Painting Classify image based on its content w.r.t musical instruments: Violin
  4. 4. AI and classification Training Data Feature Selection Learning Algorithm (e.g SVM, CNN, …) Output AI Classifier Data Information extraction: classify data object in predefined categories Restrictions: Training data availability (lack of annotated content)
  5. 5. AI model : CNN for feature selection and classification Learning algorithm & classification Taking some input data (image, music recording, text etx) and outputting a class (a cat, dog, etc) or a probability of classes that best describes the data (class labels needed) Unsupervised Feature Extraction ● Feature extraction is necessary since data usually carry too much redundant and/or irrelevant information ● Many possible features to consider ● Start from wide set of data features and result to a more restricted set with strong representation power piano sax violin celo voiceconvolution +nonlinearity max pooling Convolution and pooling layers Fully connected layers N binary classification vec
  6. 6. Human - in- the- Loop AI classifier Human Annotation Output Active Learning Confident Uncertain
  7. 7. HITL - Active learning approach AI model Labelled training set Unlabelled pool of data UL Human annotator Learn a model Select queries Given an unlabeled pool of examples: ● Rank examples in order of informativeness (using existing methods and defining new ones based on description logic) ● Query the labels of the most informative examples of U ● The new labeled examples are included in the training data L ● The model is re-trained using the new training data
  8. 8. Musical Instrument Identification Data: Music audio signals (synthetic and real recordings) ac. guitar el. guitar violin cello saxophone organ piano voice flute clarinet trumpet Classes (labels) Audio waveform Spectrogram Nsynth 1.5k Datasets MIS 2k 5 classes Europeana Sounds - user annotated set 2k 63 labels IRMAS 6.5k 5 classes 5 classes
  9. 9. Unsupervised learning for feature extraction HITL- Deep Active learning Most Informative samples Human annotations CNN Learning algorithm Model updating Labelled data L Unlabelled data U
  10. 10. Experimental Results Problems ● Recordings quality (scratches) ● Unknown instruments ● Piano often annotates as string ● User annotations need extra validation L IRMAS MIS NSynth Total cel 388 471 0 859 pia 721 0 300 1021 vio 580 228 0 808 cla 505 239 0 744 org 682 0 176 858 gac 637 0 300 937 tru 441 235 0 676 flu 451 219 0 670 sax 626 323 0 949 gel 760 0 300 1060 voi 778 0 300 1078 Total 6569 1715 1376 Confidece:= Output of final (softmax) layer of CNN L1 Total cel 604 pia 738 vio 574 cla 515 org 575 gac 672 tru 468 flu 468 sax 649 gel 739 voi 760 Total L2 Total cel 255 pia 283 vio 234 cla 229 org 283 gac 265 tru 208 flu 202 sax 300 gel 321 voi 318 Total L: Labeled set of data L= L1+L2 Each track is divided in segments of 3 seconds Training L1 accuracy L2 accuracy L accuracy Europeana Sounds Annotated Set L1 85.18% 71.25% 81.00% L2 L 90.55% 89.85% 90.34%
  11. 11. Other Application Domains
  12. 12. Europeana Photography Assessing the aesthetic quality of images Attributes: ● balance, fill the frame, lead room, rule of thirds, motion blur, simple, color harmony, framed, leading lines, shallow DOF, Repetition and pattern symmetry ● interesting content, object emphasis, good lighting, color harmony, vivid color, shallow depth of field, motion blur, rule of thirds, balancing element, repetition, and symmetry earlyresults
  13. 13. Europeana Fashion Extracting information from catwalk images: ● Identification of clothing items (dress, shoes, skirts, trousers swimsuits) ● Cloth categories ● Details ● Colors ● Patterns ● Fabric texture Images courtesy of European Fashion, under copyright.