SlideShare a Scribd company logo
1 of 44
Download to read offline
Low computational cost algorithms
for photo clustering and mail
signature detection in the cloud!
Daniel Manchón
Co-directors: Xavi Giró (UPC) Omar Pera (Pixable)
1
Outline
• Motivation!
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
2
Motivation: Photo clustering
3
Low computational cost algorithms for photo clustering and mail signature detection in the cloud
Motivation: Mail signature detection
4
Low computational cost algorithms for photo clustering and mail signature detection in the cloud
Motivation: Cloud computing
5
Low computational cost algorithms for photo clustering and mail signature detection in the cloud
Outline
• Motivation
• Tasks summary
• Pixable internship!
• GPI research assistant
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
6
Pixable internship
- Social photos aggregation!
- Photo ranking!
- Editorial content!
- Contacts feeds!
- Owned by Singtel
- Photo storage!
- Synchronization across multiple devices!
- Support for RAW
- CallerID application!
- Multiple contact source support!
- Contact backup and synchronization!
- SPAM detection
7
Photofeed tasks
• Instagram source (in-production)
• Referrals and invitations method
• "New relic" integration
• Photo clustering and
summarization
• Photo download service 

(in-production)
8
• Mail scrapping monitorization
• Signature detection!
• Identity analysis improvement
• Tooling (in-production)
Contactive tasks
9
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant!
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
10
GPI research assistant
• Mediaeval 2013 (published paper)
• ICMR SEWM (published paper)
• Pyxel software framework
• Mediaeval 2014
11
Multimedia retrieval conference
GPI: Image and Video Processing Group
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction!
• Requirements
• Design
• Results
12
Photo Clustering: Intro
PhotoTOC
[Platt et al, PACRIM 2003]
State of the artEvent detection
13
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction
• Requirements!
• Design
• Results
14
Photo Clustering: Requirements
• User data stored in Amazon
cloud and MongoDB.
• Low computing
• Easily configurable using
REST API
• Event generation
• Visual and metadata information
available
• F1 and NMI as evaluation metrics
• 400k annotated photo dataset
Mediaeval requirements Photofeed constrains
15
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design!
• Results
16
Design
Hi, I’m John. Hi, I’m Emily.
(a) Temporal sorting by each user independently
17
Design
(b) Temporal-based oversegmentation in mini-clusters
PhotoTOC
[Platt et al, PacRim 2003]
18
Design
(b) Temporal-based oversegmentation in mini-clusters, mean values modelization
19
Username= John
T.taken= 2010-09-10 02:10:12
GPS= (42.1,-10)
tags= live,stage,deerhunter
Username= emily
T.taken= 2010-12-13 02:11:10
GPS= (43,-8.40)
tags= live,deerhunter
Username= emily
T.taken= 2010-12-13 03:11:10
GPS= (no data)
tags= live,stones
Username= emily
T.taken= 2010-12-14 23:11:10
GPS= (43.2,-8.2)
tags= sound, test
Design
(c) Sequential merging of mini-clusters
?
t
avg(·) avg(·) avg(·)avg(·)
20
Design
(c) Sequential merging of mini-clusters
21
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
22
Results
F1 = 2
PR
P + R
UPC 3rd place of 12 teams!!!
23
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction!
• Requirements
• Design
• Results
24
Mail signature detection: Intro
• Email information extraction
• SPAM detection
• Low computation
State of the artKEY TOPICS
Learning to extract signature and reply lines from email
[Vitor R. Carvalho and William W. Cohen, 2004 ]
25
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction
• Requirements!
• Design
• Results
26
Mail signature detection: Requirements
• Mail scrapping service improvement
• Pre-process the input to reduce the execution time
• Adapt the mail scrapping service to Contactive product
?
fewer information
filter only signatures
MongoDB entries
User mailbox
id 89012
name John Doe
email j.doe@gmail.com
linkedin Id 7788455367_e
phone 789675463
27
Mail
scrapping
service
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction
• Requirements
• Design!
• Results
28
Design
2. Problem Definition and Corpus
A signature block is the set of lines, usually in the end of a message, that contain information about the sender,
such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from
famous persons and creative ASCII drawings are often present in this block also. An example of a signature block
can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1
also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In
this paper we will call such lines reply lines.
<other> From: wcohen@cs.cmu.edu
<other> To: Vitor Carvalho <vitor@cs.cmu.edu>
<other> Subject: Re: Did you try to compile javadoc recently?
<other> Date: 25 Mar 2004 12:05:51 -0500
<other>
<other> Try cvs update –dP, this removes files & directories that have been
<other> deleted from cvs.
<other> - W
<other>
<reply> On Wed, 2004-03-24 at 19:58, Vitor Carvalho wrote:
<reply> > I’ve just checked-out the baseline m3 code and
<reply> > "Ant dist" is working fine, but "ant javadoc" is not.
<reply> > Thanks
<reply> > Vitor
<other>
<sig> ------------------------------------------------------------------
<sig> William W. Cohen “Would you drive a mime
<sig> wcohen@cs.cmu.edu nuts if you played a
<sig> http://www.wcohen.com blank audio tape
<sig> Associate Research Professor full blast?”
<sig> CALD, Carnegie-Mellon University - S. Wright
Figure 1 - Excerpt from a labeled email message
(a) Split the K last mail lines and retrieve the annotations
Last K
lines
Ground truth
annotations
29
2. Problem Definition and Corpus
A signature block is the set of lines, usually in the end of a message, that contain information about the sender,
such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from
famous persons and creative ASCII drawings are often present in this block also. An example of a signature block
can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1
also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In
Lines
N Feature
Patterns
(b) feature extraction
30
Design
Design
(c) SVM training and model generation
nition and Corpus
is the set of lines, usually in the end of a message, that contain information about the sender,
e, affiliation, postal address, web address, email address, telephone number, etc. Quotes from
reative ASCII drawings are often present in this block also. An example of a signature block
lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1
of text that were quoted from a preceding message (marked with the line label <reply>). In
such lines reply lines.
om: wcohen@cs.cmu.edu
: Vitor Carvalho <vitor@cs.cmu.edu>
31
Feature matrix
[KxN]
Vector ground truth
[K]
+ SVM
training Model=
Design
(c) SVM training and model generation
Model
● Other
● Reply
● Signature
Lines
Classes
pre-process
Features
32
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction
• Requirements
• Design
• Results
33
Results
F1 = 2
Precision · Recall
Precision + Recall
34
With annotated dataset Without annotated dataset
Manual evaluation
Contactive user base mailboxes
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
35
Conclusions
• Academic
• Papers: Mediaeval 2013 and ICMR SEWM, and Mediaeval 2014 on preparation.
• UPC Pyxel framework foundations
• Industrial
• Contributions to Pixable in production servers:
• Instagram integration
• Photofeed Downloader
• Mail signature detection: Proof of concept successful.
• Work in the USA!
36
Thank you very much!!
Q&A
37
BACKUP SLIDES
38
Design
39
(c) Sequential merging of mini-clusters
Weighted
modalities
● creation (or upload) time
● geolocation
● textual labels
● same user
Design
40
(c) Sequential merging of mini-clusters
Geolocation (d=haversine)Time stamp (d=L1)
Text labels (d=Jaccard) Same user (d=boolean)
Design
41
(c) Sequential merging of mini-clusters
Design
42
(c) Sequential merging of mini-clusters
42
Mean and std.
deviation learned on
pairs of photos within
the same training
event.
Design
43
(c) Sequential merging of mini-clusters
43
phi function
Design
44
(c) Sequential merging of mini-clusters
decision threhold

More Related Content

Viewers also liked

Bebida " Real Peru "- Marketing Informatico - UNFV
Bebida " Real Peru "- Marketing Informatico - UNFVBebida " Real Peru "- Marketing Informatico - UNFV
Bebida " Real Peru "- Marketing Informatico - UNFVMedaly Ventocilla
 
Global Real Estate Institute - Connecting Real Estate Leaders Worldwide
Global Real Estate Institute - Connecting Real Estate Leaders WorldwideGlobal Real Estate Institute - Connecting Real Estate Leaders Worldwide
Global Real Estate Institute - Connecting Real Estate Leaders WorldwideRoy Maybury
 
Dr. Douglas Rosendale
Dr. Douglas Rosendale Dr. Douglas Rosendale
Dr. Douglas Rosendale Investnet
 
Presentación Psico Educa Vet Corp
Presentación Psico Educa Vet CorpPresentación Psico Educa Vet Corp
Presentación Psico Educa Vet CorpJose Rafael Romero
 
B5 - OTHER REFERENCES - PRIOR TO STEINHOFF
B5 - OTHER REFERENCES - PRIOR TO STEINHOFFB5 - OTHER REFERENCES - PRIOR TO STEINHOFF
B5 - OTHER REFERENCES - PRIOR TO STEINHOFFNivera Ishwarlall
 
Pensemos _un__docente__actor__y_constructor__de__innovaciones[1]
Pensemos  _un__docente__actor__y_constructor__de__innovaciones[1]Pensemos  _un__docente__actor__y_constructor__de__innovaciones[1]
Pensemos _un__docente__actor__y_constructor__de__innovaciones[1]Lorena Mariela Rodriguez
 
La gatera de_la_villa_La Gatera de la Villa nº 5
La gatera de_la_villa_La Gatera de la Villa nº 5La gatera de_la_villa_La Gatera de la Villa nº 5
La gatera de_la_villa_La Gatera de la Villa nº 5La Gatera de la Villa
 
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick SohnemannTrend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick SohnemannHUNGRY BOYS Creative agency
 
Español Aeronáutico creado por Lidia Están
Español Aeronáutico creado por Lidia Están Español Aeronáutico creado por Lidia Están
Español Aeronáutico creado por Lidia Están Lidia Mar Est
 
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.CA Technologies
 
Curso de Instrumentación Endoscópica para Enfermería
Curso de Instrumentación Endoscópica para EnfermeríaCurso de Instrumentación Endoscópica para Enfermería
Curso de Instrumentación Endoscópica para EnfermeríaBlog Materno-Infantil
 

Viewers also liked (19)

Bebida " Real Peru "- Marketing Informatico - UNFV
Bebida " Real Peru "- Marketing Informatico - UNFVBebida " Real Peru "- Marketing Informatico - UNFV
Bebida " Real Peru "- Marketing Informatico - UNFV
 
Desarrollo de competencias genéricas y yoga
Desarrollo de competencias genéricas  y yogaDesarrollo de competencias genéricas  y yoga
Desarrollo de competencias genéricas y yoga
 
Global Real Estate Institute - Connecting Real Estate Leaders Worldwide
Global Real Estate Institute - Connecting Real Estate Leaders WorldwideGlobal Real Estate Institute - Connecting Real Estate Leaders Worldwide
Global Real Estate Institute - Connecting Real Estate Leaders Worldwide
 
Dr. Douglas Rosendale
Dr. Douglas Rosendale Dr. Douglas Rosendale
Dr. Douglas Rosendale
 
Comenzar
ComenzarComenzar
Comenzar
 
Presentación Psico Educa Vet Corp
Presentación Psico Educa Vet CorpPresentación Psico Educa Vet Corp
Presentación Psico Educa Vet Corp
 
B5 - OTHER REFERENCES - PRIOR TO STEINHOFF
B5 - OTHER REFERENCES - PRIOR TO STEINHOFFB5 - OTHER REFERENCES - PRIOR TO STEINHOFF
B5 - OTHER REFERENCES - PRIOR TO STEINHOFF
 
Pensemos _un__docente__actor__y_constructor__de__innovaciones[1]
Pensemos  _un__docente__actor__y_constructor__de__innovaciones[1]Pensemos  _un__docente__actor__y_constructor__de__innovaciones[1]
Pensemos _un__docente__actor__y_constructor__de__innovaciones[1]
 
Expo
ExpoExpo
Expo
 
La gatera de_la_villa_La Gatera de la Villa nº 5
La gatera de_la_villa_La Gatera de la Villa nº 5La gatera de_la_villa_La Gatera de la Villa nº 5
La gatera de_la_villa_La Gatera de la Villa nº 5
 
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick SohnemannTrend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
 
PhD_APC_UPMC_IFPEN_dec1997
PhD_APC_UPMC_IFPEN_dec1997PhD_APC_UPMC_IFPEN_dec1997
PhD_APC_UPMC_IFPEN_dec1997
 
Español Aeronáutico creado por Lidia Están
Español Aeronáutico creado por Lidia Están Español Aeronáutico creado por Lidia Están
Español Aeronáutico creado por Lidia Están
 
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
 
Man Base Datos I
Man Base Datos IMan Base Datos I
Man Base Datos I
 
Proyecto
ProyectoProyecto
Proyecto
 
Pintados de pasi
Pintados de pasiPintados de pasi
Pintados de pasi
 
Curso de Instrumentación Endoscópica para Enfermería
Curso de Instrumentación Endoscópica para EnfermeríaCurso de Instrumentación Endoscópica para Enfermería
Curso de Instrumentación Endoscópica para Enfermería
 
Manual corporativo - MARCA
Manual corporativo - MARCAManual corporativo - MARCA
Manual corporativo - MARCA
 

Similar to Low computational cost algorithms for photo clustering and mail signature detection in the cloud

SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...South Tyrol Free Software Conference
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxMongoDB
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxMongoDB
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...Gábor Szárnyas
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big datajins0618
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?Brent Ozar
 
PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittGraySystemsLab
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 

Similar to Low computational cost algorithms for photo clustering and mail signature detection in the cloud (20)

SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
PCL (Point Cloud Library)
PCL (Point Cloud Library)PCL (Point Cloud Library)
PCL (Point Cloud Library)
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptx
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptx
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Effective C++
Effective C++Effective C++
Effective C++
 
PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWitt
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Low computational cost algorithms for photo clustering and mail signature detection in the cloud

  • 1. Low computational cost algorithms for photo clustering and mail signature detection in the cloud! Daniel Manchón Co-directors: Xavi Giró (UPC) Omar Pera (Pixable) 1
  • 2. Outline • Motivation! • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 2
  • 3. Motivation: Photo clustering 3 Low computational cost algorithms for photo clustering and mail signature detection in the cloud
  • 4. Motivation: Mail signature detection 4 Low computational cost algorithms for photo clustering and mail signature detection in the cloud
  • 5. Motivation: Cloud computing 5 Low computational cost algorithms for photo clustering and mail signature detection in the cloud
  • 6. Outline • Motivation • Tasks summary • Pixable internship! • GPI research assistant • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 6
  • 7. Pixable internship - Social photos aggregation! - Photo ranking! - Editorial content! - Contacts feeds! - Owned by Singtel - Photo storage! - Synchronization across multiple devices! - Support for RAW - CallerID application! - Multiple contact source support! - Contact backup and synchronization! - SPAM detection 7
  • 8. Photofeed tasks • Instagram source (in-production) • Referrals and invitations method • "New relic" integration • Photo clustering and summarization • Photo download service 
 (in-production) 8
  • 9. • Mail scrapping monitorization • Signature detection! • Identity analysis improvement • Tooling (in-production) Contactive tasks 9
  • 10. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant! • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 10
  • 11. GPI research assistant • Mediaeval 2013 (published paper) • ICMR SEWM (published paper) • Pyxel software framework • Mediaeval 2014 11 Multimedia retrieval conference GPI: Image and Video Processing Group
  • 12. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction! • Requirements • Design • Results 12
  • 13. Photo Clustering: Intro PhotoTOC [Platt et al, PACRIM 2003] State of the artEvent detection 13
  • 14. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction • Requirements! • Design • Results 14
  • 15. Photo Clustering: Requirements • User data stored in Amazon cloud and MongoDB. • Low computing • Easily configurable using REST API • Event generation • Visual and metadata information available • F1 and NMI as evaluation metrics • 400k annotated photo dataset Mediaeval requirements Photofeed constrains 15
  • 16. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction • Requirements • Design! • Results 16
  • 17. Design Hi, I’m John. Hi, I’m Emily. (a) Temporal sorting by each user independently 17
  • 18. Design (b) Temporal-based oversegmentation in mini-clusters PhotoTOC [Platt et al, PacRim 2003] 18
  • 19. Design (b) Temporal-based oversegmentation in mini-clusters, mean values modelization 19 Username= John T.taken= 2010-09-10 02:10:12 GPS= (42.1,-10) tags= live,stage,deerhunter Username= emily T.taken= 2010-12-13 02:11:10 GPS= (43,-8.40) tags= live,deerhunter Username= emily T.taken= 2010-12-13 03:11:10 GPS= (no data) tags= live,stones Username= emily T.taken= 2010-12-14 23:11:10 GPS= (43.2,-8.2) tags= sound, test
  • 20. Design (c) Sequential merging of mini-clusters ? t avg(·) avg(·) avg(·)avg(·) 20
  • 21. Design (c) Sequential merging of mini-clusters 21
  • 22. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 22
  • 23. Results F1 = 2 PR P + R UPC 3rd place of 12 teams!!! 23
  • 24. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction! • Requirements • Design • Results 24
  • 25. Mail signature detection: Intro • Email information extraction • SPAM detection • Low computation State of the artKEY TOPICS Learning to extract signature and reply lines from email [Vitor R. Carvalho and William W. Cohen, 2004 ] 25
  • 26. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction • Requirements! • Design • Results 26
  • 27. Mail signature detection: Requirements • Mail scrapping service improvement • Pre-process the input to reduce the execution time • Adapt the mail scrapping service to Contactive product ? fewer information filter only signatures MongoDB entries User mailbox id 89012 name John Doe email j.doe@gmail.com linkedin Id 7788455367_e phone 789675463 27 Mail scrapping service
  • 28. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction • Requirements • Design! • Results 28
  • 29. Design 2. Problem Definition and Corpus A signature block is the set of lines, usually in the end of a message, that contain information about the sender, such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from famous persons and creative ASCII drawings are often present in this block also. An example of a signature block can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In this paper we will call such lines reply lines. <other> From: wcohen@cs.cmu.edu <other> To: Vitor Carvalho <vitor@cs.cmu.edu> <other> Subject: Re: Did you try to compile javadoc recently? <other> Date: 25 Mar 2004 12:05:51 -0500 <other> <other> Try cvs update –dP, this removes files & directories that have been <other> deleted from cvs. <other> - W <other> <reply> On Wed, 2004-03-24 at 19:58, Vitor Carvalho wrote: <reply> > I’ve just checked-out the baseline m3 code and <reply> > "Ant dist" is working fine, but "ant javadoc" is not. <reply> > Thanks <reply> > Vitor <other> <sig> ------------------------------------------------------------------ <sig> William W. Cohen “Would you drive a mime <sig> wcohen@cs.cmu.edu nuts if you played a <sig> http://www.wcohen.com blank audio tape <sig> Associate Research Professor full blast?” <sig> CALD, Carnegie-Mellon University - S. Wright Figure 1 - Excerpt from a labeled email message (a) Split the K last mail lines and retrieve the annotations Last K lines Ground truth annotations 29
  • 30. 2. Problem Definition and Corpus A signature block is the set of lines, usually in the end of a message, that contain information about the sender, such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from famous persons and creative ASCII drawings are often present in this block also. An example of a signature block can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In Lines N Feature Patterns (b) feature extraction 30 Design
  • 31. Design (c) SVM training and model generation nition and Corpus is the set of lines, usually in the end of a message, that contain information about the sender, e, affiliation, postal address, web address, email address, telephone number, etc. Quotes from reative ASCII drawings are often present in this block also. An example of a signature block lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 of text that were quoted from a preceding message (marked with the line label <reply>). In such lines reply lines. om: wcohen@cs.cmu.edu : Vitor Carvalho <vitor@cs.cmu.edu> 31 Feature matrix [KxN] Vector ground truth [K] + SVM training Model=
  • 32. Design (c) SVM training and model generation Model ● Other ● Reply ● Signature Lines Classes pre-process Features 32
  • 33. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction • Requirements • Design • Results 33
  • 34. Results F1 = 2 Precision · Recall Precision + Recall 34 With annotated dataset Without annotated dataset Manual evaluation Contactive user base mailboxes
  • 35. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 35
  • 36. Conclusions • Academic • Papers: Mediaeval 2013 and ICMR SEWM, and Mediaeval 2014 on preparation. • UPC Pyxel framework foundations • Industrial • Contributions to Pixable in production servers: • Instagram integration • Photofeed Downloader • Mail signature detection: Proof of concept successful. • Work in the USA! 36
  • 37. Thank you very much!! Q&A 37
  • 39. Design 39 (c) Sequential merging of mini-clusters Weighted modalities ● creation (or upload) time ● geolocation ● textual labels ● same user
  • 40. Design 40 (c) Sequential merging of mini-clusters Geolocation (d=haversine)Time stamp (d=L1) Text labels (d=Jaccard) Same user (d=boolean)
  • 42. Design 42 (c) Sequential merging of mini-clusters 42 Mean and std. deviation learned on pairs of photos within the same training event.
  • 43. Design 43 (c) Sequential merging of mini-clusters 43 phi function
  • 44. Design 44 (c) Sequential merging of mini-clusters decision threhold