Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Low computational cost algorithms for photo clustering and mail signature detection in the cloud

972 views

Published on

Published in: Technology
  • DOWNLOAD FULL. BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Low computational cost algorithms for photo clustering and mail signature detection in the cloud

  1. 1. Low computational cost algorithms for photo clustering and mail signature detection in the cloud! Daniel Manchón Co-directors: Xavi Giró (UPC) Omar Pera (Pixable) 1
  2. 2. Outline • Motivation! • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 2
  3. 3. Motivation: Photo clustering 3 Low computational cost algorithms for photo clustering and mail signature detection in the cloud
  4. 4. Motivation: Mail signature detection 4 Low computational cost algorithms for photo clustering and mail signature detection in the cloud
  5. 5. Motivation: Cloud computing 5 Low computational cost algorithms for photo clustering and mail signature detection in the cloud
  6. 6. Outline • Motivation • Tasks summary • Pixable internship! • GPI research assistant • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 6
  7. 7. Pixable internship - Social photos aggregation! - Photo ranking! - Editorial content! - Contacts feeds! - Owned by Singtel - Photo storage! - Synchronization across multiple devices! - Support for RAW - CallerID application! - Multiple contact source support! - Contact backup and synchronization! - SPAM detection 7
  8. 8. Photofeed tasks • Instagram source (in-production) • Referrals and invitations method • "New relic" integration • Photo clustering and summarization • Photo download service 
 (in-production) 8
  9. 9. • Mail scrapping monitorization • Signature detection! • Identity analysis improvement • Tooling (in-production) Contactive tasks 9
  10. 10. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant! • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 10
  11. 11. GPI research assistant • Mediaeval 2013 (published paper) • ICMR SEWM (published paper) • Pyxel software framework • Mediaeval 2014 11 Multimedia retrieval conference GPI: Image and Video Processing Group
  12. 12. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction! • Requirements • Design • Results 12
  13. 13. Photo Clustering: Intro PhotoTOC [Platt et al, PACRIM 2003] State of the artEvent detection 13
  14. 14. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction • Requirements! • Design • Results 14
  15. 15. Photo Clustering: Requirements • User data stored in Amazon cloud and MongoDB. • Low computing • Easily configurable using REST API • Event generation • Visual and metadata information available • F1 and NMI as evaluation metrics • 400k annotated photo dataset Mediaeval requirements Photofeed constrains 15
  16. 16. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction • Requirements • Design! • Results 16
  17. 17. Design Hi, I’m John. Hi, I’m Emily. (a) Temporal sorting by each user independently 17
  18. 18. Design (b) Temporal-based oversegmentation in mini-clusters PhotoTOC [Platt et al, PacRim 2003] 18
  19. 19. Design (b) Temporal-based oversegmentation in mini-clusters, mean values modelization 19 Username= John T.taken= 2010-09-10 02:10:12 GPS= (42.1,-10) tags= live,stage,deerhunter Username= emily T.taken= 2010-12-13 02:11:10 GPS= (43,-8.40) tags= live,deerhunter Username= emily T.taken= 2010-12-13 03:11:10 GPS= (no data) tags= live,stones Username= emily T.taken= 2010-12-14 23:11:10 GPS= (43.2,-8.2) tags= sound, test
  20. 20. Design (c) Sequential merging of mini-clusters ? t avg(·) avg(·) avg(·)avg(·) 20
  21. 21. Design (c) Sequential merging of mini-clusters 21
  22. 22. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 22
  23. 23. Results F1 = 2 PR P + R UPC 3rd place of 12 teams!!! 23
  24. 24. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction! • Requirements • Design • Results 24
  25. 25. Mail signature detection: Intro • Email information extraction • SPAM detection • Low computation State of the artKEY TOPICS Learning to extract signature and reply lines from email [Vitor R. Carvalho and William W. Cohen, 2004 ] 25
  26. 26. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction • Requirements! • Design • Results 26
  27. 27. Mail signature detection: Requirements • Mail scrapping service improvement • Pre-process the input to reduce the execution time • Adapt the mail scrapping service to Contactive product ? fewer information filter only signatures MongoDB entries User mailbox id 89012 name John Doe email j.doe@gmail.com linkedin Id 7788455367_e phone 789675463 27 Mail scrapping service
  28. 28. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction • Requirements • Design! • Results 28
  29. 29. Design 2. Problem Definition and Corpus A signature block is the set of lines, usually in the end of a message, that contain information about the sender, such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from famous persons and creative ASCII drawings are often present in this block also. An example of a signature block can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In this paper we will call such lines reply lines. <other> From: wcohen@cs.cmu.edu <other> To: Vitor Carvalho <vitor@cs.cmu.edu> <other> Subject: Re: Did you try to compile javadoc recently? <other> Date: 25 Mar 2004 12:05:51 -0500 <other> <other> Try cvs update –dP, this removes files & directories that have been <other> deleted from cvs. <other> - W <other> <reply> On Wed, 2004-03-24 at 19:58, Vitor Carvalho wrote: <reply> > I’ve just checked-out the baseline m3 code and <reply> > "Ant dist" is working fine, but "ant javadoc" is not. <reply> > Thanks <reply> > Vitor <other> <sig> ------------------------------------------------------------------ <sig> William W. Cohen “Would you drive a mime <sig> wcohen@cs.cmu.edu nuts if you played a <sig> http://www.wcohen.com blank audio tape <sig> Associate Research Professor full blast?” <sig> CALD, Carnegie-Mellon University - S. Wright Figure 1 - Excerpt from a labeled email message (a) Split the K last mail lines and retrieve the annotations Last K lines Ground truth annotations 29
  30. 30. 2. Problem Definition and Corpus A signature block is the set of lines, usually in the end of a message, that contain information about the sender, such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from famous persons and creative ASCII drawings are often present in this block also. An example of a signature block can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In Lines N Feature Patterns (b) feature extraction 30 Design
  31. 31. Design (c) SVM training and model generation nition and Corpus is the set of lines, usually in the end of a message, that contain information about the sender, e, affiliation, postal address, web address, email address, telephone number, etc. Quotes from reative ASCII drawings are often present in this block also. An example of a signature block lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 of text that were quoted from a preceding message (marked with the line label <reply>). In such lines reply lines. om: wcohen@cs.cmu.edu : Vitor Carvalho <vitor@cs.cmu.edu> 31 Feature matrix [KxN] Vector ground truth [K] + SVM training Model=
  32. 32. Design (c) SVM training and model generation Model ● Other ● Reply ● Signature Lines Classes pre-process Features 32
  33. 33. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction • Requirements • Design • Results 33
  34. 34. Results F1 = 2 Precision · Recall Precision + Recall 34 With annotated dataset Without annotated dataset Manual evaluation Contactive user base mailboxes
  35. 35. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 35
  36. 36. Conclusions • Academic • Papers: Mediaeval 2013 and ICMR SEWM, and Mediaeval 2014 on preparation. • UPC Pyxel framework foundations • Industrial • Contributions to Pixable in production servers: • Instagram integration • Photofeed Downloader • Mail signature detection: Proof of concept successful. • Work in the USA! 36
  37. 37. Thank you very much!! Q&A 37
  38. 38. BACKUP SLIDES 38
  39. 39. Design 39 (c) Sequential merging of mini-clusters Weighted modalities ● creation (or upload) time ● geolocation ● textual labels ● same user
  40. 40. Design 40 (c) Sequential merging of mini-clusters Geolocation (d=haversine)Time stamp (d=L1) Text labels (d=Jaccard) Same user (d=boolean)
  41. 41. Design 41 (c) Sequential merging of mini-clusters
  42. 42. Design 42 (c) Sequential merging of mini-clusters 42 Mean and std. deviation learned on pairs of photos within the same training event.
  43. 43. Design 43 (c) Sequential merging of mini-clusters 43 phi function
  44. 44. Design 44 (c) Sequential merging of mini-clusters decision threhold

×