Email as a datasource for applications


Published on

Our email contains years of important personal information: key contacts, versions of documents, discussions around important projects or deals. It's a datasource that too often ignored by developers and for those brave ones who don't, they're in for a bumpy ride dealing with the tedious details of arcane protocols.

The presentation will be about the potential use cases for email data, the varies ways to access it, the common pitfalls and different tools targeted at this.

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Email as a datasource for applications

  1. 1. Email as a datasource for appsBruno Morency @brunomorency
  2. 2. • Overview of the technologies that make emailWhat this • How your apps can fit in that picturepresentationwill be about • An intro to IMAP and message bodies with common pitfalls. • Overview of Context.IO
  3. 3. “The reports of my death weregreatly exaggerated” - Email
  4. 4. 2.9 billion in 2010
  5. 5. 3.8 billion by 2014 180B messages/day 340M tweets/day
  6. 6. group collaboration task management document collaboration customer support app notificationEmail is a communications system project management client relationship applicant tracking photo sharing bug tracking Nigerian extortion
  7. 7. Overview of protocols and standardsor “which acronym does what”
  8. 8. Protocol forSMTP transmission of emails across the internet
  9. 9. • Message transport, nothing to do with content• Defines the envelope (sender and recipients)• Does not define the message headers• Chain from client to recipient’s server
  10. 10. DKIM Standards for sender signatures and preventSPF sender spoofing
  11. 11. • Complement spam filters• Opens the message and checks headers to decide if it will deliver it to the inbox• As a receiver, it’s one more way to block spam.• As a sender, it’s a tool you must master to avoid ending up in the spam folder• Email deliverability is an industry by itself
  12. 12. Protocol to allow aIMAP client to access and manipulate emails on a receiving server.
  13. 13. • All messages and their folder organization are on the server• Clients poll to know about with new messages that arrive or actions made through other clients• While it doesn’t send messages, clients usually store sent messages through it
  14. 14. Protocol to allow aPOP client to retrieve emails from a receiving server.
  15. 15. • The server only serves as a temporary buffer for received messages• Classification and message state is purely a client-side concept• Many clients can access the same account but can’t coordinate anything
  16. 16. RFC-822 Standards definingMIME headers and the actual body of the messageMultipart
  17. 17. Where does your app fit in there?
  18. 18. Typical 1. Send emails to usersthings apps 2. Receive emails from userswant to do 3. Access emails users send andwith email receive.
  19. 19. group collaboration task management document collaboration customer support app notificationEmail is a communications system project management client relationship applicant tracking photo sharing bug tracking Nigerian extortion
  20. 20. Introduction to IMAP
  21. 21. Me: “App Developer, meet IMAP. IMAP,meet App Developer.”IMAP: “I don’t give a sh*t about you, AppDeveloper. Go away!”
  22. 22. 1. Connect to the IMAP server and authenticate>"openssl"s_client"-crlf"-connect"["a"few"lines"of"SSL"and"server"info"]*"OK"Gimap"ready"for"requests"from""zw8i38638oab.180a001"LOGIN"username"password*"CAPABILITY"IMAP4rev1"UNSELECT"IDLE"NAMESPACE"QUOTA"ID"XLIST"CHILDREN"X-GM-EXT-1"UIDPLUS"COMPRESS=DEFLATEa001"OK"username"authenticated"(Success)
  23. 23. 3. LIST mailboxesa002"LIST"""""*"*"LIST"(HasChildren)""/"""Drive"*"LIST"(Noselect"HasChildren)""/"""Drive/Dev"*"LIST"(HasNoChildren)""/"""Drive/Dev/A"*"LIST"(HasNoChildren)""/"""Drive/Dev/B"*"LIST"(HasNoChildren)""/"""INBOX"*"LIST"(HasNoChildren)""/"""Archive"*"LIST"(HasNoChildren)""/"""Sent"Mail"*"LIST"(HasNoChildren)""/"""Drafts"*"LIST"(HasNoChildren)""/"""Spam"*"LIST"(HasChildren)""/"""My"folder"*"LIST"(HasNoChildren)""/"""My"folder/label"A"*"LIST"(HasNoChildren)""/"""My"folder/label"B"a002"OK"Success
  24. 24. 4. SELECT a mailboxa003"SELECT""Drive/Dev"*"FLAGS"(Answered"Flagged"Draft"Deleted"Seen)*"OK"[PERMANENTFLAGS"(Deleted"Seen"*)]"Limited*"OK"[UIDVALIDITY"614213447]"UIDs"valid*"OK"[UIDNEXT"1042]"Predicted"next"UID*"84"EXISTS*"3"RECENTa003"OK"[READ-WRITE]"Drive/Dev"selected."(Success)
  25. 25. 4. FETCH messagesa013"FETCH"80:81"(FLAGS"BODY[HEADER.FIELDS"(DATE"FROM"SUBJECT)])*"80"FETCH"(FLAGS"(Seen)"BODY[HEADER.FIELDS"(DATE"FROM"SUBJECT)]"{101}Date:"Mon,"26"Jul"2012"14:05:16"-0400From:"Dominik"Gehl"<>Subject:"test)*"81"FETCH"(FLAGS"(Seen)"BODY[HEADER.FIELDS"(DATE"FROM"SUBJECT)]"{115}From:"Dominik"Gehl"<>Subject:"Payment"required"errorDate:"Tue,"27"Mar"2012"09:28:01"-0400)a013"OK"Success
  26. 26. 4. FLAG a message as reada015"STORE"81"+FLAGS"(Seen)*"81"FETCH"(FLAGS"())a015"OK"Success
  27. 27. 4. CLOSE the mailbox and LOGOUT the accounta023"CLOSEa023"OK"Returned"to"authenticated"state."(Success)a024"LOGOUT*"BYE"LOGOUT"Requesteda024"OK"LOGOUT"completed."(Success)
  28. 28. That didn’t seem so bad!
  29. 29. • There is no persistent primary key you can rely on to retrieve aPitfall #1: messageIdentifying • Message Sequence Numbermessages • Unique Identifier
  30. 30. • Ascending and contiguous sequence. If the mailbox saysSequence 11 exist, you can fetch messages with seq. nb. 1 to 11Number • They can (and will) be reassigned during a session.
  31. 31. • 32-bit value uniquely identifying a message within a mailbox. • Ascending but not necessarilyUnique incremental nor contiguous.Identifier • If you move a message to(aka UID) another mailbox, it will get a new UID in that new mailbox • Changes if the mailbox UIDVALIDITY changes
  32. 32. • Only the INBOX mailbox has a special meaning.Pitfall #2: • Everything else has theSpecial-use meaning the client wants it to have (which may not be infolders (or English)lack thereof) • Gmail has XLIST which add mailbox attributes (Inbox, Sent, Starred, ...)
  33. 33. Pitfall #3: • Anything that searches or fetches messages is doneNo data until within the context of a mailboxyou select a • Can’t get account-wide list ofmailbox messages
  34. 34. • Its an extension that isnt widelyPitfall #4: available and even then, restricted to a single mailboxThreads • X-GM-THREAD-ID to the rescue
  35. 35. • You need to get and parse the body structurePitfall #5:Attachment? • As far as IMAP is concerned, an attachment is the same thing as any other MIME part
  36. 36. • Setting the Deleted flag marks the message for deletion but it’sPitfall #6: still thereDeleting • EXPUNGE will remove allmessages messages with Deleted flag from the currently selected mailbox
  37. 37. • Purging client side message list is a PITA.Pitfall #7: • Server wont tell you whichKeeping up messages were deleted, you just have to figure out somewith deleted have been and find which onemessages were. • Its the same if you want to keep track of Seen flag.
  38. 38. The joys of parsing email messagesYé! I fetched a message! Now what do I do?
  39. 39. A simple messageDelivered-To:"sysadmin@context.ioReturn-Path:"<>Received:"by""with"SMTP"id"n8mr410292qct.135.1336583200550;""""""""Wed,"09"May"2012"10:06:40"-0700"(PDT)Received:"from""("[])""""""""by""with"ESMTP"id"b2si1383913qcd.195.2012.05.06.40;""""""""Wed,"09"May"2012"10:06:40"-0700"(PDT)Date:"Wed,"9"May"2012"17:06:39"+0000"(UTC)From:"Amazon"EC2"Notification"<>To:""Sys"Admin""<>Cc:""Alerts""<>Message-ID:"<>Subject:"Notice:"Amazon"EC2"Instance"scheduled"for"retirementMIME-Version:"1.0Content-Type:"text/plain;"charset=UTF-8Content-Transfer-Encoding:"7bitHello,"...
  40. 40. A message with an attachmentMIME-Version:"1.0Content-Type:"multipart/mixed;"boundary=_MYBOUNDARY_--_MYBOUNDARY_Content-Type:"text/plainThis"is"the"body"of"the"message.--_MYBOUNDARY_Content-Type:"image/jpeg;"name="IMG_713.jpg"Content-Disposition:"attachment;"filename="IMG_713.jpg";"size=6379099;Content-Transfer-Encoding:"base64/9j/4AAQSkZJRgABAgAAZABkAAD/7AARRHVja3kAAQAEAAAAZA+4AJkFkb2JlAGTAAAAAAQMAAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMD8IAEQgAegG1AwERAIRAQMRAfEASMAAQACAwEBAQEBAAAAAAAAAAAHCAUGCQQDAgEKAQEAAgIDAQAAAAAAAAAAAAAABgcFCAEDBAIQAAEEAgEBBgQGAQUAAAAAAAUCAwQGAQcAEjBQERMUFRBgFhcgQHAhNAhBMSIjJDURAAIC==--_MYBOUNDARY_--
  41. 41. A message with alternative partsMIME-Version:"1.0Content-Type:"multipart/alternative;"boundary=_MYBOUNDARY_--_MYBOUNDARY_Content-Type:"text/plain;"charset="us-ascii"Content-Transfer-Encoding:"quoted-printableHello!"Here’s"a"message"with"*rich*"text--_MYBOUNDARY_Content-Type:"text/html;"charset="us-ascii"Content-Transfer-Encoding:"quoted-printable<html><body>Hello!"Here’s"a"message"with"<b>rich</b>"text</body></html>--_MYBOUNDARY_--
  42. 42. Pitfall #1: • Great to track messages but spec says its optional.Message-IDis optional ... and it’s not always there.
  43. 43. • Refers to Message-ID of other emailsPitfall #2:In-Reply-To • Very useful to rebuild threadsReferences ... until an Outlook user jumps in and replaces it with their own Thread. Topic and Thread.Index headers
  44. 44. Pitfall #3: • Content-Disposition tells youAttachments attachment or inline. Should signature image be consideredare what you as a file attachment?decide them • TNEF attachmentsto be
  45. 45. webhooksthreads contacts messages files
  46. 46. Demo of Context.IO console