Your SlideShare is downloading. ×
RfC2822 for Mere Mortals
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

RfC2822 for Mere Mortals


Published on

This is a presentation I did years ago, but I heard that there are still people using it as a reference. So here it is, slightly cleaned up. If you are writing systems that process email addresses in …

This is a presentation I did years ago, but I heard that there are still people using it as a reference. So here it is, slightly cleaned up. If you are writing systems that process email addresses in some form or anotehr you might want to read this.

Published in: Internet

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. What's in an Email Address? RFC2822 Em@il @ddresses for Mere Mortals Schalk W. Cronjé @ysb33r
  • 2. Why This Topic? ● Recurring bugs in software we build ● Lack of understanding at all levels – Developers – Testers – Support People ● Assumptions made, without reading RFCs ● Understanding RFCs are not straightforward – RTFM is difficult when TFM cannot be found ● We require a basic reference
  • 3. Content ● Overview ● Local-part ● Domain-part ● Valid or not? ● The real world
  • 4. Brave, brave RFC World RRFFCC22882211 RRFFCC11003344 RRFFCC11003355 RRFFCC22882222 RRFFCC882211 RRFFCC882222 Domain name specification. Restrictions on email addresses at protocol levels. Specifies layout of email transmitted over internet. Specifies format of email address. RRFFCC22004477 Encoding of 8-bit in RFC2822 header fields RRFFCC33449900 Encoding international domain names RRFFCC11112233 ((PPaarrttiiaallllyy uuppddaatteedd bbyy RRFFCC22882211)) Requirements for internet hosts
  • 5. Address Format Modern format local-part @ domain-part Historic format (RFC821/RFC2821) source-route : local-part @ domain-part
  • 6. RFC2822 Local Parts ● Unrestricted characters 0..9 a..z A..Z ! # $ % & ' * + - / = ? ^ _ ` | { } ~ . ● Quotable charactersq u( oted by “ ) < [ ( : @ ; ) ] > , non-ws-ctrl ● Illegal characters All 8-bit. ● Whitespace ws-ctrl illegal, only used for folding in headers space character is valid if quoted [ RFC2821: 4.1.2; RFC2822: 3.2, 3.4 ]
  • 7. Local Payload ● Routing characters – ! % have been used for local-routing in legacy systems, including UUCP and MHS. – Can be used to bypass routing in mis-configured systems. ● Shell exploits – | / ` $ have been used to attempt remote command execution
  • 8. Does Case Matter? ● Case is ignored in domain == ● Strictly-speaking case matters in local-parts != – Most MTAs ignore case – RFC2821 discourages use of case as a distinguishing factor [ RFC2821: 2.4 ]
  • 9. Does Size Matter? ● RFC2821 places limitations on length of local-part and domain-part – 64 characters for local-part – 255 characters for domain-part ● This is normally not a problem for messages transmitted across the internet, but can be problematic for in-house applications or encoded email addresses such as X.400. ● Many MTAs will now ignore this length restriction as long as the overall SMTP protocol line length restriction is not exceeded. [ RFC2821: ]
  • 10. Domain Parts ● Can either be a RFC1035 domain or an address literal ● Valid characters for domain names: a..z A..Z 0..9 - ● Subdomains separated by dot character. ● Subdomain may not start or end with dash. ● 255 characters max length. ● 63 characters max per subdomain. ● Cannot start or end in dot. ● Restriction of subdomain starting with digit have been relaxed.
  • 11. Address Literals ● Workarounds for when host names cannot be resolved. – @[protocol:host-address] – IPv4: @[] – IPv6: @[IPv6:fe80::a00:20ff:fec2:2ef4] ● Protocol must be registered with ICANN. [ RFC2821: 4.1.3 ]
  • 12. International Domain Names ● Domain names not representable in US-ASCII can be registered ● Such domain names cannot be handles by DNS or existing protocols ● RFC 3490 describes the encoding/decoding of such domain names from presentation to protocol: exä => ● Potential for phising
  • 13. Valid or not? ● Valid even under strict RFC2822 interpretation ● Most punctuation are valid in local part, including: {$cha?k*cr%nje}
  • 14. Valid or not? schalk_cronje@[] ● Yes, the domain part is an address-literal ● Acceptance of address-literals should be configurable – They can be security risks – RFC2821 prefers usage of MX-based deliveries.
  • 15. Valid or not? schalk_cronje@ ● No, it is not an address-literal nor a valid domain name. ● Some systems will attempt to deliver this by passing the to the domain resolving subsystem, which in return will simply return the IP address. – This violates RFC1123 – This is a potential security risk. [ RFC1123: 2.1 ]
  • 16. Valid or not? ● Not valid according to RFC1035 ● Limitation lifted in RFC1123. [ RFC1123: 2.1 ]
  • 17. Valid or not? schalk_cronje@#192168 ● Valid in RFC821 for compatibility with non-TCP/IP networks. ● Outlawed by RFC2821. ● Not supported by any modern MTA. [ RFC821: 4.1.2; RFC2821: F.4 ]
  • 18. Valid or not? ● No, domain-part may not start with a dot. [ RFC2822: 3.2.4 ]
  • 19. Valid or not? ● No, strictly RFC2822 states that domain-part may not end with a dot. ● RFC1034 use the dot-ending to indicate absolute domains (FQDN) in resource records. ● Most systems will accept, resolve and deliver this [ RFC2822: 3.2.4; RFC1034: 3.1]
  • 20. Valid or not? ● No, consecutive dots are not allowed in domain parts. [ RFC2822: 3.2.4; RFC1034: 3.1]
  • 21. Valid or not? ● No. – Local-parts may not start with a dot. – Consecutive dots are not allowed in local parts. ● Pragmatically, many known MTAs don’t care [ RFC2822: 3.2.4]
  • 22. Valid or not? ● No, _ is not valid in domain names ● Some DNS servers will support this. ● Some sites do use th_e for internal systems. ● It remains illegal for internet operations [ RFC2821: 4.1.3 ]
  • 23. Valid or not? ● No, @ cannot be used unquoted in local parts “schalk_cronje@lon_eng” [ RFC2822: 3.2.5, 3.4 ]
  • 24. Local-part Quoting ● Quoting should only be used where absolutely necessary ● Where a quoted-form have an unquoted form... – The two forms are equivalent – The unquoted form should be used for transmission ● Quoting is performed by enclosing local-part in quotes or preceding a character by backslash. [ RFC2821: 4.1.2 ]
  • 25. Valid or not? <> ● No, this is an envelope for email addresses ● The following is valid: “<schalk_cronje>”
  • 26. Valid or not? schalk_O” ● No, the double quote is a quoting character.
  • 27. Valid or not? schalk_O' ● Yes, apostrophe is valid in unquoted form
  • 28. Valid or not? “schalk_O”cronje” ● This is debatable ● Neither RFC2821, nor RFC2822, is completely clear whether the double quote is valid if escaped Note that the backslash, "", is a quote character, which is used to indicate that the next character is to be used literally [ RFC2821: 4.1.2 ]
  • 29. Valid or not? schalk_cronjé ● Not at RFC2821/RFC2822 levels - contains at one least 8-bit character ● Can be completely valid at the presentation level – Email client can take care of translation between a user-readable form and a level suitable for transmission ● There is NO agreed standard for encoding non-US-ASCII in local parts
  • 30. My 8-bit's Worth ● Custom encoding is valid, when both the sender and receiver will know about the encoding – Intermediate relays will simply pass it through ● UTF-7: ● RFC2047 (adapted): =?UTF-8?Q?schalk_cronj=C3=A9? ● Storing email addresses with 8-bit content in XML is problematic – requires encoding.
  • 31. The 8-bit Legacy ● RFC822 was written in a 7-bit world – It can be misinterpreted as to 8-bit being legal. ● Some MTAs will actually transmit 8-bit characters in email addresses ● In-house systems might have a requirement for 8-bit ● An email must be able to allow, block, quarantine or filter on 8-bit characters.
  • 32. Valid or not? "`echo haX0r | /usr/bin/passwd root --stdin`" ● Valid even under strict RFC2822 interpretation ● Quoting allows for spaces and | to be used ● Imagine if this was passed to a shell script in a badly configured system!
  • 33. Valid or not? "@lon-eng,@scm-eng:schalk_cronje" ● Valid even under strict RFC2822 interpretation ● Quoting allows fo@r :, to be used
  • 34. Valid or not? @lon-eng, ● Valid even under strict RFC2822 interpretation ● This is an example of a source-route. ● Usage is deprecated ● It is best to remove them, before relaying. [ RFC2821: 3.7, C, F.2 ]
  • 35. Practical Validation ● Address validation cannot purely be performed against the RFC ● Context is very important ● Validation at user-level will differ from that at protocol-level. RFC rule of thum: bBe as lenient as possible in what you accept, but as strict as possible in what you send out.
  • 36. Validation Context ● Context places additional demands on validation algorithms ● Validation algorithms must be configurable – Allows for specifics in user environments – Allows for adaptability within various code subsystems
  • 37. Pattern Matching ● DOS-patterns (*?) is useful, but not good enough ● Regex is a better way to perform complex pattern matches – Not all users understand regex – It is therefore good to give users the option of an input notation, but use regex internally to perform the matching
  • 38. The *? Problem schalk* ● The above is a valid email address ● Was the intention to filter for this exact address? ● Or was the intention to filter for addresses such as ● Regex: – schalk* – schalk.*
  • 39. Lists of Addresses ● RFC2822 uses the comma for separating address lists in headers ● A common misnomer is that it is easy to delimit addresses usin;g o r ,. ● Although it is possible, it is no trivial task to parse lists such as, “s,c,h,a,l,k” ,s,cha, , “sch”,alk”
  • 40. Real World Violations ● Use of _ in domain-part ● Domain part starts with dot ● Domain part ends in dot ● 4000 characters in local part ● 8-bit characters in local-part
  • 41. What can we do? ● Developers should never make any assumptions as to what the customer might need or to what the customer's infrastructure might be – Code to be as RFC-compliant as possible, but allow for configurability as and when needed. – User interfaces should be context-sensitive. ● Testers should ensure that nobody makes such assumptions
  • 42. Handling email addresses is an extraodinary complex matter for something very simple. Next time you enter an email address... might not want to take it for granted Questions ?