What's in an Email Address? 
RFC2822 Em@il @ddresses for Mere Mortals 
Schalk W. Cronjé 
@ysb33r
Why This Topic? 
● Recurring bugs in software we build 
● Lack of understanding at all levels 
– Developers 
– Testers 
– ...
Content 
● Overview 
● Local-part 
● Domain-part 
● Valid or not? 
● The real world
Brave, brave RFC World 
RRFFCC22882211 RRFFCC11003344 
RRFFCC11003355 
RRFFCC22882222 
RRFFCC882211 
RRFFCC882222 
Domain ...
Address Format 
Modern format 
local-part @ domain-part 
Historic format (RFC821/RFC2821) 
source-route : local-part @ dom...
RFC2822 Local Parts 
● Unrestricted characters 
0..9 a..z A..Z ! # $ % & ' * + - / = ? ^ _ ` | { } ~ . 
● Quotable charact...
Local Payload 
● Routing characters 
– ! % have been used for local-routing in legacy 
systems, including UUCP and MHS. 
–...
Does Case Matter? 
● Case is ignored in domain 
ntaba.biz == ntaba.biz 
● Strictly-speaking case matters in local-parts 
s...
Does Size Matter? 
● RFC2821 places limitations on length of local-part and 
domain-part 
– 64 characters for local-part 
...
Domain Parts 
● Can either be a RFC1035 domain or an address literal 
● Valid characters for domain names: 
a..z A..Z 0..9...
Address Literals 
● Workarounds for when host names cannot 
be resolved. 
– @[protocol:host-address] 
– IPv4: @[192.1.1.1]...
International Domain Names 
● Domain names not representable in US-ASCII 
can be registered 
● Such domain names cannot be...
Valid or not? 
schalk_cronje@ntaba.biz 
● Valid even under strict RFC2822 
interpretation 
● Most punctuation are valid in...
Valid or not? 
schalk_cronje@[192.168.1.1] 
● Yes, the domain part is an address-literal 
● Acceptance of address-literals...
Valid or not? 
schalk_cronje@192.168.1.1 
● No, it is not an address-literal nor a valid 
domain name. 
● Some systems wil...
Valid or not? 
schalk_cronje@1967.com 
● Not valid according to RFC1035 
● Limitation lifted in RFC1123. 
[ RFC1123: 2.1 ]
Valid or not? 
schalk_cronje@#192168 
● Valid in RFC821 for compatibility with 
non-TCP/IP networks. 
● Outlawed by RFC282...
Valid or not? 
schalk_cronje@.ntaba.biz 
● No, domain-part may not start with a dot. 
[ RFC2822: 3.2.4 ]
Valid or not? 
schalk_cronje@ntaba.biz. 
● No, strictly RFC2822 states that domain-part 
may not end with a dot. 
● RFC103...
Valid or not? 
schalk_cronje@ntaba..biz. 
● No, consecutive dots are not allowed in 
domain parts. 
[ RFC2822: 3.2.4; RFC1...
Valid or not? 
● No. 
.schalk_cronje@ntaba.biz 
schalk..cronje@ntaba.biz 
– Local-parts may not start with a dot. 
– Conse...
Valid or not? 
schalk_cronje@lon_eng.ntaba.biz 
● No, _ is not valid in domain names 
● Some DNS servers will support this...
Valid or not? 
schalk_cronje@lon_eng@ntaba.biz 
● No, @ cannot be used unquoted in local 
parts 
“schalk_cronje@lon_eng”@n...
Local-part Quoting 
● Quoting should only be used where 
absolutely necessary 
● Where a quoted-form have an unquoted 
for...
Valid or not? 
<schalk_cronje@ntaba.biz> 
● No, this is an envelope for email addresses 
● The following is valid: 
“<scha...
Valid or not? 
schalk_O”cronje@ntaba.biz 
● No, the double quote is a quoting character.
Valid or not? 
schalk_O'cronje@ntaba.biz 
● Yes, apostrophe is valid in unquoted form
Valid or not? 
“schalk_O”cronje”@ntaba.biz 
● This is debatable 
● Neither RFC2821, nor RFC2822, is 
completely clear whet...
Valid or not? 
schalk_cronjé@ntaba.biz 
● Not at RFC2821/RFC2822 levels - contains 
at one least 8-bit character 
● Can be...
My 8-bit's Worth 
● Custom encoding is valid, when both the sender and 
receiver will know about the encoding 
– Intermedi...
The 8-bit Legacy 
● RFC822 was written in a 7-bit world 
– It can be misinterpreted as to 8-bit being legal. 
● Some MTAs ...
Valid or not? 
"`echo haX0r | /usr/bin/passwd root --stdin`"@ntaba.biz 
● Valid even under strict RFC2822 
interpretation ...
Valid or not? 
"@lon-eng,@scm-eng:schalk_cronje"@ntaba.biz 
● Valid even under strict RFC2822 
interpretation 
● Quoting a...
Valid or not? 
@lon-eng,@scm-eng:schalk_cronje@ntaba.biz 
● Valid even under strict RFC2822 
interpretation 
● This is an ...
Practical Validation 
● Address validation cannot purely be 
performed against the RFC 
● Context is very important 
● Val...
Validation Context 
● Context places additional demands on 
validation algorithms 
● Validation algorithms must be configu...
Pattern Matching 
● DOS-patterns (*?) is useful, but not good 
enough 
● Regex is a better way to perform complex 
pattern...
The *? Problem 
schalk*cronje@ntaba.biz 
● The above is a valid email address 
● Was the intention to filter for this exac...
Lists of Addresses 
● RFC2822 uses the comma for separating 
address lists in headers 
● A common misnomer is that it is e...
Real World Violations 
● Use of _ in domain-part 
● Domain part starts with dot 
● Domain part ends in dot 
● 4000 charact...
What can we do? 
● Developers should never make any 
assumptions as to what the customer might 
need or to what the custom...
Handling email addresses is an extraodinary 
complex matter for something very simple. 
Next time you enter an email addre...
Upcoming SlideShare
Loading in …5
×

RfC2822 for Mere Mortals

449 views
339 views

Published on

This is a presentation I did years ago, but I heard that there are still people using it as a reference. So here it is, slightly cleaned up. If you are writing systems that process email addresses in some form or anotehr you might want to read this.

Published in: Internet
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
449
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

RfC2822 for Mere Mortals

  1. 1. What's in an Email Address? RFC2822 Em@il @ddresses for Mere Mortals Schalk W. Cronjé @ysb33r
  2. 2. Why This Topic? ● Recurring bugs in software we build ● Lack of understanding at all levels – Developers – Testers – Support People ● Assumptions made, without reading RFCs ● Understanding RFCs are not straightforward – RTFM is difficult when TFM cannot be found ● We require a basic reference
  3. 3. Content ● Overview ● Local-part ● Domain-part ● Valid or not? ● The real world
  4. 4. Brave, brave RFC World RRFFCC22882211 RRFFCC11003344 RRFFCC11003355 RRFFCC22882222 RRFFCC882211 RRFFCC882222 Domain name specification. Restrictions on email addresses at protocol levels. Specifies layout of email transmitted over internet. Specifies format of email address. RRFFCC22004477 Encoding of 8-bit in RFC2822 header fields RRFFCC33449900 Encoding international domain names RRFFCC11112233 ((PPaarrttiiaallllyy uuppddaatteedd bbyy RRFFCC22882211)) Requirements for internet hosts
  5. 5. Address Format Modern format local-part @ domain-part Historic format (RFC821/RFC2821) source-route : local-part @ domain-part
  6. 6. RFC2822 Local Parts ● Unrestricted characters 0..9 a..z A..Z ! # $ % & ' * + - / = ? ^ _ ` | { } ~ . ● Quotable charactersq u( oted by “ ) < [ ( : @ ; ) ] > , non-ws-ctrl ● Illegal characters All 8-bit. ● Whitespace ws-ctrl illegal, only used for folding in headers space character is valid if quoted [ RFC2821: 4.1.2; RFC2822: 3.2, 3.4 ]
  7. 7. Local Payload ● Routing characters – ! % have been used for local-routing in legacy systems, including UUCP and MHS. – Can be used to bypass routing in mis-configured systems. ● Shell exploits – | / ` $ have been used to attempt remote command execution
  8. 8. Does Case Matter? ● Case is ignored in domain ntaba.biz == ntaba.biz ● Strictly-speaking case matters in local-parts schalk@ntaba.biz != ScHaLk@ntaba.biz – Most MTAs ignore case – RFC2821 discourages use of case as a distinguishing factor [ RFC2821: 2.4 ]
  9. 9. Does Size Matter? ● RFC2821 places limitations on length of local-part and domain-part – 64 characters for local-part – 255 characters for domain-part ● This is normally not a problem for messages transmitted across the internet, but can be problematic for in-house applications or encoded email addresses such as X.400. ● Many MTAs will now ignore this length restriction as long as the overall SMTP protocol line length restriction is not exceeded. [ RFC2821: 4.5.3.1 ]
  10. 10. Domain Parts ● Can either be a RFC1035 domain or an address literal ● Valid characters for domain names: a..z A..Z 0..9 - ● Subdomains separated by dot character. ● Subdomain may not start or end with dash. ● 255 characters max length. ● 63 characters max per subdomain. ● Cannot start or end in dot. ● Restriction of subdomain starting with digit have been relaxed.
  11. 11. Address Literals ● Workarounds for when host names cannot be resolved. – @[protocol:host-address] – IPv4: @[192.1.1.1] – IPv6: @[IPv6:fe80::a00:20ff:fec2:2ef4] ● Protocol must be registered with ICANN. [ RFC2821: 4.1.3 ]
  12. 12. International Domain Names ● Domain names not representable in US-ASCII can be registered ● Such domain names cannot be handles by DNS or existing protocols ● RFC 3490 describes the encoding/decoding of such domain names from presentation to protocol: exämple.com => xn--example-cua.com ● Potential for phising
  13. 13. Valid or not? schalk_cronje@ntaba.biz ● Valid even under strict RFC2822 interpretation ● Most punctuation are valid in local part, including: {$cha?k*cr%nje}@ntaba.biz
  14. 14. Valid or not? schalk_cronje@[192.168.1.1] ● Yes, the domain part is an address-literal ● Acceptance of address-literals should be configurable – They can be security risks – RFC2821 prefers usage of MX-based deliveries.
  15. 15. Valid or not? schalk_cronje@192.168.1.1 ● No, it is not an address-literal nor a valid domain name. ● Some systems will attempt to deliver this by passing the 192.168.1.1 to the domain resolving subsystem, which in return will simply return the IP address. – This violates RFC1123 – This is a potential security risk. [ RFC1123: 2.1 ]
  16. 16. Valid or not? schalk_cronje@1967.com ● Not valid according to RFC1035 ● Limitation lifted in RFC1123. [ RFC1123: 2.1 ]
  17. 17. Valid or not? schalk_cronje@#192168 ● Valid in RFC821 for compatibility with non-TCP/IP networks. ● Outlawed by RFC2821. ● Not supported by any modern MTA. [ RFC821: 4.1.2; RFC2821: F.4 ]
  18. 18. Valid or not? schalk_cronje@.ntaba.biz ● No, domain-part may not start with a dot. [ RFC2822: 3.2.4 ]
  19. 19. Valid or not? schalk_cronje@ntaba.biz. ● No, strictly RFC2822 states that domain-part may not end with a dot. ● RFC1034 use the dot-ending to indicate absolute domains (FQDN) in resource records. ● Most systems will accept, resolve and deliver this [ RFC2822: 3.2.4; RFC1034: 3.1]
  20. 20. Valid or not? schalk_cronje@ntaba..biz. ● No, consecutive dots are not allowed in domain parts. [ RFC2822: 3.2.4; RFC1034: 3.1]
  21. 21. Valid or not? ● No. .schalk_cronje@ntaba.biz schalk..cronje@ntaba.biz – Local-parts may not start with a dot. – Consecutive dots are not allowed in local parts. ● Pragmatically, many known MTAs don’t care [ RFC2822: 3.2.4]
  22. 22. Valid or not? schalk_cronje@lon_eng.ntaba.biz ● No, _ is not valid in domain names ● Some DNS servers will support this. ● Some sites do use th_e for internal systems. ● It remains illegal for internet operations [ RFC2821: 4.1.3 ]
  23. 23. Valid or not? schalk_cronje@lon_eng@ntaba.biz ● No, @ cannot be used unquoted in local parts “schalk_cronje@lon_eng”@ntaba.biz schalk_cronje@lon_eng@ntaba.biz [ RFC2822: 3.2.5, 3.4 ]
  24. 24. Local-part Quoting ● Quoting should only be used where absolutely necessary ● Where a quoted-form have an unquoted form... – The two forms are equivalent – The unquoted form should be used for transmission ● Quoting is performed by enclosing local-part in quotes or preceding a character by backslash. [ RFC2821: 4.1.2 ]
  25. 25. Valid or not? <schalk_cronje@ntaba.biz> ● No, this is an envelope for email addresses ● The following is valid: “<schalk_cronje>”@ntaba.biz
  26. 26. Valid or not? schalk_O”cronje@ntaba.biz ● No, the double quote is a quoting character.
  27. 27. Valid or not? schalk_O'cronje@ntaba.biz ● Yes, apostrophe is valid in unquoted form
  28. 28. Valid or not? “schalk_O”cronje”@ntaba.biz ● This is debatable ● Neither RFC2821, nor RFC2822, is completely clear whether the double quote is valid if escaped Note that the backslash, "", is a quote character, which is used to indicate that the next character is to be used literally [ RFC2821: 4.1.2 ]
  29. 29. Valid or not? schalk_cronjé@ntaba.biz ● Not at RFC2821/RFC2822 levels - contains at one least 8-bit character ● Can be completely valid at the presentation level – Email client can take care of translation between a user-readable form and a level suitable for transmission ● There is NO agreed standard for encoding non-US-ASCII in local parts
  30. 30. My 8-bit's Worth ● Custom encoding is valid, when both the sender and receiver will know about the encoding – Intermediate relays will simply pass it through ● UTF-7: schalk+AF8-cronj+AOk@ntaba.biz ● RFC2047 (adapted): =?UTF-8?Q?schalk_cronj=C3=A9?=@ntaba.biz ● Storing email addresses with 8-bit content in XML is problematic – requires encoding.
  31. 31. The 8-bit Legacy ● RFC822 was written in a 7-bit world – It can be misinterpreted as to 8-bit being legal. ● Some MTAs will actually transmit 8-bit characters in email addresses ● In-house systems might have a requirement for 8-bit ● An email must be able to allow, block, quarantine or filter on 8-bit characters.
  32. 32. Valid or not? "`echo haX0r | /usr/bin/passwd root --stdin`"@ntaba.biz ● Valid even under strict RFC2822 interpretation ● Quoting allows for spaces and | to be used ● Imagine if this was passed to a shell script in a badly configured system!
  33. 33. Valid or not? "@lon-eng,@scm-eng:schalk_cronje"@ntaba.biz ● Valid even under strict RFC2822 interpretation ● Quoting allows fo@r :, to be used
  34. 34. Valid or not? @lon-eng,@scm-eng:schalk_cronje@ntaba.biz ● Valid even under strict RFC2822 interpretation ● This is an example of a source-route. ● Usage is deprecated ● It is best to remove them, before relaying. [ RFC2821: 3.7, C, F.2 ]
  35. 35. Practical Validation ● Address validation cannot purely be performed against the RFC ● Context is very important ● Validation at user-level will differ from that at protocol-level. RFC rule of thum: bBe as lenient as possible in what you accept, but as strict as possible in what you send out.
  36. 36. Validation Context ● Context places additional demands on validation algorithms ● Validation algorithms must be configurable – Allows for specifics in user environments – Allows for adaptability within various code subsystems
  37. 37. Pattern Matching ● DOS-patterns (*?) is useful, but not good enough ● Regex is a better way to perform complex pattern matches – Not all users understand regex – It is therefore good to give users the option of an input notation, but use regex internally to perform the matching
  38. 38. The *? Problem schalk*cronje@ntaba.biz ● The above is a valid email address ● Was the intention to filter for this exact address? ● Or was the intention to filter for addresses such as schalkRfcDudecronje@ntaba.biz ● Regex: – schalk*cronje@ntaba.biz – schalk.*cronje@ntaba.biz
  39. 39. Lists of Addresses ● RFC2822 uses the comma for separating address lists in headers ● A common misnomer is that it is easy to delimit addresses usin;g o r ,. ● Although it is possible, it is no trivial task to parse lists such as schalk@ntaba.biz, “s,c,h,a,l,k”@ntaba.biz ,s,cha,lk@ntaba.biz , “sch”,alk”@ntaba.biz
  40. 40. Real World Violations ● Use of _ in domain-part ● Domain part starts with dot ● Domain part ends in dot ● 4000 characters in local part ● 8-bit characters in local-part
  41. 41. What can we do? ● Developers should never make any assumptions as to what the customer might need or to what the customer's infrastructure might be – Code to be as RFC-compliant as possible, but allow for configurability as and when needed. – User interfaces should be context-sensitive. ● Testers should ensure that nobody makes such assumptions
  42. 42. Handling email addresses is an extraodinary complex matter for something very simple. Next time you enter an email address... ...you might not want to take it for granted Questions ?

×