This document discusses Unicode transformation attacks and provides examples of how applications can be manipulated. It covers how Unicode lets systems support multiple languages, how characters are assigned unique numbers, and examples of lookalike characters. It also describes how data can be encoded to disguise malicious code, examples of bypassing filters by changing case or using lookalikes, and real world examples like compromising Spotify and Twitter accounts by creating usernames with special characters. The document recommends ways to prevent these issues, like canonicalizing input and performing security checks after decoding.
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Unicode
1.
2.
3. • What is Unicode?
• How Apps deal with Unicode
• Unicode Transformation Attack
• Real World Examples
• How To Manipulate Applications
• Remediation
5. • Unicode lets computer systems support more
languages, allowing for world wide use
• Stores characters with multiple bytes
• It provides a unique number for every character,
no matter what the platform, no matter what the
program, no matter what the language
7. • Classic example: c0rn ;)
o=U+006f, ο=U+03bf, о=U+043e
• Latin Small o, Greek Small O, Cyrillic Small
Letter o
• Searches for the above can turn up different
results in Google
8. • Data can be entered using Unicode to disguise
malicious code and permit various Unicode
transformation issues, such as Best-Fit
Mapping
9. • Occurs when a character X gets transformed to
an entirely different character Y.
• Character X in the source encoding doesn't exist
in the destination encoding, so the App
attempts to find a best match.
• So the characters are transcoded between
Unicode and another encoding language.
11. • Lowercase operation on the input after
filtering.
• The string "script" is prevented by the filter,
but the string "scrİpt" is allowed.
• Possibility of using many lookalikes:
AΑАᐱᗅᗋᗩᴀᴬ⍲A
12. • Unicode character U+FF1C FULLWIDTH LESS-THAN SIGN
(<) transformed into U+003C LESS-THAN SIGN (<) due
to best-fit.
• Unicode Transformation for Cross-Site Scripting or SQL
Injection;
• %C0%BE = >
• %C0%BC = <
13. • URL encoded GET input locale is set to
acux5291%C0%BEz1%C0%BCz2a%90bcxuca5291
• Here is a part of the HTTP request.
https://vendors-unit.prudential.com/OA_HTML/help?locale=
acux5291%C0%BEz1%C0%BCz2a%90bcxuca5291
&group=FND:LIBRARY:US&topic=US/FND/@ICX_FWK_LABS_H
OME_PAGE
14. • In the HTTP response, this character was converted to
the short form (<)
<input type="hidden" value="acux5291>z1<z2a�bcxuca5291" name="group">
• Unicode character
acux5291%C0%BEz1%C0%BCz2a%90bcxuca5291
is transformed into
acux5291>z1<z2a�bcxuca5291
16. • Supported Unicode usernames.
• Existing user account bigbird hijacked.
• Attacker created a new Spotify account with username
ᴮᴵᴳᴮᴵᴿᴰ (string u’u1d2eu1d35u1d33u1d2eu1d35u1d3fu1d30′).
• Send a request for a password reset for your new account.
• A password reset link is sent to the email for your new account. Use
it to change the password.
• Instead of logging into that account with username ᴮᴵᴳᴮᴵᴿᴰ, logged
with username bigbird with the new password.
• Account compromised.
17. • The canonical_username function only implemented
the first time. Function like “toLower” implemented.
• Users signs up with username BigBird, normalized to
bigbird.
• Another user signs up as ᴮᴵᴳᴮᴵᴿᴰ, which also gets
normalized to BIGBIRD the first time, but bigbird the
next time.
• ᴮᴵᴳᴮᴵᴿᴰ requests a password reset email, but with it can
reset bigbird’s account.
18. • Use Canonicalizing
– Important aspect of input sanitization
– Converting data with various possible
representations into a standard "canonical"
representation deemed acceptable by the
application mapping all characters to lower case
– Treat “BigBird”, “ ᴮᴵᴳᴮᴵᴿᴰ ” and “bigbird” as the same
by Canonicalizing as they would all be mapped to
‘bigbird’
19. • The vulnerability was noticed when the compromised
accounts started RETWEETING a tweet with a "♥" symbol
that was followed by a string of code/Parameter.
• Users didn’t even have to click on the tweet sent out by the
Twitter account @derGeruhn. Just the act of viewing the
tweet would cause the user to automatically retweet
• Affected accounts also involuntarily re-tweeted a cross-site
scripting (XSS) code as a result of the vulnerability
• That tweet hit the max re-tweet
over 84,000 times
20. • TweetDeck didn’t escape HTML-chars if a Unicode-
char is in the tweet -text
• The Unicode-Heart (which gets replaced with an
image by TweetDeck) somehow prevents the Tweet
from being HTML-escaped.
• TweetDeck was not supposed to display this as an
image.
Because it's simple Text,
which should be escaped to
"&hearts;".
21. 1. When converting strings used in security-
sensitive operations, use documented options
which prevent the use of best-fit mappings.
2. A suitable canonical form should be chosen and
all user input canonicalized into that form before
any authorization decisions are performed.
3. Security checks should be carried out after UTF-
8 decoding is completed.
X is only allowed if X==canonical(X)
22. • Here’s a chart with all the new emoji in yellow
including my favorite “1F595” which will be a
hit on Twitter.
• http://www.unicode.org/charts/PDF/Unicode-
7.0/U70-1F300.pdf