Unicode

• What is Unicode?
• How Apps deal with Unicode
• Unicode Transformation Attack
• Real World Examples
• How To Manipulate Applications
• Remediation

• Unicode lets computer systems support more
languages, allowing for world wide use
• Stores characters with multiple bytes
• It provides a unique number for every character,
no matter what the platform, no matter what the
program, no matter what the language

• Every character has a unique number
• A = U+0041
• < = U+003C

• Classic example: c0rn ;)
o=U+006f, ο=U+03bf, о=U+043e
• Latin Small o, Greek Small O, Cyrillic Small
Letter o
• Searches for the above can turn up different
results in Google

• Data can be entered using Unicode to disguise
malicious code and permit various Unicode
transformation issues, such as Best-Fit
Mapping

• Occurs when a character X gets transformed to
an entirely different character Y.
• Character X in the source encoding doesn't exist
in the destination encoding, so the App
attempts to find a best match.
• So the characters are transcoded between
Unicode and another encoding language.

• Lowercase operation on the input after
filtering.
• The string "script" is prevented by the filter,
but the string "scrİpt" is allowed.
• Possibility of using many lookalikes:
AΑАᐱᗅᗋᗩᴀᴬ⍲Ａ

• Unicode character U+FF1C FULLWIDTH LESS-THAN SIGN
(＜) transformed into U+003C LESS-THAN SIGN (<) due
to best-fit.
• Unicode Transformation for Cross-Site Scripting or SQL
Injection;
• %C0%BE = >
• %C0%BC = <

• URL encoded GET input locale is set to
acux5291%C0%BEz1%C0%BCz2a%90bcxuca5291
• Here is a part of the HTTP request.
https://vendors-unit.prudential.com/OA_HTML/help?locale=
&group=FND:LIBRARY:US&topic=US/FND/@ICX_FWK_LABS_H
OME_PAGE

• In the HTTP response, this character was converted to
the short form (<)
<input type="hidden" value="acux5291>z1<z2a�bcxuca5291" name="group">
• Unicode character
is transformed into
acux5291>z1<z2a�bcxuca5291

• ?locale=%c0%bcscript%3E&group=FND:LIBRARY:US&topic=US/FND/@ICX
_FWK_LABS_HOME_PAGE
• ?locale=%3E&group=FND:LIBRARY:US&topic=US/FND/@ICX_FWK_LABS_H
OME_PAGE
• ?locale=%c0%bcscript/%3E&group=FND:LIBRARY:US&topic=US/FND/@IC
X_FWK_LABS_HOME_PAGE

• Supported Unicode usernames.
• Existing user account bigbird hijacked.
• Attacker created a new Spotify account with username
ᴮᴵᴳᴮᴵᴿᴰ (string u’u1d2eu1d35u1d33u1d2eu1d35u1d3fu1d30′).
• Send a request for a password reset for your new account.
• A password reset link is sent to the email for your new account. Use
it to change the password.
• Instead of logging into that account with username ᴮᴵᴳᴮᴵᴿᴰ, logged
with username bigbird with the new password.
• Account compromised.

• The canonical_username function only implemented
the first time. Function like “toLower” implemented.
• Users signs up with username BigBird, normalized to
bigbird.
• Another user signs up as ᴮᴵᴳᴮᴵᴿᴰ, which also gets
normalized to BIGBIRD the first time, but bigbird the
next time.
• ᴮᴵᴳᴮᴵᴿᴰ requests a password reset email, but with it can
reset bigbird’s account.

• Use Canonicalizing
– Important aspect of input sanitization
– Converting data with various possible
representations into a standard "canonical"
representation deemed acceptable by the
application mapping all characters to lower case
– Treat “BigBird”, “ ᴮᴵᴳᴮᴵᴿᴰ ” and “bigbird” as the same
by Canonicalizing as they would all be mapped to
‘bigbird’

• The vulnerability was noticed when the compromised
accounts started RETWEETING a tweet with a "♥" symbol
that was followed by a string of code/Parameter.
• Users didn’t even have to click on the tweet sent out by the
Twitter account @derGeruhn. Just the act of viewing the
tweet would cause the user to automatically retweet
• Affected accounts also involuntarily re-tweeted a cross-site
scripting (XSS) code as a result of the vulnerability
• That tweet hit the max re-tweet
over 84,000 times

• TweetDeck didn’t escape HTML-chars if a Unicode-
char is in the tweet -text
• The Unicode-Heart (which gets replaced with an
image by TweetDeck) somehow prevents the Tweet
from being HTML-escaped.
• TweetDeck was not supposed to display this as an
image.
Because it's simple Text,
which should be escaped to
"&hearts;".

1. When converting strings used in security-
sensitive operations, use documented options
which prevent the use of best-fit mappings.
2. A suitable canonical form should be chosen and
all user input canonicalized into that form before
any authorization decisions are performed.
3. Security checks should be carried out after UTF-
8 decoding is completed.
X is only allowed if X==canonical(X)

• Here’s a chart with all the new emoji in yellow
including my favorite “1F595” which will be a
hit on Twitter.
• http://www.unicode.org/charts/PDF/Unicode-
7.0/U70-1F300.pdf

• http://www.rishida.net/tools/conversion/
• http://www.fileformat.info/info/unicode/char/a.htm
• http://www.panix.com/~eli/unicode/convert.cgi?text=
Unicode
• http://unicode-table.com/en/
• http://www.unicode.org/charts/PDF/Unicode-7.0/U70-
1F300.pdf

Unicode

More Related Content

What's hot

Similar to Unicode

More from Ronan Dunne, CEH, SSCP

Recently uploaded

Unicode