English is not the only language that the Internet “speaks.” Internationalised Domain Names (IDNs) now allow for domain names in Arabic, Cyrillic, Chinese, and other non-Latin characters. This session will show how to trace IDNs and will examine some of the IDN info security issues. There will be a quick introduction to working with foreign language Websites and useful tips for using online search and translation tools.
Internationalised Domain Names & Internet Investigations
1. Internationalised Domain
Names, Foreign Language
Websites, & Investigations
Jonathan D. Abolins
Thu, 28 July 2011
11:00 AM - 12:00 PM PDT (GMT-08:00)
Post-Webinar Version with additional notes.
2. Introduction
About me
Why this topic
Some notes about this presentation’s approach.
3. Note About Translation Tools
Machine translation tools help a lot.
But they can also leave out much or mislead.
Helps to know the languages involved or work
with a competent translator.
But the translators might not know about some
recent Internet developments.
4. Quick Overview of Terms
Labels – example: www.veresoftware.com
Label 1 Label 2 Label 3
TLD – Top Level Domain (e.g., .com or .uk)
ccTLD – Country Code TLD (e.g., .uk, .ru)
IDN – Internationalised Domain Name
Unicode
ACE – ASCII Compatible Encoding
Punycode (RFC 3492), a form of ACE
5. OSINT in an Alphabet Soup of
the Networked World
But see http://www.cartoonistgroup.com/store/add.php?iid=8381
Sometimes, alphabet soup is soup, not a coded message.
6. A couple of Examples of non-
English Windows 7 Desktops
First is Russian.
Second is Arabic. Note the shift to the right.
They were done by switching the languages on
one of my Windows 7 Ultimate PCs.
The GUI labels for My Documents, My Music,
etc. are localised. But the underlying directory
names, as seen via dir command in a CMD
window, did not change.
7.
8.
9. The Net No Longer “Speaks”
Primarily English
Old days
Had to use code pages (character encodings) for
non-Latin text. Can be confusing.
Difficult to mix languages.
Now
Unicode covers most of the world’s writing systems.
90+ scripts.
Still encounter code pages.
10. But Underlying Code is
Universal
Bits & Bytes
Programming languages
HTML codes
IP Adresses
Etc.
This can work to your advantage!
11. If a foreign site offers English, why
read the foreign language version?
http://krebsonsecurity.com/2010/12/russian-police-only-translate-the-good-news/
15. Note for the Previous Slides…
Sometimes the foreign site might be using a site
structure developed in the English speaking
world. Particularly the case with some Web
forums.
Other times, the Web designers are trying to
avoid problems with mixing texts for directory
and file names.
In any case, the file path info often can be a
help.
16. Tip: Google Chrome Has
Built-in Translation Function
http://habrahabr.ru/blogs/DIY/
17. Search Tip:
A Picture is Worth 1K Words
An image search might help to zero in on the entries of
interest.
Especially useful if you want to save time wading
through foreign
language hits.
Example search for the
RASKAT (Раскат) data
destruction device from
Russia. Look for images
the look “computerish”.
18. Google Translate Annoyance:
URL Conversion
Tried to type in “http://www.xakep.ru”
but Google “Russified” it.
Uncheck the Phonetic Typing box
before entering URLs for site
translation
/
19.
20. Internationalised Domain
Names (IDN)
Intro – The Phonebook Analogy
Imagine a phonebook where people could have entries in their prefered
scripts. Mr. Wong could have his in Chinese. Ms. Romanov could have
her in Russian. And so on. Many people will choose to have both Latin
text and foreign text entries for the same phone number. Makes it easier
for their family and friends to find them. But others fret about the
different texts.
Underneath it all, however, the phone system hardware, networks, and
the phone numbers remain the same.
Something like this is happening with the Internet.
21. The First Four IDN ccTLDs
In May 2010
United Arab Emirates: .اﻣﺎرات
Saudi Arabia: .اﻟﺳﻌودﯾﺔ
Russian Federation: .рф
Egypt: .ﻣﺻر
More IDN ccTLDs have been launched.
Remember, IDNs can also exist under non-IDN ccTLDs.
Example: .גינדיcom or bücher.com
http://blog.icann.org/2010/05/idn-cctlds-%E2%80%93-the-first-four/
28. Fun with Arabic & Other
RTL (right to Left) IDN URLs
Reading direction can switch.
Example URL.
http:///ﺳﺟل.ﻣﺻرFiles/GeneralPolicy.pdf
1 ----> <----------2 3 --------------------------------------------->
The direction changes can cause problems in
various tools and procedures.
This is where Punycode really helps.
http://xn--rgbn6c.xn--wgbh1c/Files/GeneralPolicy.pdf
29. Punycode
DNS works with Punycode for IDN labels
Example: ﺳﺟل.ﻣﺻر
Punycode: xn--rgbn6c.xn--wgbh1c
.xn--wgbh1c is Punycode for the ﻣﺻرIDN ccTLD.
Note the distinctive xn– prefix.
Much safer way to store & use IDNs.
Various online and offline tools for conversion.
Conversions works in both directions.
Unicode IDN <-> Punycode.
32. Challenges with IDNs
Recognising what it is.
(domain name, URL, e-mail address).
Which end is the ccTLD?
What language is it?
What country of registry?
Sad 'cause I can't find the ( صSaad) key.
(How do I enter the IDN?)
Some characters have multiple codes.
Many tools don't work correctly with IDNs.
Homograph (Look-alike) Attacks
33. Recognising IDNs. Not just URLs.
How About IDN E-mail Addresses?
What if you found a note with this:
ваше_имя@письмо.рф ?
Would you know it’s an
e-mail address?
Would your translator
recognise it as an e-mail
address?
34. By the Way, What About Vocalisation of
URLs & e-Mail Addresses in Foreign
Languages?
The way a URL or an email address – IDN or not – is
said can differ across languages.
How is the “at” symbol or the “dot” said?
Example with Russian and “Ivan@pochta.ru”:
“Ivan sobachka pochta tochka ru”
or
“Ivan sobachka pochta dot ru”
Sobachka (собачка – “little dog”) is a popular Russian way of
voicalising the “@” sign.
Tochka (точка – “point”) or Dot (дот) used for the “.” mark.
How to say an e-mail address in Russian:
http://www.themoscowtimes.com/opinion/article/the-really-cool-people-say-
dot/439857.html
36. How Do I Type the IDN?
Copy & Paste
Directly from page
Google Translate
Wikipedia
Keyboard input
Need the right keyboard or
keytops.
System setup for allowing
the foreign language input.
Character map tools
37. One Character, Multiple Codes
http://singapore41.icann.org/meetings/singapore2011/presentation-idn-variant-tlds-update-20jun11-en.pdf
38. Common Net Commands & IDN
Windows cmd CLI a problem w/o modifcation
Tools have to be able to handle Unicode.
ping
nslookup
dig
Whois (can be tricky at times)
Punycode is more reliable.
40. Whois & IDN ccTLD Domains
Whois on the domain name might not always
work well with some IDN ccTLD domains.
But there are options, including:
Get and lookup IP address
Use IANA db & Delegation Record
46. Homographs for Fraud
& Punycode for Detection
http://www.facebook.com/
Really is http://www.facebook.com/
http://www.facebοok.com/
http://www.xn--facebok-dpf.com/
http://www.faceboοk.com/
http://www.xn--facebok-epf.com/
http://www.facebοοk.com/
http://www.xn--facebk-m0ea.com/
http://idnaconv.phlymail.de/
47. Homograph Attack Concerns
Raised by various people, including 3ric
Johanson at Shmoocon in 2005.
He registered www.xn—pypal-4ve.com to spoof
Paypal.
Anti-Phishing Working Group Global Phishing
Survey 1H2010: last true homograph attack
was in 2009. A “hotmail.net” look-alike:
xn--hotmal-t9a.net
Global Phishing Survey 1H2010: http://tinyurl.com/2ch5o87
50. Code Pages /Character
Encodings
Examples:
Arabic: Windows 1256, IBM 864
Cyrillic: IBM 855, KOI8-R, Windows 1251
Hebrew: IBM 862, Windows 1255
See also http://en.wikipedia.org/wiki/Code_pages
51. Character Encoding in Internet
documents
If page doesn’t render properly:
Check HTML source for clues like
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=KOI8-R">
Server’s country location might be a clue.
Try browser’s character encoding tools. (FireFox
example)
For Cyrillic, check out these tools:
Universal Cyrillic Decoder page http://2cyr.com/decode/
Russian Anywhere (re) package for many Linux distros.
53. Firefox – Character Encoding
Set to Auto Detect
In recent versions of Firefox,
Firefox button
-> Web Developer
-> Character Encoding
-> Auto Detect
In some cases, trial & error is
needed.
This method also can work
for local files.
http://www.lena.ru/songs.html
54. Resources
ICANN
IDN Info: http://www.icann.org/en/topics/idn/
Blog: http://blog.icann.org/
IDN Wiki: http://idn.icann.org/
IDN TLD Map: http://www.icann.org/en/maps/idntld.htm
IDN Blog
http://idnblog.com/
Verisign IDN FAQ
http://www.verisigninc.com/en_US/products-and-services/domain-name-
services/domain-information-center/idn-resources/idn-faq/index.xhtml
This Domain Name is Greek to Me: An Introduction to Internationalized
Domain Names for Investigators (DFI News)
http://www.dfinews.com/article/domain-name-greek-me-introduction-internationalized-domain-
names-investigators?page=0,1
Internationalized Domain Names & Investigations in the Networked World
(one of the DojoCon 2010 videos)
http://www.irongeek.com/i.php?page=videos/dojocon-2010-videos
55. Resources (cont)
XN—ICANN
http://www.hackerfactor.com/blog/index.php?/archives/321-xn-ICANN.html
IDNForums.Com
Emphasis upon buying & selling IDN domains.
http://www.idnforums.com/
IANA ccTLDs Database
http://www.iana.org/domains/root/db/#
Stratchclyde Forensics – IDN Homograph Attacks
http://www.computerforensicsglasgow.info/IDN_Homograph_Attacks.htm
New Arrival in Russian Spam – .РФ
http://www.thesecurityblog.com/2011/02/new-arrival-in-russian-spam-%D1%80%D1%84/
An IDN – Punycode Converter
http://idnaconv.phlymail.de/
How to say an e-mail address in Russian
http://www.themoscowtimes.com/opinion/article/the-really-cool-people-say-
dot/439857.html
56. Resources (cont)
Keyboard Setup
How to Change Keyboard Language
http://www.lib.uchicago.edu/e/using/catalog/inputoptions.html
http://tlt.its.psu.edu/suggestions/international/keyboards/winkey.html
http://www.al-bab.com/arab/comp.htm
Translation and Language Issues
American Translators Association: Getting It Right (insights into translation
issues)
http://www.atanet.org/publications/getting_it_right.php
Basis Technology – Excellent papers & presentations on language issues.
http://www.basistech.com/resources/
(The links on the left have more papers on topics such as Middle Eastern Languages, Digital Forensics,
etc.)
57. Resources: Google Searches
for Some IDN ccTLDs
Republic of Korea: 한국
http://www.google.com/search?q=site%3A.한국
Serbia: СРБ
http://www.google.com/search?q=site%3A%D0%A1%D0%A0%D0%91
Peoples Republic of China: 中国
http://www.google.com/search?q=site%3A.%E4%B8%AD%E5%9B%BD
http://www.google.com/search?q=site%3A.%E4%B8%AD%E5%9C%8B
Hong Kong SAR: 香港
http://www.google.com/search?q=site%3A.%E9%A6%99%E6%B8%AF
Taiwan: 台湾
http://www.google.com/search?q=site%3A.%E5%8F%B0%E6%B9%BE
http://www.google.com/search?q=site%3A.%E5%8F%B0%E7%81%A3
Egypt: ﻣﺻر
http://www.google.com/search?q=site%3A.ﻣﺻر
Jordan: اﻻردن
http://www.google.com/search?q=site%3A.%D8%A7%D9%84%D8%A7%D8%B1%D8%AF%D9%86
Saudi Arabia: اﻟﺳﻌودﯾﺔ
http://www.google.com/search?q=site%3A.%D8%A7%D9%84%D8%B3%D8%B9%D9%88%D8%AF%
D9%8A%D8%A9
Russian Federation: РФ
http://www.google.com/search?q=site%3A.%D0%A0%D0%A4