Internationalised Domain Names & Internet Investigations

English is not the only language that the Internet “speaks.” Internationalised Domain Names (IDNs) now allow for domain names in Arabic, Cyrillic, Chinese, and other non-Latin characters. This session will show how to trace IDNs and will examine some of the IDN info security issues. There will be a quick introduction to working with foreign language Websites and useful tips for using online search and translation tools.



    Internationalised Domain Names & Internet Investigations Internationalised Domain Names & Internet Investigations Presentation Transcript

    • Internationalised DomainNames, Foreign LanguageWebsites, & Investigations Jonathan D. Abolins Thu, 28 July 2011 11:00 AM - 12:00 PM PDT (GMT-08:00) Post-Webinar Version with additional notes.
    • Introduction About me Why this topic Some notes about this presentation’s approach.
    • Note About Translation Tools Machine translation tools help a lot. But they can also leave out much or mislead. Helps to know the languages involved or work with a competent translator.  But the translators might not know about some recent Internet developments.
    • Quick Overview of Terms Labels – example: www.veresoftware.com Label 1 Label 2 Label 3 TLD – Top Level Domain (e.g., .com or .uk) ccTLD – Country Code TLD (e.g., .uk, .ru) IDN – Internationalised Domain Name Unicode ACE – ASCII Compatible Encoding Punycode (RFC 3492), a form of ACE
    • OSINT in an Alphabet Soup ofthe Networked WorldBut see http://www.cartoonistgroup.com/store/add.php?iid=8381Sometimes, alphabet soup is soup, not a coded message.
    • A couple of Examples of non-English Windows 7 Desktops First is Russian. Second is Arabic. Note the shift to the right. They were done by switching the languages on one of my Windows 7 Ultimate PCs. The GUI labels for My Documents, My Music, etc. are localised. But the underlying directory names, as seen via dir command in a CMD window, did not change.
    • The Net No Longer “Speaks”Primarily English Old days  Had to use code pages (character encodings) for non-Latin text. Can be confusing.  Difficult to mix languages. Now  Unicode covers most of the world’s writing systems. 90+ scripts.  Still encounter code pages.
    • But Underlying Code isUniversal Bits & Bytes Programming languages HTML codes IP Adresses Etc. This can work to your advantage!
    • If a foreign site offers English, whyread the foreign language version? http://krebsonsecurity.com/2010/12/russian-police-only-translate-the-good-news/
    • What if you can’t read Russian?
    • File/Pathnames May Have Clues… http://www.mvd.ru/news/
    • File/Pathnames May Have Clues… http://www.mvd.ru/presscenter/
    • Note for the Previous Slides… Sometimes the foreign site might be using a site structure developed in the English speaking world. Particularly the case with some Web forums. Other times, the Web designers are trying to avoid problems with mixing texts for directory and file names. In any case, the file path info often can be a help.
    • Tip: Google Chrome HasBuilt-in Translation Functionhttp://habrahabr.ru/blogs/DIY/
    • Search Tip:A Picture is Worth 1K Words An image search might help to zero in on the entries of interest. Especially useful if you want to save time wading through foreign language hits. Example search for the RASKAT (Раскат) data destruction device from Russia. Look for images the look “computerish”.
    • Google Translate Annoyance:URL Conversion Tried to type in “http://www.xakep.ru” but Google “Russified” it. Uncheck the Phonetic Typing box before entering URLs for site translation/
    • Internationalised DomainNames (IDN) Intro – The Phonebook Analogy Imagine a phonebook where people could have entries in their prefered scripts. Mr. Wong could have his in Chinese. Ms. Romanov could have her in Russian. And so on. Many people will choose to have both Latin text and foreign text entries for the same phone number. Makes it easier for their family and friends to find them. But others fret about the different texts. Underneath it all, however, the phone system hardware, networks, and the phone numbers remain the same. Something like this is happening with the Internet.
    • The First Four IDN ccTLDsIn May 2010 United Arab Emirates: ‫.اﻣﺎرات‬ Saudi Arabia: ‫.اﻟﺳﻌودﯾﺔ‬ Russian Federation: .рф Egypt: ‫.ﻣﺻر‬More IDN ccTLDs have been launched.Remember, IDNs can also exist under non-IDN ccTLDs. Example: ‫.גינדי‬com or bücher.comhttp://blog.icann.org/2010/05/idn-cctlds-%E2%80%93-the-first-four/
    • Examples of IDNs & Punycode ‫.גינדי‬com 스타벅스코리아.com газпром.рф ‫ﺳﺟل.ﻣﺻر‬ 汕头大学.中国 xn--pssza05mm53a.xn--fiqs8s/
    • Gindi Realty (Israel)‫.גינדי‬comPunycode: http://xn--6dbcrb7a.com/
    • Offline IDN Example
    • Starbucks Korea 스타벅스코리아.comPunycode: http://xn--oy2b35ckwhba574atvuzkc.com/
    • Shantou University (PRC)汕头大学.中国/ Same as http://stu.edu.cnPunycode: http://xn--pssza05mm53a.xn--fiqs8s/
    • Sajela.MiSr (Egypt)‫ﺳﺟل.ﻣﺻر‬Punycode: http://xn--rgbn6c.xn--wgbh1c/
    • Fun with Arabic & OtherRTL (right to Left) IDN URLs Reading direction can switch. Example URL. http://‫/ﺳﺟل.ﻣﺻر‬Files/GeneralPolicy.pdf 1 ----> <----------2 3 ---------------------------------------------> The direction changes can cause problems in various tools and procedures. This is where Punycode really helps. http://xn--rgbn6c.xn--wgbh1c/Files/GeneralPolicy.pdf
    • Punycode DNS works with Punycode for IDN labels Example: ‫ﺳﺟل.ﻣﺻر‬ Punycode: xn--rgbn6c.xn--wgbh1c .xn--wgbh1c is Punycode for the ‫ ﻣﺻر‬IDN ccTLD.  Note the distinctive xn– prefix. Much safer way to store & use IDNs. Various online and offline tools for conversion. Conversions works in both directions. Unicode IDN <-> Punycode.
    • An Online Converterhttp://idnaconv.phlymail.de/
    • idn: An Offline IDN Converter(Linux)
    • Challenges with IDNs Recognising what it is. (domain name, URL, e-mail address). Which end is the ccTLD? What language is it? What country of registry? Sad cause I cant find the ‫( ص‬Saad) key. (How do I enter the IDN?)  Some characters have multiple codes. Many tools dont work correctly with IDNs. Homograph (Look-alike) Attacks
    • Recognising IDNs. Not just URLs.How About IDN E-mail Addresses? What if you found a note with this: ваше_имя@письмо.рф ? Would you know it’s an e-mail address? Would your translator recognise it as an e-mail address?
    • By the Way, What About Vocalisation ofURLs & e-Mail Addresses in ForeignLanguages? The way a URL or an email address – IDN or not – is said can differ across languages. How is the “at” symbol or the “dot” said? Example with Russian and “Ivan@pochta.ru”: “Ivan sobachka pochta tochka ru” or “Ivan sobachka pochta dot ru”  Sobachka (собачка – “little dog”) is a popular Russian way of voicalising the “@” sign.  Tochka (точка – “point”) or Dot (дот) used for the “.” mark.How to say an e-mail address in Russian:http://www.themoscowtimes.com/opinion/article/the-really-cool-people-say-dot/439857.html
    • What Does the IDN URL Mean?
    • How Do I Type the IDN? Copy & Paste  Directly from page  Google Translate  Wikipedia Keyboard input  Need the right keyboard or keytops.  System setup for allowing the foreign language input. Character map tools
    • One Character, Multiple Codeshttp://singapore41.icann.org/meetings/singapore2011/presentation-idn-variant-tlds-update-20jun11-en.pdf
    • Common Net Commands & IDN Windows cmd CLI a problem w/o modifcation Tools have to be able to handle Unicode. ping nslookup dig Whois (can be tricky at times) Punycode is more reliable.
    • Not All Our Tools Are Unicodeor IDN-Ready
    • Whois & IDN ccTLD Domains Whois on the domain name might not always work well with some IDN ccTLD domains. But there are options, including:  Get and lookup IP address  Use IANA db & Delegation Record
    • IANA Root Zone dbhttp://www.iana.org/domains/root/db/#
    • IANA Delegation Recordshttp://www.iana.org/domains/root/db/xn--p1ai.html
    • Security Concern:Homograph Attacks
    • Are These Sets The Same? АаВьСсЕеНКкМРрОоТуХхЗ AaBbCcEeHKkMPpOoTyXx3
    • Looking at the Underlying Code АаВьСсЕеНКкМРрОоТуХхЗ <-Cryllic 0410 0430 0412 044C 0421 0441 0415 0435 041D 041A 043A 041C 0420 0440 041E 043E 0422 0443 0425 0445 0417 AaBbCcEeHKkMPpOoTyXx3 <-ASCII 0041 0061 0042 0062 0043 0063 0045 0065 0048 004B 006B 004D 0050 0070 004F 006F 0054 0079 0058 0078 0033
    • Homographs for Fraud& Punycode for Detection http://www.facebook.com/ Really is http://www.facebook.com/ http://www.facebοok.com/ http://www.xn--facebok-dpf.com/ http://www.faceboοk.com/ http://www.xn--facebok-epf.com/ http://www.facebοοk.com/ http://www.xn--facebk-m0ea.com/ http://idnaconv.phlymail.de/
    • Homograph Attack Concerns Raised by various people, including 3ric Johanson at Shmoocon in 2005. He registered www.xn—pypal-4ve.com to spoof Paypal. Anti-Phishing Working Group Global Phishing Survey 1H2010: last true homograph attack was in 2009. A “hotmail.net” look-alike: xn--hotmal-t9a.net Global Phishing Survey 1H2010: http://tinyurl.com/2ch5o87
    • Not All Homographs Are Bad.Clever Homograph: xakep.ru
    • Special Topic:Character Encodings
    • Code Pages /CharacterEncodings Examples:  Arabic: Windows 1256, IBM 864  Cyrillic: IBM 855, KOI8-R, Windows 1251  Hebrew: IBM 862, Windows 1255  See also http://en.wikipedia.org/wiki/Code_pages
    • Character Encoding in Internetdocuments If page doesn’t render properly:  Check HTML source for clues like <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=KOI8-R">  Server’s country location might be a clue.  Try browser’s character encoding tools. (FireFox example)  For Cyrillic, check out these tools:  Universal Cyrillic Decoder page http://2cyr.com/decode/  Russian Anywhere (re) package for many Linux distros.
    • Examplehttp://www.lena.ru/songs.html
    • Firefox – Character EncodingSet to Auto DetectIn recent versions of Firefox,Firefox button-> Web Developer-> Character Encoding-> Auto DetectIn some cases, trial & error isneeded.This method also can workfor local files. http://www.lena.ru/songs.html
    • Resources ICANN  IDN Info: http://www.icann.org/en/topics/idn/  Blog: http://blog.icann.org/  IDN Wiki: http://idn.icann.org/  IDN TLD Map: http://www.icann.org/en/maps/idntld.htm IDN Blog http://idnblog.com/ Verisign IDN FAQ http://www.verisigninc.com/en_US/products-and-services/domain-name- services/domain-information-center/idn-resources/idn-faq/index.xhtml This Domain Name is Greek to Me: An Introduction to Internationalized Domain Names for Investigators (DFI News) http://www.dfinews.com/article/domain-name-greek-me-introduction-internationalized-domain- names-investigators?page=0,1 Internationalized Domain Names & Investigations in the Networked World (one of the DojoCon 2010 videos) http://www.irongeek.com/i.php?page=videos/dojocon-2010-videos
    • Resources (cont) XN—ICANN http://www.hackerfactor.com/blog/index.php?/archives/321-xn-ICANN.html IDNForums.Com Emphasis upon buying & selling IDN domains. http://www.idnforums.com/ IANA ccTLDs Database http://www.iana.org/domains/root/db/# Stratchclyde Forensics – IDN Homograph Attacks http://www.computerforensicsglasgow.info/IDN_Homograph_Attacks.htm New Arrival in Russian Spam – .РФ http://www.thesecurityblog.com/2011/02/new-arrival-in-russian-spam-%D1%80%D1%84/ An IDN – Punycode Converter http://idnaconv.phlymail.de/ How to say an e-mail address in Russian http://www.themoscowtimes.com/opinion/article/the-really-cool-people-say- dot/439857.html
    • Resources (cont)Keyboard Setup How to Change Keyboard Language http://www.lib.uchicago.edu/e/using/catalog/inputoptions.html http://tlt.its.psu.edu/suggestions/international/keyboards/winkey.html http://www.al-bab.com/arab/comp.htmTranslation and Language Issues American Translators Association: Getting It Right (insights into translation issues) http://www.atanet.org/publications/getting_it_right.php Basis Technology – Excellent papers & presentations on language issues. http://www.basistech.com/resources/ (The links on the left have more papers on topics such as Middle Eastern Languages, Digital Forensics, etc.)
    • Resources: Google Searchesfor Some IDN ccTLDs Republic of Korea: 한국 http://www.google.com/search?q=site%3A.한국 Serbia: СРБ http://www.google.com/search?q=site%3A%D0%A1%D0%A0%D0%91 Peoples Republic of China: 中国 http://www.google.com/search?q=site%3A.%E4%B8%AD%E5%9B%BD http://www.google.com/search?q=site%3A.%E4%B8%AD%E5%9C%8B Hong Kong SAR: 香港 http://www.google.com/search?q=site%3A.%E9%A6%99%E6%B8%AF Taiwan: 台湾 http://www.google.com/search?q=site%3A.%E5%8F%B0%E6%B9%BE http://www.google.com/search?q=site%3A.%E5%8F%B0%E7%81%A3 Egypt: ‫ﻣﺻر‬ http://www.google.com/search?q=site%3A.‫ﻣﺻر‬ Jordan: ‫اﻻردن‬ http://www.google.com/search?q=site%3A.%D8%A7%D9%84%D8%A7%D8%B1%D8%AF%D9%86 Saudi Arabia: ‫اﻟﺳﻌودﯾﺔ‬ http://www.google.com/search?q=site%3A.%D8%A7%D9%84%D8%B3%D8%B9%D9%88%D8%AF% D9%8A%D8%A9 Russian Federation: РФ http://www.google.com/search?q=site%3A.%D0%A0%D0%A4
    • Thank you.• Jon.Abolins@gmail.com• Twitter: @jabolins• Web: idn.MeydaOnline.com