• 2005-02-08

Sorry to be blunt, but someone must have been asleep. The astonishing new phishing exploit that has generated a lot of commotion since the weekend relies on something so obvious that you don’t have to be a coder or know anything about what makes web browsers tick to understand the principle. Being a moderately competent web writer or designer will do.

It all relies on IDN, or Internationalized Domain Names. The possibility for domain names (and thus URIs) to be written in any character set covered by the Unicode standard is a necessary step towards the internationalization of the web: clearly a Good Thing. Modern browsers#[1] support them, more or less well. In my tests of non-ascii URIs, Opera came out best.

The evident problem with IDN comes from homographs. Usually, homographs are two or more words that are spelled the same but differ in meaning, eg fell (past tense of fall), fell (synonym of chop down) and fell (synonym of harmful). In the world of internationalization and multiple character encodings, homographs mean letters or strings that look alike or very similar, but none the less belong to different Unicode code charts or blocks.

In the example given by schmoo (who published the exploit), the phony URL is written as http://www.pаypal.com/. When this address is handed to a browser via a href attribute (in a link), the а entity resolves to а. Looks like an a, right? Except that it isn’t. Decimal 1072 gives us hexadecimal 430, and looking this up in the Unicode data sheet, we find that а corresponds to CYRILLIC SMALL LETTER A. Which is not the same as LATIN SMALL LETTER A, hexadecimal 61, a a, the one that the correct paypal domain uses.

Cyrillic а and Latin a are different by definition, since they belong to two different writing systems. Not accounting for homographs in giving out internationalized domain names leads to the nightmare of multiple different URLs that are visually undistinguishable.

P.S.: Follow the first link in this post (to Boing Boing) and/or update your browser to protect yourself. Most importantly: always type in the URL of security-sensitive sites by hand.

Note: I first wrote “homophones” in the title, even though I knew I was going to write about homographs. This is the reason the permalink is so ugly. My bad.


[1]: And since Internet Explorer only understands ascii domain names, it is not affected by this security breach.


1 comment(s) for 'Dangerous homographs'

  1. (Comment, 2005-02-13 00:34 )
    #1blacky

    What’s even more embarrasing is the fact that when IDNs where proposed, this exploit was known, documented and proposals were made to remedy it. A very classic case of all the people who thought twice now being able to say “Told you so”.